A few years ago, Eric Hellman wrote a great piece about library catalog privacy leakage. He showed that because library catalogs transmit data over HTTP connections (rather than over secure, encrypted HTTPS connections), anyone in between the user’s computer and the OPAC server can intercept information about the book record that user was looking at. For his example of the New York Public Library, that included automatically sharing the user’s data with the following for-profit companies:
And these are just the sites authorized to see the data by virtue of having widgets and components installed on the NY Public library catalog page! This doesn’t count the third parties all of these companies share the user data with, or any malicious actor who might take advantage of insecure networks to look at what the user is viewing.
Since then, the Library Freedom Project has encouraged libraries to move services to HTTPS connections, and the Let’s Encrypt initiative has made it easy for organizations with developers to programmatically get HTTPS certificates, free of charge.1
Hellman concludes his sobering post like this:
In 1972, Zoia Horn, a librarian at Bucknell University, was jailed for almost three weeks for refusing to testify at the trial of the Harrisburg 7 concerning the library usage of one of the defendants. That was a long time ago. No longer is there a need to put librarians in jail.
This post has been in the back of my mind for a few years, but I wasn’t convinced by Hellman’s parallel between books read and catalog browsing. Users can look at all the records they want, but that doesn’t equate reading a book. What does equate with reading a book, however, is reading a book. This past winter I did an evaluation of the eBook providers we offer to our users to see how well they stack up in supporting basic reader privacy (that is, do they support HTTPS?).2 The answers were sobering.
We may on occasion also match or combine the personal information that you provide with information that we obtain from other sources or that is already in our records, whether collected online or offline or by predecessor or affiliated group companies
Many of the other vendors supported HTTPS, but the connections need to be set up correctly by the library in their proxy settings. I set about working to switch over as many of these providers as possible to HTTPS, working with Mary, who manages our EZProxy. What I assumed would be a simple project by two fairly technically-minded folks turned out to be much, much harder than I expected.
Since there is a lot of interest in the library world about moving our services to HTTPS connections right now, I wanted to provide some details about some of the roadblocks you might face in dealing with third-party vendor databases. In my mind, eBook vendors are one of the most crucial online library services to encrypt, because they provide direct access to materials that have been read by a library user. Privacy is the second most prominent value in the American Library Associations Core Values of Librarianship. According to the Michigan Library Privacy Act, disclosing this information without consent can result in a $250 fine. Do I want to take a chance that we’ll be liable for $250 multiplied by our eBook vendor COUNTER stats in fines? Do I want to be party to using technology that could squash the ability of GVSU’s students to explore unpopular topics? Do I want to be responsible for a Grand Valley student being suspected or crimes or terrorism because of what they read (which has happened in the UK while similar accusations have happened already in the States).
So, I’m working with many of these vendors to make sure that their eBook platforms support HTTPS properly, and would love to get more libraries on board to support this, but be prepared to do a lot of work to get these systems working properly.
To switch a library database eBook vendor over to HTTPS, we have to specifically tell EZProxy to work with HTTPS connections for each database. The proxy server will refuse to proxy an HTTPS connection if that protocol hasn’t been specified in the EZProxy stanza for a provider.3 This means that you can’t just change the target URL in your database A-Z list to https: and be done with it. To successfully set up an HTTPS connection to a database through EZProxy, you need two certificates:
Even if a vendor does support HTTPS, you still need to have a wildcard certificate for your proxy server, since that is the server that is actually “hosting” the content. And telling EZProxy about the HTTPS possibility will also allow the proxy server to work with your wildcard certificate, since there are some technical limitations that need to be overcome.
Wildcard certificates will cover subdomains of the parent URL, so if you have a wildcard certificate for “*.mywebsite.com” you are all set for subdomains like “hotdogs.mywebsite.com” or “catphotos.mywebsite.com” to be served over HTTPS. You won’t be able to serve sub-subdomains, though, like “awesome.hotdogs.mywebsite.com.” Of course, you can get a wildcard certificate for your subdomain, like “*.hotdogs.mywebsite.com” and then you can have HTTPS fun with “catsup.hotdogs.mywebsite.com” and “mustard.hotdogs.mywebsite.com” and so on. At GVSU, we have a wildcard certificate for “*.ezproxy.gvsu.edu,” so we’re good for any subdomain off the ezproxy URL. But this is the problem for EZProxy. EZProxy URLs end up being formed like this:
So, to access a database like Safari eBooks, whose URL is
proquest.safaribooksonline.com, you’d end up with a URL like this:
There are just too many subdomains for the wildcard certificate to work with this URL on an HTTPS connection. So if we tell EZProxy to include an HTTPS domain in the stanza, it will serve up the HTTPS versions of a site by changing the dots in the database vendor URL to dashes. So an encrypted URL for Safari eBooks looks like this:
In this way, EZProxy lets us use our wildcard certificates with database vendor sites. But this work-around only happens if you have your stanza properly set up and you direct your users to the HTTPS version of the site. (As far as I can tell, there is no way within EZProxy to force an HTTPS version of a database. You either have to rely on the incoming proxied link, or rely on the database vendor to handle it on the server end. The smart money is on the proxied link.)
Now, some databases, like LearningTech Library, were fairly easy to set up at this point. I had our EZProxy manager update the stanza to include HTTPS connection information, and then updated the link from our database A-Z list to use the HTTPS connection. LearningTech Library made this especially easy, since they offer the HTTPS configuration stanza right on their website.
This worked great, until I ran some tests on the site. What I discovered was that in several circumstances, despite using an HTTPS connection, several actions resulted in reverting to HTTP. For instance, doing a search for items and then clicking on a result would load the new page in HTTP. Whoops! So much for HTTPS! There were also a few navigation elements that have hard-coded HTTP links, and downloading a PDF, which is the equivalent of reading a book is always transfered over HTTP. So much for quick wins. (I reported these issues to their technical support team on March 22nd and was told they were priority fixes. They haven’t yet been fixed.)
The bigger dilemma came when I tried to tackle Safari eBooks. Safari supports HTTPS connections, and the default OCLC stanza for Safari includes the HTTPS configurations. Once we got the new stanza in place, I switched over our Database A-Z list, and everything was great! Until I looked at the stats.
Most of the Safari usage we have comes not from the Database A-Z list, but from our catalog or discovery layer through MARC records. And those MARC records have hard coded http:// links in the 856 fields.
When I click this link and go through proxy, I end up with the following URL, with no HTTPS:
See the subdomains? That was my first clue. Because I passed the URL through EZProxy with HTTPS, it should have changed the subdomains dots to dashes. That it didn’t told me something: the Safari server was redirecting the HTTPS connection to HTTP.
Look at the first URL again. See how it’s a GET request, passing the book ID as a value with the key of xmlID? Notice that the URL I end up on is a RESTful structured URL? No GET variable there at all. What I suspect is happening is that Safari gets the GET request and redirects to the new URL, using a hard-coded HTTP protocol. I can confirm this by bypassing the redirect, and using the following proxy URL. My book comes up with HTTPS no problem:
I submitted a ticket with ProQuest today about this, and hopefully it will get resolved! (Case #02016344)
(ProQuest does provide information on editing MARC records in bulk using MARCedit. While that will solve the directing folks to https: issue, it won’t solve the problem of the redirect, unless I also restructure all of the URLs in the 856 fields! And since ProQuest provides the MARC records always with the GET request, this means that we’ll have to tinker with every single update we ever get from them. Fixing the redirect to be protocol neutral will make maintenance much easier.)
In the long run, we need to get database vendors on board to make their products easier to use with HTTPS. ProQuest’s eBrary has an HTTPS certificate (although SSL labs gives it a failing grade), but they redirect all connections to HTTP. Granted, when you log in to your account they route you through HTTPS, but as soon as your credentials have been passed over the wire, back to HTTP you go! This makes absolutely no sense. I hope that eBook Central, the system that is slated to replace eBrary, supports HTTPS. I know that I’ve submitted feature requests for other ProQuest databases like Early English Books Online to add a certificate and HTTPS support, so hopefully this will be coming down the line. But without the vendors, we’re going to have a hell of a time getting our library privacy house in order.
N.B. I’ll be writing up my experience on moving our eBook vendors and other systems over to HTTPS here in the Worknotes as I go. If you want to contribute, or point out that I’m doing things the hard way, please let me know! The more information we share on this the better.