Matthew Reidsma

Work Notes

Updates from the GVSU Libraries’ Web Team.
Archive // Subscribe: EmailRSS

The Unbearable Lightness of Vendor Security

A few years ago, Eric Hellman wrote a great piece about library catalog privacy leakage. He showed that because library catalogs transmit data over HTTP connections (rather than over secure, encrypted HTTPS connections), anyone in between the user’s computer and the OPAC server can intercept information about the book record that user was looking at. For his example of the New York Public Library, that included automatically sharing the user’s data with the following for-profit companies:

  • Bibliocommons
  • ContentCafe
  • Google Analytics and Tag Manger
  • FoxyCart
  • IDreamBooks.com
  • ShareThis
  • Scorecard Research

And these are just the sites authorized to see the data by virtue of having widgets and components installed on the NY Public library catalog page! This doesn’t count the third parties all of these companies share the user data with, or any malicious actor who might take advantage of insecure networks to look at what the user is viewing.

Since then, the Library Freedom Project has encouraged libraries to move services to HTTPS connections, and the Let’s Encrypt initiative has made it easy for organizations with developers to programmatically get HTTPS certificates, free of charge.1

Hellman concludes his sobering post like this:

In 1972, Zoia Horn, a librarian at Bucknell University, was jailed for almost three weeks for refusing to testify at the trial of the Harrisburg 7 concerning the library usage of one of the defendants. That was a long time ago. No longer is there a need to put librarians in jail.

This post has been in the back of my mind for a few years, but I wasn’t convinced by Hellman’s parallel between books read and catalog browsing. Users can look at all the records they want, but that doesn’t equate reading a book. What does equate with reading a book, however, is reading a book. This past winter I did an evaluation of the eBook providers we offer to our users to see how well they stack up in supporting basic reader privacy (that is, do they support HTTPS?).2 The answers were sobering.

Only one of the providers supported HTTPS by default, redirecting all users to an HTTPS connection. That was Knovel, a database provided by Elsevier, who will happily redirect you to HTTPS but will use the personal information that they collect about you in countless evil ways, according to their privacy policy:

We may on occasion also match or combine the personal information that you provide with information that we obtain from other sources or that is already in our records, whether collected online or offline or by predecessor or affiliated group companies

Many of the other vendors supported HTTPS, but the connections need to be set up correctly by the library in their proxy settings. I set about working to switch over as many of these providers as possible to HTTPS, working with Mary, who manages our EZProxy. What I assumed would be a simple project by two fairly technically-minded folks turned out to be much, much harder than I expected.

Since there is a lot of interest in the library world about moving our services to HTTPS connections right now, I wanted to provide some details about some of the roadblocks you might face in dealing with third-party vendor databases. In my mind, eBook vendors are one of the most crucial online library services to encrypt, because they provide direct access to materials that have been read by a library user. Privacy is the second most prominent value in the American Library Associations Core Values of Librarianship. According to the Michigan Library Privacy Act, disclosing this information without consent can result in a $250 fine. Do I want to take a chance that we’ll be liable for $250 multiplied by our eBook vendor COUNTER stats in fines? Do I want to be party to using technology that could squash the ability of GVSU’s students to explore unpopular topics? Do I want to be responsible for a Grand Valley student being suspected or crimes or terrorism because of what they read (which has happened in the UK while similar accusations have happened already in the States).

No!

So, I’m working with many of these vendors to make sure that their eBook platforms support HTTPS properly, and would love to get more libraries on board to support this, but be prepared to do a lot of work to get these systems working properly.

To switch a library database eBook vendor over to HTTPS, we have to specifically tell EZProxy to work with HTTPS connections for each database. The proxy server will refuse to proxy an HTTPS connection if that protocol hasn’t been specified in the EZProxy stanza for a provider.3 This means that you can’t just change the target URL in your database A-Z list to https: and be done with it. To successfully set up an HTTPS connection to a database through EZProxy, you need two certificates:

  1. The vendor database must have an SSL certificate.
  2. Your proxy server must have a wildcard certificate.

Even if a vendor does support HTTPS, you still need to have a wildcard certificate for your proxy server, since that is the server that is actually “hosting” the content. And telling EZProxy about the HTTPS possibility will also allow the proxy server to work with your wildcard certificate, since there are some technical limitations that need to be overcome.

Wildcard certificates will cover subdomains of the parent URL, so if you have a wildcard certificate for “*.mywebsite.com” you are all set for subdomains like “hotdogs.mywebsite.com” or “catphotos.mywebsite.com” to be served over HTTPS. You won’t be able to serve sub-subdomains, though, like “awesome.hotdogs.mywebsite.com.” Of course, you can get a wildcard certificate for your subdomain, like “*.hotdogs.mywebsite.com” and then you can have HTTPS fun with “catsup.hotdogs.mywebsite.com” and “mustard.hotdogs.mywebsite.com” and so on. At GVSU, we have a wildcard certificate for “*.ezproxy.gvsu.edu,” so we’re good for any subdomain off the ezproxy URL. But this is the problem for EZProxy. EZProxy URLs end up being formed like this:

http://VENDOR.URL.COM.EZPROXY.GVSU.EDU

So, to access a database like Safari eBooks, whose URL is proquest.safaribooksonline.com, you’d end up with a URL like this:

http://proquest.safaribooksonline.com.ezproxy.gvsu.edu

There are just too many subdomains for the wildcard certificate to work with this URL on an HTTPS connection. So if we tell EZProxy to include an HTTPS domain in the stanza, it will serve up the HTTPS versions of a site by changing the dots in the database vendor URL to dashes. So an encrypted URL for Safari eBooks looks like this:

https://proquest-safaribooksonline-com.ezproxy.gvsu.edu

In this way, EZProxy lets us use our wildcard certificates with database vendor sites. But this work-around only happens if you have your stanza properly set up and you direct your users to the HTTPS version of the site. (As far as I can tell, there is no way within EZProxy to force an HTTPS version of a database. You either have to rely on the incoming proxied link, or rely on the database vendor to handle it on the server end. The smart money is on the proxied link.)

Now, some databases, like LearningTech Library, were fairly easy to set up at this point. I had our EZProxy manager update the stanza to include HTTPS connection information, and then updated the link from our database A-Z list to use the HTTPS connection. LearningTech Library made this especially easy, since they offer the HTTPS configuration stanza right on their website.

This worked great, until I ran some tests on the site. What I discovered was that in several circumstances, despite using an HTTPS connection, several actions resulted in reverting to HTTP. For instance, doing a search for items and then clicking on a result would load the new page in HTTP. Whoops! So much for HTTPS! There were also a few navigation elements that have hard-coded HTTP links, and downloading a PDF, which is the equivalent of reading a book is always transfered over HTTP. So much for quick wins. (I reported these issues to their technical support team on March 22nd and was told they were priority fixes. They haven’t yet been fixed.)

The bigger dilemma came when I tried to tackle Safari eBooks. Safari supports HTTPS connections, and the default OCLC stanza for Safari includes the HTTPS configurations. Once we got the new stanza in place, I switched over our Database A-Z list, and everything was great! Until I looked at the stats.

Most of the Safari usage we have comes not from the Database A-Z list, but from our catalog or discovery layer through MARC records. And those MARC records have hard coded http:// links in the 856 fields.

What’s more, with Safari I did a quick experiment. I wrote a small piece of jQuery code that parses all 856 links to databases, and looks to see if the provider is Safari. If it is, it switches out the http: links for https: Since we already confirmed that our proxy server was set up correctly for HTTPS, these MARC record links should work perfectly. But when I started testing titles, all of the book-level links redirected to HTTP once they went through the proxy server. This wasn’t a problem with the JavaScript code, it was happening on the Safari server. I started comparing the EZProxied links and that’s when I discovered the issue. Here’s a sample book-level link from one of our MARC records, after I updated the https:

http://ezproxy.gvsu.edu/login?url=https://proquest.safaribooksonline.com/?xmlId=9781498731454

When I click this link and go through proxy, I end up with the following URL, with no HTTPS:

http://proquest.safaribooksonline.com.ezproxy.gvsu.edu/9781498731454

See the subdomains? That was my first clue. Because I passed the URL through EZProxy with HTTPS, it should have changed the subdomains dots to dashes. That it didn’t told me something: the Safari server was redirecting the HTTPS connection to HTTP.

Look at the first URL again. See how it’s a GET request, passing the book ID as a value with the key of xmlID? Notice that the URL I end up on is a RESTful structured URL? No GET variable there at all. What I suspect is happening is that Safari gets the GET request and redirects to the new URL, using a hard-coded HTTP protocol. I can confirm this by bypassing the redirect, and using the following proxy URL. My book comes up with HTTPS no problem:

http://ezproxy.gvsu.edu/login?url=https://proquest.safaribooksonline.com/9781498731454

I submitted a ticket with ProQuest today about this, and hopefully it will get resolved! (Case #02016344)

Now, I could update my JavaScript to compensate for this, but let’s think about scale. Items in my catalog would be fixed, as long as JavaScript didn’t fail to run. Folks coming from the catalog would get an HTTPS connection. But what about folks who come through discovery? Not fixing the actual data in the 856 field doesn’t populate to Summon, our discovery service. Those folks will still get HTTP versions of the book. And this is just one of our nearly 300 databases. Even if I just stick with the 20 eBook providers on my initial list, this quickly becomes a scaling fiasco. I spent several hours testing and troubleshooting this one issue, and arguably it was supposed to be one of the easier database vendors to switch over!

(ProQuest does provide information on editing MARC records in bulk using MARCedit. While that will solve the directing folks to https: issue, it won’t solve the problem of the redirect, unless I also restructure all of the URLs in the 856 fields! And since ProQuest provides the MARC records always with the GET request, this means that we’ll have to tinker with every single update we ever get from them. Fixing the redirect to be protocol neutral will make maintenance much easier.)

In the long run, we need to get database vendors on board to make their products easier to use with HTTPS. ProQuest’s eBrary has an HTTPS certificate (although SSL labs gives it a failing grade), but they redirect all connections to HTTP. Granted, when you log in to your account they route you through HTTPS, but as soon as your credentials have been passed over the wire, back to HTTP you go! This makes absolutely no sense. I hope that eBook Central, the system that is slated to replace eBrary, supports HTTPS. I know that I’ve submitted feature requests for other ProQuest databases like Early English Books Online to add a certificate and HTTPS support, so hopefully this will be coming down the line. But without the vendors, we’re going to have a hell of a time getting our library privacy house in order.

N.B. I’ll be writing up my experience on moving our eBook vendors and other systems over to HTTPS here in the Worknotes as I go. If you want to contribute, or point out that I’m doing things the hard way, please let me know! The more information we share on this the better.


  1. Let’s Encrypt is *not* trivial to set up for a library with no technical staff. [It’s a huge step forward](https://www.wired.com/2016/04/scheme-encrypt-entire-web-actually-working/), and I have no doubt that this will get easier for non-technical folks to take advantage of this, but we need to acknowledge that setting up Let’s Encrypt is going to save libraries that do not already have staff who would set up paid certificates.
  2. I only looked at the eBook providers that we list in our database table as being “eBook databases.” There are many, many more database providers we subscribe to that provide eBooks.
  3. To the folks at OCLC who are now gathering buckets of money with their new annual payments for EZProxy: please use that money to make stanzas protocol neutral.