Matthew Reidsma

Work Notes

Updates from the GVSU Libraries’ Web Team.
Archive // Subscribe: EmailRSS

More Server Problems

The past few weeks we’ve been experiencing a lot of sluggishness of the library online tools. As I’ve written about before, this is because our servers are the computer equivalent of a bologna sandwich. We thought we had the issues resolved, but we’ve continued receiving reports of problems and downtime. I’ve been spending most of my time the past few weeks trying to solve this, and wanted to update you on what I’ve done so far.

  1. We moved the main scripts that run the digital displays in Mary I off the production server. This reduced our memory usage by about a fifth, and took the number of connections way down. Kyle also implemented a caching system to further reduce the number of calls that the system made. The displays for the most part have been more stable since the change, but the problems have continued.
  2. I’ve been diving through our server logs, looking for problems and anomalies. I found a few things that I hadn’t noticed before, like files that were not found (returning an HTTP code of 404) because the URLs were wrong or things had moved. I’ve fixed many of those, but I spotted something more problematic while watching the live connections today:
  3. Some bots from China and Russia are hammering our servers. The Chinese bot, pretending to be a spider from the “Chinese Google” Baidu, requests a page from our status app about 3 times every minute, although sometimes it ramps up to almost a dozen requests a minute. The more aggressive bot, called the Sputnik bot, has been hitting our servers at intervals, about once every 4-6 hours for 30-60 minutes, sending as many as 70 requests a minute to the status app. When you combine these additional requests with the normal traffic we have, it was enough to shut down the servers. This is where our extreme sluggishness (and downtime) has mostly come from. I have blocked the IP ranges for the bots from the server, sending instead a 403 HTTP code (Forbidden). Of course, they are still sending the requests, but we’re spending less processing power on dealing with them.
  4. The computer availability map was one of the biggest resource hogs, even though it isn’t running off the production server at all. It runs on development. But, it calls a JavaScript file and a CSS file from production. The script was designed to refresh every 2 minutes when we had a single display at Zumberge. Well, now that we have 8 or 9 displays running it in Mary I, that’s a lot of requests. Especially since those displays don’t actually use anything in that JavaScript and CSS file! So I changed the script to not load those assets when the map is in kiosk mode. (They are still loaded if you visit the page directly.) I also changed the automatic refresh to every 3 minutes for kiosks, and every 15 minutes for regular loads. If you need it to refresh more if you’re not running kiosk - click the refresh button in your browser. :)

I’ll continue to monitor the traffic over the weekend to see how we’re doing, although historically there hasn’t been as much activity on the weekends. Monday will be the real test, and then the hosting computer we lease our servers from will be upgrading the version of PHP, the language our tools are written in, on Tuesday, which means we’ll likely be scrambling to fix things for yet another reason. I’ll update you as we know more!