Drupal scalability, Part 2

This is a follow up to a writeup I did last year. Read the previous article about Drupal scalability for the back story. In five months nearly nothing has changed in the production environment at work. What has changed is my understanding of disk performance, as it relates to databases, and the power of higher level caching.

I couple months ago I spent many hours looking at disk stats in Linux with the iostat utility. How many writes per second can a three disk RAID 5 array do? About the same as a single disk. As previously suspected, RAID 1 is used when there's two disks and RAID 5 otherwise, like in the datbase servers.

So how to we get more IOPS? There are really only two ways and they both cost money. A bigger RAID cache only works until it fills up, and sometimes when that happens everything stops while the cache is flushed out to disks, which brings us to option two. Get more disks. Company politics being what they are we still don't have a Gigabit network, we still don't have any more disks, and we're still serving NFS from the database server. Oh, and we're launching another website next week. What's a lowly developer to do?

Drupal and Memcached

Down my original list to #4, Squid has too great a learning curve to tackle in a day, so I decided to try out the Drupal Memcache module. The required PHP Memcache library from PECL and Memcached itself take a little bit of sysadmin time to install, but it's fairly easy. The problem is most of the websites on these servers are members only sites. That means we can't utilize Drupal page caching because nobody is anonymous. On the other hand this new website is a small all anonymous traffic site. The results are simply staggering.

Most relatively healthy Drupal sites seem to serve pages in 300-600ms. With Memcache and Drupal page caching enabled our response time dropped from 500ms to 50ms (local testing excluded network delays). Not only that but we can serve 10-15 times the number of concurrent requests and the response time only rises to 500ms.

Content Delivery Networks

Response time isn't everything. Pull out tools like YSlow and Page Speed and you'll see that a user's browser spends a lot of time downloading resources to display a web page. Using a Content Delivery Network (CDN) can help, sometimes significantly. Our little trick is to modify the site theme to prefix Javascript and CSS references with the CDN base url, which is setup for Origin Pull from the site. Since the CSS background images are relative URLs, those images are also served from the CDN. For those who have the luxury of Origin Pull, it's an extremely simple method to get some benefit from their CDN.

By using Memcached, Drupal page caching, CSS and Javascript aggregation served from a CDN we're able to serve a super-fast site from already overloaded servers. This new web site is Yoga with Rodney Yee. We even managed to score an A in YSlow!

For all the other non-anonymous sites the problems remain. The saga continues...