Chapter 3: Performance Configuration

This chapter introduces Drupal's built-in performance features. It explains how Drupal's built-in page cache works, and details how it can be configured. The chapter also discusses Drupal's built-in CSS and JS aggregation and compression. The importance of regularly purging Drupal's logs will be discussed. And finally, the chapter will explore Drupal's throttle module.

Section 1: Performance Configuration

There are many things you can do to improve the performance and scalability of a Drupal powered website. Before adding or upgrading servers, applying performance oriented patches, or any of the many other topics of varying complexity that will be discussed in this book, you should first enable all of Drupal's relevant built-in performance options.

Page Cache

Find Drupal's performance configuration options by navigating to the Performance page in the Site Configuration section of your website's administration pages. When the page cache is enabled, Drupal will save a fully rendered copy of each page accessed by anonymous visitors in the cache_page database table. When the same page is subsequently visited by the same or another anonymous user, the pre-rendered, cached copy is quickly and efficiently served directly out of the cache_page table. This cached copy will not be served to logged in users because pages are usually customized for logged in users. As most public web pages see significantly more anonymous traffic than logged in traffic, enabling the page cache generally results in a very significant performance improvement.

Drupal's page cache only caches pages accessed by anonymous visitors utilizing the HTTP GET method.

Caching Mode

The page cache has three modes, disabled, normal and aggressive. Drupal has many built in caches, but most can not be disabled. It is a common misconception to assume that when you set the cache mode to disabled that you are turning off all caches, but this is not the case -- you are only disabling the page cache.

In the code, the three cache levels are defined as constants in the includes/bootstrap.inc include file. These constants are CACHE_DISABLED, CACHE_NORMAL, and CACHE_AGGRESSIVE. When the Drupal page cache is enabled, whether it is in normal mode or aggressive mode the same content is cached for anonymous visitors. The primary difference between these two cache modes is that Drupal does not invoke the _boot() or _exit() hooks defined by some modules when in aggressive mode.

The first time a page is visited by an anonymous visitor, Drupal includes all necessary modules and invokes a series of functions, hooks and database queries in these modules to generate the page. If the Drupal page cache is enabled, whether normal or aggressive mode, this resulting output will generally be stored in the page_cache database table. As to how this actually happens, the last line of index.php calls the function drupal_page_footer() which is defined in includes/common.inc. This function calls page_set_cache() in the same file where logic checks if the page is being served to an anonymous visitor using the HTTP GET method, and that there haven't be any Drupal messages set in the current session. If these three conditions are true, Drupal invokes PHP's built in ob_get_contents() function which returns a complete copy of the current page which is in PHP's buffers. This content may optionally be compressed depending on your configuration, and then ob_end_flush() is invoked telling PHP to flush its buffers, actually sending the generated page to the remote web browser. Finally, a call is made to cache_set() which save a complete copy of the generated page into the cache_page database table for future reuse.

The next time this exact same URL is visited by the same or a different anonymous visitor, the already generated copy of the page is efficiently retrieved from the cache_page database table, bypassing the need to include all the modules and regenerate the page. Once again starting in index.php, toward the beginning of the file there is a call to the drupal_bootstrap() function which is defined in includes/bootstrap.inc. This bootstrap function loops step by step through a series of "phases".

The first phase, DRUPAL_BOOTSTRAP_CONFIGURATION, locates and reads the correct settings.php configuration file, initializing Drupal's configuration array. The second phase, DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE, reads the cache_inc variable do determine which cache handler should be used. By default Drupal uses its own core includes/cache.inc cache handler which stores cache data in the database, but it's also possible to use a contributed handler which stores cache data elsewhere, such as in memcache. This second phase also provides a fastpath mechanism for simply displaying the cached copy of the page and exiting without invoking any further phases. The third phase, DRUPAL_BOOTSTRAP_DATABASE, opens a connection to the database. The fourth phase, DRUPAL_BOOTSTRAP_ACCESS, checks if the IP address of the remote host has been banned by the site administrator, and if so displays a terse explanatory message and exits. The fifth phase, DRUPAL_BOOTSTRAP_SESSION, loads the session data into memory. And finally, in the sixth phase, DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE, Drupal calls the page_get_cache() function to load the cached page from the cache_page table. If a valid copy of the page exists in the cache, no further bootstrap phases are invoked.

When in normal page caching mode, the sixth bootstrap phase will first invoke the _boot() hook in all enabled modules where it is defined. Then, it will send the actual cached page to the remote web browser of the anonymous visitor. Finally, it will invoke the _exit() hook in all enabled modules where it is defined, and then Drupal will exit.

When in advanced page caching mode, neither the _boot() nor _exit() hooks are invoked, which means Drupal does not have to include these module files when displaying the cached page and can instead simply send the cached page to the remote web browser of the anonymous visitor.

In Drupal 6, the statistics module and the throttle module are the only two modules that define the _exit() hook. No core modules define the _boot() hook. In the statistics module the _exit() hook is used to count how many times each node is viewed and to update the access log. In the throttle module the _exit() hook is used to detect surges in site traffic and enable or disable the automatic throttling mechanism. If you put Drupal into advanced page caching mode the _exit() hooks are not invoked so none of this functionality will work.

Minimum Cache Lifetime

Configuration of the minimum cache lifetime is found in the page cache section of the performance administration page, however it actually affects all of Drupal's caches. In regards to the page cache, the idea is to ensure that some benefit is gotten from caching generated pages. By default Drupal only caches page content as long as it is known to be valid. As soon a new comment or node is posted or updated the entire page cache has to be flushed as there is no way to determine which pages are affected by the changed content.

The minimum cache lifetime enforces a configurable amount of time that any given page will live in the cache even if new content is posted during that time. The longer pages live in the cache, the higher the "hit rate" and thus the more effective the cache can be.

Enforcement of the minimum cache lifetime happens globally on a per cache table basis, so once any new user posts or updates content the countdown to flushing the page cache begins. A variable is also tracked in each user's session when they post new content, simulating a cache flush only for these users. This allows anonymous users to see their own comments immediately when posted rather than waiting for the page cache to first expire and be flushed. When any page is regenerated for a specific user that has posted new content, this new version of the page will update the version in the cache.

When trying to scale a website, the minimum cache lifetime should be enabled and set to the largest time you are willing to make anonymous visitors wait before seeing newly posted content. When determining how long this is, remember that anonymous users will still see content that they have posted themselves immediately.

Page Compression

When this option is enabled, cached pages are compressed with gzip before they are stored in the cache_page database table. Then, when these pages are served to anonymous visitors Drupal confirms that the remote client supports gzip encoded pages, and if so it quickly serves the pre-compressed page to the remote client. Fortunately most web browsers do support gzip encoded pages. For the few that do not, Drupal will uncompress the cached page before sending it the remote client.

Actual compression of cached pages happens in includes/common.inc in the function page_set_cache() with the following code:

  $data = gzencode($data, 9, FORCE_GZIP);

When serving cached pages, Drupal detects whether or not the remote client supports gzip encoding in includes/bootstrap.inc in the function drupal_page_cache_header() with the following code:

  if (@strpos($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip') === FALSE && 
  function_exists('gzencode')) {

In the rare case where the remote client does not support gzip encoding the page is uncompressed in the same function with the following code:

  cache->data = gzinflate(substr(substr($cache->data, 10), 0, -8));

Block Cache

While the page cache offers impressive performance gains for anonymous users, it does not improve performance for logged in users. The block cache is a new feature in Drupal 6 which improves performance for logged in users. There are several different ways that the block cache can cache blocks, all controlled by module developers when creating the blocks. By default, when the block cache is enabled one copy of the block is cached per role.

The various block caching modes are defined in modules/block/block.module. Available caching modes are BLOCK_NO_CACHE, BLOCK_CACHE_PER_ROLE, BLOCK_CACHE_PER_USER, BLOCK_CACHE_PER_PAGE, and BLOCK_CACHE_GLOBAL.

When a block sets BLOCK_NO_CACHE, the block will not ever be cached. This cache mode is generally used either when the cache is so simple that it's more efficient to regenerate it each time it is displayed, or when the block changes so frequently that there's no benefit from caching it.

As noted earlier, BLOCK_CACHE_PER_ROLE is the default mode, and means that multiple versions of the block will be cached, one for each role. The BLOCK_CACHE_PER_USER mode means that a unique version of the block will be cached for each user. The BLOCK_CACHE_PER_PAGE tells Drupal to cache a unique version of the block for each page it is displayed on. And finally, the BLOCK_CACHE_GLOBAL mode means to cache a single version of the block displayed on all pages to all users and all roles.

The cache modes are defined in the code as bitwise flags allowing devlopers to set multiple cache modes. For example, the core profile.module defines an 'Author information' block which sets two flags, both BLOCK_CACHE_PER_PAGE and BLOCK_CACHE_PER_ROLE. This means that a unique version of the block is generated on each page and for each role viewing that page. The core book.module sets the same two block caching flags for the 'Book navigation' block.

When programming Drupal modules, you can control the block cache in hook_block() when defining your block. In the following example, we configure our block to be cached on a per page and per roles basis:

function example_block($op = 'list', $delta = 0, $edit = array()) {
  if ($op == 'list') {
    $blocks[0]['info'] = t('Example block');
    $blocks[0]['cache'] = BLOCK_CACHE_PER_PAGE | BLOCK_CACHE_PER_ROLE;
    return $blocks;
  }
  else if ($op == 'view') {
    // Output the actual block here.
  }

Bandwidth Optimizations

The next section on the Performance administration page is titled bandwidth optimizations.

Optimizing CSS Files

Optimizing JavaScript Files

Section 2: Drupal Logs

The default core Drupal distribution has two primary logs. Each of these logs can be quite useful in troubleshooting and developing your website, but as your website grows you need to tune these logs to prevent them from causing performance issues.

Watchdog Logs

Prior to Drupal 6 the core distribution included a module called the "watchdog". In Drupal 6 the module has been renamed to dblog, but the watchdog() logging function still retains the same name. As the name suggests, the dblog module writes logs to the database. The module's help text explains, "the dblog module monitors your website, capturing system events in a log to be reviewed by an authorized individual at a later time. The dblog log is simply a list of recorded events containing usage data, performance data, errors, warnings and operational information. It is vital to check the dblog report on a regular basis as it is often the only way to tell what is going on."

As your website grows, writing logs to the database can increasingly become a performance bottleneck. For this reason in Drupal 6 the core distribution also includes the syslog module which can replace the default dblog module, routing watchdog logs to the operating system's logging facility. As your website grows, syslog can offer significantly better performance. These logs are no longer easily viewed through a web browser with Drupal, however they cause significantly less performance overhead and they can be safely retained much longer. The syslog module's help text explains, "syslog is an operating system administrative logging tool, and provides valuable information for use in system management and security auditing. Most suited to medium and large sites, syslog provides filtering tools that allow messages to be routed by type and severity. On UNIX/Linux systems, the file /etc/syslog.conf defines this routing configuration; on Microsoft Windows, all messages are sent to the Event Log."

The Access Log

The Drupal Access Log is provided by the core statistics module. When this log is enabled, each time Drupal displays a page it logs the session ID of the user visiting the page, the title of the page visited, the internal Drupal path, the complete URL, the IP address of the visitor, the user ID of the visitor, the length of time it took Drupal to display the page, and a timestamp as to when the page was actually displayed. Drupal can then provide several useful administrative reports from this data.

As a website grows in popularity, the amount of data being written to the accesslog can quickly become an unnecessary performance bottleneck. It can help to configure Drupal to discard old access logs more frequently, minimizing the size of the accesslog database table. However, from a performance standpoint the best thing to do is to completely disable the statistics module and instead utilize the logs provided by your web browser.

Section 3: The Throttle Module

Background

The Drupal throttle module became a part of the core distribution in Drupal 4.1, released in February of 2003. It was originally written for the Drupal powered technical website, KernelTrap.org, which at the time was running on a single Pentium I CPU with a 100 megabit connection to the Internet. KernelTrap was receiving regular links from Slashdot.org, and the slow processor was simply unable to keep up with the load, succombing to a "Slashdotting".

The solution at the time was to ssh into the server and manually disable all modules which where not strictly necessary, until the link moved further down the Slashdot front page and the server could again keep up. The throttle module was written to automate this process, quickly detecting when the website came under a heavy load and automatically disabling unnecessary functionality, then re-enabling it when the load subsided.

As we will learn later in this chapter, the throttle module is little more than a bandaid, attempting to work around a problem rather than solving it.

Configuration

Modules

Blocks

Custom Integration

Why The Throttle Was Removed From Drupal 7