Understanding and Defining the Problem
There are numerous ways that you can use this book. It is designed to be readable from cover to cover, while also being usable as a reference. Whether you have a specific performance problem you are trying to solve, or you're researching options for improving the general performance of your website, this book will prove helpful. It can also prove useful to someone making a decision as to whether or not Drupal can scale sufficiently for an upcoming project.
This first chapter is aimed at someone that has been tasked with making general performance improvements to their website, helping you to better understand what you need to accomplish and why you need to accomplish it. Rather than randomly finding problems and fixing them in the order they're found, you will review your entire website and identify all of the areas where there are significant performance and scalability problems. You will then prioritize these problems based on the potential gains, as well as on the size and complexity of the planned solution. Finally, you will begin focusing on the "lowest hanging fruit", often solving the simplest problems first and quickly realizing measurable performance improvements.
Goals versus Requirements
More often than not, there is a specific reason that you have begun to focus on improving the performance of your website. This reason may be a technical problem that needs solving, such as a database server that fails when you are linked to by popular websites like Slashdot and Digg. Or, your current interest in website performance may be business driven, a task that was handed down the management chain to make all pages on your website load within 2 seconds. Either way, it is important to understand the tasks that need to be accomplished, and to distinguish which tasks are requirements, and which tasks are goals.
A requirement is something that absolutely must be accomplished, while a goal is something that it would be nice to accomplish. Using the above examples, if your database server is failing whenever your website gets too busy, it is reasonable to consider solving this problem a requirement. On the other hand, if you are tasked with achieving sub-two-second page load times, this is more likely to be classified as a goal. You can often achieve sub-two-second page load times for the majority of visitors to most of your pages, but for a variety of reasons it is not always possible to achieve specific page load times for all visitors of all web pages. It is important to set realistic expectations.
Performance and Scalability Checklist
The following lists will help you better define the areas of your website that need to be improved, and to better understand what is driving this need for improvement. They are logically grouped into multiple sections. Review each section to determine which are applicable to your current project, and then work through your selected lists, thoroughly documenting the goals and requirements for your upcoming project. There is a temptation to skip this step and charge head first into actually making changes, however until you define your goals you have no way to measure your progress, and you may end up trying to fix things that aren't even broken.
Can you quantify the performance improvements you would like to make to your website? Work through this section, clearly listing your quantitative goals to the best of your ability.
- Average page load times: Are your web pages loading too slowly? Are users complaining of slow page loads? Are pages slow for anonymous visitors, or logged in users, or both? What are your targeted page load times for each?
- Maximum page load times: Do most of your we pages load in a reasonable amount of time, while some pages take an abnormally long time? Are the same pages always slow, or does it seem to be more random than that? What is your current maximum page load time? What would be an acceptable maximum page load time? What would be an optimal maximum page load time?
- Page load times for first time visitors: Do you need to make a good impression on first time visitors to your website? How long does it take someone to load your web page if they've never visited it before, and they don't have any of your page elements loaded in their web browser cache?
- Number of monthly page views: How many page views has your website see on each of the previous six months? Have you launched any new advertising campaigns or made any significant announcements that you expect to result in more website traffic? What are your targeted number of page views for each of the next six months? What are you basing this projected growth on?
- Number of monthly anonymous visitors: What percentage of your traffic is anonymous visitors that do not have user accounts or choose to not log in? Where have these anonymous visitors come from in the past? What is your targeted number of anonymous visitors for each of the next six months?
- Number of monthly logged-in visitors: What percentage of your traffic is logged-in users? How much does your logged in traffic increase from month to month? What is your targeted number of logged-in users for each of the next six months?
- Number of subscriptions: Is your website subscription oriented? How do subscriptions differ from normal users? How many subscriptions have you seen in each of the previous six months? What are your targeted number of subscriptions for each of the next six months?
- The time it takes to submit content: How long does it currently take to submit a new story? How long does it take to submit a new comment? Are you using free tagging? How many seconds is an acceptable amount of time for submitting new content? How many seconds is an optimal amount of time for submitting a new content?
Are there business needs driving your current performance and scalability efforts? Work through this section of the checklist to fully define these business needs.
- Growth rate: Is there a general business drive to increase the monthly growth rate of your website? How is this growth rate being measured? What is the targeted growth rate? How does this growth rate compare with past growth rates? Is the planned growth rate realistic?
- Advertisement impressions: Does your business model depend on selling a certain number of advertisements on your website? Is online advertising new to your business, or is it an ongoing source of income? Do you plan to sell the same number of advertisements each month, or do you plan to regularly increase the number of advertisements you are selling? Are you managing the ads in-house, or are you using a third-party advertising network?
- Partnerships: Are you partnering up with another popular website, and expecting a significant increase in web traffic? Will this traffic be mostly anonymous visitors, or mostly logged in users? How much traffic does your partner website see?
Risk Management Goals
Is your website a critical component of your business? Are you currently unable to take regular backups, or unclear even what data needs to be backed up? Work through this section to define what is acceptable data loss, setting goals and requirements for your upcoming performance and scalability efforts.
- High availability: How fault tolerant is your existing infrastructure? What happens if your primary database server fails? What happens if a web server fails?
- Minimizing down time: What is the most downtime your website has already experienced? What was the effect of this downtime? What are the consequences if your website is down for too long? What qualifies as downtime? What is your budget for building a fault tolerant infrastructure? How much downtime would be acceptable, and is it measured in seconds, minutes, hours, or days? How many much downtime would be catastrophic to your business, and is it measured in seconds, minutes, hours, or days?
- Fast data recovery: Where do you store your backups? How often are you taking backups? How many copies of backups do you retain? Have you ever tried restoring data from your backups? How long did it take? If you something happened to your database, how long can you afford to recover data from a backup?
- Survival after catastrophic failures: Do you have a plan if a hurricane, earthquake, or explosion wipes out your data center? Do you keep a copy of your data at a completely separate physical location? If using an online backup solution, have you confirmed that their servers are actually in another data center? How long would it take you to build an entire new infrastructure?
What other needs are driving your performance efforts? Reviewing the following goals, and try to come up with some more of your own.
- Auditing current site performance: Do you currently not have a good idea of the performance of your website? Are you looking for ways to better understand how your site is currently performing, in order to understand what needs to be improved, if anything?
- Solve specific known performance bottlenecks: Do you know exactly where the problems are with your website? Are you receiving complaints from website users, or from management? Can you duplicate the reported problems? Do you have a general idea of what is causing the problems? How can you measure the known performance bottlenecks?
- Improve scalability: Are you expecting to outgrow your existing infrastructure? Do you know how much traffic your current infrastructure can handle? Do you have a budget to add additional servers to your network? Do you need to make due with the hardware you have?
- Contributing back to Drupal: Have you solved some performance issues in ways that you think would be useful to other Drupal users? Would you like to be recognized for contributing code and documentation back to the Drupal project? Would you like to see your improvements merged into Drupal's core code so when you upgrade in the future you don't have to keep solving the same problems?