Friday, October 31, 2008

Always prepare for high amounts of traffic

Today at work I did some brainstorming about how I can make our website more efficient for high traffic. My company expects about 50,000 visitors by the end of the year so I wanted to figure out the BEST possible way to execute the database queries on the site.

For each ad on our site there is a counter that counts how many page views the ad has. So every time an ad page is visited it requires a select statement and an update statement (to update the page view count).

On a normal website, a solution like this may be fine:

- select * from table where id='123'
- update table set views='4'

But with a high traffic site, things should be done differently if you want it to run as fast as possible. See a MySQL InnoDB table uses row level locking which means nobody can access that particular row while it is being updated.

On a high traffic website, the above solution would do this:

-User A requests ad page.
-User A's page loads and is currently updating row 5 of the ads table.
-Meanwhile user B requests that same ad page. But he/she must wait until User A is done updating row 5 of the ads table.

Not good. On a very high traffic website there would be a lot of waiting. But there is a different way. Instead of using a column in the ads table to hold the pageview count, create a table that is only for holding pageview counts. This way, while the pagecount table is being updated, it doesn't get in the way of the select statements done on the ads table.

The second solution would act like this on a high traffic web site:

-User A requests ad page.
-User A's page loads.
-At the very end of the page, an update statement is executed on the pageview table. -Meanwhile, User B requests that same ad page. The page loads immediately because that row in the ad table is not locked out due to the update in the pagecount table instead of the ads table.

Much better. No wait time, no locked rows. The content is delivered as fast as possible to the user.

You may think, well how do you update the page count without first executing a select statement to figure out what the CURRENT page count is? After all, how can we update the page count to plus 1 if we don't know what it is already?

Well, the answer to that is the following:

Update pageviews set views=views+1 where id='5'

So now we have a fast and efficient high traffic web site serving content uninterrupted to users.