How to manage thousands visitors, Part III. www.wikipedia.org
Following the articles of How to manage thousands visitors, I will now write about a very very big site which uses MySQL and PHP5. Also it is so big that handle about 10,000 and 30,000 page requests per second on a normal day. Yeah a very very huge traffic.
It do about a traffic of about 864,000,000 to 2,592,000,000 request per day. A great number of traffic don’t you think?
I am talking about Wikipedia, the great open Enciclopedia.
Wikipedia runs on dedicated clusters of Linux servers in Florida and in four other locations. Wikipedia employed a single server until 2004, when the server setup was expanded into a distributed multitier architecture. In January 2005, the project ran on 39 dedicated servers located in Florida. This configuration included a single master database server running MySQL, multiple slave database servers, 21 web servers running the Apache HTTP Server, and seven Squid cache servers. By September 2005, its server cluster had grown to around 100 servers in four locations around the world.
Page requests are first passed to a front-end layer of Squid caching servers. Requests that cannot be served from the Squid cache are sent to load-balancing servers running the Linux Virtual Server software, which in turn pass the request to one of the Apache web servers for page rendering from the database. The web servers deliver pages as requested, performing page rendering for all the language editions of Wikipedia. To increase speed further, rendered pages for anonymous users are cached in a filesystem until invalidated, allowing page rendering to be skipped entirely for most common page accesses. Two larger clusters in the Netherlands and Korea now handle much of Wikipedia’s traffic load.
Overall system architecture
- 89 machines in Florida, 11 in Amsterdam, 23 in Yahoo!’s Korean hosting facility.
- The master database servers run MySQL and stores article metadata.
- Text is stored on separate database instances running on Apache servers, to avoid consuming expensive database disk space.
- The Apaches are running identically-configured Apache web servers. The Apache servers accept requests from users, get data from the database if necessary, and format the requests back to the users, by running the MediaWiki software implemented in PHP with the APC PHP cache (our experience). They share their work directories by NFS, so uploads etc. should remain quite in sync.
- The Squid systems maintain large caches of pages, so that common or repeated requests don’t need to touch the Apache or database servers. They serve most page requests made by visitors who aren’t logged in. They are currently running at a hit-rate of approximately 75%, effectively quadrupling the capacity of the Apache servers behind them. This is particularly noticeable when a large surge of traffic arrives directed to a particular page via a web link from another site, as the caching efficiency for that page will be nearly 100%. See cache strategy for more details.
Share and Enjoy:
These icons link to social bookmarking sites where readers can share and discover new web pages.
June 29th, 2007 at 7:21 pm
[…] Saddor :: Cesar D. Rodas ยป How to manage thousands visitors, Part III. www.wikipedia.org (tags: architecture) Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. […]