The openSUSE download redirector The openSUSE download redirector (a.k.a. the MirrorBrain) automatically redirects clients (per HTTP redirection) to a mirror server near them. It works similar to the systems employed by sourceforge.net, mozilla.com or similar large organizations, which face a number of download requests which is too high to be practically handled by a single site. To find a mirror close to the client, the redirector employs geolocation of the client's IP address. If several mirrors are suitable, the redirector load-balances requests to the mirrors based on their capabilities. Implementation: The core of the redirector is mod_zrkadlo, a module for the Apache HTTP server, written in C, and designed for high performance and scalability, with security in mind. mod_zrkadlo is pronounced "mod zurrcat low". Zrkadlo is Slovakian for mirror. Due to the fast-evolving nature of the file tree offered by openSUSE project, the redirector doesn't simply choose one mirror for a client once, but acts as granular as on file-level, because mirrors are known to be incomplete, especially if content changes often. To achieve this, the redirector is supported by an SQL database which knows the exact contents of each mirror. The database is periodically updated by scanning all mirrors with a scanner program. In addition, there is a probing program which intermittently checks each mirror for responsiveness, and which can disable or pause redirection to a certain mirror, should it fail. Features: - works transparently to the client, through HTTP redirection - can optionally return a metalink (http://metalinker.org), or human readable mirror list - operates with file level granularity - involves only a single database query per HTTP request, using a database connection pool through the Apache DBD framework - mirror choice per country / continent, using GeoIP database - uses a randomized, weighted algorithm for mirror selection (each mirror having a score) - optionally memorizes client<->mirror association through memcache daemon - can make sure that mirrors get only requests from the same country or region (important for countries with poor internet connectivity) - mirrors can be special catch-all type, to integrate content delivery networks - is configurable in Apache style configuration, with automatic per-directory configuration merging - optionally redirects dependent on file name pattern, file size, mime type, user agent, request origin, ... - flexible logging options - has a debug mode which can be enabled directory-wise, and thus is "compatible" with running production - the client IP address can be overridden for diagnostic purposes - canonicalizes file pathnames before database lookup, so the database needs to hold only real files, and is not blown up by symlinks. So how does the redirector Apache module work? This page http://en.opensuse.org/Build_Service/Redirector shows pseudocode which gives an outline how it works. Software requirements: Frontend (the redirector): - Apache HTTP server 2.2.x or newer - libGeoIP, libapr_memcache, and mod_form - memcache daemon Backend (rest of the MirrorBrain framework): - MySQL server. The tables must be InnoDB tables, because only that engine offers row-based write locks. Due to optimizations of InnoDB engine for high performance it makes sense to have a separate MySQL instance for this database. Postgresql should also work, but it hasn't been tested. - Python, python-mysql, python-sqlobject for the ping process and database maintenance - Perl for the scanner process There is a small mirror administration web frontend, built upon the TurboGears framework, but its development has just started. Hardware requirements: File storage is attached to the webserver. (Running the redirector without attached file storage is a feature which is not implemented, but considered.) The openSUSE project currently hosts > 700.000 files using 850 GB. The webserver needs few computational resources. If it has other tasks, besides redirecting, those other tasks mainly determine the needed resources. However, to handle high amounts of redirects, like hundreds per seconds, it is recommended to run Apache in a hybrid prefork/worker configuration, with e.g. 32 threads per process, which results in a good pooling of database connections. The most computational resources are needed by the database server. For large file trees, they can be considerable, like the openSUSE project, which redirects for a total of > 500.000 files. For performance reasons, the database server must be able to hold the database and indices completely in memory. The openSUSE redirector database is currently served by a 4-way Xeon 3.4Ghz with 4 gigs of Ram, which is sufficient for the mysql server itself, as well as 12 parallel scanner processes, and can handle 1000-2000 requests per second. HA (High Availability) setup: For HA, the webserver, the database server and the connecting infrastructure needs to be redundant. A geographically distributed array of redirectors would be one way. Locally, it can be achieved by creating a hot standby for failover, or by running identical nodes with load sharing / balancing. This could be implemented by - deploying a hardware load balancer, which distributes requests to webserver nodes, or using clusterip on the webserver nodes themselves to make them do load sharing - two or more webserver nodes with identical setup for ease of maintenance - running a one or more mysql servers in slave configuration. Database queries could be split so that write requests go only to the master, while read requests go to master and slaves. - mysql-proxy does load balancing and r/w-splitting Obviously, there are different ways achieving HA, which will vary to local requirements. Links: http://en.opensuse.org/Build_Service/Redirector http://www.poeml.de/~poeml/talks/apachecon08-mirrors.pdf http://www.poeml.de/~poeml/talks/redirector/ https://forgesvn1.novell.com/svn/opensuse/trunk/tools/download-redirector-v2 http://www.maxmind.com/app/ip-location http://www.linux-ha.org/ClusterIP http://forge.mysql.com/wiki/MySQL_Proxy http://www.stdlib.net/~colmmacc/Apachecon-EU2005/scaling-apache-handout.pdf Acknowledgement This product includes GeoLite data created by MaxMind, available from http://maxmind.com/