Requirements
Software requirements
Frontend (the redirector):- Apache HTTP server 2.2.x or newer
- libGeoIP, libapr_memcache, mod_memcache , and mod_form
- memcache daemon
- MySQL server. The tables must be InnoDB tables, because only that engine offers row-based write locks. Due to optimizations of InnoDB engine for high performance it makes sense to have a separate MySQL instance for this database. Postgresql should also work, but it hasn't been tested.
- Perl for the scanner process
- optional: Python, python-mysql, python-sqlobject for the mirror probe and for database maintenance
Hardware requirements
File storage is attached to the webserver. (Running the redirector without attached file storage is a feature which is not implemented, but considered.) The openSUSE project currently hosts > 700.000 files using 850 GB. The webserver needs few computational resources. If it has other tasks, besides redirecting, those other tasks mainly determine the needed resources. However, to handle high amounts of redirects, like hundreds per seconds, it is recommended to run Apache in a hybrid prefork/worker configuration, with e.g. 32 threads per process, which results in a good pooling of database connections.
The most computational resources are needed by the database server. For large file trees, they can be considerable, like the openSUSE project, which redirects for a total of > 500.000 files. For performance reasons, the database server must be able to hold the database and indices completely in memory. The openSUSE redirector database is currently served by a 4-way Xeon 3.4Ghz with 4 gigs of Ram, which is sufficient for the mysql server itself, as well as 12 parallel scanner processes, and can handle 1000-2000 requests per second.
HA (High Availability) setup
For HA, the webserver, the database server and the connecting infrastructure needs to be redundant. A geographically distributed array of redirectors would be one way. Locally, it can be achieved by creating a hot standby for failover, or by running identical nodes with load sharing / balancing. This could be implemented by
- deploying a hardware load balancer, which distributes requests to webserver nodes, or using clusterip on the webserver nodes themselves to make them do load sharing
- two or more webserver nodes with identical setup for ease of maintenance
- running a one or more mysql servers in slave configuration. Database queries could be split so that write requests go only to the master, while read requests go to master and slaves.
- mysql-proxy does load balancing and r/w-splitting
Obviously, there are different ways achieving HA, which will vary to local requirements.