
mod_asn looks up the AS and network prefix of IP address.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mod_asn is an Apache module doing lookups of the autonomous system (AS) and the
network prefix that an IP address is contained in.

It is written with scalability in mind. To do high-speed lookups, it uses the
PostgreSQL ip4r datatype that is indexable with a Patricia Trie algorithm to
store network prefixes.

It comes with script to create such a database and update it with snapshots from
router's "view of the world".

The module sets the looked up data as env table variables, for use by other
Apache module to do things with it, or for logging -- and it can add the data
as response headers to the client.


Example HTTP response headers:

HTTP/1.1 200 OK
Date: Thu, 12 Feb 2009 23:24:33 GMT
Server: Apache/2.2.11 (Linux/SUSE)
X-Prefix: 83.133.0.0/16
X-AS: 13237



Performance
~~~~~~~~~~~

The database with all ~250.000 prefixes is about 20-30MB in size in the form of
a PostgreSQL database. Without any tuning, it is able to to >3000 lookups per
second on a MacBook Pro (tested with random IPs, a single connection, and
client written in Python running on the same machine).

The Apache module is extremely lightweight. 



Design notes
~~~~~~~~~~~~

Performed with a Patricia Trie algorithm, the lookup is very efficient. The
Patricia Trie is a radix tree that works it way from bit to bit, starting at
the most significant bit. At each bit, there are two alternative "paths". Or
put another way, the space of prefixes is roughly divided in two halfs at each
point. The ip4r datatype achieves this by implementing an index that works this
way. Without the index, a full table scan would be required, plus bitmask
prefix match for each of the ~250.000 candidate rows.

"Conventional" storage in databases is possible with a workaround, e.g. with
two long integers denoting each prefix in a MySQL database. But this would
require an SQL "between" query. An additional column would be needed to store
the prefix length, in order to find the closest match (the most narrow prefix).
The built-in inet/cidr data type in PostgreSQL doens't help either because it
can't be indexed. With conventional methods, only about 30 lookups per second
can be achieved with a database.

Having the data in a real database makes it accessible for other means as well;
it is easily possible to query it the list of prefixes that an AS announces,
for instance. In addition, the storage in the database offers the possibility
to change and update the data (or even completely replace it) in a simple way,
by doing this in transaction, without blocking running queries.

For usage outside of Apache, a small libpq-based standalone daemon could be
written that queries the database. Alternatively, a small handler could be
written for mod_asn that does nothing than read an IP address from a request
body (or URL) and return the result.

One argument for the ip4r data type in PostgreSQL is that it is IPv6-ready.
Some IPv6 autonomous systems already exist (about 800 as of the beginning of
2009).


Usage with MirrorBrain
~~~~~~~~~~~~~~~~~~~~~~

mod_asn can support mod_mirrorbrain (see http://mirrorbrain.org).
mod_mirrorbrain can use the data (set in the subprocess environment) for its
mirror selection algorithm.

In addition, the database can be queried with the MirrorBrain tool set:

 # mb iplookup mirror.susestudio.com
130.57.19.0/24 (AS3680)
 # mb iplookup mirror.susestudio.com --all-prefixes
130.57.19.0/24 (AS3680)
130.57.0.0/16, 130.57.0.0/20, 130.57.19.0/24, 130.57.32.0/21, 137.65.0.0/16,
147.2.0.0/17, 151.155.0.0/16, 164.99.0.0/16, 192.31.114.0/24, 192.94.118.0/24,
192.108.102.0/24, 192.149.26.0/24, 195.109.215.0/24, 212.153.69.0/24


