Issue15

Title add support for mirrors and checksums in Link headers
Priority wish Status chatting
Superseder Nosy List ant, poeml
Assigned To poeml Keywords

Created on 2009-10-09.00:49:17 by poeml, last changed by poeml.

Messages
msg220 (view) Author: poeml Date: 2010-09-05.23:52:30
A lot of groundwork for this has been done:

- issue 40 is resolved. The hash cache was moved from file-based into the 
database.
- all required tools are there.

Now it is just a matter of using the data and writing it to HTTP headers.

This should be made configurable maybe - because it causes Apache nee a little bit 
more resources, which may or may not be desired. (Of course, it would be cool if 
it just "happens", and the default should probably be that the headers are 
included.)
msg159 (view) Author: poeml Date: 2010-03-12.02:51:00
Issue 40 which was blocking this issue is mostly done.
msg136 (view) Author: poeml Date: 2010-03-08.20:46:21
Cf. issue #40, where the hash cache redesign is tracked now.
msg109 (view) Author: poeml Date: 2009-12-11.21:43:08
Additional thought about the hash store file format: An identifier and version 
number should added to the beginning.
msg108 (view) Author: poeml Date: 2009-12-11.21:41:21
Most of this can be easily added, there is one thing that I need to change first though:

At the moment, Metalink hashes are cached to disk in a form that is suitable for direct inclusion 
into Metalinks:

 # cat /srv/metalink-hashes/samba/srv/mirrors/samba/pub/samba/MIRRORS.txt.size_68
      <verification>
      	<hash type="md5">e8ad5924dcef6c25a3455230c46a4caa</hash>
      	<hash type="sha1">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
      	<hash type="sha256">2104ed8aa2f4af920c1669585eeaabb0c94ace6cb92e67cbd3ab04b2bb7356b5</hash>
      	<pieces length="262144" type="sha1">
            <hash piece="0">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
      	</pieces>
      </verification>



Or, an example with PGP signature:


      <verification>
        <signature type="pgp" file="openSUSE-11.2-NET-i586.iso.asc">
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQBK+ATuqE7a6JyACsoRArqBAJ0ViDK4IUQPKYz1qbXivJielVCkDACf
VCZ4fiIU8640lArqhzu9QuTRL0s=
=2F9I
-----END PGP SIGNATURE-----

        </signature>
        <hash type="md5">bfb98c4b2e079f9d147b53d3fc9495c5</hash>
        <hash type="sha1">8e5854c6e00b7a0f124c3060da4184e6d5f8d6b2</hash>
        <hash type="sha256">9de4a0b44f7c474929ece46481a783500078fb3b2f05b885069a74aff198fc7f</hash>
        <pieces length="524288" type="sha1">
            <hash piece="0">8fc60d0c4918bf53ad7858196633b05a4ca4b060</hash>
            <hash piece="1">a4860b0e708063900253be974e7ebcd9a50c660b</hash>
            [...]
            <hash piece="214">11ddc7c352d9f017f6c00822c3d4be540dc9ad38</hash>
            <hash piece="215">e5ec3ff4693175b7da90f3bc1fdf1ee5d7f3f20a</hash>
            <hash piece="216">897256b6709e1a4da9daba92b6bde39ccfccd8c1</hash>
        </pieces>
      </verification>


That was fine so far because Apache just has to open the file and can directly write it to the 
network, while sending the metalink. 

Now, we'll need access to the individual data in that snippet. Thus, the format of storing the data 
needs to be changed (or the XML parsed by Apache, but that sounds an ugly option). I'm thinking of a 
text-based format. It should be optimized for parsing with low overhead.


Maybe a simple series of null-terminated strings, and no newlines (because then we can store the 
multi-line PGP signature string without modifications):


hash <type> <hash string>\0
hash <type> <hash string>\0
hashpieces <type> <length>\0
hashpiece 0 <hash string>\0
hashpiece 1 <hash string>\0
hashpiece 2 <hash string>\0
pgp <signature string with embedded newlines>\0
EOF


Maybe, if looking around a bit in Apache, something else springs to eye which is suitable for 
reading in the data quickly.


Once the data can be read in once & quickly during the request processing phase, it's available to 
do the most wonderful things. In particular we can also easily implement a "checksum server" that 
returns a checksum for any file when .md5 or .sha1 or .sha256 is appended to an URL. And of course 
we can send instance digests, as requested here.
msg37 (view) Author: poeml Date: 2009-10-09.00:49:17
There is a proposal for transmitting information about mirrors and checksums to clients using Link 
headers which looks like:

  Link: <http://www2.example.com/example.ext>; rel="duplicate"
  Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
  Link: <http://example.com/example.ext.torrent>; rel="describedby";
type="application/x-bittorrent"
  Link: <http://example.com/example.ext.metalink>; rel="describedby";
type="application/metalink4+xml"
  Link: <http://example.com/example.ext.asc>; rel="describedby";
type="application/pgp-signature"
  Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=

See IETF draft http://tools.ietf.org/html/draft-bryan-metalinkhttp
History
Date User Action Args
2010-09-05 23:53:14poemlsetassignedto: poeml
2010-09-05 23:52:30poemlsetmessages: + msg220
2010-03-12 02:51:00poemlsetmessages: + msg159
2010-03-08 20:46:21poemlsetmessages: + msg136
2009-12-11 21:43:08poemlsetmessages: + msg109
2009-12-11 21:41:21poemlsetmessages: + msg108
2009-11-04 16:32:24antsetnosy: + ant
2009-10-09 00:49:17poemlcreate