Hash cache needs to be more flexible #40

poeml · 2015-06-05T00:55:09Z

                                                                                           [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue40

Title    Hash cache needs to be more flexible
 Priority   feature     Status    resolved
Superseder             Nosy List  ant, poeml
Assigned To poeml      Keywords

msg135 (view) Author: poeml Date: 2010-03-08.20:44:36

The hash cache is too inflexible, in its current on-disk format. It was fine in the past,
where Apache included the ocntents into v3 Metalinks. The snippets on disk were prepared
just for that. However, it's difficult to add further features like

hashes in HTTP headers
inclusion of hashes into RFC Metalinks (different format)
inclusion of hashes into the mirror lists
building a "hash server" (append .md5 to any URL and get the md5 sum)

So this is blocking several good things that could be done.

Issue 15 contains some ramblings about this, but let's track this change here.

I currently think that moving the hash into the database might be best. It would definitely
a flexible option without the need to invent an on-disk format and write parsers for it.
Also, it would make the data available to a web frontend.

Before the on-disk format is dropped, we can try how well it works with the database.

As a first step, I have now transferred all functionality from the external metalink-hasher
script into the "mb" tool. Thus, now the database functionality is available for no cost.

msg148 (view) Author: poeml Date: 2010-03-10.00:17:41

In svn trunk, there is now working code that saves the hashes also to the
database. Seems like a good step forward. The code needs more testing to become
robust enough to be used by mod_mirrorbrain.

msg150 (view) Author: poeml Date: 2010-03-11.23:53:05

This is largely done.

Code in metalink-hasher seems to work well, and creates hashes in the
database in addition to the on-disk storage which we keep available for
transition.

The new hashes in the database are not cleaned up yet, if they become
obsolete. Maybe "mb db vacuum" should become involved in the cleanup,
but it would need to look into the file tree for that. It's probably
needed to let mb makehashes clean up per directory. Otherwise files
could very quickly accumulate.

mod_mirrorbrain uses the new hashes from the database and falls back to
on-disk hashes for transition. The new hashes are already used in old
Metalinks, new Metalinks, and also in the mirror lists!

msg160 (view) Author: poeml Date: 2010-03-12.02:57:44

Note to self: need to check whether empty files (0 byte size) are still
handled correctly, or if they need a special case.

msg182 (view) Author: poeml Date: 2010-04-23.03:03:42

What's also missing is a way to switch off (or on) (per /etc/mirrorbrain.conf)
generation of the "expensive" hashes, like torrents and zsync. Maybe with a file
mask or list of directories.

msg204 (view) Author: poeml Date: 2010-09-01.16:13:33

Generation of hashes for zsync and torrents can now be (separately) switched off
in /etc/mirrorbrain.conf.

For the zsync hashes, the default is "off", because Apache currently allocates
large amounts of memory for these large data.

On another note, empty files seem to be handled as they should.

Hence, I regard this bug resolved.

History
         Date         User  Action              Args
2010-09-01 16:13:33 poeml set    status: testing -> resolved
                                   messages: + msg204
2010-04-23 03:05:37 poeml set    status: in-progress -> testing
2010-04-23 03:03:42 poeml set    messages: + msg182
2010-03-29 06:44:31 ant   set    nosy: + ant
2010-03-12 02:57:45 poeml set    messages: + msg160
2010-03-11 23:53:06 poeml set    messages: + msg150
2010-03-10 00:17:41 poeml set    messages: + msg148
2010-03-08 20:44:37 poeml create

(end of migrated issue)

The text was updated successfully, but these errors were encountered:

poeml added enhancement resolved labels Jun 5, 2015

poeml mentioned this issue Jun 5, 2015

add support for mirrors and checksums in Link headers (RFC 6249) #11

Closed

poeml closed this as completed Jun 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hash cache needs to be more flexible #40

Hash cache needs to be more flexible #40

poeml commented Jun 5, 2015

Hash cache needs to be more flexible #40

Hash cache needs to be more flexible #40

Comments

poeml commented Jun 5, 2015

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue40

msg135 (view) Author: poeml Date: 2010-03-08.20:44:36

msg148 (view) Author: poeml Date: 2010-03-10.00:17:41

msg150 (view) Author: poeml Date: 2010-03-11.23:53:05

msg160 (view) Author: poeml Date: 2010-03-12.02:57:44

msg182 (view) Author: poeml Date: 2010-04-23.03:03:42

msg204 (view) Author: poeml Date: 2010-09-01.16:13:33

(end of migrated issue)