You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The hash cache is too inflexible, in its current on-disk format. It was fine in the past,
where Apache included the ocntents into v3 Metalinks. The snippets on disk were prepared
just for that. However, it's difficult to add further features like
hashes in HTTP headers
inclusion of hashes into RFC Metalinks (different format)
inclusion of hashes into the mirror lists
building a "hash server" (append .md5 to any URL and get the md5 sum)
So this is blocking several good things that could be done.
Issue 15 contains some ramblings about this, but let's track this change here.
I currently think that moving the hash into the database might be best. It would definitely
a flexible option without the need to invent an on-disk format and write parsers for it.
Also, it would make the data available to a web frontend.
Before the on-disk format is dropped, we can try how well it works with the database.
As a first step, I have now transferred all functionality from the external metalink-hasher
script into the "mb" tool. Thus, now the database functionality is available for no cost.
In svn trunk, there is now working code that saves the hashes also to the
database. Seems like a good step forward. The code needs more testing to become
robust enough to be used by mod_mirrorbrain.
Code in metalink-hasher seems to work well, and creates hashes in the
database in addition to the on-disk storage which we keep available for
transition.
The new hashes in the database are not cleaned up yet, if they become
obsolete. Maybe "mb db vacuum" should become involved in the cleanup,
but it would need to look into the file tree for that. It's probably
needed to let mb makehashes clean up per directory. Otherwise files
could very quickly accumulate.
mod_mirrorbrain uses the new hashes from the database and falls back to
on-disk hashes for transition. The new hashes are already used in old
Metalinks, new Metalinks, and also in the mirror lists!
What's also missing is a way to switch off (or on) (per /etc/mirrorbrain.conf)
generation of the "expensive" hashes, like torrents and zsync. Maybe with a file
mask or list of directories.
Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue40
msg135 (view) Author: poeml Date: 2010-03-08.20:44:36
The hash cache is too inflexible, in its current on-disk format. It was fine in the past,
where Apache included the ocntents into v3 Metalinks. The snippets on disk were prepared
just for that. However, it's difficult to add further features like
So this is blocking several good things that could be done.
Issue 15 contains some ramblings about this, but let's track this change here.
I currently think that moving the hash into the database might be best. It would definitely
a flexible option without the need to invent an on-disk format and write parsers for it.
Also, it would make the data available to a web frontend.
Before the on-disk format is dropped, we can try how well it works with the database.
As a first step, I have now transferred all functionality from the external metalink-hasher
script into the "mb" tool. Thus, now the database functionality is available for no cost.
msg148 (view) Author: poeml Date: 2010-03-10.00:17:41
In svn trunk, there is now working code that saves the hashes also to the
database. Seems like a good step forward. The code needs more testing to become
robust enough to be used by mod_mirrorbrain.
msg150 (view) Author: poeml Date: 2010-03-11.23:53:05
This is largely done.
Code in metalink-hasher seems to work well, and creates hashes in the
database in addition to the on-disk storage which we keep available for
transition.
The new hashes in the database are not cleaned up yet, if they become
obsolete. Maybe "mb db vacuum" should become involved in the cleanup,
but it would need to look into the file tree for that. It's probably
needed to let mb makehashes clean up per directory. Otherwise files
could very quickly accumulate.
mod_mirrorbrain uses the new hashes from the database and falls back to
on-disk hashes for transition. The new hashes are already used in old
Metalinks, new Metalinks, and also in the mirror lists!
msg160 (view) Author: poeml Date: 2010-03-12.02:57:44
Note to self: need to check whether empty files (0 byte size) are still
handled correctly, or if they need a special case.
msg182 (view) Author: poeml Date: 2010-04-23.03:03:42
What's also missing is a way to switch off (or on) (per /etc/mirrorbrain.conf)
generation of the "expensive" hashes, like torrents and zsync. Maybe with a file
mask or list of directories.
msg204 (view) Author: poeml Date: 2010-09-01.16:13:33
Generation of hashes for zsync and torrents can now be (separately) switched off
in /etc/mirrorbrain.conf.
For the zsync hashes, the default is "off", because Apache currently allocates
large amounts of memory for these large data.
On another note, empty files seem to be handled as they should.
Hence, I regard this bug resolved.
(end of migrated issue)
The text was updated successfully, but these errors were encountered: