Issue33

Title mirrorprobe: socket timeout doesn't trigger in some cases
Priority bug Status in-progress
Superseder Nosy List poeml
Assigned To poeml Keywords

Created on 2009-12-09.22:40:50 by poeml, last changed by poeml.

Messages
msg103 (view) Author: poeml Date: 2009-12-09.22:45:12
strace brings light into the dark:

13727 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("208.209.50.16")}, 
16) = -1 EINPROGRESS (Operation now in progress)
13727 poll([{fd=5, events=POLLOUT}], 1, 20000 <unfinished ...>
13725 <... select resumed> )            = 0 (Timeout)
13725 gettimeofday({1260397081, 562526}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:01 ooo DEBUG    wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
13725 gettimeofday({1260397082, 566132}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:02 ooo DEBUG    wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
[...]
13727 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("208.209.50.19")}, 
16) = -1 EINPROGRESS (Operation now in progress)
13727 poll([{fd=5, events=POLLOUT}], 1, 20000 <unfinished ...>
13725 <... select resumed> )            = 0 (Timeout)
13725 gettimeofday({1260397101, 608601}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:21 ooo DEBUG    wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
13725 gettimeofday({1260397102, 608581}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:22 ooo DEBUG    wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
[...]


The host has three addresses, all three are tried, and thus the timeout applied 3 times...:

 # host openofficeorg.secsup.org
openofficeorg.secsup.org is an alias for mirrors.secsup.org.
mirrors.secsup.org has address 208.209.50.19
mirrors.secsup.org has address 208.209.50.18
mirrors.secsup.org has address 208.209.50.16
mirrors.secsup.org has IPv6 address 2600:803:420:2:b00b:b00b:1234:1111
mirrors.secsup.org has IPv6 address 2600:803:420:2:b00b:b00b:1234:1112
mirrors.secsup.org has IPv6 address 2600:803:420:2:b00b:b00b:1234:1114

Not ideal.

This raises the question whether DNSrr'ed hosts should generally be handled as separate mirrors. 

(For other mirrors it's often no problem to use the IP addresses directly, but in this particular 
case the Apache virtual host for the OOo mirror isn't hit then.)
msg102 (view) Author: poeml Date: 2009-12-09.22:40:50
I'm seeing the mirrorprobe taking more than 60 seconds sometimes, which shouldn't be the case since it runs with a network 
timeout of 20 seconds. 

This happens with a non-reachable mirror (the network stack returns "network not reachable")

Strangely enough, the timeout actually used is the threefold of the one configured!

A little test program reproduces this:


 # cat ~/timeout-test.py       
#!/usr/bin/python

import sys
import socket
import urllib2

socket.setdefaulttimeout(float(sys.argv[1]))
req = urllib2.Request('http://openofficeorg.secsup.org/')

try:
    response = urllib2.urlopen(req)
except urllib2.URLError, e:
    sys.exit(e)



 # time ~/timeout-test.py 1
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 1  0.00s user 0.09s system 2% cpu 3.108 total
 # time ~/timeout-test.py 2
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 2  0.02s user 0.07s system 1% cpu 6.106 total
 # time ~/timeout-test.py 3
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 3  0.01s user 0.08s system 1% cpu 9.107 total
 # time ~/timeout-test.py 4
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 4  0.02s user 0.08s system 0% cpu 12.108 total
 # time ~/timeout-test.py 20
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 20  0.01s user 0.08s system 0% cpu 1:00.10 total


Usage should be correct:

socket.setdefaulttimeout = setdefaulttimeout(...)
    setdefaulttimeout(timeout)
    
    Set the default timeout in floating seconds for new socket objects.
    A value of None indicates that new socket objects have no timeout.
    When the socket module is first imported, the default is None.


This is with Python 2.6.0 on openSUSE 11.1, and I can reproduce on Python 2.5.2 on 11.0.


But the best is, the timeout works 100% as expected for other non-reachable addresses:

 # time ~/timeout-test.py 4 http://mirror.psigrid.gov.ph/openoffice/
<urlopen error timed out>
~/timeout-test.py 4 http://mirror.psigrid.gov.ph/openoffice/  0.02s user 0.08s system 2% cpu 4.107 total
 # time ~/timeout-test.py 4 http://mirror.icis.pcz.pl/OOO/          
<urlopen error timed out>
~/timeout-test.py 4 http://mirror.icis.pcz.pl/OOO/  0.02s user 0.09s system 2% cpu 4.110 total
 # time ~/timeout-test.py 4 http://less.cogeco.net/ftp/openoffice/
<urlopen error timed out>
~/timeout-test.py 4 http://less.cogeco.net/ftp/openoffice/  0.02s user 0.08s system 2% cpu 4.095 total


And, worse, the other mirror is now online again, so the bug isn't reproducible with it anymore:

 # time ~/timeout-test.py 4 http://openofficeorg.secsup.org/      
~/timeout-test.py 4 http://openofficeorg.secsup.org/  0.01s user 0.09s system 32% cpu 0.293 total
 # time ~/timeout-test.py 4 http://openofficeorg.secsup.org/
~/timeout-test.py 4 http://openofficeorg.secsup.org/  0.04s user 0.07s system 34% cpu 0.301 total



Hrm. I think it might be a good idea to implement mirrorprobe's parallelization anew, using process pool and simply kill off 
all forked children after some time. That might be more robust, and might also fix the problems with too much thread memory 
on small machines.
History
Date User Action Args
2009-12-09 22:45:13poemlsetmessages: + msg103
2009-12-09 22:40:50poemlcreate