"fetch -r -R" fails when an existing file is already complete

FStl · Jun 19, 2017

I created a script to download some large files and proceed only if they are completely downloaded; since the files are large I need the resume (-r) and preserve (-R) options:
fetch -r -R -o . URL1 URL2 URL3 ... || exit

Now suppose on the first run of this script, URL1 is completely downloaded, but the rest are not. When the script is run again, fetch fails on URL1 and returns an overall error:
fetch: URL1: Requested Range Not Satisfiable

Is this a bug in fetch, or a bug in the web server? Note that the web server does report the size of each file, because I can see its full size and percent progress of the fetch.

trev · Jun 19, 2017

I tried repeated fetch -r -R -o and the first time the file downloads, the second time HTTP code 416 is returned and then the file downloads again. See apache24 log below.

Code:

[19/Jun/2017:16:19:20 +1000] "GET /~trev/test HTTP/1.1" 200 35191340 "-" "fetch libfetch/2.0"
[19/Jun/2017:16:19:26 +1000] "GET /~trev/test HTTP/1.1" 416 314 "-" "fetch libfetch/2.0"
[19/Jun/2017:16:19:26 +1000] "GET /~trev/test HTTP/1.1" 200 35191340 "-" "fetch libfetch/2.0"

Curious.

FStl · Jun 19, 2017

From my (client) side, the error is always reproducible, once a file has been downloaded. I can't access the web server logs because it is not mine.

FStl · Jun 19, 2017

It seems the bug is in fetch; it blindly issues a GET request with a Range: bytes=EXISTINGFILESIZE- header, which will fail when EXISTINGFILESIZE = TOTALFILESIZE, because that is not satisfiable according to the standard (https://tools.ietf.org/html/rfc7233#section-2.1):

Code:

If a valid byte-range-set includes at least one byte-range-spec with a first-byte-pos that is less than the current length of the representation, or at least one suffix-byte-range-spec with a non-zero suffix-length, then the byte-range-set is satisfiable. Otherwise, the byte-range-set is unsatisfiable.

Surely, it is reasonable to expect fetch to handle this edge case on its own?

trev · Jun 19, 2017

FStl said:
Surely, it is reasonable to expect fetch to handle this edge case on its own?

It does, for me, by re-downloading the whole file immediately after issuing the 416 code. See the log I posted for timestamps.

FStl · Jun 19, 2017

trev said:
It does, for me, by re-downloading the whole file immediately after issuing the 416 code. See the log I posted for timestamps.

How is "re-downloading the whole file" any better? The whole point of using the resume and preserve flags is to *not* have to do that.

Anyway, in my case, it does not re-download; it just fails.

In either case, the behavior of fetch is unacceptable.

Beastie · Jun 19, 2017

FStl said:
In either case, the behavior of fetch is unacceptable.

I've never noticed that fetch(1) supports multiple URLs in a single command. So I've never used it like that and most likely never will.
Anyway, if I want to use it in a script I insert the command within a loop and get the list of URLs from an external file. This is way more flexible IMHO.

FStl · Jun 19, 2017

Ah, this has already been reported and even fixed in stable:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212065

FStl · Jun 20, 2017

FStl said:
Ah, this has already been reported and even fixed in stable:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212065

So, I downloaded 11.1-BETA2 to test fetch, and was dismayed to find that the behavior of fetch has now become exactly as trev has described. The "fix" that has been applied assumes that the server responds with a Content-Range header in the 416 response, but it is a SHOULD and not a MUST according to the standard.

As I have said before, this new behavior of fetch severely undermines the -r and -R options. A more comprehensive fix is needed.

trev · Jun 20, 2017

I take your point.