Solved How to 'cp -a' from a website

I want to copy a subdirectory tree from a website, effectively cp -a remote-host/dir ..

How would I do that?

I can download files individually via my browser but would like to duplicate the remote directory.
 
You can also use rsync if you have SSH access — it is usually a better equivalent to cp -a over the network:

rsync -avz user@remote-host:/path/to/dir .


For HTTP-only access, wget --mirror works, but as mentioned, directory listing must be enabled; otherwise there is no generic way to enumerate files over plain HTTP.
 
You know scp(1) exists? Other than that, wget(1) can download directories, but DirectoryIndex has to be enabled, there's no way to figure out the contents of a web directory through the regular HTTP(S) protocol.
It's a public website.

I had forgotten that wget does a recursive retrieval so I did that, but got a ton of html files which I don't want.

Not sure if there is a straightforward way of deleting them all.
 
I'm not bothered about copying a website.

What I wanted was to just get the files from

To clarify, that should recursively go through every link and fetch the data. The fact it is a "website" is not quite so important.

It should also have an ftp http mirror: http://ftp.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/

Though if this is Debian specifically, I think apt-mirror tends to be a good archiving solution.
 
Lftp is a nice program for mirroring websites. From the manpage:-

"lftp has built-in mirror which can download or update a whole directory
tree. There is also reverse mirror (mirror -R) which uploads or updates
a directory tree on server. Mirror can also synchronize directories be‐
tween two remote servers, using FXP if available."

Lftp runs as an interactive session. See the description of the 'mirror' command in the manpage; basically you mirror a remote directory to a local one.
 
I have now removed all the gunk from the download and copied all the files onto my PXE server from which I was able to install Debian with little effort.

Having the same facility for FreeBSD would be nice, and I wouldn't be surprised if someone has already put together such a package, although I have not yet come across such a thing.
 
I have now removed all the gunk from the download and copied all the files onto my PXE server from which I was able to install Debian with little effort.

Having the same facility for FreeBSD would be nice, and I wouldn't be surprised if someone has already put together such a package, although I have not yet come across such a thing.
Precisely why I build the tool I mentioned above. Would be interesting to see it in action for the site you mentioned.

Up to you, but if you don’t mind I’d like to try myself and see if it works if you don’t want to . If so, PM me the site and I’ll give it a whirl.
 
Precisely why I build the tool I mentioned above. Would be interesting to see it in action for the site you mentioned.

Up to you, but if you don’t mind I’d like to try myself and see if it works if you don’t want to . If so, PM me the site and I’ll give it a whirl.
I think I would have been able to grab just the files I wanted without all the 'index.html' files if I was better acquainted with wget's command line flags.

I did get all the files I wanted, but it was a pain removing those that I didn't want.

The aim was to simply recursively retrieve files from:

 
It's about 1GB to download. Takes one minute for me.
Code:
wget2 -r -np http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/
You might need html to go deeper into subdirectories ...
 
Just to give you an idea of what the tool can do, I scraped for iso files in that url.

It uses the lynx text browser to do all this.

Code:
root@bsd:.Github/shcrapy # ./shcrapy -e iso http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot
Finished. Collected        2 links.

root@bsd:/Github/shcrapy # cat http.us.debian.org_downloadable_files.txt
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/mini.iso
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/mini.iso

root@bsd:.Github/shcrapy # cat http.us.debian.org_visited_urls.txt
http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/depthcharge/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/boot-screens/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/grub/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/grub/x86_64-efi/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/linux
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/pxelinux.cfg/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/debian-installer/amd64/pxelinux.cfg/default
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/depthcharge/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/pxelinux.cfg/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/gtk/pxelinux.cfg/default
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/pxelinux.cfg/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/pxelinux.cfg/default
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/xen/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/xen/vmlinuz
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/boot-screens/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/grub/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/grub/x86_64-efi/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/linux
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/pxelinux.cfg/
http://http.us.debian.org/debian/dists/trixie/main/installer-amd64/current/images/netboot/debian-installer/amd64/pxelinux.cfg/default

Depth is limited to just under the given url by default, but can be changed go deeper. Also allows excluding certain keywords from a URL. So if you know the extensions of the files you want, this is very useful.
 
Back
Top