Shell Scrape the pkg.freebsd.org server for packages

Hi all,

I made this script a few weeks back and thought I should share. It basically grabs all the packages from a certain web directory.
It should work for OpenBSD too (though their mirrors provide FTP access too which is probably better).

One thing to note is that it actually downloads the same file twice in order to do a checksum check. It is a bit naff as a solution but unlike the OpenBSD's SHA256 file in the packages directory, couldn't actually find a similar system for FreeBSD (I imagine because pkg is meant to deal with it).

It should also resume in the way that if the package exists and was of the correct checksum, it doesn't try to download it again.

The whole point of this script compared to using pkg is that I can run it on any OS (Even Cygwin) and fetch the packages. This is potentially useful for if you are without a fast internet at home and have to use a machine i.e at a library which is probably not going to be running FreeBSD (yet ;) ).

It only requires curl as a dependency. Using FreeBSD's inbuilt fetch would have been better but curl is more portable to other operating systems which is what I was going for with this script.

Bash:
URL="http://pkg.freebsd.org/FreeBSD:11:amd64/release_2/All/"
FILETYPE="txz"

mkdir -p work

######################################
# Fetch the package directory listing
######################################
if [ ! -e "work/list.html" ]; then
  curl -o work/list.html "$URL"
fi

#########################################
# Scrape the package names from the HTML
#########################################
PACKAGES=`cat work/list.html | grep href | sed 's/.*href="//g' | sed 's/".*//g' | grep ".$FILETYPE$"`

mkdir -p packages

for PACKAGE in $PACKAGES; do

  #########################################################
  # Skip the package if it is already contained within the
  # known checksums file
  #########################################################
  if [ -e work/SHA256 ]; then
    EXISTS=`cat work/SHA256 | grep "$PACKAGE"`

    if [ "$EXISTS" != "" ]; then
      echo "Package: $PACKAGE already exists"
      continue
    fi
  fi

  ##########################################
  # Fetch the package and generate checksum
  ##########################################
  echo "Fetching package: $PACKAGE"
  curl -o work/first.bin "$URL$PACKAGE"
  SHA256=`sha256sum work/first.bin | awk '{print $1}'`

  ##################################################
  # Re-fetch the same package and generate checksum
  ##################################################
  echo "Re-fetching package: $PACKAGE for comparison"
  curl -o work/second.bin "$URL$PACKAGE"
  SHA256_SECOND=`sha256sum work/second.bin | awk '{print $1}'`

  if [ "$SHA256" != "$SHA256_SECOND" ]; then
    echo "Error: SHA256 did not agree for package: $PACKAGE"
    exit 1
  fi

  ############################################################
  # If the checksums both match, add it to the known packages
  # and clean up a little.
  ############################################################
  mv work/first.bin packages/"$PACKAGE"
  rm work/second.bin
  echo "$PACKAGE $SHA256" >> work/SHA256
done

cp work/SHA256 packages/SHA256
 
Back
Top