Guest Post: FreeBSD in Science

Status
Not open for further replies.
A

Anne Dickison

Guest
— contributed by Jason Bacon


FreeBSD has always been an exceptional platform for scientific computing, thanks in large part to its unrivaled reliability. This allows for computational analyses that need to run for weeks or months, a risky proposition on many platforms. Even if an intensive analysis can be parallelized, restarts are still expensive in terms of time, compute resources, and electric bills.


The FreeBSD ports system is another advantage for scientific computing. It facilitates instant installation of over 30,000 software packages, with the option to build any of them from source using non-portable optimizations such as -march=native. This can make a big difference in run time for software that benefits from the latest SSE and AVX CPU features.


FreeBSD ports has seen continued growth and maturation in the scientific categories, now boasting over 2,000 scientific ports across astro, biology, cad, math, and science, with the biology category recently surging past the 200 mark to 228. A major milestone is the biostar-tools meta-port, which makes it trivial for FreeBSD users to install all of the software needed to learn from the Biostar Handbook, the premier introductory text for budding bioinformaticians.


The fully-integrated ZFS file system makes it possible to deploy systems with ZFS in minutes. ZFS offers major benefits to scientific computing, such as the performance boosts and data integrity provided by software RAID, encryption, and data compression. The LZ4 compression feature is particularly useful, as it reduces the need for explicit compression using gzip, bzip2, or xz. Using explicit compression hinders the use of text processing tools, especially tools that use seek operations to achieve major speed gains, such as tail.


In addition to FreeBSD’s long-standing advantages, there have been many new developments since I wrote for the FreeBSD Journal’s Big Data edition in summer 2018.


Personally, I’ve moved out of HPC systems management (High Performance Computing, largely referring to the use of parallel computing clusters) into the bioinformatics field, where I can focus less on systems management and more on software development and scientific research. FreeBSD has been my primary development platform as I endeavor to improve access to high-performance, easy-to-use bioinformatics software by developing new libraries such as libxtend and biolibc, along with many new applications based on them. One of my major goals is to expedite bioinformatics research by developing tools that are easier to install and use, and run much faster, often rendering HPC unnecessary. Currently there are many bottlenecks caused by the use of interpreted languages such as Perl, Python, and R, or simply by poor software design and/or implementation. FreeBSD has facilitated this work immensely by providing easy setup and management, along with all of the mainstream tools and libraries needed for scientific software development.


Many scientific packages not yet available via FreeBSD ports can be installed using the Conda package manager. Conda is also advantageous for users who do not have root privileges. The sysutils/linux-miniconda-installer port makes it trivial for ordinary users to deploy a Miniconda installation in their own directory.


The R statistical language is heavily used in science, and is well supported on FreeBSD. The primary repository for R software is called CRAN (The Comprehensive R Archive Network). Bioconductor is another repository devoted to biology-related tools. There are several hundred CRAN packages in the ports tree for easy installation and updates. However, installation of any CRAN package within the R environment using the traditional install.packages() function works as well on FreeBSD as it does on Linux and macOS. Installation of Bioconductor packages using BioManager::install() is also well supported.


FreeBSD’s Linux compatibility makes it possible to run the vast majority of closed-source scientific applications as well. The auto-install-linux_base script, also part of sysutils/auto-admin, will quickly and easily configure basic Linux compatibility with all the features necessary to run most Linux binaries. Users may find installation and configuration of commercial software to be a challenge, but in reality it is not much easier on fully-supported systems such as RHEL. The major problem is not the operating system, but the lack of documentation about required dependencies. Many vendors test their software only on RHEL desktop distributions and are not even aware of the minimum set of Yum packages required by their product. This presents problems for Linux-based HPC clusters, where running a desktop Linux distribution on compute nodes is highly undesirable. The same problems are faced by FreeBSD users trying to figure out which Linux libraries must be installed. Once this is determined, most Linux software will run well on FreeBSD. Note that FreeBSD’s Linux compatibility is not an emulation layer, but a set of system calls that allow the FreeBSD kernel to directly support Linux binaries. As a result, some Linux binaries run slightly faster on FreeBSD than they do on Linux.

Deployment of FreeBSD systems for scientific computing has also become much faster and easier in recent years. The sysutils/desktopinstaller port, which began as a simple scripted XFCE installer in 2009, has evolved into a robust tool for installing any desktop environment or window manager in the ports tree on any hardware platform. Scientists can now easily deploy a FreeBSD laptop or workstation running their desktop of choice and any of thousands of scientific software packages, often in well under an hour.

The auto-admin menu, part of sysutils/auto-admin which is installed as a dependency by desktop installer, makes it easy to manage a FreeBSD system either remotely or on the console. Implemented as a set of simple shell scripts, it runs in any terminal session with no need for a graphical environment. The menu system covers most major admin tasks such as user management, system updates, software management, etc.

For those who truly need HPC, SPCM (Simple, Portable, Cluster Management), available via the sysutils/spcm port, makes it easy to deploy and manage a FreeBSD HPC cluster. With SPCM, a small cluster can be configured from scratch in a few hours. SPCM was developed on both CentOS Linux and FreeBSD over the past ten years and has now matured into a fairly robust tool that enables rapid deployment and easy management of a small HPC cluster.


Currently under development for HPC is LPJS (Lightweight, Portable Job Scheduler), a simple and clean system for managing CPU and memory resources on an HPC cluster. Development of this new tool was motivated by the increasing complexity of existing schedulers such as SLURM (Simple Linux Utility for Resource Management). The ’S’ in SLURM has become a bit of an irony as SLURM has evolved into the premier batch system for massive, complicated HPC clusters around the world. For example, SLURM now requires configuration of a complex database system just to enable basic job accounting. LPJS is being developed to fill the niche of small-scale HPC that has been abandoned by all of the major players. It is designed to be easy to deploy and easy to use, based on many lessons from the design of popular schedulers such as HTCondor, LSF, SGE, SLURM, and Torque/PBS. Some notable features include near-zero configuration, basic support for all POSIX platforms, even allowing for hybrid clusters, and support for deployment on existing LANs such as lab PCs, as well as the more traditional dedicated rack-mount hardware.


Most likely, FreeBSD will run all the Unix-based scientific software you need. Even if it won’t, you can always use virtualization to run the few applications not available for FreeBSD. FreeBSD works well as a host or a guest of many popular virtualization packages. You can try out FreeBSD as a guest under Windows or Linux. Any virtualization product should work, but I recommend VirtualBox since it is free and open source, easy to configure, and the guest additions add some nice features like mouse integration and screen resizing. VirtualBox also runs on a FreeBSD host, allowing you to run guest systems such as Windows and Linux. For the more tech-savvy, FreeBSD’s bhyve virtualizer offers and option with lower overhead for those who don’t need a graphical interface to configure and run guests.

The future of FreeBSD as a scientific computing platform has never looked brighter. Looking back over more than two decades of utilizing FreeBSD both for my own work and in production environments at research institutions, I see a continual and marked reduction in barriers to adoption. The vast majority of scientific computing needs are now easily and reliably met by FreeBSD systems. A few remaining proprietary tools, such as CUDA, still present some hurdles, but with open alternatives such as OpenCL and OpenMP gaining ground, these too will likely disappear before long.


The post Guest Post: FreeBSD in Science first appeared on FreeBSD Foundation.

Continue reading...
 
Status
Not open for further replies.
Back
Top