parallel tasks

I'm putting this out because I suspect there are more ways to accomplish this.

I stumbled onto 'parafly' on freshports. The premise - to run a list of commands in parallel - looked very useful.

sysutils/parafly/
# pkg install parafly

There is no man page but typing 'ParaFly' (note the caps) will offer terse help text. ParaFly needs a file for input. I'll use ping(8) as an example but any command is fair game.

Code:
ping host1
ping host2
ping host3
ping host4
ping host5

# ParaFly -c task.txt -CPU 5

The output is exactly as you expect. What other utility executes commands in parallel from a text file? BSD make(1) (aka 'pmake') comes to mind. The same thing as a makefile (not visible but the requisite tabs are there, I promise):

Code:
# Makefile
all: t1 t2 t3 t4 t5

t1:
    ping host1

t2:
    ping host2

t3:
    ping host3

t4:
    ping host4

t5:
    ping host5

Typing 'make' will run the tasks one-at-a-time. make(1) needs '-j' to allow multiple jobs.

# make -j 5

This gives similar output. The 'pings' all run in parallel. A poor man's nmap if you will.

What other ways can parallel tasks be done - preferably out-of-the-box?
 
Other ways using freeware? There is pdsh, a parallel shell (but don't know whether it is available for FreeBSD prebuilt as a port). It turns out good old xargs() can run multiple processes too, very easily. There is a variety of parallelizing makes and cluster makes; at one point about 15 years ago, I evaluated some of them, and found all of them to be terribly lacking (our group ended up writing our own).

It is actually quite easy to write small shell scripts to parallelize tasks, using "&" and wait.

Where all of this gets actually interesting is not just running multiple processes on one host, but on multiple hosts. This is easiest done in a uniform cluster, with a large pool of interchangeable worker machines. Ideally, it requires a parallel or cluster file system underneath, because the various tasks need to be able to see each other's files. About 25 years ago there was a very good commercial product called LSF or Load Sharing Facility, which came with a cluster shell: it placed each process on a different machine. I know that product was taken over by IBM, and I don't know whether it still exists. The real issue in building a cluster is data movement. Even a parallel make can expose the shortcomings there: Running a dozen compile processes on each node, on a dozen nodes at the same time, is likely to overwhelm the parallel file system, unless it is industrial strength.

In real-world applications (with clusters of thousands or tens of thousands of machines), much more heavyweight custom tools are used. The issue of how to distribute tasks pales in comparison with organizing how to place them optimally.
 
Where all of this gets actually interesting is not just running multiple processes on one host, but on multiple hosts.
I'm not looking for something like torque. That's a whole other level. Just local tasks in parallel. It seems like there should be some shell builtin that uses '&' but also waits for all to complete.
 
Back
Top