Please provide a short (approximately 100 word) summary of the following web Content, written in the voice of the original author. If there is anything controversial please highlight the controversy. If there is something surprising, unique, or clever, please highlight that as well. Content: Title: Design of GNU Parallel Site: Design of GNU Parallel This document describes design decisions made in the development of GNU parallel and the reasoning behind them. It will give an overview of why some of the code looks the way it does, and will help new maintainers understand the code better. One file program GNU parallel is a Perl script in a single file. It is object oriented, but contrary to normal Perl scripts each class is not in its own file. This is due to user experience: The goal is that in a pinch the user will be able to get GNU parallel working simply by copying a single file: No need to mess around with environment variables like PERL5LIB. Choice of programming language GNU parallel is designed to be able to run on old systems. That means that it cannot depend on a compiler being installed - and especially not a compiler for a language that is younger than 20 years old. The goal is that you can use GNU parallel on any system, even if you are not allowed to install additional software. Of all the systems I have experienced, I have yet to see a system that had GCC installed that did not have Perl. The same goes for Rust, Go, Haskell, and other younger languages. I have, however, seen systems with Perl without any of the mentioned compilers. Most modern systems also have either Python2 or Python3 installed, but you still cannot be certain which version, and since Python2 cannot run under Python3, Python is not an option. Perl has the added benefit that implementing the {= perlexpr =} replacement string was fairly easy. The primary drawback is that Perl is slow. So there is an overhead of 3-10 ms/job and 1 ms/MB output (and even more if you use --tag ). Old Perl style GNU parallel uses some old, deprecated constructs. This is due to a goal of being able to run on old installations. Currently the target is CentOS 3.9 and Perl 5.8.0. Scalability up and down The smallest system GNU parallel is tested on is a 32 MB ASUS WL500gP. The largest is a 2 TB 128-core machine. It scales up to around 100 machines - depending on the duration of each job. Exponentially back off GNU parallel busy waits. This is because the reason why a job is not started may be due to load average (when using --load ), and thus it will not make sense to just wait for a job to finish. Instead the load average must be rechecked regularly. Load average is not the only reason: --timeout has a similar problem. To not burn up too much CPU GNU parallel sleeps exponentially longer and longer if nothing happens, maxing out at 1 second. Shell compatibility It is a goal to have GNU parallel work equally well in any shell. However, in practice GNU parallel is being developed in bash and thus testing in other shells is limited to reported bugs. When an incompatibility is found there is often not an easy fix: Fixing the problem in csh often breaks it in bash . In these cases the fix is often to use a small Perl script and call that. env_parallel env_parallel is a dummy shell script that will run if env_parallel is not an alias or a function and tell the user how to activate the alias/function for the supported shells. The alias or function will copy the current environment and run the command with GNU parallel in the copy of the environment. The problem is that you cannot access all of the current environment inside Perl. E.g. aliases, functions and unexported shell variables. The idea is therefore to take the environment and put it in $PARALLEL_ENV which GNU parallel prepends to every command. The only way to have access to the environment is directly from the shell, so the program must be written in a shell script that will be sourced and there has to deal with the dialect of the relevant shell. env_parallel.* These are the files that implements the alias or function env_parallel for a given shell. It could be argued that these should be put in some obscure place under /usr/lib, but by putting them in your path it becomes trivial to find the path to them and source them: source `which` The beauty is that they can be put anywhere in the path without the user having to know the location. So if the user's path includes /afs/bin/i386_fc5 or /usr/pkg/parallel/bin or /usr/local/parallel/20161222/sunos5.6/bin the files can be put in the dir that makes most sense for the sysadmin. env_parallel.bash / / env_parallel.ash / env_parallel.dash / env_parallel.zsh / env_parallel.ksh / env_parallel.mksh env_parallel.(bash|sh|ash|dash|ksh|mksh|zsh) defines the function env_parallel . It uses alias and typeset to dump the configuration (with a few exceptions) into $PARALLEL_ENV before running GNU parallel . After GNU parallel is finished, $PARALLEL_ENV is deleted. env_parallel.csh env_parallel.csh has two purposes: If env_parallel is not an alias: make it into an alias that sets $PARALLEL with arguments and calls env_parallel.csh . If env_parallel is an alias, then env_parallel.csh uses $PARALLEL as the arguments for GNU parallel . It exports the environment by writing a variable definition to a file for each variable. The definitions of aliases are appended to this file. Finally the file is put into $PARALLEL_ENV . GNU parallel is then run and $PARALLEL_ENV is deleted. First all functions definitions are generated using a loop and functions . Dumping the scalar variable definitions is harder. fish can represent non-printable characters in (at least) 2 ways. To avoid problems all scalars are converted to \XX quoting. Then commands to generate the definitions are made and separated by NUL. This is then piped into a Perl script that quotes all values. List elements will be appended using two spaces. Finally \n is converted into \1 because fish variables cannot contain \n. GNU parallel will later convert all \1 from $PARALLEL_ENV into \n. This is then all saved in $PARALLEL_ENV . GNU parallel is called, and $PARALLEL_ENV is deleted. parset (supported in sh, ash, dash, bash, zsh, ksh, mksh) parset is a shell function. This is the reason why parset can set variables: It runs in the shell which is calling it. It is also the reason why parset does not work, when data is piped into it: ... | parset ... makes parset start in a subshell, and any changes in environment can therefore not make it back to the calling shell. Job slots The easiest way to explain what GNU parallel does is to assume that there are a number of job slots, and when a slot becomes available a job from the queue will be run in that slot. But originally GNU parallel did not model job slots in the code. Job slots have been added to make it possible to use {%} as a replacement string. While the job sequence number can be computed in advance, the job slot can only be computed the moment a slot becomes available. So it has been implemented as a stack with lazy evaluation: Draw one from an empty stack and the stack is extended by one. When a job is done, push the available job slot back on the stack. This implementation also means that if you re-run the same jobs, you cannot assume jobs will get the same slots. And if you use remote executions, you cannot assume that a given job slot will remain on the same remote server. This goes double since number of job slots can be adjusted on the fly (by giving --jobs a file name). Rsync protocol version rsync 3.1.x uses protocol 31 which is unsupported by version 2.5.7. That means that you cannot push a file to a remote system using rsync protocol 31, if the remote system uses 2.5.7. rsync does not automatically downgrade to protocol 30. GNU parallel does not require protocol 31, so if the rsync version is >= 3.1.0 then --protocol 30 is added to force newer rsync s to talk to version 2.5.7. Compression GNU parallel buffers output in temporary files. --compress compresses the buffered data. This is a bit tricky because there should be no files to clean up if GNU parallel is killed by a power outage. GNU parallel first selects a compression program. If the user has not selected one, the first of these that is in $PATH is used: pzstd lbzip2 pbzip2 zstd pixz lz4 pigz lzop plzip lzip gzip lrz pxz bzip2 lzma xz clzip . They are sorted by speed on a 128 core machine. Schematically the setup is as follows: command started by parallel | compress > tmpfile cattail tmpfile | uncompress | parallel which reads the output The setup is duplicated for both standard output (stdout) and standard error (stderr). GNU parallel pipes output from the command run into the compression program which saves to a tmpfile. GNU parallel rec