Modern Unix Tools

20 Aug 2020

It’s been a rough year. My ex wife had her assistant drop my name to a drug dealer on 10 March 2020 and I was attacked hours later crushing my sinus. I’ve held an axiom not to get into corrupt Iowa politics here, but at some point I will publish a book:

Iowa Gov Herschel Loveless

DSM mob boss “Cock Eyed” Lew Farell

My RapidAPI offering of Microsoft Z3 is in beta, I’m hoping to launch soon in a series of blog posts. The first will be using Z3 to fight COVID-19. Posts to follow will include writing plugins for VSCode and Excel - and more problems from Garey and Johnson expressed as SMT scripts.

For relaxation, marketing myself, and fun I have been helping port GNU Coreutils to Rust uutils.

My first question was, “What have Unix utilities looked like over the years?” They range from poeticly succinct to thousands of lines.








Unix v7




Results from The Relevance of Classic Fuzz Testing: Have We Solved This One? were harsh. The Rust port is an extreme work in progress.

I am synthesizing expectation tests from the above projects. This has yielded a surprising number of issues to fix.

I am working on formal input grammars to avoid entire classes of parsing bugs.

Several tools ask only a BNF style grammar as input. yagg is having Perl bitrot, GramTest has a clean BNF syntax, Google Domato and Mozilla Dharma are actively developed.

clap is widely used for other command line utilities, args - command-line-interface are simple, and arg supposedly has a small memory footprint.

lalrpop does LALR(1), pest PEG, and nom does parser combinators. asp is the new kid on the block that adds algebraic constraints to parser combinators.

I plan on taking one tool, probably ‘head’, and specifying the input grammar with those tools for comparision.

One surprise I had was number of environment variables that GNU Coreutils uses - many poorly documented.

For resource smash testing I plan on using Linux ulimit - much easier than trying to have Rust or LLVM smash resources at compile time. Clasic post by Neil Mitchell on resource smash testing garbage collected languages.

To study the effect of Rust’s borrow checker on memory I planed to replicate the results of What causes Ruby Memory Bloat? vs other implementations, record cache miss rates, and quantify differences in LLVM IR output around malloc/free.

Along the way I expect to use several cache oblivious and succinct data structues that Unix developers of the 1970s didn’t have available to them. Apparently cargo build –release automatically does dead code elimination but I still plan on running bloaty to see if more can be stripped.

Profile guided optimization has had huge performance improvements, there is even a machine function splitter in the works. llvm-propeller by Google and BOLT by Facebook are the two major initiatives.

The SOUPER superoptimizer for LLVM IR instructions is also promising.

For modeling Unix tool state I plan on writing TLA+ and Alloy specifications.

gg is the future of distributing builds and thunking them for repoducability. I’m writing an SMT backed scheduler for cost optimizing them in a time budget.

Rust verification tools is a good rundown of the latest concolic testing tools for Rust. I also plan on using lots of strace and writing expectation tests on top of that. Tools like cargo-profiler leverage Cachegrind.

One last thing, How to delete all your files.

cd /mnt/nfs/Documents
rsync -r * ~/Documents
#injection attach when one of the files is "--delete"

Definely need to look into the filename injection attack surface.