Amtal

Unix Pipes and Pointless Functional Programming

What do Unix pipelines, jQuery method chaining, and Haskell’s . function have in common?

These are properties of the ‘pointfree’ or ‘pointless’ style of function composition. It’s a useful technique that produces very modular code, and is especially important in functional languages. Anyone familiar with a Unix shell can testify to its effectiveness.

Pointfree Style in Unix

From a functional language point of view, programs in a Unix pipeline are functions. The pipe character chains them together, f | g | h being equivalent to h(g(f())). Passing parameters to a program is partial application, specializing the function with immutable external data.

#!/bin/bash
# 
# Take a log file and extract top 3 offending IPs. Sample log entry:
# [Wed Jun 30 20:02:53 2010] [error] [client 209.129.94.61] File does not exist
cat $1 | grep '[[:digit:].]\{7,15\}' -o | sort | uniq -c | sort -r | head -n 3

The immutable data and anonymous intermediate values isolate steps from the context they appear in. Moving a sequence of steps into a helper script is a trivial cut-and-replace. This makes top-down design easy.

Steps also tend to be referentially transparent. This lets them be tested and debugged in isolation, making bottom-up development easy.

While the style has severe limitations1 it is amazingly effective at complex data transformations. If you want the safety immutability and referential transparency bring, or simply can’t use mutable storage, pointfree style is for you.

(No) Pointfree Style in Erlang

So what if you can’t use mutable storage, but your language doesn’t support pointfree style?

Normally, you just think up lots of short, unenlightened temporary “variable” names, with each one being used just once in the following line. They clutter the code and are usually too nondescript and short to help understanding, but look familiar to imperative programmers and thus have their uses.

Of course, sometimes there are no names to think up… And then things get ugly.

That case is extreme, and could be mitigated by adding more helper functions or folding over a list of transformation functions with the CSS in the accumulator. However, the problem constantly crops up on much smaller scales: most Erlang projects contain some Foo2 = f(Foo1) somewhere.

get_hostname(Url) ->
    case Url of % strip protocol
        <<"http://",Url2/binary>> -> ok;
        <<"https://",Url2/binary>> -> ok;
        Url2 -> ok
    end,
    Url3 = hd(binary:split(Url2, <<"/">>)), % strip path
    Url4 = hd(binary:split(Url3, <<":">>)), % strip port
    list_to_binary(string:to_lower(binary_to_list(Url4))).

More importantly, bad code doesn’t appear out of nowhere. Even mediocre programmers rarely produce something terrible on their first try. Disgusting code appears gradually as the codebase evolves and grows, and minor modifications add up. Busy fixing bugs and adding features, the programmer doesn’t notice the growth until it’s too late – and then a large chunk of code needs to be put out of its misery, needing a rewrite too thorough to be called a refactor.

This means that refactoring such patterns into folds when they reach a certain size isn’t sufficient. The problem needs to be solved before it takes root, which means adding syntactic support for pointfree style.

Syntax Sugar with Same Semantics

Fortunately, there’s LFE. A language with identical semantics to Erlang, but Lisp syntax and macros. Macros that can implement new language constructs:

(defn get-hostname [url]
  (-> (case url ; strip protocol
        ((binary "http://" (rest bytes)) rest)
        ((binary "https://" (rest bytes)) rest)
        (x x))
      (binary:split <> (binary "/")) car ; strip path
      (binary:split <> (binary ":")) car ; strip port
      binary_to_list string:to_lower list_to_binary)) ; lower case

The result:

The code is elegant and easy to follow.

(defn gen-url-slug [size]
  (-> (* 3 size)
      crypto:rand_bytes
      base64:encode
      (binary:replace <> (binary "+") (binary "-") `(global))
      (binary:replace <> (binary "/") (binary "_") `(global))))

In addition, it’s conducive to refactoring. Just cut, replace with function name, and paste somewhere:

(defn strip-protocol [url]
  (case url 
        ((binary "http://" (rest bytes)) rest)
        ((binary "https://" (rest bytes)) rest)
        (x x)))

Golf in the Small, Value in the Large

On toy examples, at a small scale, this looks like pointless code golf. The improvements seem miniscule if at all existant to people used to assigning values to variables, especially mutable ones.

However, when dealing with large, growing functional programs the payoff becomes apparent:

Point-free style makes the syntax match the way the code is designed and executed. This is a very powerful advantage.

1 In point-free style, steps can only push data to the step directly below. There is also no looping or control structures unless you use something like arrows rather than simple pipes.