Why Use Erlang?

It isn't obvious what makes the language good, and there are major barriers to entry.

And yet, it's capable of extremely impressive things.

How does such an obscure language serve so well under fire?

The Trick

The trick is built into the very foundation of the language, and its effects are subtle. You notice them when you run your first Erlang service. Its ability to take abuse is uncanny: bad input, unthought of edge cases, connection failures, dumb mistakes, deadlocks...

No matter what, your poorly written code recovers. It somehow runs for weeks on end with no intevention, and it's hard to understand why.

Services in Software as a Service Systems

To understand the trick, let's look at a familiar example.

A recent presentation from Github described their internal services. Let's imagine what they could look like, supervised by admins and something like god. If a service fails, their supervisor restarts them. Supervisors have supervisors too: even admins have someone watching them.

Supervised services.

Most services are independent. If metrics is down, chat still works so the problem may be discussed, and patches can be pushed out using the chatbot and deploy service.

Even the ones with dependencies are fairly isolated. If the database must be restarted, only a small window of writes from logging and metrics will be lost.

If temporary data loss is unacceptable, monitoring can be added between dependencies. Logging monitors the database. If it's down, logging can enter a reduced-functionality mode, cache incoming writes until the problem is resolved, or otherwise handle the fault.

Services monitoring services.

Monitoring handles all possible faults! It doesn't matter whether the database is down due to a bug, power outage, network partition, or meteor strike. The fault is still handled.

A stronger version of monitoring is linking. If the chatbot is down, deployment can't be accessed. If deployment is down, the chatbot is useless. Failure in one should cause the other to shut down, minimizing error messages.

Services linking to services.

What have we got?

We've got an architecture made of loosely coupled, isolated services. They monitor and link between each other to handle all kinds of failure equally. Distributed across many redundant machines in multiple locations, they form an extremely robust system. Parts can fail, but others will notice and restart them.

Isn't that nice? Wouldn't it be cool if all software was this robust?

Software is Made of Services

Erlang takes the techniques that build robust systems in the large, and applies them to the smallest tasks. Actors are extremely cheap to create, and in-process message passing is instant. You can design a program like you would a large system, and reap the same benefits!

For example, let's pretend the chatbot is written in Erlang.

Chatbot as a group of services.

You get the properties of a large, well-designed system on a small OS process level scale. More importantly, these benefits are baked in - this is the default way Erlang code is written. Even poorly written Erlang code can recover from failures, because everything is done by small, isolated services which can be restarted into a (hopefully) clean state.

This is unique.