Elixir is Safe

Tags

Photograph of multiple spools of thread.

Here’s a little secret: I like Elixir because it’s safe.

Some languages are risky. Take C for example. You can write very fast code in C. You can also write very serious bugs.

In 2019, a Microsoft security engineer said that over the previous 12 years, about 70 percent of Microsoft’s security patches were fixes for memory safety bugs resulting from mistakes in C or C++ code.

In open source, the infamous Heartbleed bug, disclosed in 2014, allowed an attacker to read whatever happened to be in a server’s memory, which is about as bad as it gets.

As Robert Merkel wrote:

Mistakes of the type that caused Heartbleed have led to security problems since the 1970s. OpenSSL is written in a programming language called C, which also dates from the early 1970s. C is renowned for its speed and flexibility, but the trade-off is that it places all responsibility on programmers to avoid making precisely this kind of mistake.

I want to stay as far as possible from bugs like that. So I’ve always used programming languages which manage memory for me. Since Elixir uses garbage collection (“GC”), it checks that box.

But here’s the thing: what all GC languages do for memory safety, Elixir also does for safe concurrency.

The Thread Safety Challenge

Thread safety is hard. A recent post on the GitHub blog, How we found and fixed a rare race condition in our session handling, is a good illustration.

Here’s a snippet:

In summary, if an exception occurred at just the right time and if concurrent request processing happened in just the right sequence across multiple requests, we ended up replacing the session in one response with a session from an earlier response… This bug required very specific conditions: a background thread, shared exception context between the main thread and the background thread, callbacks in the exception context, reusing the env object between requests, and our particular authentication system.

So this was a rare but very serious bug, created by one of the highest-profile development teams in the world, in which User A would accidentally be authenticated as User B.

This summary statement gives a sense of the effort involved in tracking it down:

Taking a step back, a bug such as this is not only challenging from a technical perspective in how to identify complex interactions between multiple threads, deferred callbacks, and object sharing, but it is also a test of an organization’s ability to respond to a problem with an ambiguous cause and risk. A well-developed product security incident response team and process combined with a collaborative team of subject matter experts across our support, security, and engineering teams enabled us to quickly triage, validate, and assess the potential risk of this issue and drive prioritization across the company. This prioritization accelerated our efforts in log analysis, review of recent changes across our code and infrastructure, and ultimately the identification of the underlying issues that led to the bug.

Does your organization have the tools and expertise required to fix a bug like this?

Could you do it? I’m not sure I could.

The Underlying Cause

In Ruby, as in many languages, there are two options for concurrency.

  1. Operating system processes, which are isolated but expensive
  2. Threads, which are cheaper but have shared mutable memory

Shared mutable state is a common source of bugs, which makes multi-threading risky in languages like Ruby.

But if you’re running Ruby at GitHub’s scale, you have to make some trade offs for the sake of performance. This may be why the GitHub team is not abandoning threads but working “to make our code more robust for various threading contexts.”

That sounds hard. But there’s another way.

Concurrency for Mortals

In Elixir, we don’t use threads directly. Like garbage collection, thread management is the runtime’s responsibility.

Instead, we use a special kind of lightweight process, provided by the Erlang runtime. Creating one is easy, as a tiny experiment will show.

IO.inspect ["Hello from process ", self()]
spawn(fn -> IO.inspect ["Hello from process ", self()] end)

These processes give us:

  1. Concurrent (non-blocking) IO
  2. Concurrent execution
  3. “Shared nothing” concurrency

Item 3 is the killer one for safety. Like two people, two processes cannot share memory; they can only communicate by sending each other messages.

And whether we spawn them directly or not, Elixir developers use processes all the time.

When we use Phoenix, every web request or web socket connection is a process. Every test we run with ExUnit is a process. Every background task, every supervisor, every file handle or logger is a process.

In Elixir, you’ll never have to wonder if your server library is leaking state across threads of execution. It isn’t. It can’t.

And although it’s not impossible to write a concurrency bug in Elixir, you’re much less likely to do it.

As Matt Nowack of Discord said on Elixir Wizards:

I came from a background of doing massively parallel distributed systems, but all written in Python, and I was constantly worrying about race conditions. “Oh, what if I check this variable, but then something else comes and messes with it before I’m going to add a number to it?” When I finally internalized the guarantees of how processes work [in Elixir]… it made parallel programming so much easier, so much faster, so much simpler.

Two Asides

The concept of processes which don’t share memory and communicate only with messages has come to be known as the “Actor Model.” It has been implemented in many languages. But unlike bolted-on implementations, Elixir’s runtime guarantees that this contract is honored; one “actor” simply cannot mess with the memory of another.

Also, remember garbage collection? Truly isolated processes make that better, too. When a process terminates, the runtime knows that nothing else is using its memory, so the GC algorithm is “everything goes.” Simple and fast.

That covers most cases, since most processes are short-lived. For long-lived processes, GC is performed concurrently, one tiny heap at a time. So there’s never a “stop the world” pause, where every thread of execution has to be paused while the garbage collector runs.

That’s a lot less to worry about.

Come On In

Are you looking for a language where you can build something quickly? A language where, if you succeed, scaling up will be simple?

Check out Elixir.

And if you want help with development or training, get in touch.

DockYard is a digital product agency offering custom software, mobile, and web application development consulting. We provide exceptional professional services in strategy, user experience, design, and full stack engineering using Ember.js, React.js, Elixir, Ruby, and other technologies. With staff nationwide, we’ve got consultants in key markets across the U.S., including Portland, San Francisco, Los Angeles, Salt Lake City, Minneapolis, Dallas, Miami, Washington D.C., and Boston.