Elixir Deployment Tools Update - February 2018

By: Paul Schoenfelder
This doesn't look like an OTP release to me...

In the time since I was hired by DockYard to start working on deployment tools, I have been pretty quiet, and I can finally remedy that today by giving you a look at what I have planned for this year, and what you can expect from your tools in the future. This plan is based in part on my own experience with Distillery, discussions with the Elixir core team, as well as feedback from the community in general. I’m really excited about where things are headed, and by the time you’ve finished reading this update, it is my hope that you will be too!

A Quick Recap

To better understand where we are headed, it is useful to know where we have been. When it comes to deployment of Elixir projects, there has always been some degree of confusion, because there were really two different ways you could choose to deploy your application: either as an OTP release, or in the form of a Mix project, source code and all. I don’t believe the latter is an acceptable option; not only is it not reproducible, but it opens up your production hosts to additional attack vectors, and worse, it throws out a huge part of OTP’s design. We want deployments to be reproducible, require the minimum amount of resources and dependencies, and take full advantage of the features provided to us by Erlang/OTP.

Releases are a fundamental piece in the design of OTP, and if you ever take the time to read through the Erlang source code, you will see just how pervasive it is. In short, they are the means by which you are intended to package together OTP applications which are then run as a single unit. Releases both build on, and extend, the low-level tooling by which the runtime boots and manages OTP applications. For example, there is a script which defines how to load and boot applications at startup, used both within and without releases; but only the release handler provides the script which describes how to upgrade from one version of an application to another, and vice versa. While you can manually reload modules without the release handler, that is a blunt instrument in comparison to the carefully structured appup process, which not only loads the new code, but also ensures that running processes are transitioned into the new code in an organized fashion. These low-level tools work in concert with one another, so that the release handler can even change the version of the runtime being used in the middle of the upgrade/downgrade process. It is important to understand how fundamental releases are, because to dismiss them is to dismiss all of the experience and effort that went into the design of OTP, which has stood the test of time.

While in general deploying OTP releases is preferred by the community, it isn’t always an easy choice for those who are new to the language, are told to use Distillery to build a release when they want to deploy, end up encountering some issues, struggle to understand what went wrong and how to fix it, and end up asking themselves why they shouldn’t just deploy source code and use mix run --no-halt or mix phx.server - particularly coming from scripting languages where deploying source code is the norm. That’s an absolutely valid reaction in that situation, after all, we all want our tools to just work, and when the only tool to build a release is some third-party library written by someone they don’t know, and on top of that, doesn’t integrate seamlessly with some of the features of the language, well, it’s no surprise that people choose the easiest path.

The core team realizes that this is untenable - we can’t afford to have this kind of fragmentation and pain, or it will spread and infect other aspects of the language. In fact, this has already happened to some degree, you only have to look at how many variations on configuring libraries based on environment variables there are to see the impact this has had. Let’s take a look at what I see as the major issues facing us, and what my plan is for solving them.

Development vs Production Tooling

In my opinion, the biggest issue with consolidating the community around OTP releases has been with the discrepancy between how things are done in development, and how things are done when you are ready to go to production. This is due in part to the fact that the two main development tasks we rely on, iex -S mix and mix run (or mix phx.server for Phoenix applications), are not founded on OTP releases. Instead, Mix starts the runtime with only :kernel, :stdlib, and itself started, and then dynamically loads and starts applications based on the project definition found in mix.exs. It chooses to do this because it unified the Mix task infrastructure, and provided the means to handle configuration via config.exs, since Mix could inject configuration into the application environment before starting other applications. At the time, I suspect that José wasn’t concerned with whether releases were supported or not, and was more concerned with making sure the tooling was powerful and intuitive, and it wasn’t until later that the friction became apparent.

Tasks

Mix tasks don’t work within releases for a simple reason: Mix is designed to work in the context of the typical Mix project directory structure, as well as the required Mix project definition in mix.exs. Some aspects of Mix’s API, such as Mix.env, are also meaningless in the context of a release. Instead, Distillery provides a facility for executing custom commands, for example, a bin/myapp migrate command might be a thin shell script like so:

#!/usr/bin/env bash

$RELEASE_ROOT_DIR/bin/myapp command Elixir.MyApp.ReleaseTasks migrate

This results in MyApp.ReleaseTasks.migrate/0 being invoked in a context where all of the code is loaded, but none of the applications are started, effectively equivalent to a Mix task. The problem is that you can’t reuse, for example, Ecto’s migration task. Instead, you have to use Ecto’s migrator API to implement similar functionality. You have to know which applications to start, how to discover the migrations, and a variety of other small differences, which Ecto would have already solved for you if you could use the Mix task. You also have to read up on Distillery’s shell context and API so you are aware of things like the RELEASE_ROOT_DIR environment variable.

Other than the question of migrations, this hasn’t been a huge issue in practice, but not being able to use Mix in production has certainly presented friction, particularly if you want to expose a variety of custom commands for interacting with the running application both in dev and prod. You ultimately end up hacking something together which wraps a Mix task around the module actually implementing the task, so that you can use the task in dev, and invoke the module via custom command in prod - not ideal.

Configuration

Erlang projects traditionally rely on the system configuration file, sys.config, only for configuring the runtime, and more rarely for configuring applications. This config file is static, you can’t call functions in it, you can’t fetch the value of an environment variable; that type of dynamic configuration was expected to be performed in application code.

Elixir came on the scene, and introduced its own configuration file, config.exs, which unlike sys.config is dynamic - after all, it is effectively an Elixir script, you are limited only by what code you are willing to write. Because Mix was taking care of starting your application and its dependencies, it could evaluate the config file before doing so, and ensure everything was pushed into the application environment (which is what you access via Application.get_env/3 and Application.put_env/3). The fundamental issue here is that this didn’t take into account how Elixir applications would work in OTP releases; there is no Mix project, and Mix is no longer in charge of choosing how and when to boot your application, instead that is the job of :init and the boot script which instructs the runtime how to load and start applications and in what order.

When I wrote the initial tooling for releases in Elixir, I saw little choice but to translate the Mix config file to sys.config, by reading it at build time and writing out the resulting datastructure in Erlang term format. This decision had several implications: it implicitly changes the semantics of config.exs from that of a runtime config file to a build time config file; it requires that usages of System.get_env/1 be changed to use some other mechanism, of which there are numerous fragmented variants; it prevents you from fetching something from the environment and transforming it prior to setting the config; and worse of all, it creates a rift between development and production that is only surfaced when it is time to deploy to production, which is confusing to many, and for good reason.

Errors

Speaking of confusion, another area in which I see room for significant improvement is that of errors which occur early in the boot process. When the Erlang VM starts, it does a number of things before Elixir itself is even loaded, and even once it is, Elixir isn’t in charge of the boot process, the runtime is. To give you an idea of what I mean, let’s take a very quick look at how the runtime boots, regardless of whether it’s a release or a Mix project:

  1. The ERTS (Erlang Runtime System) emulator (written in C) gets to a point where it calls the very first piece of Erlang code in the system, this code is found in the :otp_ring0 module, and its start/2 function. :otp_ring0 and a handful of other modules are preloaded into the emulator, which is how the system bootstraps itself.
  2. :otp_ring0.start/2 simply calls :init.boot/1 with the command line arguments passed to the runtime
  3. :init loads some core NIFs, such as zlib, erl_tracer, and others; parses arguments and sets boot options, and then begins to evaluate the instructions found in the boot script. By default this boot script is found in the Erlang distribution (as start.boot or start_clean.boot), but releases provide their own with instructions for the applications it contains. This file is simply a binary containing an Erlang term, which is a list of tuples containing instructions like load, apply and others. :init remains running during the entire life of the VM, until a shutdown is initiated (e.g. via :init.stop/0) or a crashing application requires that the node itself crash. This module is responsible for handling errors which are unrecoverable. Such errors result in a crash dump being written, and then termination of the process.
  4. Early in the boot script, there are instructions to load and start the :application_controller module, which is then sent messages by :init to load and start applications in the order they are specified in the boot script. This module handles any errors with loading or starting applications, as well as errors when applications crash. It will generally print information about those errors, and then crash itself, which will bubble up to :init
  5. Once all applications are started (or more specifically, all boot instructions have been processed), the boot process is complete

The reason why we sometimes get really ugly or undecipherable errors when crashes occur, is because those errors are handled so early in the boot process that Elixir can’t intercept them and print them in a nicer format. Furthermore, they are handled at a level where it is not guaranteed that even the Erlang standard library is fully available. Because hardly any modules can be relied on at this phase of the program, no effort is made to pretty print errors, instead they are dumped to standard output in raw form. If you’ve ever seen an {"init terminating in do_boot", ...} error, that is something which failed in :init, or which produced an unrecoverable error. As an example, if an application in the release is missing, that error will be hit inside :application_controller when :init sends the message to load that application. The application controller will crash because it could not find the application it needed to load, :init will receive the EXIT message for the application controller and determine that an unrecoverable error has occurred. This is an error which you would expect to be able to print a friendly message for, but we can’t currently. There are a vareity of such errors, and it has been a pain point when those errors occur.

A More Perfect Union

Elixir has evolved to a point where we need to close the gap on some, if not all, of these issues. Friction early in the development of a language is acceptable, but we’re now to the stage where core tooling like this really needs to be rock solid and well integrated. This means we need to ensure that Mix embraces releases fully, and provides the necessary API to write tools which work both in the context of a Mix project, or in the context of a release - but such things should mostly be transparent to the author of libraries, Mix tasks, etc. Now that you have a better understanding of the problems, let’s take a look at what we’re planning to do to fix them!

Releases in Elixir Core

To remove the awkward transition between development and production, we really need our development tooling to be built on releases under the covers. In short, if you run iex -S mix or mix run, these should effectively be the equivalents of bin/myapp console or bin/myapp foreground. If you are always running releases, then there is no transition to be made between development and production.

Just making those commands generate and run a release isn’t enough though; we want our Mix config files to work the same as they do now in development, but in production too. If we just took the current version of how releases work and made you work that way all the time, Mix config files would lose all their usefulness.

Luckily, I’ve come up with a solution for this which is beautiful in its simplicity, but also frustrating because it has been right in front of me for so long and I missed it. By leveraging instructions in the boot script, we can have Mix evaluate the config file before any further actions are taken, which gives us the best of all worlds - we no longer need to translate to sys.config, and config.exs retains its runtime semantics across development and production. The only caveat to this is that some configuration options will still need to be done via vm.args, for example any :kernel config settings, as it is loaded and started before Elixir. That is a very small price to pay for being able to fully support Mix configuration.

Supporting Mix tasks is a more complicated issue, and will likely not be solved in the first pass we make, but if Mix itself understands releases, I don’t think it’s a huge leap to get to a point where we can define Mix tasks which work in a release. Fundamentally, there will always be some subset of Mix tasks which are designed for working within a Mix project directory and alongside source code, so it is important to provide a clear delineation between those which need a Mix project, and those which don’t, so that proper errors can be produced if one tries to invoke a task in the wrong context. It is something I will continue to have conversations with the core team about as we get closer to release.

Better Errors

The question of how to better deal with errors is still an open one. I am exploring a few approaches that may work, which involve making it possible to inject custom behaviour into :init and :application_controller, or potentially taking over both roles entirely by supporting custom :init or :application_controller modules. The former is definitely more desirable, because we want a model by which we can easily extend or customize behaviour in these early phases, without having to reimplement all of the really critical work they do. The bottom line is that I think we can get to a point where errors are either more readable, or even better, presented in a format which is native to the project (e.g. Elixir projects get Elixir-formatted error data, rather than Erlang-formatted). This work will require coordination and cooperation with the OTP team to realize, but I’ve spoken with a number of people who see that there is room for improvement here, so I have high hopes that we can find a solution that will work for them as well as the community at large.

Better Documentation

A major issue for beginners has been the quality and style of documentation around releases. While the current docs do cover the vast majority of questions one may have, they are poorly organized, contain duplication, and are not broken up into smaller topics which are easier to find. I’ve come up with a new structure for the docs, which will likely be carried forward into the Elixir guides once releases are in the core tooling, and which I’ve already started working on. It is my hope that by the end of March, the docs will be completely reworked from the ground up in a way that makes them far friendlier to beginners and veterans alike.

Looking Forward

The initial version of release tooling in Mix will be relatively simple in comparison to Distillery; it will support console and foreground modes, but not daemon mode, it will not support hot upgrades/downgrades, and will likely have limited extension points (i.e. plugins/hooks). That said, we plan to bring these things back in some form or another as quickly as possible.

The question of how to deal with cross-compilation, for example building on your Macbook and deploying to a Digital Ocean VM running Debian, is very much still open. My intent is to provide much more detailed docs on how to get started with properly preparing for deployments like this, and rely on those as the primary solution for the near future; but I am always looking for ways by which we can make that process smoother. Nerves projects have the benefit of their toolchain, but I’m not yet convinced that it carries over well to non-embedded projects. I am keeping my mind open about this though, and always welcome feedback on the topic!

What About Distillery?

Distillery will continue to be the primary release tooling for the near future. I am planning to port some of these changes to Distillery itself, namely the Mix configuration support. That said, once releases are part of Mix, Distillery will be rewritten to build on top of that tooling and extend it where needed. If it turns out that it is not needed, then all the better! Ideally, once we get releases in Mix itself, then Distillery can be deprecated, and we can all focus our attention on a single path forward.

TL;DR

If you didn’t care about all of that stuff, and just want to know the major strokes:

  • Mix will be extended with the ability to generate OTP releases
  • The mix run and iex -S mix commands will be modified to use OTP releases
  • Mix configuration (i.e. config.exs) will be fully supported
  • Mix tasks may be supported in the future, but that is likely something for a later phase
  • You may see better errors for node crashes, it depends on some external factors
  • Expect much better and richer documentation in the next month or so

While no commitment has been made, we’re expecting that the initial release of this tooling will be made available as part of Elixir 1.7. I will continue to provide updates moving forward on a monthly basis. Feel free to reach out with any feedback you may have!