Unlocking the power of Transformers with Bumblebee

Introduction

Perhaps the most popular machine learning library in existence is the HuggingFace Transformers library.

HuggingFace is a platform focused on democratizing artificial intelligence. Their transformers library implements hundreds of transformer models with which you can load pre-trained checkpoints from the most popular AI labs around the world including Facebook, Microsoft, and Google.

The transformers library also simplifies the process of using these models for both training and inference. With HuggingFace Transformers, you can implement machine translation, text completion, text summarization, zero-shot classification, and much more in just a few lines of Python code.

And now, with Elixir’s recently announced Bumblebee library, you can do all of the same things with a few lines of Elixir code.

What are Transformers?

Transformer models are a type of deep learning architecture that have achieved state-of-the-art performance in a wide range of tasks in natural language processing, computer vision, and more. The most famous models in the tech world including GPT3 and DALL-E make use of a transformer architecture.

Transformer models, in contrast to other architectures, scale exceptionally well.

More data and a larger model generally yield better performance for transformers. While scale yields better models, it also requires more resources to train. That means it’s more difficult for individual researchers and small businesses to take advantage of the power of transformers.

To bridge this gap, HuggingFace offers a hub for large labs to resource their models to the public–allowing individuals to take advantage of massive models without the need to train them on their own.

What is Bumblebee?

Bumblebee is a project out of the Elixir Nx ecosystem which aims to implement a number of pre-trained Axon models and integrate with popular model “hubs”–most notably HuggingFace Hub.

The Python ecosystem has a number of model hubs that essentially act as git repositories for pre-trained model parameters and checkpoints. The HuggingFace hub is easily the most popular with over 60,000 pre-trained models and millions (or more) of downloads across all models.

Access to pre-trained models and accessible machine learning applications is a massive advantage of the Python ecosystem. Bumblebee is an attempt to bridge the gap between the Python ecosystem and the Elixir ecosystem–without imposing any runtime constraints on the user.

The Bumblebee library is written in 100% pure Elixir. All of the models in Bumblebee are implemented in Axon and converted from supported checkpoints using pure Elixir code. At the time of this writing, Bumblebee interfaces with HuggingFace hub and can convert PyTorch checkpoints to Axon parameters for use directly with Axon models.

Much like HuggingFace Transformers, Bumblebee also aims to simplify common tasks around machine learning models such as text generation models, translation models, and more.

All of this code is written in Nx, meaning that Bumblebee can support pluggable backends and compilers out of the box. Depending on your use case, you can use the same library to target deployment scenarios at the edge all the way up to massive server deployments–just by changing the Nx backend or compiler.

At the time of this writing, Bumblebee is still in the early phases of development; however, it’s already quite powerful. In this post, I’ll highlight some of the incredible things you can do with Bumblebee right now.

Setup

Before running these examples, you’ll want to fire up a Livebook and install the following dependencies:

Mix.install([
  {:bumblebee, "~> 0.1.0"},
  {:axon, "~> 0.3"},
  {:exla, "~> 0.4"},
  {:nx, "~> 0.4"}
])

You’ll also want to configure your environment to use EXLA:

Nx.global_default_backend(EXLA.Backend)

If you have access to a GPU or another accelerator, it will speed up these examples, but it’s not necessary to run them.

Text Summarization

Text summarization is a language understanding task that requires a model to produce a condensed representation of a longform text. With the unreasonable effectiveness of transformer models, text summarization is relatively easy. With Bumblebee, you can implement text summarization in a few lines of Elixir code. First, you just need to load a model:

model_name = "facebook/bart-large-cnn"

{:ok, model} = Bumblebee.load_model({:hf, model_name})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, model_name})

The model name is the name of a model checkpoint from the HuggingFace hub. This example uses a pre-trained model from Facebook. Bumblebee.load_model/1 and Bumblebee.load_tokenizer/1 do the work of importing the correct model, parameters, and tokenizer from the HuggingFace Hub. Both load_model/1 and load_tokenizer/1 are designed to support other types of model hubs. So you need the {:hf, model} tuple to tell Bumblebee that the model you’re trying to pull is from HuggingFace.

Summarization is really just a generation task:

serving = Bumblebee.Text.Generation.generation(model, tokenizer, min_length: 10, max_length: 20)

This will return an %Nx.Serving{} struct which wraps the generation logic in an easy-to-use struct. Now you can run your task with:

article = """
Elixir is a dynamic, functional language for building scalable and maintainable applications.\
Elixir leverages the Erlang VM, known for running low-latency, distributed, and fault-tolerant systems. Elixir is successfully used in web development, embedded software, data ingestion, and multimedia processing, across a wide range of industries
"""

Nx.Serving.run(serving, article)

And after a while you will see the summary:

"Elixir is a dynamic, functional language for building scalable and maintainable applications."

Not too bad! With just three function calls you have a pretty powerful text summarization implementation!

Bumblebee.apply_tokenizer/2 tokenizes the input into discrete integer tokens for the model to consume. Bumblebee’s tokenizers are backed by HuggingFace’s fast Rust tokenizer implementations.

The next line Bumblebee.Text.Generation.generate uses your pre-trained model and parameters to perform text generation based on the input text. In this case, the goal of the generation task is to generate a summary of the input text.

Finally, the Bumblebee.Tokenizer.decode/2 function takes the integer tokens generated and maps them back to string values which you can understand and interpret.

Machine Translation

Another common language understanding task is machine translation. If you think about it, machine translation is very similar to text summarization; however, rather than mapping a longer representation to a shorter one, the goal is to map a representation in one language to an equivalent representation in another.

It should make sense then that you can implement the machine translation task in Bumblebee just by changing the model from your previous example:

model_name = "facebook/mbart-large-en-ro"

{:ok, model} = Bumblebee.load_model({:hf, model_name},
  module: Bumblebee.Text.Mbart,
  architecture: :for_conditional_generation
)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, model_name})

article = """
Elixir is a dynamic, functional language for building scalable and maintainable applications.
"""

serving = Bumblebee.Text.Generation.generation(model, tokenizer,
  max_new_tokens: 20,
  forced_bos_token_id: 250041
)
Nx.Serving.run(serving, article)

And after a while you will see the following output:

"un limbaj dinamic, funcţional pentru ameliora aplicaţiile cu o capacitate ridicat"

Not much has changed here! All you need to do is change the model to a machine translation model (also from Facebook), and add an additional option to Bumblebee.Text.Generation.generate/4.

This option essentially tells the Bumblebee model that the translation should be in Romanian. By changing the token to one of the codes associated with any of the model’s supported languages, you can translate English sentences to one of any number of the model’s supported languages!

Text Completion

Perhaps the first large language model to go mainstream was GPT3. Its completions endpoint can generate incredibly realistic text that can be useful in a number of applications such as creating realistic chatbots. With Bumblebee, you can make use of GPT3’s predecessor: GPT2.

All you need to do is load the model:

model_name = "gpt2"

{:ok, model} = Bumblebee.load_model({:hf, model_name}, architecture: :for_conditional_generation)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, model_name})

Notice you need to specify the architecture here to tell Bumblebee you’d like to use this model for a generation task. Now with a simple prompt and a few lines of Elixir:

prompt = "Elixir is a dynamic, functional language for"

serving = Bumblebee.Text.Generation.generation(model, tokenizer, min_length: 10, max_length: 30)
Nx.Serving.run(serving, prompt)

And after a while you will see:

"Elixir is a dynamic, functional language for building and manipulating data structures. It is a powerful tool"

Once again, not bad! And for only a few lines of Elixir code, I would say this is pretty impressive!

And much more…

These examples only scratch the surface of what Bumblebee is truly capable of. Bumblebee also supports a number of vision models that were not highlighted here.

In addition to the tasks presented here, you can also use models loaded in Bumblebee for zero and one-shot classification, question answering, sentence similarity, and much more. Additionally, you can make use of Bumblebee as a medium for importing pre-trained checkpoints for fine-tuning in Axon. This means you can tailor powerful models to your own specific use cases.

Of course, Bumblebee is still in its early stages, but it’s a promising step in the right direction that enables Elixir programmers to take advantage of powerful pre-trained models with minimal code.

If there are models you’d like to see implemented, or pipelines not yet built-in to Bumblebee you’d like to see, feel free to open an issue. Additionally, if you’d like to get involved, we’d love to have you work with us in the Erlang Ecosystem Foundation Machine Learning Working Group.

Until next time :)