Introduction
We just recently released Nx v0.1.0. In honor of the release, today’s post breaks down the absolute basics of Nx. If you’re interested in learning more about Nx, machine learning in Elixir, and driving the ecosystem forward, join us in the Erlang Ecosystem Foundation ML Working Group Slack.
Nx
(Numerical Elixir) is a library for creating and manipulating multidimensional arrays. It is intended to serve as the core of numerical computing and data science in the Elixir ecosystem. Programming in Nx
requires a bit of a different way of thinking. If you’re familiar with the Python ecosystem, Nx
will remind you a lot of NumPy. While this is true, there are some key differences - mostly due to the difference in language constructs between Elixir and Python. As one example, Nx tensors are completely immutable.
At the core of Nx
is the Nx.Tensor
. The Nx.Tensor
is analogous to the NumPy ndarray
or TensorFlow/PyTorch Tensor
objects. It is the main data structure the Nx
library is designed to manipulate. All of the Nx
functionality such as gradient computations, just-in-time compilation, pluggable backends, etc. are built on top of implementations of the Nx.Tensor
behavior.
In this post, we’ll go over what exactly an Nx.Tensor
is, how to create them, and how to manipulate them. This post intentionally ignores some of the more in-depth offerings within the Nx API in order to focus on the basics. If you’re interested in learning more, I suggest checking out the Nx documentation and following myself and DockYard on Twitter to stay up to date on the latest Nx
content.
Installation
Nx
is a regular Elixir library, so you can install it in the same way you would any other Elixir library. Since this post is designed for you to follow along in a Livebook, we’ll use Mix.install
:
Mix.install([
{:nx, "~> 0.1.0"}
])
Lists vs. Tensors
When you first create and inspect a tensor, you’re probably inclined to think of it as a list or a nested list of numbers:
Nx.tensor([[1, 2, 3], [4, 5, 6]])
#Nx.Tensor<
s64[2][3]
[
[1, 2, 3],
[4, 5, 6]
]
>
That line of thinking is reasonable - after all, inspecting the values yields a nested list representation of the tensor! The truth, though, is that this visual representation is just a matter of convenience. Thinking of a tensor as a nested list is misleading and might cause you to have a difficult time grasping some of the fundamental concepts in Nx
.
The Nx.Tensor
is a data structure with four key fields:
:data
:shape
:type
:names
Let’s look at each of these fields in-depth.
Tensors have data
In order to perform any computations at all, tensors need to have some underlying data which contain its values. The most common way to represent a tensor’s data is with a flat VM binary - essentially just an array of bytes. This is an important implementation detail; Nx
mostly operates on the raw bytes which represent individual values in a tensor. Those values are stored in a flat container - Nx
doesn’t operate on lists or nested lists.
Binaries are just C byte arrays, so we’re able to perform some very efficient operations on large tensors. While this gives us a nice performance boost, it also constrains us. Our tensor operations need to know what type
the byte values represent in order to perform operations correctly. This means every value in a tensor must have the same type.
Finally, the choice of representing tensor data as a flat binary leads to some interesting (and annoying) scenarios to consider. At the very least, we need to be conscious of endianness - you can’t guarantee the raw byte values of a tensor will be interpreted the same way on different machines.
Nx.tensor([[1, 2, 3], [4, 5, 6]]) |> Nx.to_binary()
<<1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0,
0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0>>
Nx.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) |> Nx.to_binary()
<<0, 0, 128, 63, 0, 0, 0, 64, 0, 0, 64, 64, 0, 0, 128, 64, 0, 0, 160, 64, 0, 0,
192, 64>>
Tensors have shape
The “nested list” representation you see when inspecting a tensor is actually a manifestation of its shape
. A tensor’s shape is best described as the size of each dimension. While two tensors might have the same underlying data, they can have different shapes, which fundamentally change the nature of the operations performed on them.
We describe a tensor’s shape with a tuple of integers: {size_d1, size_d2, ..., size_dn}
. For example, if a tensor has a shape {2, 1, 2}
, it means the tensor’s first dimension has size 2, second dimension has size 1, and third dimension has size 2:
Nx.tensor([[[1, 2]], [[3, 4]]])
#Nx.Tensor<
s64[2][1][2]
[
[
[1, 2]
],
[
[3, 4]
]
]
>
We can also describe the number of dimensions in a tensor as its rank
. As you start to work more in the scientific computing space, you’ll inevitably come across descriptions of shape which reference 0-D shapes as scalars:
Nx.tensor(1)
#Nx.Tensor<
s64
1
>
1-D shapes as vectors:
Nx.tensor([1, 2, 3])
#Nx.Tensor<
s64[3]
[1, 2, 3]
>
2-D shapes as matrices:
Nx.tensor([[1, 2, 3], [4, 5, 6]])
#Nx.Tensor<
s64[2][3]
[
[1, 2, 3],
[4, 5, 6]
]
>
and so on.
Those descriptions aren’t inaccurate, but if you have experience with advanced mathematics, the notation in Nx
will probably confuse you. This is another important note - the choice terms and notation in Nx
such as rank
and tensor
are that of ubiquity in the numerical computing space and not of mathematical correctness.
Practically speaking, a tensor’s shape tells us 2 things:
- How to traverse and index a tensor
- How to perform shape-dependent operations
Theoretically, we could write all of our operations to work on a flat binary, but that doesn’t map very well to the real-world. We reason about things with dimensionality. Let’s consider the example of an image. A common representation of images in numerical computing is {color_channels, height, width}
. A 32x32 RGB image will have shape {3, 32, 32}
. Now imagine if you were asked to access the green value of the pixel at height 5 and width 17. If you have no understanding of the tensor’s shape, this would be an impossible task. However, since you do know the shape, you just need to perform a few calculations and you’ll be able to very efficiently access any value in the tensor.
To access a tensor’s shape, you can use Nx.shape
:
Nx.shape(Nx.tensor([[1, 2, 3], [4, 5, 6]]))
{2, 3}
To access its rank, you can use Nx.rank
:
Nx.rank(Nx.tensor([[1, 2, 3], [4, 5, 6]]))
2
Tensors have names
As a consequence of working in multiple dimensions, you often want to perform operations only on certain dimensions of an input tensor. Some Nx
functions give you the option to specify an axis
or axes
to reduce, permute, traverse, slice, etc. The norm is to access axes by their index in a tensor’s shape. For example, axis 1
in shape {2, 6, 3}
is of size 6
. Unfortunately, writing code that relies on integer axis values is fragile, and difficult to read. One problem you’ll often run into is the choice of channels-first or channels-last tensor representations of images. In a channels-first configuration, the shape of an image is {channels, height, width}
. In a channels-last configuration, the shape of an image is {height, width, channels}
. Now consider that I write code which computes a grayscale representation by taking the maximum color value along an image’s color channels. If I write my code like:
defn grayscale(img), do: Nx.reduce_max(img, axes: [0])
It breaks if somebody attempts to use a channels-last representation! Nx
remedies this with named tensors. Named tensors give semantic meaning to the axes of a tensor. We can more accurately describe an image’s shape with the keyword list [channels: 3, height: 32, width: 32]
. This affords you the ability to write code like this:
defn grayscale(img), do: Nx.reduce_max(img, axes: [:channels])
To learn more about what named tensors offer, I suggest you read Tensor considered harmful which describes the initial idea.
To access the list of axes in a tensor, you can use Nx.axes
:
Nx.axes(Nx.tensor([[1, 2, 3], [4, 5, 6]]))
[0, 1]
To access the list of names in a tensor, you can use Nx.names
:
Nx.names(Nx.tensor([[1, 2, 3], [4, 5, 6]], names: [:x, :y]))
[:x, :y]
Tensors have a type
As mentioned before, a consequence of operating on binaries is the need to have tensors with homogenous types. In other words, every value in the tensor must be the same type. This is important for efficiency, which is why tensors exist - to support efficient, parallel computation. If we know that every value in a 1-D tensor is 16 bits long in memory and that the tensor is 128 bits long, we can quickly calculate that there are 8 values in it—128 / 16 = 8. We can also easily grab individual values for parallel calculation because we know that there’s a new value every 16 bits. Imagine if this weren’t the case; that is, if the first value were 8 bits long, the second value 32 bits, and so on. To count the items or divide them into groups, we’d have to walk through the entire tensor every time (a waste of time), and each value would have to declare its length (a waste of space). All tensors are instantiated with a datatype which describes their type
and size
. The type is represented as a tuple of {:type, size}
.
Valid types are:
:f
- floating point types:s
- signed integer types:u
- unsigned integer types:bf
- brain-floating point types
Valid sizes are:
- 8, 16, 32, 64 for signed and unsigned integer types
- 16, 32, 64 for floating point types
- 16 for brain floating point types
The size of the type more accurately describes its precision. While 64-bit types consume more memory and are slower to operate on, they are more precise than their 32-bit counterparts. The default integer type in Nx
is {:s, 64}
. The default float type is {:f, 32}
. When creating tensors with values that are mixed, Nx
will promote the values to the “highest” type, preferring to (for example) waste some space by representing a 16-bit float in 32 bits than to lose some of the information in a 32-bit float by chopping it to 16 bits. This is called type promotion. Type promotion is outside the scope of this post, but it’s something to be aware of.
You can get the type of a tensor with Nx.type
:
Nx.type(Nx.tensor([[1, 2, 3], [4, 5, 6]]))
{:s, 64}
Nx.type(Nx.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]))
{:f, 32}
As you can see, tensors are not lists. Tensors are a data structure designed to do one thing well: crunch numbers. Lists are much more general purpose. While I have no doubt you could probably perform the same operations the Nx
API implements on nested lists, it would be a nightmare—even more so than it already was to write implementations on flat binaries. If you can write a general purpose Nx.conv/3
implementation on a nested list, please contact me so I can learn your ways.
Creating Tensors
Now you know what a tensor is, but how can you create one? You’ve already seen one way in this post: using the Nx.tensor/2
factory function. Nx.tensor/2
provides a simple interface for creating tensors with values that are know in advance:
Nx.tensor([[1, 2, 3]])
#Nx.Tensor<
s64[1][3]
[
[1, 2, 3]
]
>
You can also specify the :type
and :names
of the tensor:
Nx.tensor([[1, 2, 3], [4, 5, 6]], type: {:f, 64}, names: [:x, :y])
#Nx.Tensor<
f64[x: 2][y: 3]
[
[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]
]
>
As you’ve seen already, if you don’t specify a type, Nx
will infer the type from the highest type in the tensor:
Nx.tensor([[1.0, 2, 3]])
#Nx.Tensor<
f32[1][3]
[
[1.0, 2.0, 3.0]
]
>
If you need more complex creation routines, Nx
offers a number of them. For example, you can create tensors of random uniform and normal values:
# between 0 and 10
Nx.random_uniform({2, 2}, 0, 10, type: {:s, 32})
#Nx.Tensor<
s32[2][2]
[
[1, 0],
[3, 9]
]
>
# mean 0, variance 2
Nx.random_normal({2, 2}, 0.0, 2.0)
#Nx.Tensor<
f32[2][2]
[
[-0.7820066213607788, -0.37923309206962585],
[-0.04907086119055748, -2.698871374130249]
]
>
You can also fill a tensor with a constant value:
Nx.broadcast(1, {2, 2})
#Nx.Tensor<
s64[2][2]
[
[1, 1],
[1, 1]
]
>
Or create a tensor which counts along an axis:
Nx.iota({5})
#Nx.Tensor<
s64[5]
[0, 1, 2, 3, 4]
>
Nx.iota({5, 5}, axis: 0)
#Nx.Tensor<
s64[5][5]
[
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]
]
>
You can also create tensors from Elixir binaries:
Nx.from_binary(<<1::64-native, 2::64-native, 3::64-native>>, {:s, 64})
#Nx.Tensor<
s64[3]
[1, 2, 3]
>
Remember, be aware of endianness!
Nx.from_binary(<<1::64, 2::64, 3::64>>, {:s, 64})
#Nx.Tensor<
s64[3]
[72057594037927936, 144115188075855872, 216172782113783808]
>
Finally, if you’re coming over from NumPy, you can load tensors in from NumPy files:
Nx.from_numpy("path/to/numpy.npy")
Or NumPy archives:
Nx.from_numpy("path/to/numpy_archive.npz")
When you start working with Nx
, you’ll quickly realize a lot of your time is spent trying to get your data into a tensor. Right now the ecosystem has relatively good support for creating tensors from images (stb_image and nx_evision) and structured data (Explorer), but lacks in other areas such as text, audio, signals, and videos. If you’d like to see any of these bumped up in priority, feel free to reach out with your use case.
Manipulating Tensor Shapes
So now you have a tensor, but it’s in the wrong shape! Can you change it? Yes! Nx
has a number of shape manipulation functions in its API. Let’s look at a few:
Reshape
The simplest shape manipulation you might want to do is a basic reshape:
Nx.tensor([[1, 2, 3], [4, 5, 6]])
|> Nx.reshape({6})
#Nx.Tensor<
s64[6]
[1, 2, 3, 4, 5, 6]
>
Reshaping is a constant-time operation—it only actually changes the shape property of a tensor. Remember, the data itself is still a flat binary. You are only changing the “view” of the data.
When you’re reshaping, the “size” of the tensor must stay the same. You can’t reshape a tensor with shape {2, 3}
to a tensor with shape {8}
. You can also Nx.reshape
to change a tensor’s names:
Nx.tensor([[1, 2, 3], [4, 5, 6]])
|> Nx.reshape({2, 3}, [:x, :y])
#Nx.Tensor<
s64[2][3]
[
[1, 2, 3],
[4, 5, 6]
]
>
Transpose
While messing around with Nx.reshape
, you might have attempted to permute the dimensions of the tensor with something like:
Nx.tensor([[1, 2, 3], [4, 5, 6]])
|> Nx.reshape({3, 2})
#Nx.Tensor<
s64[3][2]
[
[1, 2],
[3, 4],
[5, 6]
]
>
only to be surprised by the result. What you actually wanted to do was transpose the tensor:
Nx.transpose(Nx.tensor([[1, 2, 3], [4, 5, 6]]))
#Nx.Tensor<
s64[3][2]
[
[1, 4],
[2, 5],
[3, 6]
]
>
Transposing a tensor reorders the dimensions of the tensor according to the permutation you give it. It’s easier to see this happening with named tensors:
Nx.tensor([[[1], [2]], [[3], [4]]], names: [:x, :y, :z])
|> Nx.transpose(axes: [:z, :x, :y])
#Nx.Tensor<
s64[z: 1][x: 2][y: 2]
[
[
[1, 2],
[3, 4]
]
]
>
Notice how dimension :z
is now where dimension :x
was, dimension :x
is now where dimension :y
was, and dimension :y
is now where dimension :z
was.
Adding and Squeezing Axes
If you have a tensor with a “scalar” shape {}
, and you want to give it some dimensionality, you can use Nx.new_axis
:
Nx.tensor(1)
|> Nx.new_axis(-1, :baz)
|> Nx.new_axis(-1, :bar)
|> Nx.new_axis(-1, :foo)
#Nx.Tensor<
s64[baz: 1][bar: 1][foo: 1]
[
[
[1]
]
]
>
Nx.new_axis/3
will insert a new axis in the given position (-1
means at the end of the shape) with the given name. The new axis is always size 1
. Alternatively, you might want to get rid of 1
sized dimensions. You can do this with Nx.squeeze
:
Nx.tensor([[[[[[[[1]]]]]]]])
|> Nx.squeeze()
#Nx.Tensor<
s64
1
>
Nx.squeeze/1
will “squeeze out” any 1-sized dimensions in the tensor.
You might be thinking, “What’s so special about these functions? Couldn’t I have just reshaped the tensors?” Absolutely. As a matter of fact, all of these functions are built on top of an Nx.reshape
operation. However, using Nx.new_axis
and Nx.squeeze
is a much better illustration of your intent than simply reshaping.
Manipulating Tensor Types
On top of manipulating shape, you might want to manipulate a tensor’s type. There are two methods which allow you to do this: Nx.as_type
and Nx.bitcast
.
Nx.as_type
does an element-wise type conversion:
Nx.tensor([[1.0, 2.0, -3.0], [4.0, 5.0, 6.0]])
|> Nx.as_type({:s, 64})
#Nx.Tensor<
s64[2][3]
[
[1, 2, -3],
[4, 5, 6]
]
>
You should note that if you are “downcasting”, this conversion can result in underflow, overflow, or a loss of precision and cause some hard-to-debug issues:
Nx.tensor([[1.6, 2.8, -1.2], [3.5, 2.3, 3.2]])
|> Nx.as_type({:u, 8})
#Nx.Tensor<
u8[2][3]
[
[1, 2, 255],
[3, 2, 3]
]
>
The Nx.as_type
operation returns entirely new bytes for the underlying tensor data. Alternatively, Nx.bitcast
just returns a new “view” of the tensor data. Rather than interpreting the bytes as {:f, 32}
, you might want to interpret them as {:s, 64}
. This means a bitcast is also a constant-time operation, but there are no guarantees about its behavior.
Nx.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
|> Nx.bitcast({:s, 32})
#Nx.Tensor<
s32[2][3]
[
[1065353216, 1073741824, 1077936128],
[1082130432, 1084227584, 1086324736]
]
>
Basic Tensor Operations
So you’ve created some tensors, got them in the right shape and type, and now you want to do something with them. But, what can you actually do? A lot!
The most basic operations you can perform on tensors are element-wise unary operations. These operations “loop” through each value in the tensor and perform some mathematical operation on the value to return a new value in its place. For example, you can compute the element-wise exponential with Nx.exp
:
Nx.exp(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[2.7182817459106445, 7.389056205749512, 20.08553695678711]
>
Or you can compute element-wise trigonometric functions:
Nx.sin(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[0.8414709568023682, 0.9092974066734314, 0.14112000167369843]
>
Nx.cos(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[0.5403022766113281, -0.416146844625473, -0.9899924993515015]
>
Nx.tan(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[1.5574077367782593, -2.185039758682251, -0.14254654943943024]
>
Or even an element-wise natural log:
Nx.log(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[0.0, 0.6931471824645996, 1.0986123085021973]
>
If it helps to think of these functions as an Enum.map/2
then you can, just remember that tensors are not lists and do not implement the Enumerable protocol. The element-wise implementations on tensors are more efficient than calling an Enum.map
. This is because if you’re using a compiler or backend, you’ll be able to take advantage of specialized kernels which target the CPU or GPU. Additionally, as the dimensionality of your tensor increases, so to would the “nesting” of your Enum.map/2
implementation which attempts to mimic the element-wise operation:
Nx.sin(Nx.tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]))
#Nx.Tensor<
f32[2][2][3]
[
[
[0.8414709568023682, 0.9092974066734314, 0.14112000167369843],
[-0.756802499294281, -0.9589242935180664, -0.279415488243103]
],
[
[0.6569865942001343, 0.9893582463264465, 0.41211849451065063],
[-0.5440211296081543, -0.9999902248382568, -0.5365729331970215]
]
]
>
You should never write code that loops through the values in a tensor to perform operations on individual elements. Instead, you should write those kinds of operations in terms of the existing element-wise unary functions.
Broadcasting
On top of Nx
supporting unary operators, you can also perform a number of binary operations on tensors such as add
, subtract
, multiply
, and divide
:
Nx.add(Nx.tensor([1, 2, 3]), Nx.tensor([4, 5, 6]))
#Nx.Tensor<
s64[3]
[5, 7, 9]
>
Nx.multiply(Nx.tensor([1, 2, 3]), Nx.tensor([4, 5, 6]))
#Nx.Tensor<
s64[3]
[4, 10, 18]
>
These binary operations work element-wise. The values between the two tensors are zipped and added, multiplied, subtracted, etc. But, what happens if you encounter a situation where the shapes of the tensors don’t match? Nx
will attempt to broadcast them.
Recall from the creation examples that we used Nx.broadcast
to fill a tensor with a constant value. What Nx.broadcast
actually does is attempts to implement Nx
’s broadcasting semantics. A tensor can be broadcasted to a certain shape if:
- It is a scalar shape
{}
, OR - The size of each dimension in the tensor matches the corresponding size of each dimension in the target shape OR,
- The size of dimensions which do not match the target shape are size 1
Broadcasting gives us a way to efficiently “repeat” values without consuming any additional memory. For example, say you have two tensors with shapes {1, 1000}
and {1000, 1000}
respectively. You want to add the first tensor to the second tensor, but repeating the 1000 elements in its second dimension across the first dimension of the second tensor. Broadcasting allows you to accomplish this, without explicitly repeating values yourself:
# {1, 3}
a = Nx.tensor([[1, 2, 3]])
b = Nx.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Nx.add(a, b)
#Nx.Tensor<
s64[3][3]
[
[2, 4, 6],
[5, 7, 9],
[8, 10, 12]
]
>
Notice how in the above example [1, 2, 3]
is added to [1, 2, 3]
, [4, 5, 6]
, and [7, 8, 9]
in the second tensor. We didn’t need to do anything! Nx
took care of the repetition for us!
Aggregates
In a previous example, you saw how you could compute a grayscale image by calculating the maximum value along an image’s channels. In that example, we used the reduce_max
aggregate function. Nx
offers a number of other aggregates for computing the sum
, product
, min
, and mean
of a tensor along an axis.
To understand how these aggregates work, let’s consider Nx.sum
. You can use Nx.sum
to compute the sum of all values in the tensor:
Nx.sum(Nx.tensor([[1, 2, 3], [4, 5, 6]]))
#Nx.Tensor<
s64
21
>
Or you can use it to compute the sum along an axis. It’s easy to see what this looks like with named tensors:
Nx.sum(Nx.tensor([[1, 2, 3], [4, 5, 6]], names: [:x, :y]), axes: [:y])
#Nx.Tensor<
s64[x: 2]
[6, 15]
>
Notice how the :y
axis “disappears”. You reduce the axis away by summing all of the elements away. You can see this contraction a little better if we keep the axes:
Nx.sum(Nx.tensor([[1, 2, 3], [4, 5, 6]], names: [:x, :y]), axes: [:y], keep_axes: true)
#Nx.Tensor<
s64[x: 2][y: 1]
[
[6],
[15]
]
>
Nx.sum(Nx.tensor([[1, 2, 3], [4, 5, 6]], names: [:x, :y]), axes: [:x], keep_axes: true)
#Nx.Tensor<
s64[x: 1][y: 3]
[
[5, 7, 9]
]
>
Immutability
One final detail that is essential for you to understand about Nx
is that tensors are immutable. You cannot change the underlying data of an object. All of the operations you saw illustrated today return new tensors with brand new data. This might lead you to think, “But wait, isn’t that really inefficient? We’re copying really large binaries every time!” The answer, at least for the Elixir implementations of tensors, is yes. Fortunately though, Nx
offers pluggable backends and compilers which stage out calculations to external libraries with efficient implementations of these operations. Immutability is actually a huge benefit when constructing graphs for just-in-time compilation and automatic differentiation.
We will cover graph construction in a future post. For now, just understand that in Nx
we don’t ever change the contents of a tensor, we just call functions which return new tensors.
Conclusion
This post was designed to be an introduction to Nx
for those who haven’t worked with machine learning or numerical computing before. I hope now you feel a little more comfortable with the idea of a tensor and working with some of the functions in the Nx
API. If you want to see tensors in action, I recommend checking out some of my previous posts, and staying tuned in for future posts with applications of Nx
to the real world.
Until next time!