This post is a high level primer on what is happening “under the hood” with our code. It will lend some insight into what terminology like “tokenizing,” “interpreting,” “compiling,” and a host of other terms mean. You’ll gain a better sense of what the concept of a virtual machine encapsulates. And hopefully you’ll leave with a better understanding of what your script is doing before it hits your computer’s processor.
Machine code is binary that is executed directly by your computer’s CPU. The bit patterns correspond directly to the architecture design of the processor.
Before a statement in a scripted language becomes machine code, it gets compiled into machine code by a compiler.
LLVM compiles code on most Unix-based machines. It generates the machine code for the processor during compilation, which is just the process of translating one language to another.
The virtual machine executes your code. It’s written in C and is known as the YARV interpreter. It is at the heart of a scripting languages “implementation,” as it executes the source code via whatever language the scripting language is built upon (C in the case of Ruby MRI).
YARV doesn’t receive the Ruby statement as you typed it. It goes through an abstraction of your code known as an Abstract Syntax Tree (AST), which get compiled to YARV byte code and run.
This “tree” is made up of nodes assembled by something called the parser.
You can think of a node on the Abstract Syntax Tree as an atomic representation of a Ruby grammar rule. The reason that Ruby knows to print “Hello, World” when it sees
print 'Hello, World' is because the parser knows that
'Hello, World' is its argument. These syntax rules are located inside of a language’s grammar rule file.
Again, the parser creates the Abstract Syntax Tree that the virtual machine compiles and interprets.
If you’re wondering how Ruby knows that
'Hello, World', then you’re understanding the function of the Lexer or Tokenizer. The Tokenizer scans your line of Ruby code, character-by-character and determines where the “words” of the language begin and end. The Tokenizer can tell the difference between a space separating words and a space separating a method name from its arguments.
And that’s the 10,000 foot lifecycle of a Ruby statement, as it goes from Tokenization to becoming machine code. If you’re looking for the microscopic explanation, I’d recommend Ruby Under a Microscope.