What Is Rust
74 Days Until I Can Walk
Introduction
This article is the first in what I hope will be a series of approximately ten articles discussing the Rust programming language. I am learning about the language as I write these, and so I do not speak from any place of authority. These articles are subject to updates, revisions and corrections as I learn more. I do not have a plan for what these ten articles will contain, but rather will be guided by whatever catches my interest, or raises questions for me as I develop my understanding of Rust.
One of the more exciting aspects of getting involved in the Rust community is the sense that anyone may be come a real, and meaningful contributor to the project. Even at this early stage in my own journey, I believe that with some work I may be able to make a real impact on those working with, and learning Rust. My hope is that these articles will provide some kind of road-map for people like me who would like to become active participants in the Rust project.
This week marks the end of my first seven days writing code in Rust. It has been a very fun experience, and I have documented the day-to-day happenings of that process in my dev-logs on the Fourteen Screws website. My intention is for these articles to be a lot more detailed, and more dynamic as well. The blog posts will remain as written, but these posts will update as I learn more.
What is Rust?
In a nutshell, Rust is a system programming language designed with a focus on memory and thread safety. It began as a personal project by Graydon Hoare in 2006, while he was working for Mozilla Research. Mozilla officially backed the project in 2009 and the first stable release (Version 1.0.0) was announced on May 15th, 2015. As of the time of writing, the latest stable version of Rust is 1.70.0.
But what is Rust? What do we mean by “system programming language”? What do we mean by “memory and thread safety”? How does Rust deliver on its promises of being a “safe” language. These questions will be addressed in turn in the sections below.
Systems Programming Language
Systems programming generally means writing software that is designed to speak directly with hardware—for example, operating systems or embedded systems. It may also be called “low-level programming”. Software written in this space often needs to be extremely fast, and extremely small. Historically, C and C++ have been the big heavy-hitters in the systems programming world. Famously, the Linux kernel is written in C, meaning that the C language is used to instruct hardware ranging from your mobile phone, to satellites in orbit, and innumerable systems in between.
However, programming at this level comes with a number of unique challenges. Because we are so much closer to the metal, and because speed and size are so important, we as the developers generally take greater responsibility for managing the resources that we use. For example we may need to explicitly tell the machine to allocate some memory for us. We need to remember to free that memory when we are finished with it. We need to think about whether we are copying data around or simply sharing references between functions and threads. Minor oversights in the management of system resources can have serious consequences, such as the shellshock bug in 2014, or the Therac-25 which killed or maimed 6 patients in the 1980s.
Rust aims to be a safe systems programming language. With it we should have all the control and speed that we would get from working with C/C++. But we should also have safeguards in place which stop us from accidentally mishandling system resources. For the purposes of this article, I want to discuss two challenging areas that Rust directly addresses, these being memory safety and thread safety.
Memory and Thread Safety
Let’s say, hypothetically we write an application which requests 1 MB of memory from the operating system. When we are finished with this memory, we must return it to the OS, otherwise we have what is called a memory leak. If we have a leak, and this memory allocation takes place within a loop that fires 1,000 times per second, then every second we will allocate and lose 1 GB of system memory, ultimately likely leading to a system crash.
Languages such as Java handle this problem using a garbage collector (GC). The GC periodically wakes up and inspects the memory of your application while it is running. Wherever it finds memory that you are no longer using, it frees that memory and returns it to the OS. The problem is that this process is too slow when we are programming systems. We cannot afford to periodically pause our application so that a garbage collector can loop over our memory and free resources.
So how does a language like Rust protect us from memory leaks without incurring an expensive runtime penalty?
Furthermore, what happens if we have two variables which both point to the same memory address and one of those variables frees the memory, leaving the second variable pointing at an invalid location?
Before answering this question, let us quickly look at thread safety. Threads in an application allow us to perform several tasks in parallel. If done correctly, this essentially allows our application to multitask, performing several different jobs at the same time, improving the speed of our application. However, if multiple threads are sharing and using the same resources, then we run the risk of having a race condition which can lead to unexpected behaviour.
Consider two threads \(A\) and \(B\), both of which access and increment a variable count
. Initially, let us say count=0
. \(A\) and \(B\) run at the same time and both attempt to access count
. Let’s say \(A\) gets there first and reads the value of count
. \(A\) does some processing and sets count=1
. However, before it can write this result back to memory, \(B\) also reads count
and gets the value 0. \(A\) now writes back to memory and count
is set to 1. \(B\) then increments its value of count, setting count=1
(because the value \(B\) received was 0). \(B\) then writes back to memory. The final value for count is 1, despite the fact that it has been incremented twice and should have the value 2.
This is a typical example of a problem faced when working with concurrent systems.
Rust tackles both of these issues at compile time rather than at runtime using a concept of ownership and borrowing.
Ownership and Borrowing
Core to the whole idea of Rust is the concept of Ownership and Borrowing. These concepts are so fundamental that they were mentioned as the focus of the roadmap to releasing Rust 1.0.
First, let us start with the concept of a lifetime. A variable’s lifetime spans from when it comes into scope, until it passes out of scope. When a variable passes out of scope, the resources that it is pointing to should be freed. This, at least, helps to prevent memory leaks. But how do we deal with the issue of releasing memory that is pointed to by multiple variables. For this we have ownership and borrowing.
In the code above we declare a variable x
and assign it to be a vector containing three ints. We then assign the value of x
to a new variable y
. Rather than y
storing a reference to the same memory as x
, or y
receiving a copy of the vector (duplicating the memory), y
actually takes ownership of the vector. This means that x
is no longer a valid variable. This code will not compile because the println!"
statement is attempting to print x
after y
has taken ownership of its value.
This applies to functions as well. The code below exhibits the exact same behaviour because the function do_something
takes ownership of the value of x
before x is printed.
However, the code below is perfectly valid as we declare x
, use it, and then allow do_something
to take ownership of its value
However, it is possible to pass a value around without needing to invalidate the variable to which it is assigned. We do this using references, which allows a variable to borrow the value of another variable. The code below will compile just fine because y
only stores a reference to x
, it does not take ownership of x
’s value.
Here is where the concept of lifetimes becomes important. Because we know which variable owns the value, any other variables which reference the value cannot access it after the owner passes out of scope. Consider the code below:
y
is the owner of the vector, and x
references it, but because y
passes out of scope before we print x
, the compiler knows that x
is trying to access invalid memory and can flag this.
Variables in Rust are immutable by default. That means that in order to be able to modify a variable, we must explicitly declare that it is mutable as shown below:
References too can be declared as mutable. However we can only have one mutable reference to a variable at a time, and we cannot have a mutable reference if immutable references already exist;
Hence if we have multiple threads trying to modify the same resource, the compiler can see that this will happen and will throw an error.
Because the Rust compiler is aware of lifetimes, and this concept of ownership and borrowing, it is possible to analyse your code at compile time and determine when it is safe to free resources, or when there is a risk that multiple threads will attempt to write to the same resource. So rather than checking for problems at runtime, as happens with a garbage collector, Rust resolves issues before your application even starts meaning you don’t incur the overhead of GC, but still benefit from the protections of automatic memory management.
Of course, sometimes we do need multiple threads to access and write to the same memory, and there are ways to do this. But you must actively make decisions in order to sacrifice safety. This is very different to the developer experience when working with C/C++ where it is easy to accidentally deallocate some memory that is still being pointed to elsewhere, or have multiple threads attempt to modify a shared resource.
Interestingly, the concept of “safety” in Rust is more than just a lofty claim. It is possible to mathematically prove that Rust functions are safe, even when they contain code that is marked as unsafe. This has been demonstrated by Ralf Jung and the RustBelt project.
Who Owns Rust?
While Rust started as a personal project by Graydon Hoare, he stepped away from the project in 2013 due to a combination of burnout and challenges in his personal life. The project was backed by Mozilla until 2020 when, due to the COVID pandemic and widespread layoffs it was decided that Rust should exist as its own distinct legal entity. This led to the establishment of the Rust Foundation, an independent non-profit organisation run by a small full-time team, and governed by a board of directors. The Rust Foundation now has legal ownership of the Rust trademark.
The Rust Foundation is distinct from The Rust Project which is responsible for the actual development of the Rust programming language. The Project is divided into several teams, with each being responsible for developing a different aspect of the Rust language. Contributors are almost entirely volunteers, although I believe there are a small number of exceptions who are now paid through the Rust Foundation.
Plans for future enhancements and development of Rust are submitted using an RFC (Request For Comment) document. And RFC is submitted, debated, and if it is accepted it is given a unique ID and added to the RFC book
Rust Release Cycle
There are three release of Rust at any given time:
- Nightly
- Beta
- Stable
Rust releases a new stable version every six weeks. As of the time of writing, the current version of Rust is 1.70.0. The “nightly” is released every night and contains experimental features which may be of use to some developers. The beta is forked from the nightly branch and will ultimately become a stable release after testing and bug fixes.
Developers contributing to Rust commit their code directly to the main branch of the Rust repository.
Conclusion
I’ve written this article in more of a rush than I had initially intended, so it is more high level than I would like. I’ve learned a lot about the Rust Project and the Rust Foundation while writing it, however. Still, I think I will periodically revisit this page and improve it over time. There is an awful lot more that could be said, and more resources which could comfortably be cited here.