The Rise of Memory Safe Languages

While not a single event, this has been a general trend over the past decade or two, and one that is rapidly accelerating as more and more individuals, projects, and companies adopt languages that are inherently memory safe. All computers have a large chunk of space that can be used to store data known as memory. Programs utilize this memory to store data, like game assets, text, data, and much else. All the programs running on a computer share this space, and they must coexist. However, it is extremely insecure when the data in use by one program is written over by another. It can even lead, in severe cases to remote code execution over the internet, and attackers gaining full control over your device. However, this problem can be solved, and many solutions exist, and although careful debugging and sane coding can help, language-level solutions are necissary. For a long time, we had two choices: use memory unsafe languages like C and C++, or use memory-safe, but slower languages with more unpredictable performance behaviour, such as Go, Java, and Python. However, a new language combines the benefits of both models.

This eliminates most, but not all, memory errors by guaranteeing that all memory to be used is owned by the process.

The first programming language was binary, followed closely by assembly. Both of these were most certainly not memory safe, and allowed one to do basically anything they wished, from raw access to drivers to putting the CPU in an invalid state. If not careful, you could even break hardware, and permanently brick your computer. (EFIvars) The first assembly codes had only a few instructions, like LOAD, MV and STORE, but as time went on, more and more assembly codes were added in order to make programming easier. No one wanted to have to write tons of boilerplate to do a simple task. So CISC (Complex Instruction Set Computers) were developed. These allowed tasks that took four or five instructions to do (like dividing a floating point number) to be done in one. The only downside to this was that it increased hardware complexity and reduced clock speed. Also, even with the most powerful assembly language, people needed something more. Programming languages provided a way to avoid boilerplate on an even grander scale, while not making the hardware running them any more complex. While some programming languages (like C) evolved from the bottom up as a way to simplify assembly and make programming easier, others (such as Lisp) evolved from mathematical and logical notation. (Lambda calculus, in Lisp’s case). Lisp, like some other languages, was inherently created with a form of automatic memory management. However, C does not have such facilities. All memory on the heap must be manually allocated and freed by the programmer. In addition, there are no checks on the string handling facilities, so it only takes one malformed string to crash the whole process.

C however, can be considered to be the “great grandfather” of most modern programming languages. C++, Java, C#, D, Go, and many other languages are influenced by it, whether in syntax, memory model, or any of its other features. C by nature is a simple and elegant language. It contains very few keywords, very few language features, and has absolutely nothing beyond the basic necessities. This sparse and minimalistic nature has led it to become one of the most popular and widespread programming languages of all time. It is very easy to port to a new platform, and thus is one of the most used embedded systems language. Personally, I view it as one of the best languages of all time. But such lightweightness has its downfalls. For one thing, a lack of a proper memory management system leaves memory allocation a topic filled with pitfalls. Once you have allocated an array in C, you have absolutely no way of knowing how large it is or if it has already been freed. And any mistakes invite the possibility of undefined behavior. C++, as an extension of C, has many of the same downfalls. However, it is slightly mitigated by C++’s concept of Resource Allocation Is Initialization, or RAII for short. This allows a simpler system where memory is automatically deallocated when the parent object is deleted. However, all safety comes at the price of freedom, and C++ is slightly more complex and slower due to these details. Other languages take it a step further than either of those, and automatically clean up unused memory. Examples include Java and Lisp. Both of these languages use a method called “garbage collection” where all memory that has no references linking to it is deleted. Although garbage collection has only become mainstream recently, it was first implemented by Lisp, the second oldest programming language ever created, after FORTRAN. Garbage collection requires periodic sweeps of all objects while the program is paused, to garbage collect unused patches of memory; alternately it can constantly run in the background (on another thread) checking and disposing of memory as it becomes available. These garbage collector passes slow down a the process and make it non real time, either by inducing unpredictable pauses or by just slowing down running time overall. However, the benefits of automated memory management are such that nearly all new languages have this feature to some extent, despite the tradeoff in speed and language simplicity. Programming with a GC is much simpler than dealing with memory yourself, and helps programmers avoid heartache and bugs because you don’t have to worry about when memory is in use, when its done being used, whether its already been freed, and how much memory your program is using.

Rust is a relatively new language, created by Mozilla around 2011. It brings a relatively new and unique concept that promises to have all the benefits of memory safety (with garbage collection) while not reducing speed at all. It does this via an extremely strict compiler that requires that each variable can only be mutably referenced if there aren’t any other references to it at that moment. This method allows the compiler to use the automatically detected scope of a variable to insert allocations and frees at compile time, at the scope boundaries, while guaranteeing that you won’t use it in a way that clashes with its inserted memory management, preventing you from using the variable after it’s been freed or while other parts of the program are modifying it, and alerting the programmer to other unsafe situations. The basic method used is RAII, like C++. Except, instead of being manually handled, it is now done by the compiler. All of the garuntees about the number of pointers you can have, and the allocations and frees are represented in the type system, so you can specify what kind of lifetime a pointer should have, and other things, in the types of functions, methods, and properties. Representing this kind of data in types required the Rust creators to delve into type theory, usually a domain left to acedemics and Haskell, to creat a very powerful type system that can represent many constraints and a lot of information. They are continually working to improve it, including making it more expressive, including associated types and other advanced type-system features. There are even Higher-Kinded Types, which only Haskell and Scala support currently.

However, Rust’s approach to type-system based power also has its downfalls. For one thing, it is often more difficult to program for novices, since the rules of borrow checking and the ownership model are somewhat complex and grow more so the larger the program is. Even though you can get used to the ideas of the ownership model, and eventually even be able to program within the rules without ever running into compiler issues, it still puts limits on what you can express and makes coding slower. In addition, it also makes coding certain types of data structures, like meshes or graphs, significantly more difficult, as these by nature require many mutable references. The Rust community is aware of this and is working on extending the compiler so it will allow some programs that are obviously fine but today break the rules. They plan to do this by using non-lexical lifetimes, so that the “life” of a variable isn’t tied to a lexical scope, necessarily. Despite its shortcomings, Rust has won StackOverflow’s “Most loved language” award for multiple years in a row. These were all features that contributed to my interest in Rust.

https://www.rust-lang.org/en-US/documentation.html

Works Cited
“Rust Documentation.” Rust Documentation · The Rust Programming Language, www.rust-lang.org/en-US/documentation.html.

2 thoughts on “The Rise of Memory Safe Languages”

    1. Glad to see someone saw this blog! (:
      I’ve had some experience with Clojure, which has persistent data structures built-in, and it is a very powerful new paradigm. Rust’s ability to deal with them, though, is not first class, nor built-in, and so it is still a little awkward.

Leave a Reply

Your email address will not be published. Required fields are marked *