I gave a talk the other day at Python meetup in Stockholm with the same title. I wanted to also put it online as a post in case anybody is interested in using Rust from Python. (If you fancy watching a video instead, scroll to the bottom).
I like working with Python. I have been working with it for a few years building web-applications. It is an expressive language, there are libraries for almost everything I need, it is quick to try out ideas or to build features and I feel productive.
However, for critical paths in applications, Python is not a great fit. These paths could be parts of code which are executed very often or which need to be as quick as possible. For example, Jinja2 is a very popular templating library in Python. In turn, Jinja2 depends on MarkupSafe. In order to make escaping strings as fast as possible, the functionality is implemented in C.
This has also been the recommended approach to solving performance problems in Python — to write C/C++ extensions. However, low level languages like C/C++ don’t provide any safety to the programmer and this many times leads to vulnerabilities like Heartbleed or Ghost. These exploits were due to buffer overflow. These exploits also happened in a very widely deployed codebase. The point is, it is quite hard to manage memory properly when there is no support from the language.
Rust is an open source, modern, systems programming language sponsored by Mozilla. It is sponsored by Mozilla in the sense that, several of the core contributors are employees of Mozilla and work full time on Rust. The language grew out of a personal project by Mozilla employee Graydon Hoare and was first announced in 2010. V1.0 was released in May 2015, and there is a new release every 6 weeks. I consider Rust as a young teenager (or maybe just-starting-school) bringing fresh ideas to programming languages. It probably moved out of infancy when companies started using it in production.
Of the several features that Rust has, I find 3 are particularly relevant to building Python extensions using Rust — zero-cost abstractions, minimal runtime and guaranteed memory safety.
Zero cost abstractions is something Rust borrows from C++.
C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better
So this language principle would mean that, for features of the language that I don’t use, there should be no penalty. For features of the language that I do use, it shouldn’t be possible to do any better by going down in the stack (i.e. by writing generated machine instructions, for example). One obvious evidence of this is in the trait implementation. Traits is one of the cornerstone’s of the abstraction model in Rust. It allows us to add behaviour to types in a variety of cases. But the design principle and the implementation of this feature guarantees that there is no penalty/overhead for using Traits. See this excellent post for details.
Minimal runtime ensures that there is no overhead when we run our code. For example, in a dynamic language, the runtime determines where the method should be dispatched. In a language with garbage collection, the runtime would have to periodically stop execution while the garbage collector runs. The minimal runtime of Rust guarantees that the program will not suffer this overhead.
Guaranteed memory safety means that the program only accesses memory that belongs to it. It shouldn’t be possible to write past this memory or access it after it has been released. Also, remember that there is no garbage collector in Rust. So in this case, the language constructs need to ensure that these guarantees are fulfilled. It is solved in Rust using ownership and borrowing.
Let us now create a ‘hello-world’ of a Rust extension that we can call from Python. I will use the cffi package in Python to interface between the two languages.
Finally, to show you the difference in performance, I will consider the following problem — I want to read a text file, split it into words on whitespace, and find the most common words in that file. This is useful, for example, in writing a spelling-corrector. It is simple to do this in Python since it has a built in data structure. collections.Counter is a kind of dictionary and has a method called most_common, that returns the common words with their count. Rust doesn’t have this method. So we will write an implementation of most_common that takes a dictionary (HashMap in Rust) as input and returns the most common key after aggregating.
Before I show the difference in numbers, I want to take a moment to talk about the method signature in Rust.
/// Find the most common words in the bag based on
/// number of occurrences
fn n_most_common<T>(bag: HashMap<T, usize>, n: usize) -> Vec<(T, usize)> where T: Eq + Hash + Clone
So the n_most_common method takes a HashMap (dictionary) as the first argument. The key in this HashMap is a generic type T but the where clause specifies that this generic type T has to implement 3 traits — Eq, Hash and Clone. This is exactly same in a Python dictionary, where the keys have be hashable. But in the case of Rust, this is checked at compile type and doesn’t need unit-tests for this behaviour to hold.
As for the difference in performance, I used the same file that was used in the original article. Make 3/4 copies of this file and put it in ‘data’ directory (the file names don’t matter).
My take aways from this exercise are — types in Rust are awesome and it is relatively easy to move code from Python to Rust when performance matters.
Something I didn’t explore was to check if the benefits will be much higher for concurrent code.
You can also watch a recording of the video, but if you have already read so far, there is nothing new in the video. I start at 3 mins.
Happy to hear comments/suggestions as comments below or on twitter.