Rust

Like every other developer on the internet, I’ve been watching the rise of Rust with interest. I’ve spent the majority of my career in high-level language land, although I did do a lot of C++ once upon a time, and Rust may have passed me by with it’s focus around low-level concerns, bringing back bad memories of C++¹. However, the sheer buzz surrounding it, and the fact that they’re now close to version 1.0 made me take another look.

And I’m glad I did take another look as it has rapidly progressed since the last time I looked into it. It’s good to see a new language with new ideas, rather than a “a mix of X and Y” which seems more common. It bills itself as a “systems language”, which it certainly is, but it also has a wide-array of high-level features too - first-class functions, higher-order functions, etc., being particularly interesting to see as someone with prior experience of functional programming languages.

Inspired by this, I decided to write something in Rust. Particularly the LazySort algorithm I’ve used for testing previously².

Equivalences and differences

The first and most obvious difference between Clojure and Rust is that Rust is statically typed, and very strictly statically typed too, there’s no Java-style bodging and casting, things aren’t nullable by default. This means there will be no general lazy-seq as per Clojure.

The second and most fundamental difference between Rust and, well, everything really, is it’s approach to memory management. It is not garbage collected, nor does it have C-style manual memory management³. Instead it has a concept of a lifetime: everything allocated has a deterministic point where it will be deallocated; and passing these values around by reference involves the compiler checking that the borrowed reference will not out-live the lifetime. To return a value up the stack, higher than the place it was allocated, you either need to: copy the value, or return a boxed value, which is a special type of reference that can be moved to a different owner. This is only the tip-of-the-iceberg into the compile-time checking Rust performs, it also protects against data-races by allowing only one mutable reference to exist at any point-in-time, for example.

But it does have something that behaves a bit like a Clojure lazy sequence: the Iterator. So in theory it will be possible to create a kind of ‘sorted’ iterator that will return the values in a collection in sorted order, by sorting the underlying collection element-by-element, in the same way the Clojure implementation did.

The implementation

The finished version is available here: https://github.com/benashford/rust-lazysort, and also on crates.io (the Rust package platform) here: https://crates.io/crates/lazysort

One of the higher-level features that Rust supports is extension methods, which proved very useful in this case, it enabled me to add a method to any iterator which would convert it into a sorted iterator. This allows it to be chained with other Iterator functions:

The compiler’s impressive type-checking is so sophisticated that trying to call sorted() on an iterator of T is a compile-error unless T implements the Ord trait, which means the type has a default order. The function sorted simply doesn’t exist otherwise. There is a second function sorted_by that can be called on any iterator, regardless of content, this requires that a closure be given defining the order:

The extension methods are hygenic in the sense that they only apply if you import them:

The algorithm

Originally it was a fairly naive port of the Clojure version, i.e. it used immutable vectors containing the broken-down underlying data as it was gradually sorted. This was later replaced with a version that was more idiomatic for Rust. It was the same algorithm, a lazy quicksort, but the second one was in-place rather than immutable. Although it worked on a copy of the underlying data, or original vector (or whatever it may have been) was untouched and didn’t need to be marked as mutable.

Performance

Rust aims for C++-levels of performance, which is a very high-bar indeed. It uses LLVM for the final compilation steps, so it already takes advantage of a mature compiler. Rust is also a fast-moving target, so much so that during the short period I was working on this, testing it against nightly builds the measured performance got noticibly faster without me having to do anything.

This, however, made judging the utility of lazy-sorting as a concept really quite difficult. At one stage the cost of lazily-sorting a full large vector was only 15% worse than the sort function in the standard library; small enough that a developer could consider using it for all purposes, not just the usual optimisation for short-cutting the sorting process of only partially consuming the full set of data. But at the time of writing, even though my code got faster with the latest nightly builds of Rust, the standard library sort got much, much faster! The end result is that the overhead for a full large vector is around 2.5 times, so probably not something you’d want to use all the time.

However the same benefits as found in the Clojure version were also found in the Rust version - e.g. taking the top 1,000 results. In the tests of 50,000 random numbers, the cross-over point is just over 10,000. Interestingly this is much better than the Clojure version, where the cross-over is around 1,250.

The headline numbers are that the Rust version is significantly faster than the Clojure version. Using the test of picking the top 1,000 out of 50,000 random numbers; the Clojure version manages an average of 11.2ms, the Rust version 0.9ms.

In conclusion

Bring on version 1.0.

Footnotes

I appreciate that C++ has changed a lot since then. In those dark days finding a compiler that supported STL was a nice surprise. ↩
Don’t worry, the next blog post will be on a different subject entirely. ↩
Not strictly true, depending on how deep you want to get into unsafe blocks. ↩