H A Q


Qoreutils: GNU Coreutils implementation in Rust

Posted on

It started with my hope to learn Rust beyond the online tutorial, some algoritmic problems and studying existing Rust projects.The wish is to write something useful, yet small enough so that's achievable, in real world. Virtually everything that can be, albeit often unnecessary, rewritten in Rust, has been rewritten in Rust already. Therefore I'm not looking to start something new but rather focusing on my learning need.

I started the Qoreutils project, a Rust implementation of GNU coreutils. There has been a fairly popular and mature endeavour which targets to be a "drop-in replacement" and "cross-platform". They are certainly not my goals (or likely within my reach due to my limited talent and time). Instead, I'm hoping to

  1. Write some of the my most familiar utilities in idiomatic Rust, targeting macOS.
  2. Favor simplicity and minimize the dependencies. This means I'll probably ignore some of the edge cases and reinvent the wheels as I see fit.
  3. Support only a subset of the features that I use the most often, i.e not a drop-in replacement.
  4. Throughout out the journey trying to gain some deeper knowledge of both the language and the utilities themselves.

This blog series is meant to document the journey and hopefully you will find it an interesting read.

ls

As the first command, I picked the one I used the most often, good old ls. I don't remember when was the last time I gave it a good read of ls's manpage. Just by skimming through the sheer length of it, I know that I definitely will not have a drop-in replacement. Although it's next to nothing, I'm actually fairly happy with what I managed to achieve

  1. Well, the project started. Making progress is better than perfect.
  2. A straighforward of scaffolding: top level workspace with individual packages, nothing fancy.
  3. Although I hope to minimize the dependencies, clap is used for argument parsing. I just can't find myself wrangling the command line parsing code first. Might come back to this, who knows.

There are several things I discovered immediately that I barely paid attention to before. ls will only display section header (the input dir looks like INPUT_DIR:\n) if multiple inputs are present. It makes total sense however I only noticed it when I'm actually writing the code and compare it to the reference implmentation.

In its barebone, ls is an expanded utility of std::fs::ReadDir. One caveat is that, ls will display . and .. when -a is selected, however std::fs::ReadDir does not include them. I mean, why would it? It leads to some not so sexy code in uutils.

tee

Tee is one of my favorite commands. It does a simple thing: duplicating standard input and that's it.

There are several things I learned along the way

Multi-writer

There was a tee method in std that is deprecated now. Essentially, to move data from a io::Read to a single io::Write, there's a convenient io::copy function. The tricky part is that I can't repeatedly call io::copy on each writer, e.g in a for loop. To still leverage io::copy, I need a "multi-writer" that a single io::write method would write to all inner writers.

Trait objects

The multi-writer needs to hold an array of "writers", which is a trait in Rust. The recommended way of having a vector of trait objects is containing each trait object inside a Box. Also dyn keyword is required now but lifetimes can be inferred.

Implementing two methods of io::Write is trivial given that I simply print the underlying individual errors. The rest is just a for loop.

base64

Base64 itself is fairly straightfoward. And I tried to implement it a more reader friendly way as opposed to optimize for performance etc. Regarding Rust, 2 things I learned this time. 1. You don't need to define the custom std::error::Error if all you wanted is a Result<T, E>, E here can simply be &'static str and you can just do Err("some error"). 2. Tests co-live alongside with the functionality is pretty neat actually.