Qoreutils Part II
Posted on
It's been a while since I touched Qoreutils. Recently I picked it back up and added a few more utilities.
wc
Word count is deceptively simple until you think about Unicode. Counting bytes is trivial but counting characters requires understanding UTF-8 encoding. A Chinese character like 你 is 1 character but 3 bytes. An emoji like 😀 is 1 character but 4 bytes.
The trick is identifying UTF-8 continuation bytes. In UTF-8, continuation bytes have the pattern 10xxxxxx, meaning they fall in the range 128-191. Anything outside that range starts a new character.
if !(128u8..192u8).contains(&b) {
counts.chars += 1;
}
Simple once you see it, but I had to look it up.
chmod
I always use octal mode (chmod 644 file) but apparently symbolic mode (chmod u+rw file) exists and some people prefer it. Supporting both required a bit of parsing logic but nothing too complicated.
The recursive flag -R was straightforward. Just walk the directory tree and apply the same mode to everything.
Error Handling
I finally bit the bullet and standardized on anyhow across all utilities. Previously I had a mix of panics, custom error types, and &'static str as errors. The with_context method is particularly nice for adding file paths to error messages.
let file = File::open(path)
.with_context(|| format!("cannot open '{}'", path.display()))?;
Testing
I went back and added proper tests to tee. The original implementation was basically untestable because it read directly from stdin. Refactoring to accept a Box<dyn Read> made it possible to inject test data. Dependency injection isn't just a Java thing after all.
The tempfile crate is great for testing file operations. Creates temporary files that clean up after themselves.