H A Q


Qoreutils Part II

Posted on

It's been a while since I touched Qoreutils. Recently I picked it back up and added a few more utilities.

wc

Word count is deceptively simple until you think about Unicode. Counting bytes is trivial but counting characters requires understanding UTF-8 encoding. A Chinese character like 你 is 1 character but 3 bytes. An emoji like 😀 is 1 character but 4 bytes.

The trick is identifying UTF-8 continuation bytes. In UTF-8, continuation bytes have the pattern 10xxxxxx, meaning they fall in the range 128-191. Anything outside that range starts a new character.

if !(128u8..192u8).contains(&b) {
    counts.chars += 1;
}

Simple once you see it, but I had to look it up.

chmod

I always use octal mode (chmod 644 file) but apparently symbolic mode (chmod u+rw file) exists and some people prefer it. Supporting both required a bit of parsing logic but nothing too complicated.

The recursive flag -R was straightforward. Just walk the directory tree and apply the same mode to everything.

Error Handling

I finally bit the bullet and standardized on anyhow across all utilities. Previously I had a mix of panics, custom error types, and &'static str as errors. The with_context method is particularly nice for adding file paths to error messages.

let file = File::open(path)
    .with_context(|| format!("cannot open '{}'", path.display()))?;

Testing

I went back and added proper tests to tee. The original implementation was basically untestable because it read directly from stdin. Refactoring to accept a Box<dyn Read> made it possible to inject test data. Dependency injection isn't just a Java thing after all.

The tempfile crate is great for testing file operations. Creates temporary files that clean up after themselves.