Goal

The goal of today is to finish the commands to encode and decode chunks without encryption. I started this last week, but ran out of time to finish it. I have a tendency to underestimate how long things take.

  • I will add a verification scenario to ensure a round trip of encode-decode works correctly.
  • I will refactor the code to move the decoding (by calling postcard::from_bytes) into the library part from the command part.
  • I will add a command chunk inspect, which decodes a chunk and outputs the decoded chunk in a structured format such as JSON. This will help future me troubleshoot problems more easily.
  • I will add versioning to the clear text chunks, and have at least two versions.

Acceptance criteria for today is that all of the above works and has automated tests.

Plan

  • Add round trip verification scenario.
  • Remove the chunk hcllo command.
  • Refactor decoding.
  • Implement a first version of chunk inspect, with a verification scenario.
  • Add clear text chunk versioning.
  • Change chunk encode to be allow choosing what chunk version to use.
  • Change the round trip verification scenario to use each chunk version.

Notes

Add round trip verification scenario

  • Ran obnam chunk encode and obnam chunk decode manually to make sure a round trip works.
  • It doesn't, because decode also outputs the chunk id and label, and outputs the data as a vector of bytes.
  • Changed decode to output only the data, as binary data, to stdout.
  • Added an --output option to decode, because Subplot scenarios have trouble with Unix command I/O redirection, until I fix that.
  • Added a utility module with functions for writing binary data to stdout or a named file. I expect to need these often enough that it's worth having them as separate functions. Also, they're clearer this way.
  • Added a round trip scenario.
  • This worked first time, because I cheated, and had verified manually that the round trip works.
  • Re-based the branch to make the change history a little clearer.

Break

  • A short break for breakfast.

Drop the hello command

  • The obnam chunk hello command was there as a place holder so I could add Subplot scaffolding, but it's not useful any more.
  • Code deletion is fun.

Refactor decoding

  • For some reason I implement the chunk decoding in the src/cmd/chunk.rs module, instead of src/chunk.rs where it belongs. Past me was, again, a lazy and careless slob, but they can't be taught, so there's no point in getting angry about it.
  • Moving that code into the right place is easy.
  • I see a need to refactor: I'll add a utility function to read a named file. This can already be used in several places, and it's a logical complement to the file writing function.

Chunk inspection

  • It would be useful to have a command obnam chunk inspect, which reads an encoded chunk and outputs the information in it in a format that's reasonably easy for humans to consume, but particularly easy for programs. JSON is such a format.
  • The new command is basically just the decode command, except it serializes the decoded chunk as JSON.
  • The serde-json crate is good for JSON.
  • However, the output is useless to humans: it's just arrays of small integers, as the clear text chunk fields are byte vectors.
  • I'll need to make a more fancy output format by defining a new type, struct InspectedChunk, and creating one from a clear text chunk. rust pub struct InspectedChunk { id: String, label: String, data: String, }
  • The strings will be lossy conversions from the binary data. That's OK for now.

Chunk versioning

  • I can't always predict the future, but some things are obviously very likely. For example, I expect to need to make changes to the structure of the clear text chunk. I don't know what they'll be (or else I'd make them now), but I can prepare for the highly likely.
  • Specifically, I can make the chunk be versioned. rust pub enum CleartextChunk { V1(CleartextChunkV1), }
  • However, I don't want every user of chunks to have to match on the enum, that's just too tedious. So I'll add methods to access the various information in a chunk
  • However squared, those methods can't always succeed: a future version of the chunk might not have the information in the first version, so the methods will need to return optional values.
  • Also, the inspect command should show the version.
  • The postcard crate will encode the enum variant in its wire format, so we don't need to deal with that explicitly.
  • I don't think I care to implement versioning for chunk ids and labels. They're quite simple, but if they ever need to change, I'll add versioned new variants then.

Magic cookie

  • I am now reminded that an encoded chunk can't readily be identified as an Obnam chunk, without trying to decode it.
  • The usual solution for this is to add a magic cookie.
  • I can prepend the cookie to the postcard output, or I can include it in the data that postcard serializes.
  • If I include it in the data, I'm stuck with using postcard forever, so I'll prepend it.
  • I recently saw https://hackers.town/@zwol/114155595855705796, where the author makes suggestions for magic cookies:
    • MUST be the very first N bytes in the file
    • MUST be at least four bytes long, eight is better
    • MUST include at least one byte with the high bit set
    • MUST include a byte sequence that is invalid UTF-8
    • SHOULD include a zero byte, but you can usually get away with having that be part of the overall version number that immediately follows the magic number (did I mention that you really SHOULD put an overall version number right after the magic number, unless you know and have documented exactly why it's not necessary, e.g. PNG?)
  • I want the Obnam magic cookie to include "Obnam".
  • Using the above, a first draft of a magic cookie for Obnam chunks:
    • the string "Obnam"
    • one or more bytes for the encoding version
    • a byte 0xFF
    • a byte 0x00
  • I'll start with this, and make changes as needed later.
  • Prefixing the encoding with a cookie is easy enough, as is checking that that cookie there when decoding.
  • Added a second cookie to make sure I handle the cookies correctly everywhere. The "cookie 0" encoding uses JSON instead of Postcard.
  • Also, "inspect" needs to include the encoding version. That means the function to decode needs to return it.

Merge

I'm merging this branch now. It's big chunk of work. I really should have merged the branch last week already, and made a new branch today. In the future I shall merge every time.

Summary

I feel I've made good progress today. I reached today's goal Obnam can now encode data as a chunk, decode that, and additionally has a useful helper command to inspect an encoded chunk.

Comments?

If you have feedback on this development session, please use the following fediverse thread: https://toot.liw.fi/@liw/114210761810311968