Goal
Plan
Notes
Summary
Comments?

Goal

The goal of today is to finish the commands to encode and decode chunks without encryption. I started this last week, but ran out of time to finish it. I have a tendency to underestimate how long things take.

I will add a verification scenario to ensure a round trip of encode-decode works correctly.
I will refactor the code to move the decoding (by calling postcard::from_bytes) into the library part from the command part.
I will add a command chunk inspect, which decodes a chunk and outputs the decoded chunk in a structured format such as JSON. This will help future me troubleshoot problems more easily.
I will add versioning to the clear text chunks, and have at least two versions.

Acceptance criteria for today is that all of the above works and has automated tests.

Plan

Add round trip verification scenario.
Remove the chunk hcllo command.
Refactor decoding.
Implement a first version of chunk inspect, with a verification scenario.
Add clear text chunk versioning.
Change chunk encode to be allow choosing what chunk version to use.
Change the round trip verification scenario to use each chunk version.

Notes

Add round trip verification scenario

Ran obnam chunk encode and obnam chunk decode manually to make sure a round trip works.
It doesn't, because decode also outputs the chunk id and label, and outputs the data as a vector of bytes.
Changed decode to output only the data, as binary data, to stdout.
Added an --output option to decode, because Subplot scenarios have trouble with Unix command I/O redirection, until I fix that.
Added a utility module with functions for writing binary data to stdout or a named file. I expect to need these often enough that it's worth having them as separate functions. Also, they're clearer this way.
Added a round trip scenario.
This worked first time, because I cheated, and had verified manually that the round trip works.
Re-based the branch to make the change history a little clearer.

Break

A short break for breakfast.

Drop the `hello` command

The obnam chunk hello command was there as a place holder so I could add Subplot scaffolding, but it's not useful any more.
Code deletion is fun.

Refactor decoding

For some reason I implement the chunk decoding in the src/cmd/chunk.rs module, instead of src/chunk.rs where it belongs. Past me was, again, a lazy and careless slob, but they can't be taught, so there's no point in getting angry about it.
Moving that code into the right place is easy.
I see a need to refactor: I'll add a utility function to read a named file. This can already be used in several places, and it's a logical complement to the file writing function.

Chunk inspection

It would be useful to have a command obnam chunk inspect, which reads an encoded chunk and outputs the information in it in a format that's reasonably easy for humans to consume, but particularly easy for programs. JSON is such a format.
The new command is basically just the decode command, except it serializes the decoded chunk as JSON.
The serde-json crate is good for JSON.
However, the output is useless to humans: it's just arrays of small integers, as the clear text chunk fields are byte vectors.
I'll need to make a more fancy output format by defining a new type, struct InspectedChunk, and creating one from a clear text chunk. ~~rust pub struct InspectedChunk { id: String, label: String, data: String, }~~
The strings will be lossy conversions from the binary data. That's OK for now.

Chunk versioning

I can't always predict the future, but some things are obviously very likely. For example, I expect to need to make changes to the structure of the clear text chunk. I don't know what they'll be (or else I'd make them now), but I can prepare for the highly likely.
Specifically, I can make the chunk be versioned. ~~rust pub enum CleartextChunk { V1(CleartextChunkV1), }~~
However, I don't want every user of chunks to have to match on the enum, that's just too tedious. So I'll add methods to access the various information in a chunk
However squared, those methods can't always succeed: a future version of the chunk might not have the information in the first version, so the methods will need to return optional values.
Also, the inspect command should show the version.
The postcard crate will encode the enum variant in its wire format, so we don't need to deal with that explicitly.
I don't think I care to implement versioning for chunk ids and labels. They're quite simple, but if they ever need to change, I'll add versioned new variants then.

Magic cookie

I am now reminded that an encoded chunk can't readily be identified as an Obnam chunk, without trying to decode it.
The usual solution for this is to add a magic cookie.
I can prepend the cookie to the postcard output, or I can include it in the data that postcard serializes.
If I include it in the data, I'm stuck with using postcard forever, so I'll prepend it.
I recently saw https://hackers.town/@zwol/114155595855705796, where the author makes suggestions for magic cookies:
- MUST be the very first N bytes in the file
- MUST be at least four bytes long, eight is better
- MUST include at least one byte with the high bit set
- MUST include a byte sequence that is invalid UTF-8
- SHOULD include a zero byte, but you can usually get away with having that be part of the overall version number that immediately follows the magic number (did I mention that you really SHOULD put an overall version number right after the magic number, unless you know and have documented exactly why it's not necessary, e.g. PNG?)
I want the Obnam magic cookie to include "Obnam".
Using the above, a first draft of a magic cookie for Obnam chunks:
- the string "Obnam"
- one or more bytes for the encoding version
- a byte 0xFF
- a byte 0x00
I'll start with this, and make changes as needed later.
Prefixing the encoding with a cookie is easy enough, as is checking that that cookie there when decoding.
Added a second cookie to make sure I handle the cookies correctly everywhere. The "cookie 0" encoding uses JSON instead of Postcard.
Also, "inspect" needs to include the encoding version. That means the function to decode needs to return it.

Merge

I'm merging this branch now. It's big chunk of work. I really should have merged the branch last week already, and made a new branch today. In the future I shall merge every time.

Summary

I feel I've made good progress today. I reached today's goal Obnam can now encode data as a chunk, decode that, and additionally has a useful helper command to inspect an encoded chunk.

Comments?

If you have feedback on this development session, please use the following fediverse thread: https://toot.liw.fi/@liw/114210761810311968