Goal
The goal of today is to finish the commands to encode and decode chunks without encryption. I started this last week, but ran out of time to finish it. I have a tendency to underestimate how long things take.
- I will add a verification scenario to ensure a round trip of encode-decode works correctly.
- I will refactor the code to move the decoding (by calling
postcard::from_bytes
) into the library part from the command part. - I will add a command
chunk inspect
, which decodes a chunk and outputs the decoded chunk in a structured format such as JSON. This will help future me troubleshoot problems more easily. - I will add versioning to the clear text chunks, and have at least two versions.
Acceptance criteria for today is that all of the above works and has automated tests.
Plan
- Add round trip verification scenario.
- Remove the
chunk hcllo
command. - Refactor decoding.
- Implement a first version of
chunk inspect
, with a verification scenario. - Add clear text chunk versioning.
- Change
chunk encode
to be allow choosing what chunk version to use. - Change the round trip verification scenario to use each chunk version.
Notes
Add round trip verification scenario
- Ran
obnam chunk encode
andobnam chunk decode
manually to make sure a round trip works. - It doesn't, because
decode
also outputs the chunk id and label, and outputs the data as a vector of bytes. - Changed
decode
to output only the data, as binary data, to stdout. - Added an
--output
option todecode
, because Subplot scenarios have trouble with Unix command I/O redirection, until I fix that. - Added a utility module with functions for writing binary data to stdout or a named file. I expect to need these often enough that it's worth having them as separate functions. Also, they're clearer this way.
- Added a round trip scenario.
- This worked first time, because I cheated, and had verified manually that the round trip works.
- Re-based the branch to make the change history a little clearer.
Break
- A short break for breakfast.
Drop the hello
command
- The
obnam chunk hello
command was there as a place holder so I could add Subplot scaffolding, but it's not useful any more. - Code deletion is fun.
Refactor decoding
- For some reason I implement the chunk decoding in the
src/cmd/chunk.rs
module, instead ofsrc/chunk.rs
where it belongs. Past me was, again, a lazy and careless slob, but they can't be taught, so there's no point in getting angry about it. - Moving that code into the right place is easy.
- I see a need to refactor: I'll add a utility function to read a named file. This can already be used in several places, and it's a logical complement to the file writing function.
Chunk inspection
- It would be useful to have a command
obnam chunk inspect
, which reads an encoded chunk and outputs the information in it in a format that's reasonably easy for humans to consume, but particularly easy for programs. JSON is such a format. - The new command is basically just the
decode
command, except it serializes the decoded chunk as JSON. - The
serde-json
crate is good for JSON. - However, the output is useless to humans: it's just arrays of small integers, as the clear text chunk fields are byte vectors.
- I'll need to make a more fancy output format by defining a new type,
struct InspectedChunk
, and creating one from a clear text chunk.rust pub struct InspectedChunk { id: String, label: String, data: String, } - The strings will be lossy conversions from the binary data. That's OK for now.
Chunk versioning
- I can't always predict the future, but some things are obviously very likely. For example, I expect to need to make changes to the structure of the clear text chunk. I don't know what they'll be (or else I'd make them now), but I can prepare for the highly likely.
- Specifically, I can make the chunk be versioned.
rust pub enum CleartextChunk { V1(CleartextChunkV1), } - However, I don't want every user of chunks to have to match on the enum, that's just too tedious. So I'll add methods to access the various information in a chunk
- However squared, those methods can't always succeed: a future version of the chunk might not have the information in the first version, so the methods will need to return optional values.
- Also, the
inspect
command should show the version. - The
postcard
crate will encode the enum variant in its wire format, so we don't need to deal with that explicitly. - I don't think I care to implement versioning for chunk ids and labels. They're quite simple, but if they ever need to change, I'll add versioned new variants then.
Magic cookie
- I am now reminded that an encoded chunk can't readily be identified as an Obnam chunk, without trying to decode it.
- The usual solution for this is to add a magic cookie.
- I can prepend the cookie to the
postcard
output, or I can include it in the data thatpostcard
serializes. - If I include it in the data, I'm stuck with using
postcard
forever, so I'll prepend it. - I recently saw https://hackers.town/@zwol/114155595855705796,
where the author makes suggestions for magic cookies:
- MUST be the very first N bytes in the file
- MUST be at least four bytes long, eight is better
- MUST include at least one byte with the high bit set
- MUST include a byte sequence that is invalid UTF-8
- SHOULD include a zero byte, but you can usually get away with having that be part of the overall version number that immediately follows the magic number (did I mention that you really SHOULD put an overall version number right after the magic number, unless you know and have documented exactly why it's not necessary, e.g. PNG?)
- I want the Obnam magic cookie to include "Obnam".
- Using the above, a first draft of a magic cookie for Obnam chunks:
- the string "Obnam"
- one or more bytes for the encoding version
- a byte 0xFF
- a byte 0x00
- I'll start with this, and make changes as needed later.
- Prefixing the encoding with a cookie is easy enough, as is checking that that cookie there when decoding.
- Added a second cookie to make sure I handle the cookies correctly everywhere. The "cookie 0" encoding uses JSON instead of Postcard.
- Also, "inspect" needs to include the encoding version. That means the function to decode needs to return it.
Merge
I'm merging this branch now. It's big chunk of work. I really should have merged the branch last week already, and made a new branch today. In the future I shall merge every time.
Summary
I feel I've made good progress today. I reached today's goal Obnam can now encode data as a chunk, decode that, and additionally has a useful helper command to inspect an encoded chunk.
Comments?
If you have feedback on this development session, please use the following fediverse thread: https://toot.liw.fi/@liw/114210761810311968