Goal

The goal for today is to make the server persist chunks on disk. This continues from last time.

Plan

Change the server API to upload a chunk to accept just the chunk, and parse the chunk to get its metadata. Currently the request gives metadata separately.
Change the server to use code from the client for the backup repository to store chunks. This will give us persistence.
Add verification scenario for server persistence.

Notes

I want to change the API to upload a chunk to be only PUT /chunks, with the body of the request containing the chunk. The change is to drop chunk ID and label from the request. The server can extract those from the chunk. This prevents the mistake where the client provides the wrong metadata for the chunk. It may be an unlikely mistake, but I'd still want to avoid it. Parsing the chunk should be pretty fast.
I should benchmark that. I can't get cargo bench to do anything useful, and it maybe isn't in stable Rust yet. I'll implement the benchmark as an example instead.
Made the benchmark, as obnam/examples/parse-chunk.rs. It's not a sophisticated benchmark, but it'll do for now. Running it with hyperfine:

$ hyperfine '/scratch/cargo-cache/release/examples/parse-chunk 100000'
Benchmark 1: /scratch/cargo-cache/release/examples/parse-chunk 100000
  Time (mean ± σ):      10.5 ms ±   0.7 ms    [User: 9.8 ms, System: 0.6 ms]
  Range (min … max):     9.3 ms …  15.4 ms    264 runs

My configuration puts binaries /scratch/cargo. This binary was built using release mode.

Parsing a chunk is plenty fast enough, I think.
However, after doing all that, I realize that making the server parse the chunk introduces a dependency on how chunks are encoded. The client and server need to be in sync about the encoding. In other words, if that ever changes in the client, the server needs to also be updated. I don't like that. If I keep the chunk data as an opaque blob and give the metadata separately to the server, Obnam users will have less need to upgrade client and server in sync.
While I will want to change the chunk encoding as rarely as possible, I'm sure it will need to be changed.
The metadata may also need to change. Maybe ID and label won't be enough some day?
Both those kinds of changes will hopefully be rare, but the metadata changes feel like they'd be rarer.
Change is constant and unpredictable. I have to choose if I value more:
- the type safety and mistake prevention that I get from parsing the chunk in the server, which means fewer bugs
- reducing the need for Obnam clients and servers to be upgraded together
I'm going to choose to decouple the client and server. I'll rely on careful testing to avoid bugs.
This means I won't change the API to upload a chunk.
This means that in the long run, I'll need to have code that stores chunks on disk without understanding the encoding. The current code in the client uses the Chunk type in the client. That's OK for the client, but means I won't be able to rely on that code in the server. In the long run, to be clear.
For today, I can parse the uploaded blob as a Chunk so that I can get a trial version of persistence done. As a prototype. However, since I know it'll be a temporary thing, I can also just start a new module that deals with type-less chunks.
I choose to do a prototype so that I can get sooner to a place where I have persistence and can verify that with Subplot scenarios. Next development session I can start writing the new module. It'll need to be shared between client and server, and I'll want to have benchmarks, so it'll be more than a quick hack.
I need to change the internal abstraction for the backup repository in the server to be fallible. It's not currently, because it's all in memory, but when I start storing things on disk, things are likely to fail. At the HTTP API level, I'll change all the endpoints to return a 500 error if there's a repository error.
To implement persistence, the first step is to add the repository directory to the server configuration. When the server is initialize, its repository needs to be created. When serving requests, the repository needs to be opened. So far, so easy.
Actually serving the requests requires dealing with Id now being base64 encoded. I can no longer just say /chunks/1, I need to say chunks/AQAAAAAAAAAAAAAAAAAAAA instead. I need to add tooling to obnam to generate that.
Hmm. This is taking too long. Today's session is already an hour over time, and I'm not going to get this done today. In hindsight, it might have been better to skip the prototype phase. It requires too many changes that are only needed to adapt to using the existing client backup repository implementation. Live and learn.

Summary

Made some progress towards adding a persistent backup repository to the server, but also wasted some time on an evolutionary dead end. Will continue next time by starting over with a fresh repository implementation that avoids client specific types.

Comments?

If you have feedback on this, please use the following fediverse thread: https://toot.liw.fi/@liw/116588760176929111.

If you'd like to fund Obnam development, see my funding page.