Goal

I have at least initial support for chunk encryption working. It is very rudimentary, and there is no management of keys. My plan is to have a special per-client chunk where chunk encryption keys are stored. Before I start work on that I will want to have a convenient place to store chunks.

Basically, I want to be able to say "decrypt the chunk with this ID using the key stored in the client chunk for such and such client". I don't want to have to specify the filename for the chunk I want to decrypt, or the client chunk.

I will want to be able to do that, but by default, I want to use chunk IDs.

If this software will ever become a usable backup application, I will also want to support different locations for storing chunks. For example, the local file system, a remote S3 server, an SFTP server, and maybe others.

The goal for today is to implement the first iteration of an interface for chunk storage. I will implement it only for the local file system. Remote chunk storage will bring in requirements and constraints that I can't foresee, so I'm sure the interface will change. For example, some sort of async behavior ("start uploading chunk, but don't wait until that is done") is likely to be desirable. But it's too early to go deep on that.

However, as it's only an internal interface, changing is easy. The way the storage is laid out will need some thinking, though, as that will need to be used by a variety of versions of the software.

Plan

  • Sketch a list of anticipated needs and wants for how storage is laid out in local files.
  • Sketch a list initial needs and wants for the storage interface.
  • Sketch an implementation of a type for local file system storage.
  • Implement a command obnam repo init DIR to initialize a directory as chunk storage.
  • Implement a command obnam repo is DIR to check if a directory is initialized as chunk storage.
  • Implement a command obnam repo list DIR to list IDs of all chunks in chunk storage.
  • Implement a command obnam repo add DIR ID LABEL to add a chunk to storage.
  • Implement a command obnam repo find DIR LABEL to find IDs of chunks with a specific label.
  • Implement a command obnam repo path DIR ID to show the fully qualified path name of a chunk given its ID.
  • Implement a command obnam repo remove DIR ID to remove a chunk.

That's a lot of things to do, but I'll try to keep things simple.

Notes

Storage layout

I will first only consider local file system storage. Basically, a directory inside which all chunks are stored.

A crucial need is for the backup program to know if a directory is used for chunk storage. So there needs to be some way to recognize that, by looking only at the directory. I don't want to keep a central registry of chunk store directories as that would easily get out of sync.

Each chunk will be in its own file named after the ID and with a .chunk suffix. The directory structure will be flat. This doesn't scale up very far, but it will do for this iteration, while I have other things to worry about.

For each chunk I need to store the label as well. I need to find chunks using the label. This requires some sort of index, and I'll use SQLite, for convenience.

Thus, a chunk store directory contains:

  • index.sqlite is an SQLite file for looking up chunks by label.
  • ID.chunk for each chunk, where ID is the chunk ID

This is simple and easy, and will do for the first iteration.

Storage interface

I will need the following operations in the interface:

  • is this directory initialized for chunk storage?
  • initialize this directory as chunk storage
  • list chunks
  • add a chunk, with metadata (ID and label)
  • get chunk by ID
  • find IDs of chunks with a given label
  • remove a chunk by ID

As Rust:

struct ChunkStore {...}
impl ChunkStore {
    fn is_store(dir: &Path) -> bool {...}
    fn init(dir: &Path) -> Result<Self, StoreError> {...}
    fn add_chunks(&self, chunk: &Chunk) -> Result<Vec<ChunkId>, StoreError> {...}
    fn all_chunks(&self) -> Result<Vec<ChunkId>, StoreError> {...}
    fn get_chunk(&self, id: &ChunkId) -> Result<Chunk, StoreError> {...}
    fn find_chunks(&self, label: &Label) -> Result<Vec<ChunkId>, StoreError> {...}
    fn remove_chunk(&self, id: &ChunkId) -> Result<(), StoreError> {...}
}

Storage implementation

  • I'll use unit tests to verify the store implementation works.
  • I"ll use the rusqlite crate to use SQLite. I've used it before and it's still recommended by https://blessed.rs/crates#section-databases-subsection-sql-databases.
  • I don't care about performance today, so I'll use the simplest possible table and no index. When I start caring about performance, I'll want to have benchmarks, and that's too much for today.

Interlude

  • Oops, I hadn't removed the old chunk.rs module, and was confused where the encryption changes I made last week where, until I found chunk2.rs.
  • Made a quick patch to rename chunk2 to chunk. Merged that, and rebased my chunk store branch.
  • Phew A moment of panic there.

Storage implementation, continued

  • API design question: should find_chunks return chunk IDs, or full chunk metadata? For now, the metadata is ID and label. Can I think of a situation where I have the ID, and want the label, but would want to avoid getting the whole chunk? Well, I'll need the label to decrypt, so I'll return the full metadata. That means I'll need to store in the index database. I could even use SQLite's JSON support, but I won't today, to save time.
  • I'll store chunk id and metadata as blobs. SQLite can handle them, and that has no loss of accuracy. Converting t text would lose accuracy.
  • However, converting to a blob is fallible. This should never fail, as both ID and label are already blogs, but I use postcard::to_allovec to do the serialization, and that is fallible. That's not a big deal, just one more failure to handle.
  • This was all straightforwards, because I'd written similar code before.

Subcommands

  • I ran out of time, so I'll continue from here next time.

Summary

Today's goal was ambitious and I didn't reach it, but only because I ran out of time. I don't want to do too much in on session. Most of the actual useful work today was in thinking, but I got a basic chunk store using local files implemented.

Comments?

If you have feedback on this development session, please use the following fediverse thread: https://toot.liw.fi/@liw/114527569737306660,