Goal
I have at least initial support for chunk encryption working. It is very rudimentary, and there is no management of keys. My plan is to have a special per-client chunk where chunk encryption keys are stored. Before I start work on that I will want to have a convenient place to store chunks.
Basically, I want to be able to say "decrypt the chunk with this ID using the key stored in the client chunk for such and such client". I don't want to have to specify the filename for the chunk I want to decrypt, or the client chunk.
I will want to be able to do that, but by default, I want to use chunk IDs.
If this software will ever become a usable backup application, I will also want to support different locations for storing chunks. For example, the local file system, a remote S3 server, an SFTP server, and maybe others.
The goal for today is to implement the first iteration of an interface for chunk storage. I will implement it only for the local file system. Remote chunk storage will bring in requirements and constraints that I can't foresee, so I'm sure the interface will change. For example, some sort of async behavior ("start uploading chunk, but don't wait until that is done") is likely to be desirable. But it's too early to go deep on that.
However, as it's only an internal interface, changing is easy. The way the storage is laid out will need some thinking, though, as that will need to be used by a variety of versions of the software.
Plan
- Sketch a list of anticipated needs and wants for how storage is laid out in local files.
- Sketch a list initial needs and wants for the storage interface.
- Sketch an implementation of a type for local file system storage.
- Implement a command
obnam repo init DIR
to initialize a directory as chunk storage. - Implement a command
obnam repo is DIR
to check if a directory is initialized as chunk storage. - Implement a command
obnam repo list DIR
to list IDs of all chunks in chunk storage. - Implement a command
obnam repo add DIR ID LABEL
to add a chunk to storage. - Implement a command
obnam repo find DIR LABEL
to find IDs of chunks with a specific label. - Implement a command
obnam repo path DIR ID
to show the fully qualified path name of a chunk given its ID. - Implement a command
obnam repo remove DIR ID
to remove a chunk.
That's a lot of things to do, but I'll try to keep things simple.
Notes
Storage layout
I will first only consider local file system storage. Basically, a directory inside which all chunks are stored.
A crucial need is for the backup program to know if a directory is used for chunk storage. So there needs to be some way to recognize that, by looking only at the directory. I don't want to keep a central registry of chunk store directories as that would easily get out of sync.
Each chunk will be in its own file named after the ID and with a
.chunk
suffix. The directory structure will be flat. This doesn't
scale up very far, but it will do for this iteration, while I have
other things to worry about.
For each chunk I need to store the label as well. I need to find chunks using the label. This requires some sort of index, and I'll use SQLite, for convenience.
Thus, a chunk store directory contains:
index.sqlite
is an SQLite file for looking up chunks by label.ID.chunk
for each chunk, whereID
is the chunk ID
This is simple and easy, and will do for the first iteration.
Storage interface
I will need the following operations in the interface:
- is this directory initialized for chunk storage?
- initialize this directory as chunk storage
- list chunks
- add a chunk, with metadata (ID and label)
- get chunk by ID
- find IDs of chunks with a given label
- remove a chunk by ID
As Rust:
struct ChunkStore {...}
impl ChunkStore {
fn is_store(dir: &Path) -> bool {...}
fn init(dir: &Path) -> Result<Self, StoreError> {...}
fn add_chunks(&self, chunk: &Chunk) -> Result<Vec<ChunkId>, StoreError> {...}
fn all_chunks(&self) -> Result<Vec<ChunkId>, StoreError> {...}
fn get_chunk(&self, id: &ChunkId) -> Result<Chunk, StoreError> {...}
fn find_chunks(&self, label: &Label) -> Result<Vec<ChunkId>, StoreError> {...}
fn remove_chunk(&self, id: &ChunkId) -> Result<(), StoreError> {...}
}
Storage implementation
- I'll use unit tests to verify the store implementation works.
- I"ll use the
rusqlite
crate to use SQLite. I've used it before and it's still recommended by https://blessed.rs/crates#section-databases-subsection-sql-databases. - I don't care about performance today, so I'll use the simplest possible table and no index. When I start caring about performance, I'll want to have benchmarks, and that's too much for today.
Interlude
- Oops, I hadn't removed the old
chunk.rs
module, and was confused where the encryption changes I made last week where, until I foundchunk2.rs
. - Made a quick patch to rename
chunk2
tochunk
. Merged that, and rebased my chunk store branch. - Phew A moment of panic there.
Storage implementation, continued
- API design question: should
find_chunks
return chunk IDs, or full chunk metadata? For now, the metadata is ID and label. Can I think of a situation where I have the ID, and want the label, but would want to avoid getting the whole chunk? Well, I'll need the label to decrypt, so I'll return the full metadata. That means I'll need to store in the index database. I could even use SQLite's JSON support, but I won't today, to save time. - I'll store chunk id and metadata as blobs. SQLite can handle them, and that has no loss of accuracy. Converting t text would lose accuracy.
- However, converting to a blob is fallible. This should never fail,
as both ID and label are already blogs, but I use
postcard::to_allovec
to do the serialization, and that is fallible. That's not a big deal, just one more failure to handle. - This was all straightforwards, because I'd written similar code before.
Subcommands
- I ran out of time, so I'll continue from here next time.
Summary
Today's goal was ambitious and I didn't reach it, but only because I ran out of time. I don't want to do too much in on session. Most of the actual useful work today was in thinking, but I got a basic chunk store using local files implemented.
Comments?
If you have feedback on this development session, please use the following fediverse thread: https://toot.liw.fi/@liw/114527569737306660,