Assessment of the iteration that has ended
The goal of the previous iteration was:
The goal for this iteration is to write a dedicated program for running benchmarks for Obnam.
This was completed: Lars wrote a rudimentary program,
obnam-benchmark
, see
below. We also made release 0.7.0.
The iteration ran over a few weeks, mostly due to end-of-year holidays, and the northern hemisphere Darkness affecting Lars's ability to be productive.
Discussion
Current development theme
The current theme of development for Obnam is performance, because that is currently Lars's primary worry. The choices are performance, security, convenience.
Policy for cargo deny
We still need more discussion about how to tighten up the policy for
cargo deny
, in obnam#157. Lars will make an executive decision
on his own otherwise during this iteration.
Lars wants to use Obnam for real
Lars is test driving Obnam on a small subset of his data: his local email archive. This is roughly a million files, but doesn't change much from day to day. The backup takes several minutes to run. He'd like to start using Obnam as his primary backup system, and has identified two things that will be dealt with first: using the server API needs to be authenticated, and the overall speed needs to massively improved.
Lars opened obnam#186 for the authentication. The performance aspect is an ongoing development theme in any case.
Lars arbitrarily defines a personal performance requirement as follows: given a data set of on three million files that hasn't changed since the previous backup, an incremental backup should take at most 60 seconds on his laptop, with the server running on the same gigabit LAN.
Plans for Debian bookworm
We are still collecting thoughts about having Obnam in Debian 12 in obnam#162. What are we willing to commit to support for the expected three years of Debian's security support for that release? Without adding features or major new upstream versions.
The obnam-benchmark
tool
Lars has written a rudimentary tool for doing benchmarks of Obnam.
It's called obnam-benchmark
, with source on
gitlab.com and ugly Debian
packages in Lars's APT repository. (Everyone else probably wants to
install by building from source, for now.)
The obnam-benchmark
tool reads a YAML file that describes a suite of
benchmarks to run. Each benchmark defines one or more backups to make,
and the test data to generate for each backup. When the benchmarks are
run, the tool generates the data, starts the server, makes each
backup, and restores it.
For each backup and restore it measures the duration, and other things. The measurements are written out as a JSON file. The tool can generate a report from a set of such JSON files, as a Markdown file, with tables of results. Other document formats can be generated from Markdown. (The report is possibly the weakest part of the tool. Lars is not great at that stuff. Please help, if only by opening issues to explain how to make the report more useful to you.)
The tool can use an installed version of Obnam, or it can build Obnam from any git commit. The version of Obnam is included in the report. The intent is that we'll run the tool for every Obnam release, and optionally every commit since the previous release, to see how Obnam performance changes over time.
The tool is ready to be used, at least for simple benchmarks, but needs improvements to be good. For example, it's quite slow at creating test data, and can't use pre-created test data. Help improving the tool would be most welcome.
Now that we have the tool, we need to start using it. This requires, at minimum:
- a set of standard benchmarks for Obnam that represents real world usage patterns, as well as any artificial ones that make sense for developing Obnam
- one or more standard environments in which to run the benchmarks
- a place to publish results and reports
Lars proposes the following:
- Lars will set up a VM on his development hardware for running
benchmarks. This will have 4 virtual CPUs, and 4 GiB of RAM, and 100
GB of storage. The CPU count and memory size are intentionally
limited to make sure Obnam performs well when not running on a
supercomputer.
- It'd be nice if others ran Obnam benchmarks on their own systems and contributed the results.
- We'll create a git repository for storing the benchmark
specifications, tentatively to be called
obnam-benchmark-specs
. It might be a single file for now. - We'll create another repository,
obnam-benchmark-results
, where the JSON result files get stored. Each run ofobnam-benchmmark
will add a new result file. Old files may get removed, once they're no longer useful. An example of that might be results for commits between releases, or for quite old releases.- Lars plans to run
obnam-benchmark
for himself, manually, and upload the results to the repository via pull requests. - Others can also submit pull requests to add their reports, as usual with gitlab.
- Lars plans to run
- Any changes to the results repository will trigger a CI job to
generate the report, and publish it on
doc.obnam.org. This CI job will run on
Lars's personal CI system, which has upload access to the server
running that site.
- This assumes the generated reports are not useful to store, and that only the latest one is important. If the result data files are stored, in principle any report can be re-generated, and it doesn't seem worth keeping old versions of the report. Thoughts?q
Splitting off the server part of the obnam
crate?
The discussion of whether to split the obnam
crate into a client and
a server crate continues in obnam#175. Further opinions would be
welcome, though Lars is leaning towards splitting. No action is
planned at the moment.
Repository review
Lars reviewed all the open issues, merge requests, and CI pipelines for all the projects in the Obnam group on gitlab.com.
Container Images
This is https://gitlab.com/obnam/container-images. There were no open issues, no extra branches, and no merge requests. CI pipelines have been passing, and Lars ran the pipeline to freshen up he container image.
obnam.org
This is https://gitlab.com/obnam/obnam.org. There is one issue, regarding a need for benchmark results, which Lars closed as no longer being relevant to this repository. There were no extra branches, and no open merge requests. There is no CI for this repository.
obnam-benchmark
This is https://gitlab.com/obnam/obnam-benchmark, and is the new repository for the new tool. There were 11 open issues, about missing features, and performance. The one about making a release and uploading the tool to crates.io seems worth doing in this iteration. Any other improvements that block actually running benchmarks and automatically generating and publishing reports also need to be done, if any are found.
obnam
This is https://gitlab.com/obnam/obnam. There were 62 open issues. Lars reviewed all of them, made comments and other updates as needed, and closed:
- obnam#16 Doesn't restore the access time
- access times change when backups are done, and are generally not very useful
- obnam#64 Use a CAM
- Lars doesn't want to use content based addressing on the server.
- obnam#69 On Collision Resistance and Content Addressable Storage
- Lars doesn't want to use content based addressing on the server.
- obnam#85 Use case: anonymous user T
- all done
- obnam#92 Lacks a way to verify a backup can still be restored
- duplicate of obnam#50, and the number of open issues is starting to be large enough that duplicates are better closed
- obnam#99 Should maybe use the ring crate for AEAD
- closed as unnecessary
- obnam#163 Client could do with a built-in dummy server mode
for benchmarks
- closed as unnecessary
There were 54 open issues after this.
Goals
Goal for 1.0 (not changed this iteration)
The goal for version 1.0 is for Obnam to be an utterly boring backup solution for Linux command line users. It should just work, be performant, secure, and well-documented.
It is not a goal for version 1.0 to have been ported to other operating systems, but if there are volunteers to do that, and to commit to supporting their port, ports will be welcome.
Other user interfaces is likely to happen only after 1.0.
The server component will support multiple clients in a way that doesn’t let them see each other’s data. It is not a goal for clients to be able to share data, even if the clients trust each other.
Goal for the next few iterations (not changed for this iteration)
The goal for next few iterations is to have Obnam be performant. This will include, at least, making the client use more concurrency so that it can use more CPU cores to compute checksums for de-duplication.
Goal for this iteration (new for this iteration)
The goal for this iteration is to define an initial set of benchmarks for Obnam, and to run them, and to publish the results on doc.obnam.org. All of this should be made as automatic as possible.
Commitments for this iteration
We collect issues for this iteration in milestone 12. Lars intends to work on:
- [obnam-benchmark issue
16](https://gitlab.com/obnam/obnam-benchmark/-/issues/16)
Is not on crates.io- this should be quick, but involves making a release, and that tends to always go wrong the first few times, or when it's not been done for a while
- 1h
- obnam#157 "cargo deny" policy is not strict
- make policy stricter to deny yanked versions, and security vulnerabilities
- make sure the test suite still passes, and fix any issues
- 1h (optimistic, assuming nothing goes wrong)
- obnam#166 Lacks comprehensive benchmark suite
- carried over from the previous iteration: need to define and run the benchmarks, and publish results, and automate as much of that as possible
- 4h
- obnam#170 Should record MSRV in Cargo.toml
rust-version
- 0.25h
- obnam#173 Should have as a requirement that it doesn't cache
very much locally
- 0.25h
- obnam#174 Doesn't log performance metrics
- add at least some collection of and performance metrics, even if not all the ones in the issue are added yet
- 1h
- obnam#176 Doesn't report version with much detail
- 1h
- obnam#177 What does BackupReason::Skipped actually mean?
- research, then document in the code
- 1h
- obnam#178 Is src/benchmark.rs useful to export?
- delete it if it's not used, or comment on issue if it is?
- 0.25h
- obnam#179
Chunker
is a silly name for an iterator.- rename
- 0.25h
- obnam#180 Chunk metadata should be in AAD, not in headers?
- add copy of metadata to AAD, but keep the headers for now
- not performance related, but seems worth doing earlier rather than later
- 1h
- obnam#181 The name AsyncBackupClient implies a non-async
version
- rename
- 0.25h
- obnam#182 Does it make sense to keep AsyncBackupClient and
AsyncChunkClient separate?
- make change, and merge if it seems worthwhile
- if not, comment on issue and close it, and also in code
- 0.25h
- obnam#184 README has unnecessary YAML metadata
- drop it
- 0.25h
That's about 14 hours of estimated work. Hopefully not too much. These are not all performance related, but it's important to also tidy up as development goes forward.
Meeting participants
- Lars Wirzenius