Assessment of the iteration that has ended

The goal of the previous iteration was:

The goal for this iteration is to define an initial set of benchmarks for Obnam, and to run them, and to publish the results on doc.obnam.org. All of this should be made as automatic as possible.

This was completed. The running of benchmarks is manual, and Lars will do that for every release, going forward. The results will be put into the obnam-benchmark-results repository, which will trigger CI to update the benchmark results page with a summary of the results.

This iteration was meant to also fix the following issues:

  • obnam#174 -- Doesn't log performance metrics
    • Lars created !214, but it's not merged yet. Alexander made good suggestions for tidying up the code, but Lars failed to make them work, and then got distracted by performance investigations. They can be worked on later.
  • obnam#176 -- Doesn't report version with much detail
    • Lars didn't work on this after all.
  • obnam#180 -- Chunk metadata should be in AAD, not in headers?
    • Lars didn't work on this after all.

The iteration ran over a few weeks, mostly due to the northern hemisphere Darkness affecting Lars's ability to be productive, and also Lars got distracted by looking at improving Obnam performance.

Discussion

Current development theme

The current theme of development for Obnam is performance, because that is currently Lars's primary worry. The choices are performance, security, convenience, at least currently.

Performance

Lars has been investigating where Obnam performance bottlenecks are, by running benchmarks, and looking at profiling results from cargo flamegraph. For an Obnam run with a good number of files that haven't changed, most of the time in Obnam goes into inserting rows into an SQLite database for the new generation. This led Lars to do some investigation into how fast he can make this happen.

Lars wrote a little program that creates an SQLite database and the inserts a million rows into a table modelled after the Obnam files table. The first, naive approach resulted in about 80,000 rows inserted per second on his laptop, and nearly 120,000 on his development server. After reading an article by Jason Wyatt Lars then did the following changes:

  • use a single transaction for all million inserts
  • use the rusqlite prepared statement cache instead of preparing a new statement for each insert

The resulting speeds were (best speed of three runs, compiled in release mode, on development server with NVMe drives):

program inserts/s


individual-insert 117509 individual-one-transaction 209512 individual-prepared.rs 970874

That's almost a million inserts per second. That'll do for now.

Another approach might be to modify a copy of the previous generation, but the logic gets trickier than with the approach of starting with an empty database and inserting what we find in live data.

Lars also looked at what it would take to change the current Obnam abstractions around SQLite to use the approach used above. He feels the Obnam abstractions he wrote originally are messy and could do with a better abstraction. He intends to work on that in the new iteration.

Repository review

Lars didn't review any issues, merge requests, or CI pipelines this time. He wants to work on database abstractions first.

Goals

Goal for 1.0 (not changed this iteration)

The goal for version 1.0 is for Obnam to be an utterly boring backup solution for Linux command line users. It should just work, be performant, secure, and well-documented.

It is not a goal for version 1.0 to have been ported to other operating systems, but if there are volunteers to do that, and to commit to supporting their port, ports will be welcome.

Other user interfaces is likely to happen only after 1.0.

The server component will support multiple clients in a way that doesn’t let them see each other’s data. It is not a goal for clients to be able to share data, even if the clients trust each other.

Goal for the next few iterations (not changed for this iteration)

The goal for next few iterations is to have Obnam be performant. This will include, at least, making the client use more concurrency so that it can use more CPU cores to compute checksums for de-duplication.

Goal for this iteration (new for this iteration)

The goal for this iteration is to tidy up database abstraction code in the Obnam client and implement the performance improvements Lars did prototype code for.

Commitments for this iteration

Lars will work on Obnam client database abstractions and performance. The goal for these is for Obnam to be able to run obnam backup on a live data set of a million files that haven't changed since the previous backup in less than 30 seconds, on Lars's development server.

This work is not captured in issues.

Meeting participants

  • Lars Wirzenius