Assessment of the iteration that has ended

The goal of the previous iteration was:

The goal for this iteration is to prepare for future schema changes.

This was completed. The Obnam client now supports more than one version of the schema for backup generations, and can restore from any of them. The server does not do that yet: if anything about it's database schema changes, or it's API changes, the change is breaking and necessitates starting over with a a new, empty repository with the new server version. This will need to be addressed later. (See obnam#199.)

Discussion

Current development theme

The current theme of development for Obnam is convenience. The choices are performance, security, convenience, and tidy-up, at least currently.

Breaking changes ahead

Lars foresees several upcoming breaking changes to how Obnam encryption is done, client/server authentication, and more. Ideally these would all be done in ways that don't require users to start over with their backups, but it seems like a lot of effort for a short-term gain. Thus, Lars intends to take advantage of the fact that nobody uses Obnam yet and make some fundamental changes over the next few iterations. Part of those changes will be to make easier to evolve Obnam without redo-all-your-backups changes, but some will be so fundamental that it doesn't seem worth supporting both the old and new ways.

Lars plans the following such fundamental changes at the moment:

  • Add a "trusted root object" to the Obnam system, to replace the current approach of "independent backup generation" objects. This will increase security, as well as make ordering of backup generations be more reliable.
    • this change is planned for this iteration
    • the old "generation chunk" approach will be dropped
  • Add authentication to the client/server protocol. Details to be discusses later.
  • Refactor the server to have database schema versioning.
  • Add versioning to the client/server protocol.

After these, Lars hopes that Obnam will be in a state where it's feasible to evolve the client and server mostly without the kind of breaking changes that require starting over with an empty backup repository.

User root object

Currently, the Obnam client stores backup generations on the server without an explicit ordering. Each generation has a timestamp, which is used to sort the generations into an order, but that's not good enough. See obnam#34 (Uses timestamps to order backup generations).

  • If two backups run overlap in time, they might create new backups that are incremental to a common ancestor, but not related to each other. This will at minimum be confusing.
  • There's no guarantee the backup with a later timestamp is actually newer: clock skew, and other errors, may affect things.
  • Timestamps are cleartext data. This leaks information. Not good.
  • Timestamps are not covered by a signature. Double-plus ungood. This allows an attacker to change them, and they can make the client think the latest backup is actually the oldest one. At minimum, this means further incremental backups may back up files needlessly, but may also mean the wrong backup gets restored.

Lars proposes a change to protect against this threat model:

  • An attacker who can delete or modify files in the backup repository must not be able to alter the contents or ordering of backup generations.

To fix this, I'm thinking of the following approach:

  • Each user has a "root object", which lists their backup generations, and metadata of each generation.
  • The root object is a chunk, so it's encrypted and authenticated with AEAD. This prevents an attacker from modifying or inspecting it.
  • The root object chunk has a random chunk id, but label "user" so that a client can easily find it. It is otherwise exactly like other chunks.
  • Chunk metadata will be reduced to only label. The generation and ended metadata for chunks will be removed. This will force another breaking change, sorry.
  • The root object will have a reference to the previous one.
  • The client will find all root objects, and pick the newest one. This is because the client has no way to tell the server to delete any chunk, so it can't delete an old root object.
  • Later, when we add client authentication, the server will store the data in the root object associated with the client account, and allow the client to update it. However, adding authentication is too big a change for this iteration.

A root object will contain could be serialized into JSON like this:

{
    "client": "exolobe1",
    "previous_root": "6d381c04-a83a-11ec-a3e9-fba06bac23fd",
    "timestamp": "2022-03-20T09:07:17+00:00",
    "backups": [
        {
            "chunk-id": "7cb90434-a82d-11ec-9383-b31e0b1b81aa",
            "ended": "2022-03-20T09:07:17+00:00"
        },
        {
            "chunk-id": "7aa3681a-a82d-11ec-b24b-e3adb644891b",
            "ended": "2022-03-21T09:07:17+00:00"
        }
    ]
}

The backups field has the generation, in order, with the oldest one first.

With this approach, everything linked from the root object chunk, or found by following links further, can be assured to be in the right order and to be unmodified.

An attacker can still replace the root object chunk with an older one. This can be mitigated by checking the root object timestamp: if it's unexpectedly old, something is wrong. A stronger mitigation would be for the client to store the timestamp locally and check it on the next backup run. However, that requires data to not be lost on the client end, which is what backups are meant to protect against, so it's not a very satisfactory solution.

If the limit for how old the root object chunk can be is too long, an attacker can keep replacing the latest one with one that's as old as it can be without trigger an alarm. That would mean that any intervening backups get lost, which would be bad.

Attacks on the root object may need to be mitigated in future iterations.

Repository review

Lars reviewed all the open issues, merge requests, and CI pipelines for all the projects in the Obnam group on gitlab.com.

Container Images

  • Open issues: 0
  • Merge requests: 0
  • Additional branches: 0
  • CI: OK, ran on Monday, March 14

obnam.org

  • Open issues: 0
  • Merge requests: 0
  • Additional branches: 0
  • CI: not defined

obnam-benchmark

  • Open issues: 11
  • Merge requests: 0
  • Additional branches: 0
  • CI: not defined

summain

  • Open issues: 0
  • Merge requests: 0
  • Additional branches: 0
  • CI: not defined

obnam

  • Open issues: 54
  • Merge requests: 2
    • !214 - performance metrics
      • needs thinking and further work
    • !222 - add backup database schema to evolove; break server database
      • to be merged on Tuesday
  • Additional branches: 0
  • CI: OK

Goals

Goal for 1.0 (not changed this iteration)

The goal for version 1.0 is for Obnam to be an utterly boring backup solution for Linux command line users. It should just work, be performant, secure, and well-documented.

It is not a goal for version 1.0 to have been ported to other operating systems, but if there are volunteers to do that, and to commit to supporting their port, ports will be welcome.

Other user interfaces is likely to happen only after 1.0.

The server component will support multiple clients in a way that doesn’t let them see each other’s data. It is not a goal for clients to be able to share data, even if the clients trust each other.

Goal for the next few iterations (not changed for this iteration)

The goal for next few iterations is to have Obnam be easier and safer to change, both for developers and end users. This means that developers need to be able to make breaking changes without users having to suffer. User shall be able to migrate their data, when they feel it worthwhile, not just because there is a new version.

Goal for this iteration (new for this iteration)

The goal of this iteration is to add a "root object" for a user's backups, which lists the backup generations in order.

Commitments for this iteration

Lars intends to work on the "root object" change, as described above. This will affect, and hopefully resolve the following issues:

  • obnam#34 - Uses timestamps to order backup generations
  • obnam#62 - Describe how chunks relate to each other

Meeting participants

  • Lars Wirzenius