This is a DRAFT road map for the new repository format (GREEN ALBATROSS) to become the default format in Obnam. Feedback on this road map is welcome via the obnam-devel mailing list.

Note that only things that affect the new repository format are relevant for this road map. All other bugs or features are off-topic and will not be included.

Success criteria for being done with the new format

  • All major operations need to be tolerably fast, and run in 4 GiB of RAM. The test data set is to be the snapshot of almost 5 TiB data from my own file server. The backup repository should use encryption and compression.

    • backup
    • forget
    • restore (or possibly verify, to avoid needing another 5 TiB space)
    • fsck
  • Obnam supports Attic-style chunking (lowest N bits of weak checksum means chunk ends), and things are still tolerably fast.

  • The manual needs to be updated to cover all GA things and needs to have a comparison between 6 and GA, and also advice for converting to the new format ("start over" is sufficient, though a conversion tool would be nice).

  • The obnam.org website needs to be reviewed, and any design docs updated.

  • At least three months needs to have passed of actively asking Obnam users to use GA, without showstopper bugs being reported.

  • No known need to change the new repository format.

Road map

This is the rough order of things that I know needs doing. There are certainly things missing from the list, reality always wins over my most careful planning.

  • Add new benchmarks for all the success criteria. All the operations listed above should be benchmarked. Then analyze results and make any optimizations needed.

  • Add Attic-style chunking. These chunks are not of fixed size at fixed positions in the files. This matters to the repository format because there may be lots more chunks, depending on settings, and the format needs to handle that.

  • Add sparse file handling to GA. Sparse files are not used by everyone, but those that do, really want them. Obnam currently doesn't handle them optimally and has not way of representing them in the repository except as long sequences of all-bits-zero bytes

  • Make sure Obnam handles the case of an unknown username or group name of a file (only numeric uid or gid known). This is important when it's not feasible to get the user/group name from an SFTP server. This affects the repository format only a little, but it needs to allow storing a value to represent "not known", and all code needs to be to deal with that.

  • Implement in-process symmetric encryption. This is not directly relevant for GA, but as it will not be backwards compatible with the old way of doing encryption, I'm lumping it in with GA. Get all the incompatibilities done at once.

  • Implement a reasonably fast fsck for GA.

  • Review obnam.org website and make any necessary updates.

  • Review Obnam manual and make any necessary updates. Co-ordinate with translators to get non-English versions of the manual to also be updated.

  • Make an Obnam release with a beta level version of GA, and ask people to use it and report results.