Difference between revisions of "Repository snapshot objects"
Line 3: | Line 3: | ||
== Introduction == | == Introduction == | ||
− | A repository '''snapshot object''' is a [https://en.wikipedia.org/wiki/Merkle_tree Merkle] DAG node used to capture the current state of a VCS repository. | + | A repository snapshot object, or simply '''snapshot object''', is a [https://en.wikipedia.org/wiki/Merkle_tree Merkle] DAG node used to capture the current state of a VCS repository. |
Conceptually, a snapshot object is a complete map from repository entry points ("branches" in [[Software Heritage]] terminology, "refs" in Git) to other objects in the repository, including other snapshot objects if repository entry points point to them.<br> | Conceptually, a snapshot object is a complete map from repository entry points ("branches" in [[Software Heritage]] terminology, "refs" in Git) to other objects in the repository, including other snapshot objects if repository entry points point to them.<br> | ||
− | Practically, the map is serialized into a '''manifest''' consisting of a list of triples ''<object type, object ID, branch name>'' | + | Practically, the map is serialized into a '''manifest''' consisting of a list of triples ''<object type, object ID, branch name>''. |
− | Entries in | + | Entries in snapshots can point to the following object kinds: |
− | + | * contents (Git terminology: blobs) | |
− | + | * directories (tree) | |
− | + | * releases (annotated tags) | |
− | + | * revisions (commits) | |
− | + | * snapshots | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | The '''object ID''' of a repository object is the cryptographic hash (the same kind of hash used elsewhere in the Merkle DAG) of its manifest. | |
− | == Git implementation == | + | == Manifest == |
+ | |||
+ | The manifest of a repository object is a ''canonical'' representation of it as a sequence of bytes.<br> | ||
+ | Two different formats for such a manifest are proposed below: | ||
+ | * ''a-la Software Heritage'': how we would implement on our own, not taking into account compatibility with/stylistic choices of other VCSs | ||
+ | * ''a-la Git'': manifest implementation similar to how Git implements manifests for other DAG objects | ||
+ | |||
+ | === a-la Software Heritage === | ||
+ | |||
+ | '''TODO''' | ||
+ | |||
+ | === a-la Git === | ||
+ | |||
+ | '''TODO''' | ||
In the spirit of other [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects Git objects], snapshot objects for Git repositories can be implemented as follows. | In the spirit of other [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects Git objects], snapshot objects for Git repositories can be implemented as follows. |
Revision as of 06:53, 16 August 2016
WARNING: work in progress blueprint
Introduction
A repository snapshot object, or simply snapshot object, is a Merkle DAG node used to capture the current state of a VCS repository.
Conceptually, a snapshot object is a complete map from repository entry points ("branches" in Software Heritage terminology, "refs" in Git) to other objects in the repository, including other snapshot objects if repository entry points point to them.
Practically, the map is serialized into a manifest consisting of a list of triples <object type, object ID, branch name>.
Entries in snapshots can point to the following object kinds:
- contents (Git terminology: blobs)
- directories (tree)
- releases (annotated tags)
- revisions (commits)
- snapshots
The object ID of a repository object is the cryptographic hash (the same kind of hash used elsewhere in the Merkle DAG) of its manifest.
Manifest
The manifest of a repository object is a canonical representation of it as a sequence of bytes.
Two different formats for such a manifest are proposed below:
- a-la Software Heritage: how we would implement on our own, not taking into account compatibility with/stylistic choices of other VCSs
- a-la Git: manifest implementation similar to how Git implements manifests for other DAG objects
a-la Software Heritage
TODO
a-la Git
TODO
In the spirit of other Git objects, snapshot objects for Git repositories can be implemented as follows.
# create repo with some commits, branches, and tags $ git init test $ cd test/ $ echo foo > foo.txt $ git add foo.txt $ git commit -m 'checkin foo' $ git branch foo $ echo bar >> foo.txt $ git commit -a -m 'add bar' $ git tag bar $ echo baz >> foo.txt $ git commit -a -m 'add baz' # ASSUMPTION: the output of git show-ref is sorted by ref name using # the usual Git sort algorithm for textual object manifests. This is # currently the case as of Git 2.8.1, but it is not documented # behavior in git-show-ref(1). # repository object in full (the manifest) $ git show-ref | \ while read id ref ; do type=$(git cat-file -t $id) echo $type $id $ref done \ > /tmp/snapshot-object.txt $ cat /tmp/snapshot-object.txt commit 585f6e27f540012af621a18d0155aae2a8ec0276 refs/heads/foo commit 6d976a397fe0b28a5bc59540e64f7f36a861af68 refs/heads/master commit 521cb6d728f9fa3d6c4d73ddd309c0796ddf6995 refs/tags/bar # repository object ID, as a Git SHA1 $ git hash-object -w --stdin --literally -t snapshot < /tmp/snapshot-object.txt 470d2daa27715987685708b816bf2b52ba5a47c8 # raw content of the repository object, including Git header $ zlib-flate -uncompress < .git/objects/47/0d2daa27715987685708b816bf2b52ba5a47c8 snapshot 191commit 585f6e27f540012af621a18d0155aae2a8ec0276 refs/heads/foo commit 6d976a397fe0b28a5bc59540e64f7f36a861af68 refs/heads/master commit 521cb6d728f9fa3d6c4d73ddd309c0796ddf6995 refs/tags/bar # i.e., a 191-byte long object of type "snapshot" # (note that a "\0" before the first "commit" string has been stripped)