Difference between revisions of "Repository snapshot objects"

From Software Heritage Wiki
Jump to navigation Jump to search
(Created page with "== Repository snapshot objects == A repository '''snapshot object''' is a [https://en.wikipedia.org/wiki/Merkle_tree Merkle] DAG node used to capture the state of a VCS repos...")
 
Line 12: Line 12:
 
== Git repository snapshot objects ==
 
== Git repository snapshot objects ==
  
'''TODO'''
+
In the spirit of other [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects Git objects], snapshot object for Git repositories can be implemented as follows.
 +
 
 +
<pre>
 +
$ git init test
 +
Initialized empty Git repository in /home/zack/test/.git/
 +
$ cd test/
 +
$ echo foo > foo.txt
 +
$ git add foo.txt
 +
$ git commit -m 'checkin foo'
 +
[master (root-commit) 502cb84] checkin foo
 +
1 file changed, 1 insertion(+)
 +
create mode 100644 foo.txt
 +
$ git branch foo
 +
$ echo bar >> foo.txt
 +
$ git commit -a -m 'add bar'
 +
[master a9ae6b8] add bar
 +
1 file changed, 1 insertion(+)
 +
$ git tag bar
 +
$ echo baz >> foo.txt
 +
$ git commit -a -m 'add baz'
 +
[master 2982e6b] add baz
 +
1 file changed, 1 insertion(+)
 +
 
 +
$ # ASSUMPTION: the output of git show-ref is sorted by ref name using
 +
$ # the usual Git sort algorithm for textual object manifests. This is
 +
$ # currently the case as of Git 2.8.1, but it is not documented
 +
$ # behavior in git-show-ref(1).
 +
 
 +
$ # object content in full (the manifest)
 +
$ git show-ref > snapshot-object.txt
 +
$ sha1sum snapshot-object.txt | cut -f 1 -d' ' > snapshot-object.id.txt
 +
 
 +
$ # object ID (the Git SHA1 of the manifest)
 +
$ cat snapshot-object.id.txt
 +
11ef7425e723ae003eaa120cc0a5d5f9b6274457
 +
 
 +
$ cat snapshot-object.txt
 +
502cb84dca18c16658fe9341a2a2b2301bba2777 refs/heads/foo
 +
2982e6bd42c65ddbc6023861ccab67300aed41c8 refs/heads/master
 +
a9ae6b8e315d71b5a8aaef1e7bc5d588b2426f0e refs/tags/bar
 +
</pre>

Revision as of 15:21, 9 August 2016

Repository snapshot objects

A repository snapshot object is a Merkle DAG node used to capture the state of a VCS repository.

Conceptually, a snapshot object is a map from branch names to revision identifiers.
Practically, the map is serialized as an association list sorted by branch name.

Object ID

Each snapshot object has as its snapshot object ID the cryptographic has value of a textual serialization of the <branch name, revision ID> association list.

Git repository snapshot objects

In the spirit of other Git objects, snapshot object for Git repositories can be implemented as follows.

$ git init test
Initialized empty Git repository in /home/zack/test/.git/
$ cd test/
$ echo foo > foo.txt
$ git add foo.txt 
$ git commit -m 'checkin foo'
[master (root-commit) 502cb84] checkin foo
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt
$ git branch foo
$ echo bar >> foo.txt 
$ git commit -a -m 'add bar'
[master a9ae6b8] add bar
 1 file changed, 1 insertion(+)
$ git tag bar
$ echo baz >> foo.txt 
$ git commit -a -m 'add baz'
[master 2982e6b] add baz
 1 file changed, 1 insertion(+)

$ # ASSUMPTION: the output of git show-ref is sorted by ref name using
$ # the usual Git sort algorithm for textual object manifests. This is
$ # currently the case as of Git 2.8.1, but it is not documented
$ # behavior in git-show-ref(1).

$ # object content in full (the manifest)
$ git show-ref > snapshot-object.txt
$ sha1sum snapshot-object.txt | cut -f 1 -d' ' > snapshot-object.id.txt

$ # object ID (the Git SHA1 of the manifest)
$ cat snapshot-object.id.txt
11ef7425e723ae003eaa120cc0a5d5f9b6274457

$ cat snapshot-object.txt 
502cb84dca18c16658fe9341a2a2b2301bba2777 refs/heads/foo
2982e6bd42c65ddbc6023861ccab67300aed41c8 refs/heads/master
a9ae6b8e315d71b5a8aaef1e7bc5d588b2426f0e refs/tags/bar