Difference between revisions of "Google Summer of Code 2019/Graph compression"

From Software Heritage Wiki
Jump to: navigation, search
(Add graph compression subject)
 
Line 1: Line 1:
= Subject =
+
* '''Title:''' graph compression
 +
* '''Description:''' The Software Heritage [https://docs.softwareheritage.org/devel/swh-model/data-model.html data model] is a big [https://en.wikipedia.org/wiki/Merkle_tree Merkle] DAG made of nodes like revisions, releases, directories, etc. It is a very big graph, with ~10 B nodes and ~100 B edges, which makes it hard to fit in memory using naive approaches. Graph compression techniques have been successfully used to compress the Web graph (which is slightly larger than the Software Heritage one) and make it fit in memory. The goal of this GSoC project is review existing graph compression techniques and apply the most appropriate one to the Software Heritage case, enabling in-memory processing of its Merkle DAG.
 +
* '''Student:''' Thibault Allançon
 +
* '''Mentors:'''
 +
** [[User:StefanoZacchiroli|Stefano Zacchiroli]]
 +
** Antoine Pietri
 +
* '''Weekly reports:'''
 +
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00000.html 2019/19]
  
[https://www.softwareheritage.org/ Software Heritage] is an ambitious research project whose goal is to collect, preserve in the very long term, and share the whole publicly accessible Free/Open Source Software (FOSS) in source code form.
+
== Links ==
  
The Software Heritage [https://docs.softwareheritage.org/devel/swh-model/data-model.html data model] is a big [https://en.wikipedia.org/wiki/Merkle_tree Merkle] DAG made of nodes like revisions, releases, directories, etc. It is a very big graph, with ~10 B nodes and ~100 B edges, which makes it hard to fit in memory using naive approaches. Graph compression techniques have been successfully used to compress the Web graph (which is slightly larger than the Software Heritage one) and make it fit in memory. The goal of this GSoC is review existing graph compression techniques and apply the most appropriate one to the Software Heritage case, enabling in-memory processing of its Merkle DAG.
+
* [https://forge.softwareheritage.org/source/graph-compression/ source code repository]
 +
* see project [https://summerofcode.withgoogle.com/projects/#4670175868092416 on the GSoC portal]

Revision as of 13:38, 15 May 2019

  • Title: graph compression
  • Description: The Software Heritage data model is a big Merkle DAG made of nodes like revisions, releases, directories, etc. It is a very big graph, with ~10 B nodes and ~100 B edges, which makes it hard to fit in memory using naive approaches. Graph compression techniques have been successfully used to compress the Web graph (which is slightly larger than the Software Heritage one) and make it fit in memory. The goal of this GSoC project is review existing graph compression techniques and apply the most appropriate one to the Software Heritage case, enabling in-memory processing of its Merkle DAG.
  • Student: Thibault Allançon
  • Mentors:
  • Weekly reports:

Links