Difference between revisions of "Graph query language for the archive (internship)"
m (mention remote opportunities) |
(add IRC nicknames) |
||
Line 18: | Line 18: | ||
|mentors= | |mentors= | ||
* [https://perso.liris.cnrs.fr/angela.bonifati/ Angela Bonifati] <angela.bonifati@univ-lyon1.fr> | * [https://perso.liris.cnrs.fr/angela.bonifati/ Angela Bonifati] <angela.bonifati@univ-lyon1.fr> | ||
− | * Stefano Zacchiroli <zack@upsilon.cc> | + | * Stefano Zacchiroli <zack@upsilon.cc> (zack on [[IRC]]) |
}} | }} | ||
[[Category:Available internship]] | [[Category:Available internship]] |
Revision as of 10:33, 2 March 2021
Context: Software Heritage is an ambitious initiative whose goal is to collect, preserve forever, and make publicly available the entire body of software, in the preferred form for making modifications to it.
Description: The Software Heritage archive is structured as a graph (specifically, a Merkle DAG) and is huge: tens of billion nodes, hundreds of billion edges. It has recently been verified that a compressed representation of the graph structure can fit in memory, whereas node/edge properties can be memory-mapped to secondary storage (see: documentation and in particular the SANER 2020 paper referenced there). An ad hoc API is available to traverse the graph, with very limited querying capabilities. The goal of this internship is to experiment with the possibility of querying the archive graph via state-of-the-art graph query languages. The ideal outcome of the internship will be a prototype of query engine that answers queries on top of the compressed graph representation plus associated property maps. The tentative target platform for implementation is Neo4j extended with a compatibility layer for WebGraph ImmutableGraph, but other platforms might be considered during the internship.
Desirable skills to obtain this internship:
- Java development
- Query languages for structured, semi-structured, or graph data (e.g., one or more among: SQL, Xquery, GraphQL, SPARQL, GQL, etc.)
Will be considered a plus:
- experience with graph databases (e.g., Neo4j)
Workplace: LIRIS (Univ. Lyon 1, Lyon) or Inria Paris or contact mentors for remote opportunities
Environment: you will work shoulder to shoulder with all members of the Software Heritage team, and you will have a chance to witness from within the construction of the great library of source code.
Internship mentors:
- Angela Bonifati <angela.bonifati@univ-lyon1.fr>
- Stefano Zacchiroli <zack@upsilon.cc> (zack on IRC)