Difference between revisions of "Virtual Software Heritage filesystem (internship)"

From Software Heritage Wiki
Jump to: navigation, search
(fill in internship details)
Line 1: Line 1:
 
'''Context''': {{Internship context}}
 
'''Context''': {{Internship context}}
  
'''Description''': ''TO BE FILLED IN''
+
'''Description''': the Software Heritage [https://docs.softwareheritage.org/devel/swh-model/data-model.html data model] is a giant [https://en.wikipedia.org/wiki/Merkle_tree Merkle DAG], not unlike the [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects Git data model].
 +
The goal of this internship is to develop a [https://en.wikipedia.org/wiki/Filesystem_in_Userspace FUSE] virtual filesystem that allows to mount parts of the Software Heritage graph on a Linux machine and navigate them as if they were locally available.
 +
The filesystem backend will use the Software Heritage [https://docs.softwareheritage.org/devel/swh-storage/ storage] and/or [https://docs.softwareheritage.org/devel/swh-graph/ graph] APIs to fetch the relevant data from the archive and cache them locally as needed for efficiency reasons.
 +
Use cases that will be explored are both interactive archive navigation for retrieving archived source code and VCS analysis for research purposes.
  
 
'''Desirable skills''' to obtain this internship:
 
'''Desirable skills''' to obtain this internship:
''TO BE FILLED IN''
+
* Python development
 +
* familiarity with Git or similar [https://en.wikipedia.org/wiki/Distributed_version_control distributed version control systems]
 +
* familiarity with system level programming (C language, POSIX API)
 +
 
 +
Will be considered a plus:
 +
* working knowledge of filesystem architectures and/or implementations
  
 
'''Workplace''': {{Internship workplace}}
 
'''Workplace''': {{Internship workplace}}
Line 11: Line 19:
  
 
'''Internship mentors''':
 
'''Internship mentors''':
''TO BE FILLED IN''
+
* Antoine Pietri <antoine.pietri1@gmail.com>
 
+
* Stefano Zacchiroli <zack@upsilon.cc>
  
  

Revision as of 10:25, 29 January 2020

Context: Software Heritage is an ambitious research project whose goal is to collect, preserve in the very long term, and share the whole publicly accessible Free/Open Source Software (FOSS) in source code form.

Description: the Software Heritage data model is a giant Merkle DAG, not unlike the Git data model. The goal of this internship is to develop a FUSE virtual filesystem that allows to mount parts of the Software Heritage graph on a Linux machine and navigate them as if they were locally available. The filesystem backend will use the Software Heritage storage and/or graph APIs to fetch the relevant data from the archive and cache them locally as needed for efficiency reasons. Use cases that will be explored are both interactive archive navigation for retrieving archived source code and VCS analysis for research purposes.

Desirable skills to obtain this internship:

Will be considered a plus:

  • working knowledge of filesystem architectures and/or implementations

Workplace: on site at Inria Paris (contact mentors for remote opportunities)

Environment: you will work shoulder to shoulder with all members of the Software Heritage team, and you will have a chance to witness from within the construction of the great library of source code.

Internship mentors:

  • Antoine Pietri <antoine.pietri1@gmail.com>
  • Stefano Zacchiroli <zack@upsilon.cc>