Virtual Software Heritage filesystem (internship)
Context: Software Heritage is an ambitious initiative whose goal is to collect, preserve forever, and make publicly available the entire body of software, in the preferred form for making modifications to it.
Description: the Software Heritage data model is a giant Merkle DAG, not unlike the Git data model. The goal of this internship is to develop a FUSE virtual filesystem that allows to mount parts of the Software Heritage graph on a Linux machine and navigate them as if they were locally available. The filesystem backend will use the Software Heritage storage and/or graph APIs to fetch the relevant data from the archive and cache them locally as needed for efficiency reasons. Use cases that will be explored are both interactive archive navigation for retrieving archived source code and VCS analysis for research purposes.
Desirable skills to obtain this internship:
- Python development
- familiarity with Git or similar distributed version control systems
- familiarity with system level programming (C language, POSIX API)
Will be considered a plus:
- working knowledge of filesystem architectures and/or implementations
Workplace: on site at Inria Paris (contact mentors for remote opportunities)
Environment: you will work shoulder to shoulder with all members of the Software Heritage team, and you will have a chance to witness from within the construction of the great library of source code.
Internship mentors:
- Antoine Pietri
- Stefano Zacchiroli <zack@upsilon.cc>