Difference between revisions of "Virtual Software Heritage filesystem (internship)"

From Software Heritage Wiki
Jump to: navigation, search
(fill in internship details)
(use main internship template)
Line 1: Line 1:
'''Context''': {{Internship context}}
+
{{Internship
 
+
|description=the Software Heritage [https://docs.softwareheritage.org/devel/swh-model/data-model.html data model] is a giant [https://en.wikipedia.org/wiki/Merkle_tree Merkle DAG], not unlike the [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects Git data model].
'''Description''': the Software Heritage [https://docs.softwareheritage.org/devel/swh-model/data-model.html data model] is a giant [https://en.wikipedia.org/wiki/Merkle_tree Merkle DAG], not unlike the [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects Git data model].
 
 
The goal of this internship is to develop a [https://en.wikipedia.org/wiki/Filesystem_in_Userspace FUSE] virtual filesystem that allows to mount parts of the Software Heritage graph on a Linux machine and navigate them as if they were locally available.
 
The goal of this internship is to develop a [https://en.wikipedia.org/wiki/Filesystem_in_Userspace FUSE] virtual filesystem that allows to mount parts of the Software Heritage graph on a Linux machine and navigate them as if they were locally available.
 
The filesystem backend will use the Software Heritage [https://docs.softwareheritage.org/devel/swh-storage/ storage] and/or [https://docs.softwareheritage.org/devel/swh-graph/ graph] APIs to fetch the relevant data from the archive and cache them locally as needed for efficiency reasons.
 
The filesystem backend will use the Software Heritage [https://docs.softwareheritage.org/devel/swh-storage/ storage] and/or [https://docs.softwareheritage.org/devel/swh-graph/ graph] APIs to fetch the relevant data from the archive and cache them locally as needed for efficiency reasons.
 
Use cases that will be explored are both interactive archive navigation for retrieving archived source code and VCS analysis for research purposes.
 
Use cases that will be explored are both interactive archive navigation for retrieving archived source code and VCS analysis for research purposes.
  
'''Desirable skills''' to obtain this internship:
+
|skills=
 
* Python development
 
* Python development
 
* familiarity with Git or similar [https://en.wikipedia.org/wiki/Distributed_version_control distributed version control systems]
 
* familiarity with Git or similar [https://en.wikipedia.org/wiki/Distributed_version_control distributed version control systems]
Line 14: Line 13:
 
* working knowledge of filesystem architectures and/or implementations
 
* working knowledge of filesystem architectures and/or implementations
  
'''Workplace''': {{Internship workplace}}
+
|mentors=
 
 
'''Environment''': {{Internship environment}}
 
 
 
'''Internship mentors''':
 
 
* Antoine Pietri <antoine.pietri1@gmail.com>
 
* Antoine Pietri <antoine.pietri1@gmail.com>
 
* Stefano Zacchiroli <zack@upsilon.cc>
 
* Stefano Zacchiroli <zack@upsilon.cc>
 
+
}}
  
 
[[Category:Available internship]]
 
[[Category:Available internship]]
 
[[Category:Internship]]
 
[[Category:Internship]]
 
[[Category:Lang:English]]
 
[[Category:Lang:English]]

Revision as of 15:34, 31 January 2020

Context: Software Heritage is an ambitious research project whose goal is to collect, preserve in the very long term, and share the whole publicly accessible Free/Open Source Software (FOSS) in source code form.

Description: the Software Heritage data model is a giant Merkle DAG, not unlike the Git data model. The goal of this internship is to develop a FUSE virtual filesystem that allows to mount parts of the Software Heritage graph on a Linux machine and navigate them as if they were locally available. The filesystem backend will use the Software Heritage storage and/or graph APIs to fetch the relevant data from the archive and cache them locally as needed for efficiency reasons. Use cases that will be explored are both interactive archive navigation for retrieving archived source code and VCS analysis for research purposes.

Desirable skills to obtain this internship:

Will be considered a plus:

  • working knowledge of filesystem architectures and/or implementations

Workplace: on site at Inria Paris (contact mentors for remote opportunities)

Environment: you will work shoulder to shoulder with all members of the Software Heritage team, and you will have a chance to witness from within the construction of the great library of source code.

Internship mentors:

  • Antoine Pietri <antoine.pietri1@gmail.com>
  • Stefano Zacchiroli <zack@upsilon.cc>

See also