Integrate Software Heritage and ClearlyDefined (internship)
Context: Software Heritage is an ambitious research project whose goal is to collect, preserve in the very long term, and share the whole publicly accessible Free/Open Source Software (FOSS) in source code form.
Description: ClearlyDefined is a project whose goal is to collaboratively and semi-automatically curate information about Free/Open Source Software (FOSS) projects, including licensing and vulnerability information. As one of its main output, ClearyDefined maintains an open data knowledge-base that cross references FOSS source code artifacts found in version control systems, package repositories, etc. to curated information about their licenses and vulnerabilities. The same source code artifacts are archived by Software Heritage for long-term preservation purposes. The goal of this internship is to integrate ClearlyDefined and Software Heritage, for mutual benefit. Software Heritage will benefit from mirroring ClearlyDefined data, allowing to query them while navigating the archive and at scale; ClearlyDefined will benefit from learning about the existing of FOSS projects that have not been analyzed for "clarity" yet.
Desirable skills to obtain this internship:
- experience with database management systems (of any kind)
Workplace: on site at Inria Paris (contact mentors for remote opportunities)
- Philippe Ombredanne <firstname.lastname@example.org> (nexB)
- Stefano Zacchiroli <email@example.com> (Software Heritage)