Difference between revisions of "Expand package metadata coverage (internship)"
(7 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | {{Internship | |
− | + | |description=[https://archive.softwareheritage.org/browse/search/ searching] projects in the Software Heritage archive is currently possible by either (parts of) URL or by [https://www.softwareheritage.org/2019/05/28/mining-software-metadata-for-80-m-projects-and-even-more/ package metadata]. | |
− | |||
Currently, only a limited number of package metadata are [https://docs.softwareheritage.org/devel/swh-indexer/metadata-workflow.html#supported-intrinsic-metadata supported], including Maven, NPM, PyPI, and Gems. | Currently, only a limited number of package metadata are [https://docs.softwareheritage.org/devel/swh-indexer/metadata-workflow.html#supported-intrinsic-metadata supported], including Maven, NPM, PyPI, and Gems. | ||
The goal of this internship is to extend the coverage of supported metadata to additional package managers, the long-term goal being supporting all [https://libraries.io/ Libraries.io]-indexed package managers. | The goal of this internship is to extend the coverage of supported metadata to additional package managers, the long-term goal being supporting all [https://libraries.io/ Libraries.io]-indexed package managers. | ||
− | + | For more information of the existing tools, you can read our [https://www.softwareheritage.org/2019/05/28/mining-software-metadata-for-80-m-projects-and-even-more/ metadata blog post] or dive into the [https://docs.softwareheritage.org/devel/swh-indexer/metadata-workflow.html#adding-support-for-additional-ecosystem-specific-metadata technical tutorial] | |
+ | |||
+ | |skills= | ||
* Python development | * Python development | ||
Line 11: | Line 12: | ||
* knowledge of linked data technologies and ontologies (e.g., RDFa, JSON-LD, OWL, etc.) | * knowledge of linked data technologies and ontologies (e.g., RDFa, JSON-LD, OWL, etc.) | ||
− | + | |mentors= | |
− | + | * Morane Gruenpeter (moranegg on [[IRC]]) | |
− | + | * Valentin Lorentz (vlorentz on [[IRC]]) | |
− | + | }} | |
− | |||
− | * Morane Gruenpeter | ||
− | * Valentin Lorentz | ||
− | |||
[[Category:Available internship]] | [[Category:Available internship]] | ||
− | |||
− |
Latest revision as of 15:07, 4 February 2024
Context: Software Heritage is an ambitious initiative whose goal is to collect, preserve forever, and make publicly available the entire body of software, in the preferred form for making modifications to it.
Description: searching projects in the Software Heritage archive is currently possible by either (parts of) URL or by package metadata. Currently, only a limited number of package metadata are supported, including Maven, NPM, PyPI, and Gems. The goal of this internship is to extend the coverage of supported metadata to additional package managers, the long-term goal being supporting all Libraries.io-indexed package managers.
For more information of the existing tools, you can read our metadata blog post or dive into the technical tutorial
Desirable skills to obtain this internship:
- Python development
Will be considered a plus:
- knowledge of linked data technologies and ontologies (e.g., RDFa, JSON-LD, OWL, etc.)
Workplace: on site at Inria Paris (contact mentors for remote opportunities)
Environment: you will work shoulder to shoulder with all members of the Software Heritage team, and you will have a chance to witness from within the construction of the great library of source code.
Internship mentors: