Difference between revisions of "Mine information from external sources (GSoC task)"
Jump to navigation
Jump to search
LibEqualizer (talk | contribs) (→Task description: added subsection for Possible metadata sources) |
|||
Line 17: | Line 17: | ||
* How to efficiently fetch it at regular intervals and store it | * How to efficiently fetch it at regular intervals and store it | ||
* Implement metadata fetching from at least one source, in a way that can be generalized to other sources | * Implement metadata fetching from at least one source, in a way that can be generalized to other sources | ||
+ | |||
+ | === Possible metadata sources === | ||
+ | * the [FSF Directory](https://directory.fsf.org) | ||
== Desirable skills == | == Desirable skills == |
Revision as of 19:51, 11 April 2021
Introduction
In addition to archiving source code artifacts, Software Heritage is interested in archive metadata from external sources and correlate it to source code artifacts. This is also to enable semantic searches on the archive and scientific research.
Collecting this extrinsic metadata is a work in progress, and you are welcome to contribute to its implementation.
Task description
You would contribute to the design of our metadata-fetching architecture. This includes:
- Review what metadata we want to fetch
- How to efficiently fetch it at regular intervals and store it
- Implement metadata fetching from at least one source, in a way that can be generalized to other sources
Possible metadata sources
- the [FSF Directory](https://directory.fsf.org)
Desirable skills
- Python 3 and Git are a must to work on any Software Heritage project
- Prior experience in working with software metadata is a plus, but not required