Difference between revisions of "Mine information from external sources (GSoC task)"

From Software Heritage Wiki
Jump to: navigation, search
(add task description)
Line 1: Line 1:
 +
== Introduction ==
 +
 
In addition to archiving source code artifacts, Software Heritage is interested in
 
In addition to archiving source code artifacts, Software Heritage is interested in
 
archive metadata from external sources and correlate it to source code artifacts.
 
archive metadata from external sources and correlate it to source code artifacts.
Line 6: Line 8:
 
[https://forge.softwareheritage.org/T1739 work in progress], and you are welcome
 
[https://forge.softwareheritage.org/T1739 work in progress], and you are welcome
 
to contribute to its implementation.
 
to contribute to its implementation.
 +
 +
== Task description ==
 +
 +
You would contribute to the design of our metadata-fetching architecture.
 +
This includes:
 +
 +
* Review what metadata we want to fetch
 +
* How to efficiently fetch it at regular intervals and store it
 +
* Implement metadata fetching from at least one source, in a way that can be generalized to other sources
  
 
[[Category:GSoC task]]
 
[[Category:GSoC task]]

Revision as of 10:12, 2 March 2021

Introduction

In addition to archiving source code artifacts, Software Heritage is interested in archive metadata from external sources and correlate it to source code artifacts. This is also to enable semantic searches on the archive and scientific research.

Collecting this extrinsic metadata is a work in progress, and you are welcome to contribute to its implementation.

Task description

You would contribute to the design of our metadata-fetching architecture. This includes:

  • Review what metadata we want to fetch
  • How to efficiently fetch it at regular intervals and store it
  • Implement metadata fetching from at least one source, in a way that can be generalized to other sources