Difference between revisions of "WG/Source Discovery and Ingestion"

From Software Heritage Wiki
Jump to navigation Jump to search
(First version of the complete description of the SODI working group, including charter and expected outcomes)
Line 1: Line 1:
 +
= Working group on Source Discovery and Ingestion =
 
== Charter ==
 
== Charter ==
 +
Collecting the source code that is publicly available is an essential part of Software Heritage's mission.
 +
In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse
 +
set of possible origins like:
  
This working group, codename SODI, is in charge of the issues related to adding and updating external sources of content to the archive.
+
* several kinds of [https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities source code repositories], including:
It is expected to work both on technical and non technical aspects
+
** mainstream development platforms, like GitHub, BitBucket, Sourceforge or Codeplex;
 +
** institutional forges, like Inria's, Cenatic's or Adullact's FusionForge;
 +
** community repositories like Debian's FusionForge or Joomla's gForge, Gnu's [http://savannah.gnu.org/ Savane], and the [http://www.apache.org/dyn/closer.cgi/ Apache] or [https://git.eclipse.org/c/ Eclipse] custom repositories;
 +
** a variety of different source code archives, ranging from the [ftp://ftp.gnu.org Gnu ftp server] to individual web pages
  
=== Technical aspects ===
+
This is a challenging task, and in order to suceed, the involvement of a large community is needed.
* an API for the archive to interact with an external source (forge, repository, web page, etc.) in order to perform at least the following tasks
 
** list the full content of a source
 
** get the list of updates in a source since a given moment in time
 
* mechanisms for discovering new sources
 
  
=== Non technical aspects ===
+
=== Mission ===
 +
The SODI working group's mission is to foster ''the development'' and
 +
''the adoption'' of software components that can, for a given origin, make it
 +
''discoverable'', and ''list'' its contents, both ''in its entirety'', which is
 +
useful for a newly discovered origin, and ''incrementally'', which is useful
 +
to keep the Software Heritage archive up to date with the origin's evolving
 +
content.
  
The working group members will contribute to raise awareness, and foster broad adoption of the APIs and mechanisms defined.
+
=== Duration ===
The main objective is to make sure that there will be numerous external contributors willing to write software components adapted to the different kinds of existing sources of content.
+
This working group is open ended.
  
== Group animation ==
+
=== Expected outcomes ===
 +
The main expected outcomes of the SODI working group are listed below:
  
== Resources ==
+
''APIs for discovering and tracking origins''
 +
The SODI working group will define and evolve, in collaboration with the Software Heritage core team, standard APIs for software components  that can be plugged in the Software Heritage infrastructure to track a  (class of) origin(s). Whenever possible, proactive mechanisms for informing the Software Heritage infrastructure of content updates, like event feeds, will be preferred to approaches that require periodic polling of an origin.
  
Mailing list: [https://sympa.inria.fr/sympa/info/sodi-wg-swh sodi-wg-swh]
+
''Adoption''
 +
The SODI working group will strive, whenever possible, to have such components incorporated in the upstream code of the forges, and adopted widely.
  
 +
''Awareness''
 +
The SODI working group will establish the relevant connections in order to  raise awareness among all the interested parties.
  
 +
=== Milestones ===
 +
A first set of requirements for the API, and a preliminary draft of the API is
 +
expected to emerge from the work performed to ensure that Inria's own forge(s)
 +
are properly tracked in the Software Heritage architecture.
  
 +
=== Related working groups ===
 +
This working group is related to: [[Modeling_and_Ingesting_Version_control_systems | Modeling and Ingesting Version control systems (MIV)]]
 +
 +
== Team contact(s) ==
 +
* Roberto Di Cosmo
 +
 +
* Stefano Zacchiroli
 +
 +
== Documents ==
 +
Documents produced by the working group will be listed in this section.
 +
 +
== Connections ==
 +
Active or planned connections to other initiatives, and activities, will be listed in this section.
 +
 +
== Infrastructure ==
 +
=== Mailing list ===
 +
* https://sympa.inria.fr/sympa/info/sodi-wg-swh
  
  
 
[[Category: technical]]
 
[[Category: technical]]
 
[[Category: ingestion]]
 
[[Category: ingestion]]

Revision as of 15:53, 25 March 2016

Working group on Source Discovery and Ingestion

Charter

Collecting the source code that is publicly available is an essential part of Software Heritage's mission. In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse set of possible origins like:

  • several kinds of source code repositories, including:
    • mainstream development platforms, like GitHub, BitBucket, Sourceforge or Codeplex;
    • institutional forges, like Inria's, Cenatic's or Adullact's FusionForge;
    • community repositories like Debian's FusionForge or Joomla's gForge, Gnu's Savane, and the Apache or Eclipse custom repositories;
    • a variety of different source code archives, ranging from the Gnu ftp server to individual web pages

This is a challenging task, and in order to suceed, the involvement of a large community is needed.

Mission

The SODI working group's mission is to foster the development and the adoption of software components that can, for a given origin, make it discoverable, and list its contents, both in its entirety, which is useful for a newly discovered origin, and incrementally, which is useful to keep the Software Heritage archive up to date with the origin's evolving content.

Duration

This working group is open ended.

Expected outcomes

The main expected outcomes of the SODI working group are listed below:

APIs for discovering and tracking origins The SODI working group will define and evolve, in collaboration with the Software Heritage core team, standard APIs for software components that can be plugged in the Software Heritage infrastructure to track a (class of) origin(s). Whenever possible, proactive mechanisms for informing the Software Heritage infrastructure of content updates, like event feeds, will be preferred to approaches that require periodic polling of an origin.

Adoption The SODI working group will strive, whenever possible, to have such components incorporated in the upstream code of the forges, and adopted widely.

Awareness The SODI working group will establish the relevant connections in order to raise awareness among all the interested parties.

Milestones

A first set of requirements for the API, and a preliminary draft of the API is expected to emerge from the work performed to ensure that Inria's own forge(s) are properly tracked in the Software Heritage architecture.

Related working groups

This working group is related to: Modeling and Ingesting Version control systems (MIV)

Team contact(s)

  • Roberto Di Cosmo
  • Stefano Zacchiroli

Documents

Documents produced by the working group will be listed in this section.

Connections

Active or planned connections to other initiatives, and activities, will be listed in this section.

Infrastructure

Mailing list