Difference between revisions of "WG/Source Discovery and Ingestion"

From Software Heritage Wiki
Jump to navigation Jump to search
(First version of the complete description of the SODI working group, including charter and expected outcomes)
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Working group on Source Discovery and Ingestion =
+
= Charter =
== Charter ==
+
 
 
Collecting the source code that is publicly available is an essential part of Software Heritage's mission.
 
Collecting the source code that is publicly available is an essential part of Software Heritage's mission.
 
In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse
 
In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse
Line 11: Line 11:
 
** a variety of different source code archives, ranging from the [ftp://ftp.gnu.org Gnu ftp server] to individual web pages
 
** a variety of different source code archives, ranging from the [ftp://ftp.gnu.org Gnu ftp server] to individual web pages
  
This is a challenging task, and in order to suceed, the involvement of a large community is needed.
+
This is a challenging task, and in order to succeed, the involvement of a large community is needed.
  
=== Mission ===
+
== Mission ==
 
The SODI working group's mission is to foster ''the development'' and
 
The SODI working group's mission is to foster ''the development'' and
 
''the adoption'' of software components that can, for a given origin, make it
 
''the adoption'' of software components that can, for a given origin, make it
Line 21: Line 21:
 
content.
 
content.
  
=== Duration ===
+
== Duration ==
 
This working group is open ended.
 
This working group is open ended.
  
=== Expected outcomes ===
+
== Expected outcomes ==
 
The main expected outcomes of the SODI working group are listed below:
 
The main expected outcomes of the SODI working group are listed below:
  
Line 36: Line 36:
 
The SODI working group will establish the relevant connections in order to  raise awareness among all the interested parties.
 
The SODI working group will establish the relevant connections in order to  raise awareness among all the interested parties.
  
=== Milestones ===
+
== Milestones ==
 
A first set of requirements for the API, and a preliminary draft of the API is
 
A first set of requirements for the API, and a preliminary draft of the API is
 
expected to emerge from the work performed to ensure that Inria's own forge(s)
 
expected to emerge from the work performed to ensure that Inria's own forge(s)
 
are properly tracked in the Software Heritage architecture.
 
are properly tracked in the Software Heritage architecture.
  
=== Related working groups ===
+
== Related working groups ==
 
This working group is related to: [[Modeling_and_Ingesting_Version_control_systems | Modeling and Ingesting Version control systems (MIV)]]
 
This working group is related to: [[Modeling_and_Ingesting_Version_control_systems | Modeling and Ingesting Version control systems (MIV)]]
  
== Team contact(s) ==
+
= Team contact(s) =
* Roberto Di Cosmo
+
* [http://upsilon.cc/~zack/ Stefano Zacchiroli]
 
 
* Stefano Zacchiroli
 
  
== Documents ==
+
= Documents =
 
Documents produced by the working group will be listed in this section.
 
Documents produced by the working group will be listed in this section.
  
== Connections ==
+
= Connections =
 
Active or planned connections to other initiatives, and activities, will be listed in this section.
 
Active or planned connections to other initiatives, and activities, will be listed in this section.
  
== Infrastructure ==
+
= Infrastructure =
=== Mailing list ===
+
== Mailing list ==
 
* https://sympa.inria.fr/sympa/info/sodi-wg-swh
 
* https://sympa.inria.fr/sympa/info/sodi-wg-swh
  
  
[[Category: technical]]
+
[[Category:Working group]]
[[Category: ingestion]]
 

Latest revision as of 13:43, 31 July 2016

Charter

Collecting the source code that is publicly available is an essential part of Software Heritage's mission. In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse set of possible origins like:

  • several kinds of source code repositories, including:
    • mainstream development platforms, like GitHub, BitBucket, Sourceforge or Codeplex;
    • institutional forges, like Inria's, Cenatic's or Adullact's FusionForge;
    • community repositories like Debian's FusionForge or Joomla's gForge, Gnu's Savane, and the Apache or Eclipse custom repositories;
    • a variety of different source code archives, ranging from the Gnu ftp server to individual web pages

This is a challenging task, and in order to succeed, the involvement of a large community is needed.

Mission

The SODI working group's mission is to foster the development and the adoption of software components that can, for a given origin, make it discoverable, and list its contents, both in its entirety, which is useful for a newly discovered origin, and incrementally, which is useful to keep the Software Heritage archive up to date with the origin's evolving content.

Duration

This working group is open ended.

Expected outcomes

The main expected outcomes of the SODI working group are listed below:

APIs for discovering and tracking origins The SODI working group will define and evolve, in collaboration with the Software Heritage core team, standard APIs for software components that can be plugged in the Software Heritage infrastructure to track a (class of) origin(s). Whenever possible, proactive mechanisms for informing the Software Heritage infrastructure of content updates, like event feeds, will be preferred to approaches that require periodic polling of an origin.

Adoption The SODI working group will strive, whenever possible, to have such components incorporated in the upstream code of the forges, and adopted widely.

Awareness The SODI working group will establish the relevant connections in order to raise awareness among all the interested parties.

Milestones

A first set of requirements for the API, and a preliminary draft of the API is expected to emerge from the work performed to ensure that Inria's own forge(s) are properly tracked in the Software Heritage architecture.

Related working groups

This working group is related to: Modeling and Ingesting Version control systems (MIV)

Team contact(s)

Documents

Documents produced by the working group will be listed in this section.

Connections

Active or planned connections to other initiatives, and activities, will be listed in this section.

Infrastructure

Mailing list