Difference between revisions of "WG/Source Discovery and Ingestion"
(9 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | = Charter = | |
− | + | Collecting the source code that is publicly available is an essential part of Software Heritage's mission. | |
− | + | In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse | |
+ | set of possible origins like: | ||
− | + | * several kinds of [https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities source code repositories], including: | |
− | * | + | ** mainstream development platforms, like GitHub, BitBucket, Sourceforge or Codeplex; |
− | ** | + | ** institutional forges, like Inria's, Cenatic's or Adullact's FusionForge; |
− | ** | + | ** community repositories like Debian's FusionForge or Joomla's gForge, Gnu's [http://savannah.gnu.org/ Savane], and the [http://www.apache.org/dyn/closer.cgi/ Apache] or [https://git.eclipse.org/c/ Eclipse] custom repositories; |
− | + | ** a variety of different source code archives, ranging from the [ftp://ftp.gnu.org Gnu ftp server] to individual web pages | |
− | + | This is a challenging task, and in order to succeed, the involvement of a large community is needed. | |
− | The working group | + | == Mission == |
− | + | The SODI working group's mission is to foster ''the development'' and | |
+ | ''the adoption'' of software components that can, for a given origin, make it | ||
+ | ''discoverable'', and ''list'' its contents, both ''in its entirety'', which is | ||
+ | useful for a newly discovered origin, and ''incrementally'', which is useful | ||
+ | to keep the Software Heritage archive up to date with the origin's evolving | ||
+ | content. | ||
− | == | + | == Duration == |
+ | This working group is open ended. | ||
− | == | + | == Expected outcomes == |
+ | The main expected outcomes of the SODI working group are listed below: | ||
− | + | ''APIs for discovering and tracking origins'' | |
+ | The SODI working group will define and evolve, in collaboration with the Software Heritage core team, standard APIs for software components that can be plugged in the Software Heritage infrastructure to track a (class of) origin(s). Whenever possible, proactive mechanisms for informing the Software Heritage infrastructure of content updates, like event feeds, will be preferred to approaches that require periodic polling of an origin. | ||
+ | ''Adoption'' | ||
+ | The SODI working group will strive, whenever possible, to have such components incorporated in the upstream code of the forges, and adopted widely. | ||
+ | ''Awareness'' | ||
+ | The SODI working group will establish the relevant connections in order to raise awareness among all the interested parties. | ||
+ | == Milestones == | ||
+ | A first set of requirements for the API, and a preliminary draft of the API is | ||
+ | expected to emerge from the work performed to ensure that Inria's own forge(s) | ||
+ | are properly tracked in the Software Heritage architecture. | ||
+ | == Related working groups == | ||
+ | This working group is related to: [[Modeling_and_Ingesting_Version_control_systems | Modeling and Ingesting Version control systems (MIV)]] | ||
− | [ | + | = Team contact(s) = |
− | [[Category: | + | * [http://upsilon.cc/~zack/ Stefano Zacchiroli] |
+ | |||
+ | = Documents = | ||
+ | Documents produced by the working group will be listed in this section. | ||
+ | |||
+ | = Connections = | ||
+ | Active or planned connections to other initiatives, and activities, will be listed in this section. | ||
+ | |||
+ | = Infrastructure = | ||
+ | == Mailing list == | ||
+ | * https://sympa.inria.fr/sympa/info/sodi-wg-swh | ||
+ | |||
+ | |||
+ | [[Category:Working group]] |
Latest revision as of 13:43, 31 July 2016
Charter
Collecting the source code that is publicly available is an essential part of Software Heritage's mission. In order to fulfill this mission, we will need to discover, harvest, and keep up to date, content coming from a very diverse set of possible origins like:
- several kinds of source code repositories, including:
- mainstream development platforms, like GitHub, BitBucket, Sourceforge or Codeplex;
- institutional forges, like Inria's, Cenatic's or Adullact's FusionForge;
- community repositories like Debian's FusionForge or Joomla's gForge, Gnu's Savane, and the Apache or Eclipse custom repositories;
- a variety of different source code archives, ranging from the Gnu ftp server to individual web pages
This is a challenging task, and in order to succeed, the involvement of a large community is needed.
Mission
The SODI working group's mission is to foster the development and the adoption of software components that can, for a given origin, make it discoverable, and list its contents, both in its entirety, which is useful for a newly discovered origin, and incrementally, which is useful to keep the Software Heritage archive up to date with the origin's evolving content.
Duration
This working group is open ended.
Expected outcomes
The main expected outcomes of the SODI working group are listed below:
APIs for discovering and tracking origins The SODI working group will define and evolve, in collaboration with the Software Heritage core team, standard APIs for software components that can be plugged in the Software Heritage infrastructure to track a (class of) origin(s). Whenever possible, proactive mechanisms for informing the Software Heritage infrastructure of content updates, like event feeds, will be preferred to approaches that require periodic polling of an origin.
Adoption The SODI working group will strive, whenever possible, to have such components incorporated in the upstream code of the forges, and adopted widely.
Awareness The SODI working group will establish the relevant connections in order to raise awareness among all the interested parties.
Milestones
A first set of requirements for the API, and a preliminary draft of the API is expected to emerge from the work performed to ensure that Inria's own forge(s) are properly tracked in the Software Heritage architecture.
Related working groups
This working group is related to: Modeling and Ingesting Version control systems (MIV)
Team contact(s)
Documents
Documents produced by the working group will be listed in this section.
Connections
Active or planned connections to other initiatives, and activities, will be listed in this section.