Difference between revisions of "Google Summer of Code 2019/Increase archive coverage"

From Software Heritage Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
* '''Title:''' Increase archive coverage
 
* '''Title:''' Increase archive coverage
* '''Description:''' As Software Heritage works on archiving and sharing source code. One of the major tasks is to ingest the latest source code available in the database from time to time and from all the possible sources where you can fetch the source code using listers and ingest them using loaders. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, Gitorious, Sourceforge, ...) and return a list of the software available in it whereas Loaders take a bundle of software (tarball, Git repository, Python package, ...) and load it into Software Heritage, by adapting it so that it matches the archive data model.
+
* '''Description:''' As Software Heritage works on archiving and sharing source code, one of the major tasks is to ingest the latest source code available in the database from time to time and from all the possible sources where you can fetch the source code using listers and ingest them using loaders. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, Gitorious, Sourceforge, ...) and return a list of the software available in it whereas Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model. The goal of this project is to increase the archive coverage by making listers and loaders for different websites that which stores source code, so that Software Heritage can fetch as much source code as possible and store it in the database to preserve it for future generations.
 
 
So the goal of this project is to make specialised softwares which are called Listers and Loaders
 
The goal of this  to increase the archive coverage by making listers and loaders for different websites that which stores source code, so that Software Heritage can fetch as much source code as possible and store it in the database to preserve it for future generations
 
 
* '''Student:''' Archit Agrawal
 
* '''Student:''' Archit Agrawal
 
** [https://forge.softwareheritage.org/p/nahimilega/ forge activity]
 
** [https://forge.softwareheritage.org/p/nahimilega/ forge activity]
Line 15: Line 12:
  
 
* [http://example.com source code repository]
 
* [http://example.com source code repository]
* see project [http://example.com on the GSoC portal]
+
* see project [https://summerofcode.withgoogle.com/projects/#5658995887439872 on the GSoC portal]
  
  
 
[[Category:Google Summer of Code]]
 
[[Category:Google Summer of Code]]
 
[[Category:Google Summer of Code 2019]]
 
[[Category:Google Summer of Code 2019]]

Revision as of 15:03, 15 May 2019

  • Title: Increase archive coverage
  • Description: As Software Heritage works on archiving and sharing source code, one of the major tasks is to ingest the latest source code available in the database from time to time and from all the possible sources where you can fetch the source code using listers and ingest them using loaders. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, Gitorious, Sourceforge, ...) and return a list of the software available in it whereas Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model. The goal of this project is to increase the archive coverage by making listers and loaders for different websites that which stores source code, so that Software Heritage can fetch as much source code as possible and store it in the database to preserve it for future generations.
  • Student: Archit Agrawal
  • Mentors:
    • Nicolas Dandrimont
    • Antoine R. Dumont
  • Activity reports:

Links