Difference between revisions of "Google Summer of Code 2019/Increase archive coverage"

From Software Heritage Wiki
Jump to navigation Jump to search
Line 15: Line 15:
 
**** [https://forge.softwareheritage.org/D1584 Packagist Lister]
 
**** [https://forge.softwareheritage.org/D1584 Packagist Lister]
 
**** [https://forge.softwareheritage.org/D1610 CGit Lister]
 
**** [https://forge.softwareheritage.org/D1610 CGit Lister]
***  [https://forge.softwareheritage.org/rDLS08ade29e6de0616a3964360454ab52b58c082b75 Add tests to PyPI Lister]
+
*** Did research on the methods that could be used to make following listers and made an implementation plan for the same
*** [https://forge.softwareheritage.org/rDLSf424f07c7e628eb7a19d25f4fdb749682d97a21f Refractor base tests for listers]
 
***[https://forge.softwareheritage.org/D1441 Add documentation on *How to run a new lister*]
 
*** Did research on the methods that could be used to make following listers and made an implementation plan for the same:
 
 
**** [https://forge.softwareheritage.org/T1734 Launchpad Lister]
 
**** [https://forge.softwareheritage.org/T1734 Launchpad Lister]
 
**** [https://forge.softwareheritage.org/T1777 Rubygem Lister]
 
**** [https://forge.softwareheritage.org/T1777 Rubygem Lister]
 
**** [https://forge.softwareheritage.org/T1718 NuGET(.NET) Lister]
 
**** [https://forge.softwareheritage.org/T1718 NuGET(.NET) Lister]
 
**** [https://forge.softwareheritage.org/T1724 Maven Lister]
 
**** [https://forge.softwareheritage.org/T1724 Maven Lister]
 +
***  [https://forge.softwareheritage.org/rDLS08ade29e6de0616a3964360454ab52b58c082b75 Add tests to PyPI Lister]
 +
*** [https://forge.softwareheritage.org/rDLSf424f07c7e628eb7a19d25f4fdb749682d97a21f Refractor base tests for listers]
 +
***[https://forge.softwareheritage.org/D1441 Add documentation on *How to run a new lister*]
 
** '''Loaders:'''
 
** '''Loaders:'''
 
*** Base Loader
 
*** Base Loader
 
*** GNU Loader
 
*** GNU Loader
  
*'''Challenges:'''
 
 
* '''Learnings:'''
 
* '''Learnings:'''
 +
**How to work on a huge codebase
 +
**How to plan and design before jumping to code
 +
**Writing clean and well-commented code
 +
**How are projects are made in the industry different from making projects in college(Spoiler Alert: '''A lot''')
 +
**Multiple language integration in a python library (Used in CRAN Lister)
 +
**Different programming methodologies explained to me by my mentors(eg [https://en.wikipedia.org/wiki/Test-driven_development TDD])
 +
**Working on git and forge
 +
**Docker
 +
 
* '''TO-DO:'''
 
* '''TO-DO:'''
 
* '''Activity reports:'''
 
* '''Activity reports:'''
Line 55: Line 63:
  
 
* [https://forge.softwareheritage.org/source/swh-lister/    Lister source code repository]
 
* [https://forge.softwareheritage.org/source/swh-lister/    Lister source code repository]
* [https://forge.softwareheritage.org/source/swh-loader-core//    Loader source code repository]
+
* [https://forge.softwareheritage.org/source/swh-loader-core/   Loader source code repository]
 
* see project [https://summerofcode.withgoogle.com/projects/#5658995887439872 on the GSoC portal]
 
* see project [https://summerofcode.withgoogle.com/projects/#5658995887439872 on the GSoC portal]
  

Revision as of 17:58, 21 August 2019

  • Title: Increase archive coverage
  • Description: As Software Heritage works on archiving and sharing source code, one of the major tasks is to ingest the latest source code available in the database from time to time and from all the possible sources where you can fetch the source code using listers and ingest them using loaders. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, Gitorious, Sourceforge, ...) and return a list of the software available in it whereas Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model. The goal of this project is to increase the archive coverage by making listers and loaders for different websites that which stores source code, so that Software Heritage can fetch as much source code as possible and store it in the database to preserve it for future generations.
  • Student: Archit Agrawal
  • Mentors:
    • Nicolas Dandrimont
    • Antoine R. Dumont
  • Experience:
  • What I did:
  • Learnings:
    • How to work on a huge codebase
    • How to plan and design before jumping to code
    • Writing clean and well-commented code
    • How are projects are made in the industry different from making projects in college(Spoiler Alert: A lot)
    • Multiple language integration in a python library (Used in CRAN Lister)
    • Different programming methodologies explained to me by my mentors(eg TDD)
    • Working on git and forge
    • Docker


Links