Difference between revisions of "Google Summer of Code 2019/Increase archive coverage"

From Software Heritage Wiki
Jump to: navigation, search
(Description:)
m (Learnings:)
Line 51: Line 51:
 
*Multiple language integration in a python library (Used in CRAN Lister)
 
*Multiple language integration in a python library (Used in CRAN Lister)
 
*Different programming methodologies explained to me by my mentors(eg [https://en.wikipedia.org/wiki/Test-driven_development TDD])
 
*Different programming methodologies explained to me by my mentors(eg [https://en.wikipedia.org/wiki/Test-driven_development TDD])
*Work with tools; DVCS (git), issue tracker (phabricator forge), docker
+
*Work with tools; DVCS (git), issue tracker (phabricator forge), containerization/virtualization (docker)
  
 
=== Activity reports:===
 
=== Activity reports:===

Revision as of 13:14, 26 August 2019

Title:

Increase archive coverage

Description:

The goal of this project is to increase the archive coverage by making listers and loaders for different forges. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, GitHub, Sourceforge, ...) and return a list of the software available in it. Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model.

Student:

Archit Agrawal

Mentors:

  • Nicolas Dandrimont
  • Antoine R. Dumont

Work Done:

TO-DO:

  • Implement the Listers using the research done and the implementation plan made for Launchpad, Rubygem.
  • Find the workarounds to solve the challenges in making the Maven and NuGET(.NET) Lister.
  • Work on the remaining steps in order to complete the Base Package Manager Loader.

Learnings:

Working in Software Heritage was a wholesome experience. I got to learn a new thing almost every day. Here is a few of the most prominent ones:

  • Work on a huge codebase
  • Plan and design before jumping to code
  • Write clean and well-commented code
  • Learn the difference between doing projects in college and in the industry(Spoiler Alert: A lot)
  • Multiple language integration in a python library (Used in CRAN Lister)
  • Different programming methodologies explained to me by my mentors(eg TDD)
  • Work with tools; DVCS (git), issue tracker (phabricator forge), containerization/virtualization (docker)

Activity reports:

Links