Difference between revisions of "Google Summer of Code 2019/Increase archive coverage"

From Software Heritage Wiki
Jump to navigation Jump to search
Line 59: Line 59:
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00009.html Week 23 Second Week (Coding)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00009.html Week 23 Second Week (Coding)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00016.html Week 24 Third Week  (Coding)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00016.html Week 24 Third Week  (Coding)]
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00026.html Week 25 Fourth Week (Coding)]
+
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00026.html Week 25 Fourth Week (Coding)(Work Summary)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00033.html Week 26 Fifth Week  (First Evaluation)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00033.html Week 26 Fifth Week  (First Evaluation)]
 
* July 2019
 
* July 2019
Line 70: Line 70:
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00004.html Week 32 Eleventh Week (Coding)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00004.html Week 32 Eleventh Week (Coding)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00008.html Week 33 Twelfth Week (Coding)]
 
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00008.html Week 33 Twelfth Week (Coding)]
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00008.html Week 34 Thirteenth Week (Final Evaluation)]
+
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00011.html Week 34 Thirteenth Week (Final Evaluation)]
  
 
== Links ==
 
== Links ==

Revision as of 17:58, 24 August 2019

Title:

Increase archive coverage

Description:

As Software Heritage works on archiving and sharing source code, one of the major tasks is to ingest the latest source code available in the database from time to time and from all the possible sources where you can fetch the source code using listers and ingest them using loaders. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, GitHub, Sourceforge, ...) and return a list of the software available in it whereas [Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model. The goal of this project is to increase the archive coverage by making listers and loaders for different websites that which stores source code, so that Software Heritage can fetch as much source code as possible and store it in the database to preserve it for future generations.

Student:

Archit Agrawal

Mentors:

  • Nicolas Dandrimont
  • Antoine R. Dumont

Work Done:

TO-DO:

  • Implement the Listers using the research done and the implementation plan made for Launchpad, Rubygem.
  • Find the workarounds to solve the challenges in making the Maven and NuGET(.NET) Lister.
  • Work on the remaining steps in order to complete the Base Package Manager Loader.

Learnings:

Working in Software Heritage was a wholesome experience. I got to learn a new thing almost every day. It would me injustice id I say I can account all my learnings in a section of a blog, however here are a list of few of most prominent once:

  • Working on a huge codebase
  • Plan and design before jumping to code
  • Writing clean and well-commented code
  • Difference between doing projects in college and in the industry(Spoiler Alert: A lot)
  • Multiple language integration in a python library (Used in CRAN Lister)
  • Different programming methodologies explained to me by my mentors(eg TDD)
  • Working with git and forge
  • Working with Docker

Activity reports:

Links