Difference between revisions of "Google Summer of Code 2019/Increase archive coverage"
Jump to navigation
Jump to search
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ===Title:=== | |
− | + | '''Increase archive coverage''' | |
− | + | ||
− | + | === Description:=== | |
− | * | + | |
− | + | ||
− | + | The goal of this project is to increase the archive coverage by making listers and loaders for different forges. | |
− | * ''' | + | [https://docs.softwareheritage.org/devel/swh-lister/index.html#swh-lister Listers] are components that crawl the APIs of software forges (e.g., Bitbucket, GitHub, Sourceforge, ...) and return a list of the software available in it. Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model. |
− | * ''' | + | |
− | *''' | + | ===Student: === |
− | * ''' | + | Archit Agrawal |
− | * | + | * [https://forge.softwareheritage.org/p/nahimilega/ Forge activity] |
− | + | * [https://wiki.softwareheritage.org/wiki/Google_Summer_of_Code_2019/Increase_archive_coverage/Commit_list List Of Commits] | |
+ | |||
+ | === Mentors:=== | ||
+ | * Nicolas Dandrimont | ||
+ | * Antoine R. Dumont | ||
+ | |||
+ | ===Work Done:=== | ||
+ | * '''Listers:''' | ||
+ | ** Completed and merged | ||
+ | *** [https://forge.softwareheritage.org/rDLSfedfd73c8e4be8ce1d08b31c9a5cb99f9ca40fd6 Phabricator Lister] | ||
+ | *** [https://forge.softwareheritage.org/D1482 GNU Lister] | ||
+ | *** [https://forge.softwareheritage.org/rDLSa9a37a85bf9efac416cfdd152588bf01b7a063b2 CRAN Lister] | ||
+ | *** [https://forge.softwareheritage.org/D1584 Packagist Lister] | ||
+ | *** [https://forge.softwareheritage.org/D1610 CGit Lister] | ||
+ | ** Did research on the methods that could be used to make following listers and made an implementation plan for the same | ||
+ | *** [https://forge.softwareheritage.org/T1734 Launchpad Lister] | ||
+ | *** [https://forge.softwareheritage.org/T1777 Rubygem Lister] | ||
+ | *** [https://forge.softwareheritage.org/T1718 NuGET(.NET) Lister] | ||
+ | *** [https://forge.softwareheritage.org/T1724 Maven Lister] | ||
+ | ** [https://forge.softwareheritage.org/rDLS08ade29e6de0616a3964360454ab52b58c082b75 Add tests to PyPI Lister] | ||
+ | ** [https://forge.softwareheritage.org/rDLSf424f07c7e628eb7a19d25f4fdb749682d97a21f Refactor base tests for listers] | ||
+ | ** [https://forge.softwareheritage.org/D1441 Add documentation on *How to run a new lister*] | ||
+ | * '''Loaders:''' | ||
+ | ** '''[https://forge.softwareheritage.org/T1389 Base Package Manager Loader]''' | ||
+ | *** Ingesting source code from package managers is a process somewhat similar for all of the package managers. This calls for a common base implementation for loading content from package managers into the archive. I worked on this idea, analysed the steps required to make a loader and the implementation of present package manager loader. Came up with the plan to implement the base loader and made the pass([https://forge.softwareheritage.org/D1694 D1694], [https://forge.softwareheritage.org/D1810 D1810], [https://forge.softwareheritage.org/D1811 D1811], [https://forge.softwareheritage.org/D1812 D1812], [https://forge.softwareheritage.org/D1813 D1813], [https://forge.softwareheritage.org/D1814 D1814], [https://forge.softwareheritage.org/D1744 D1744]). However, after the recommendation from my mentor, we changed the approach to make the base loader. Instead of making the whole base loader in one go, we decided to break it into multiple steps(3 steps) and follow the incremental approach. | ||
+ | **'''[https://forge.softwareheritage.org/D1824 GNU Loader]''' | ||
+ | *** As part of the first step towards the implementation of Base Loader, GNU Loader was implemented. | ||
+ | |||
+ | ===TO-DO:=== | ||
+ | * Implement the Listers using the research done and the implementation plan made for Launchpad, Rubygem. | ||
+ | * Find the workarounds to solve the challenges in making the Maven and NuGET(.NET) Lister. | ||
+ | * Work on the remaining steps in order to complete the Base Package Manager Loader. | ||
+ | |||
+ | === Learnings: === | ||
+ | Working in Software Heritage was a wholesome experience. I got to learn a new thing almost every day. Here is a few of the most prominent ones: | ||
+ | *Work on a huge codebase | ||
+ | *Plan and design before jumping to code | ||
+ | *Write clean and well-commented code | ||
+ | *Learn the difference between doing projects in college and in the industry(Spoiler Alert: '''A lot''') | ||
+ | *Multiple language integration in a python library (Used in CRAN Lister) | ||
+ | *Different programming methodologies explained to me by my mentors(eg [https://en.wikipedia.org/wiki/Test-driven_development TDD]) | ||
+ | *Work with tools; DVCS (git), issue tracker (phabricator forge), containerization/virtualization (docker) | ||
+ | |||
+ | === Activity reports:=== | ||
* May 2019 | * May 2019 | ||
− | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00003.html Week 20 Second Week(Community Bonding)] | + | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00003.html Week 20 Second Week (Community Bonding)] |
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00010.html Week 21 Third Week (Community Bonding)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00010.html Week 21 Third Week (Community Bonding)] | ||
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00017.html Week 22 First Week (Coding)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-05/msg00017.html Week 22 First Week (Coding)] | ||
Line 19: | Line 62: | ||
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00009.html Week 23 Second Week (Coding)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00009.html Week 23 Second Week (Coding)] | ||
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00016.html Week 24 Third Week (Coding)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00016.html Week 24 Third Week (Coding)] | ||
− | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00026.html Week 25 Fourth Week (Coding)] | + | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00026.html Week 25 Fourth Week (Coding)(Work Summary)] |
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00033.html Week 26 Fifth Week (First Evaluation)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-06/msg00033.html Week 26 Fifth Week (First Evaluation)] | ||
* July 2019 | * July 2019 | ||
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-07/msg00003.html Week 27 Sixth Week (Coding)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-07/msg00003.html Week 27 Sixth Week (Coding)] | ||
** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-07/msg00006.html Week 28 Seventh Week (Coding)] | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-07/msg00006.html Week 28 Seventh Week (Coding)] | ||
+ | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-07/msg00011.html Week 29 Eight Week (Coding)(Work Summary)] | ||
+ | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-07/msg00015.html Week 30 Nineth Week (Second Evaluation)] | ||
+ | * August 2019 | ||
+ | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00002.html Week 31 Tenth Week (Coding)] | ||
+ | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00004.html Week 32 Eleventh Week (Coding)] | ||
+ | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00008.html Week 33 Twelfth Week (Coding)] | ||
+ | ** [https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00011.html Week 34 Thirteenth Week (Final Evaluation)] | ||
== Links == | == Links == | ||
− | |||
* [https://forge.softwareheritage.org/source/swh-lister/ Lister source code repository] | * [https://forge.softwareheritage.org/source/swh-lister/ Lister source code repository] | ||
− | * [https://forge.softwareheritage.org/source/swh-loader-core/ | + | * [https://forge.softwareheritage.org/source/swh-loader-core/ Loader source code repository] |
* see project [https://summerofcode.withgoogle.com/projects/#5658995887439872 on the GSoC portal] | * see project [https://summerofcode.withgoogle.com/projects/#5658995887439872 on the GSoC portal] | ||
Latest revision as of 21:46, 28 August 2019
Title:
Increase archive coverage
Description:
The goal of this project is to increase the archive coverage by making listers and loaders for different forges. Listers are components that crawl the APIs of software forges (e.g., Bitbucket, GitHub, Sourceforge, ...) and return a list of the software available in it. Loaders take a bundle of software (tarball, Git repository ...) and load it into Software Heritage, by adapting it so that it matches the archive data model.
Student:
Archit Agrawal
Mentors:
- Nicolas Dandrimont
- Antoine R. Dumont
Work Done:
- Listers:
- Completed and merged
- Did research on the methods that could be used to make following listers and made an implementation plan for the same
- Add tests to PyPI Lister
- Refactor base tests for listers
- Add documentation on *How to run a new lister*
- Loaders:
- Base Package Manager Loader
- Ingesting source code from package managers is a process somewhat similar for all of the package managers. This calls for a common base implementation for loading content from package managers into the archive. I worked on this idea, analysed the steps required to make a loader and the implementation of present package manager loader. Came up with the plan to implement the base loader and made the pass(D1694, D1810, D1811, D1812, D1813, D1814, D1744). However, after the recommendation from my mentor, we changed the approach to make the base loader. Instead of making the whole base loader in one go, we decided to break it into multiple steps(3 steps) and follow the incremental approach.
- GNU Loader
- As part of the first step towards the implementation of Base Loader, GNU Loader was implemented.
- Base Package Manager Loader
TO-DO:
- Implement the Listers using the research done and the implementation plan made for Launchpad, Rubygem.
- Find the workarounds to solve the challenges in making the Maven and NuGET(.NET) Lister.
- Work on the remaining steps in order to complete the Base Package Manager Loader.
Learnings:
Working in Software Heritage was a wholesome experience. I got to learn a new thing almost every day. Here is a few of the most prominent ones:
- Work on a huge codebase
- Plan and design before jumping to code
- Write clean and well-commented code
- Learn the difference between doing projects in college and in the industry(Spoiler Alert: A lot)
- Multiple language integration in a python library (Used in CRAN Lister)
- Different programming methodologies explained to me by my mentors(eg TDD)
- Work with tools; DVCS (git), issue tracker (phabricator forge), containerization/virtualization (docker)
Activity reports:
- May 2019
- June 2019
- July 2019
- August 2019