Google Summer of Code 2019

From Software Heritage Wiki
Revision as of 13:34, 5 February 2019 by Ardumont (talk | contribs) (Improve and extend the archive Web UI)
Jump to: navigation, search


General information

This page is the central point of information for Software Heritage participation into the Google Summer of Code program.

Google Summer of Code is a program where Google pays students stipends to work over the (northern hemisphere) summer on free software projects such as Software Heritage. Each student works with mentors from the community to complete a software project.

I want to participate as a student

Great!, we are very glad for your interest in contributing to Software Heritage and we are looking forward to work together.


The following prerequisites apply to Software Heritage GSoC projects:

  • Python 3 is our language of choice, you should be fluent with that language to apply
  • Git is our version control system of choice, you should be familiar with it to apply
  • additional prerequisites depend on the project you will work on; check project descriptions for details

Before you apply

Here are the steps you should follow before applying, to make sure you have a good grasp of what we are doing at Software Heritage and how we do it:

  1. Follow our getting started guide: it will make sure you can locally run a (small) copy of the archive and ingest source code into it
  2. Create an account our development forge
  3. Familiarize yourself with our code review workflow
  4. Make a simple change to any one of our software components and submit it as a diff for code review, following the above workflow. Easy hacks and Web UI issues are good options for what to fix, but feel free to submit any patch you think it might be useful.

What to include in your application

Make sure that your application includes the following information:

  • Describe the specific project you want to work on. What do you want to achieve? Why is it important? Why is it useful for Software Heritage? The project might be one of the project ideas that we have prepared, or something else entirely that you want to contribute to Software Heritage. Your source code archival pet peeve, surprise us!
  • Detail your work plan: a brief description of how you plan to go about your project, including a list of deliverables and a timeline of when do you expect them to be available.
  • Include a reference to the diff you submitted before applying (see the "Before you apply" section above).

Ideas list

Below you can find a list of project ideas that are good options for a reasonably sized GSoC project. They are just suggestion though, don't feel obliged to pick one of them if there is nothing that fits your taste and abilities. Feel free to propose something else that you are excited about and that contributes to improve the Software Heritage archive: we will be happy to consider it!

Increase archive coverage

Software Heritage aims to archive all the software. We naturally started with the place where most of the software is easily available today: git repositories on GitHub.

As Software Heritage grows, we're incrementally trying to increase the coverage of the archive by expanding the sources from which we archive software. We built ways of archiving things like mercurial repositories, Debian packages, pypi bundles, etc.

Expanding the coverage of the archive has two different components:

1. Create origin listers. Listers are pieces of code that crawl the APIs of Software Forges[1] (like Bitbucket, Gitorious, Sourceforge, NPM...) and return a list of the software available in it. The documentation on listers is here:

2. Create loaders. Loaders take a bundle of software (tarball, git repository, Python package, ...) and load it into Software Heritage, by adapting it so that it matches our uniform data model[2].

In a few words, a lister can be a way of asking "what are all the repositories available on", while a loader would be "how do I load the NPM package I downloaded into Software Heritage?".

Writing a lister or a loader is a great way to contribute to Software Heritage by expanding its coverage! We have a list of software sources we would like to archive here[3], but you're free to suggest more.

Mine information from archived content


Improve and extend the archive Web UI

In order to easily navigate into the archive content, a web application is currently developed. So far it offers the following main features:

There are still numerous improvements and new features to add to that web application, for instance:

  • improve overall design
  • improve navigation for mobile browsers
  • add new source code search criteria and improve the search interface
  • implement new developer oriented features: source file history, blame interface, ...
  • improve web application accessibility

If you are interested in web development and want to contribute to the application enabling users to navigate in the biggest public source code archive collected so far, feel free to apply.


GSoC students are encouraged to get in touch with the Software Heritage community using the standard development communication channels, i.e.:

See our development information page for more details.


See the official Google Summer of Code timeline.