Google Summer of Code 2021
General information
This page is the central point of information for Software Heritage participation into the Google Summer of Code program in 2021.
Google Summer of Code is a program where Google pays students stipends to work over the (northern hemisphere) summer on free software projects such as Software Heritage. Each student works with mentors from the community to complete a software project.
I want to participate as a student
Great!, we are very glad for your interest in contributing to Software Heritage and we are looking forward to work together.
Prerequisites
The following prerequisites apply to all Software Heritage GSoC projects:
- Python 3 is our language of choice, you should be fluent with that language to apply
- Git is our version control system of choice, you should be familiar with it to apply
- basic knowledge in using a CLI
- additional prerequisites depend on the project you will work on; check project descriptions for details
Before you apply
Here are the steps you should follow before applying, to make sure you have a good grasp of what we are doing at Software Heritage and how we do it:
- Follow our developer setup tutorial: it will make sure you have the source code of our software stack locally available and that you can run unit tests
- Create an account on our development forge
- Familiarize yourself with our code review workflow
- Make at least one simple change to any one of our software components and submit it as a diff for code review, following the above workflow. Easy hacks and Web UI issues are good options for what to fix, but feel free to submit any patch you think it might be useful.
What to include in your application
Make sure that your application includes the following information:
- Describe the specific project you want to work on. What do you want to achieve? Why is it important? Why is it useful for Software Heritage? The project might be one of the project ideas that we have prepared below, or something else entirely that you want to contribute to Software Heritage. Your source code archival pet peeve, surprise us!
- Detail your work plan: a brief description of how you plan to go about your project, including a list of deliverables and a timeline of when do you expect them to be available.
- Include a reference to the diff you submitted before applying (see the "Before you apply" section above).
Ideas list
Below you can find a list of project ideas that are good options for a reasonably-sized GSoC project:
- Add sources to the project search engine (GSoC task)
- Create a browser extension (GSoC task)
- Create embeddable widgets (GSoC task)
- Dashboard UI for the Code Scanner (GSoC task)
- Improve and extend the archive Web UI (GSoC task)
- Improve project search engine (GSoC task)
- Improve the Code Scanner (GSoC task)
- Improve the Vault (GSoC task)
- Improving the scheduler (GSoC task)
- Make the software deposit service (swh-deposit) modular (GSoC task)
- Mine information from archived content (GSoC task)
- Mine information from external sources (GSoC task)
We also maintain the following list of Internships.
They are usually reserved to on-site university students, but during GSoC they are also available as GSoC project ideas:
- Expand package metadata coverage (internship)
- Git remote support for Software Heritage (internship)
- Graph query language for the archive (internship)
- Ingest all Debian derivatives (internship)
- Ingest Wikidata software origins (internship)
- Integrate Software Heritage and ClearlyDefined (internship)
- Language and infrastructure for analyzing the archive (internship)
- Large-scale license text recognition (internship)
- Reverse project phylogenesis (internship)
- Software Heritage in a bottle - local repository mining toolchain (internship)
- Source code search engine prototype (internship)
Both GSoC tasks and internship topics are just suggestion though, don't feel
obliged to pick one of them if there is nothing that fits your taste and
abilities. Feel free to propose something else that you are excited about and
that contributes to improve the Software Heritage archive: we will be happy to
consider it!
Contact
GSoC students are encouraged to get in touch with the Software Heritage community using the standard development communication channels, and in particular our IRC channel (#swh-devel on FreeNode) and mailing list (swh-devel).
See our development information page for details.
Timeline
See the official Google Summer of Code timeline.