Make the software deposit service (swh-deposit) modular (GSoC task)

From Software Heritage Wiki
Revision as of 15:48, 10 February 2022 by Vlorentz (talk | contribs) (Created page with "== Introduction == The Software Heritage archive is the most comprehensive open data knowledge base about source code that has been published openly. In addition to fetching...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

The Software Heritage archive is the most comprehensive open data knowledge base about source code that has been published openly.

In addition to fetching source code from public repositories, it offers to Deposit service, to allow platforms to send code for Software Heritage to archive.

This service is currently written as a monolith, that grew over years to include a complete SWORDv2 server, a partial SWORDv2 client with extensions, and business logic specific to Software Heritage in both. This makes the current code hard to maintain and impossible to reuse.

Task description

swh-deposit would need to be split into the following parts:

  • a generic SWORDv2 server (based on Django)
  • a generic SWORDv2 client
  • server-side business logic
  • client-side business logic

The generic server and client will need to be extensively documented, so they can be reused by other software projects.

Stretch goals include:

  • The code should also be designed to allow extensions to support SWORDv3, if we ever need to support it
  • A new administration front-end and/or addition of administrative tools in swh-web

Desirable skills

  • Python 3 and Git are a must to work on any Software Heritage project
  • Basic understanding of the Software Heritage data model and of SWHID identifiers
  • Experience with Django

Potential mentors

  • Stefano Zacchiroli <zack@upsilon.cc> (zack on IRC)
  • Valentin Lorentz <vlorentz@softwareheritage.org> (vlorentz on IRC)