Make the software deposit service (swh-deposit) modular (GSoC task)

From Software Heritage Wiki
Revision as of 16:12, 10 February 2022 by Vlorentz (talk | contribs)
Jump to navigation Jump to search

Introduction

The Software Heritage archive is the most comprehensive open data knowledge base about source code that has been published openly.

In addition to fetching source code from public repositories, it offers to Deposit service, to allow platforms to send code for Software Heritage to archive.

This service is currently written as a monolith, that grew over years to include a complete SWORDv2 server, a partial SWORDv2 client with extensions, and business logic specific to Software Heritage in both. This makes the current code hard to maintain and impossible to reuse.

Task description

swh-deposit would need to be split into the following parts:

  • a generic SWORDv2 server (based on Django)
  • a generic SWORDv2 client
  • server-side business logic (currently implemented mostly in swh/deposit/api/common.py, but is tightly coupled with the views)
  • client-side business logic

The generic server and client will need to be extensively documented, so they can be reused by other software projects.

Possible extensions include:

  • The code should also be designed to allow extensions to support SWORDv3, if we ever need to support it
  • A new administration front-end and/or addition of administrative tools in swh-web

Desirable skills

  • Python 3 and Git are a must to work on any Software Heritage project
  • Basic understanding of the Software Heritage data model and of SWHID identifiers
  • Experience with Django

Potential mentors

  • Antoine Dumont <ardumont@softwareheritage.org> (ardumont on IRC)
  • Valentin Lorentz <vlorentz@softwareheritage.org> (vlorentz on IRC)