WG/Distribution Replication and Query

From Software Heritage Wiki
Jump to navigation Jump to search



let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.
Thomas Jefferson, February 18, 1791

One of Software Heritage's main missions is to ensure that the source code assets will be preserved in the long term. There are a variety of threats to digital information, ranging from operational failures to physical accidents, from human error to malicious attacks, from technical obsolescence to legal incertainty.

The most promising approach to withstand all these challenges is to ensure we build the Software Heritage archive on an infrastructure that is distributed and replicated in all respects:

  • technically: a variety of different technology
  • geographically: on different continents
  • administratively: under different control structures
  • legally: under different legal systems

The main goal of the DIREQ working group is to monitor and evaluate existing and forecoming approaches to distributed resilient archival, and to develop and evolve an API allowing the Software Heritage network of peers to abstract from the particular technologies used to implement the storage backends and the metadata database. This API will need to address the issues related to reading and writing information on the storage backend, and also updating and querying the metadata database, which may be also distributed.


This working group is open ended.

Expected outcomes

The main expected outcomes are listed below.

A common API for the distributed object storage abstracting away the details of the different underlying technologies that may be adopted by the network of peers. The solution proposed must allow several distinct technologies to be in operation at the same time, monitoring the degree of replication achieved.

State-of-the-art approaches to distributed metadata databases allowing to implement queries and updates on the database holding all the Software Heritage metadata. The append-only nature of this database may have implications on the impact of the well known CAP issues for distributed databases.

Monitoring and evaluation of existing and forecoming approaches to distributed resilient archival and databases.

Awareness The DIREQ working group will establish the relevant connections in order to raise awareness among all the interested parties.


Team contact(s)


Documents produced by the working group will be listed in this section.


Active or planned connections to other initiatives, and activities will be listed in this section.


Mailing list