Suggestion box: source code to add
The Archive is growing organically. We started "small", tracking 3 software origins (GitHub + Debian + GNU), and we will be adding new origins bit by bit, depending on the urgency of archiving them and available development energies to integrate them into Software Heritage.
Using this page you can add suggestions of software origins that we aren't following yet, but we should. You can include information about who to contact for technical collaboration, the urgency of archival, and other useful information. To that end, just add a row to the table below. Here some information about the meaning of the various columns.
Entries are currently listed simply in order of addition to this page; we will add more structure when the list will start growing.
Legend
- Software origin
- any (public accessible) "place" on the Internet that host software in source code form. Please provide a title for it and hyperlink it to the relevant URL
- Type of origin
- information about the kind of hosting, e.g., whether it is a forge, a collection of repositories, an homepage publishing tarball, or a one shot source code repository. For all kind of repositories please specify which VCS system is in use (Git, SVN, CVS, etc.)
- Contact
- who to contact for technical collaboration on how to best archive source code hosted on the software origin. You can list yourself if you're the relevant person, or provide the most relevant contact point if you know it
- Conservation status
- information about how likely it is that the software origin will disappear; high likelihood will make it more urgent for us to archive software hosted there. We suggest to use the species conservation status, i.e., one of: Critically endangered (CR), Endangered (EN), Vulnerable (VU), Near threatened (NT), Least concern (LC).
- How to mirror
- (pointers to) technical information on how to do a full mirror of all the source code available at the software origin, ideally one shot and in batch
- How to keep up
- (pointers to) technical information on how to incrementally retrieve new source code accumulated since the last visit; usually this should be based on some kind of incremental change feed or event API
- Notes
- anything else you think we should know about this software origin
Suggestions
Software origin | Type of origin | Contact | Conservation status | How to mirror | How to keep up | Notes |
---|---|---|---|---|---|---|
(sample entry) GitHubBub forge | Git hosting | John Doe <john@example.com> | LC | retrieve full repo list at /api/list, then git clone on each entry | poll RSS feed at /api/updates?since=YYYY-MM-DD | nothing special to add |
Gérard Huet's seminal work on 3D | Scanned source code | Gérard Huet <gerard.huet@inria.fr> | EN | retrieve listing images from the web pages | N/A | links are half broken, yquem should be replaced with pauillac everywhere it appears |
Gentoo | Johannes Kellner <gentoo@johannes-kellner.eu> | LC | ||||
Software Preservation Project | Website with a collection of archives | Paul McJones <paul@mcjones.org> | LC | |||
253 NASA open source software projects | LC | |||||
Smaky, the swiss micro-computer series | Partial code dump | Pierre Arnaud (current CEO of Epsitec) and/or Jean-Daniel Nicoud (founder of the computer series] | EN | Probably manually | No new updates | Some references to this history: Fondation Mémoires Informatiques, Smaky.ch (in particular, the short history |
all Debian derivatives | Debian-based distros | Paul Wise <pabs@debian.org> | varying, depending on the distro | see details | see details | |
SourceForge | CVS, SVN, Mercurial, Git | VU | ||||
wikidata endangered software | depends on the source code repository property | Loic Dachary <loic@dachary.org> | The risk is higher than LC | A script should obtain the source code repository property for the software and mirror it depending on the protocol qualifier. If the source code repository is no value, the streaming media URL of the preferred software version should be downloaded instead. | Once a copy is secured by software heritage, a URL to the software heritage repository should be added to the source code repository property and the conservation status property should be removed, meaning it is least concerned by default. The software will no longer show in the list of endangered software. | This is work in progress, part of the wikidata FLOSS project and the scripts do not exist yet. |
KDE | CVS/SVN/Git | KDE sysadmin team <sysadmin@kde.org> | NT | See details | See details | |
Java.net & Kenai.com | hg, git, svn | EN | Shutting down on April 28, 2017 | |||
fedorahosted.org | git, svn, hg, bzr | infrastructure@lists.fedoraproject.org | EN | Shutting down on Feb. 28, 2017 | ||
Langage symbolique d'Enseignement (LSE) | archives | Association Enseignement Public et Informatique (EPI) <bureau@epi.asso.fr> | CR | + educational software (INRP-CNDP) | ||
The Netlib collection of numerical software | structured website with links to archives | LC | many of these libraries are mirrored in sources already collected in Software Heritage; there is sure value in the curation information. |