Difference between revisions of "Suggestion box: source code to add"

From Software Heritage Wiki
Jump to navigation Jump to search
(sourceforge is already being ingested)
 
(67 intermediate revisions by 17 users not shown)
Line 1: Line 1:
'''Work in progress'''
 
 
 
The [[Archive]] is growing organically. We started "small", tracking 3 '''software origins''' (GitHub + Debian + GNU), and we will be adding new origins bit by bit, depending on the urgency of archiving them and available development energies to integrate them into Software Heritage.
 
The [[Archive]] is growing organically. We started "small", tracking 3 '''software origins''' (GitHub + Debian + GNU), and we will be adding new origins bit by bit, depending on the urgency of archiving them and available development energies to integrate them into Software Heritage.
  
 
Using this page you can add suggestions of software origins that we aren't following yet, but we should. You can include information about who to contact for technical collaboration, the urgency of archival, and other useful information. To that end, just add a row to the table below. Here some information about the meaning of the various columns.
 
Using this page you can add suggestions of software origins that we aren't following yet, but we should. You can include information about who to contact for technical collaboration, the urgency of archival, and other useful information. To that end, just add a row to the table below. Here some information about the meaning of the various columns.
 +
 +
Entries are currently listed simply in order of addition to this page; we will add more structure when the list will start growing.
  
 
=== Legend ===
 
=== Legend ===
Line 13: Line 13:
 
;Contact
 
;Contact
 
: who to contact for technical collaboration on how to best archive source code hosted on the software origin. You can list yourself if you're the relevant person, or provide the most relevant contact point if you know it
 
: who to contact for technical collaboration on how to best archive source code hosted on the software origin. You can list yourself if you're the relevant person, or provide the most relevant contact point if you know it
 +
;Conservation status
 +
: information about how likely it is that the software origin will disappear; high likelihood will make it more urgent for us to archive software hosted there. We suggest to use the [https://en.wikipedia.org/wiki/Conservation_status species conservation status], i.e., one of: Critically endangered (CR), Endangered (EN), Vulnerable (VU), Near threatened (NT), Least concern (LC).
 +
;How to mirror
 +
: (pointers to) technical information on how to do a full mirror of ''all'' the source code available at the software origin, ideally one shot and in batch
 +
;How to keep up
 +
: (pointers to) technical information on how to incrementally retrieve new source code accumulated since the last visit; usually this should be based on some kind of incremental change feed or event API
 +
;Notes
 +
: anything else you think we should know about this software origin
 +
;Forge task
 +
: pointer to the task on our [[forge]] tracking the work needed to ingest the software origin
  
 +
== Suggestions ==
  
== Suggestions ==
+
* https://archive.codeplex.com/ (was Microsoft's free, open source project hosting). It is now dead, but there seems to be a dump here: https://archive.org/details/sylirana_ms_codeplex_zips
 +
* https://wiki.p2pfoundation.net/List_of_Community-Hosted_GitLab_Instances
 +
* https://git.sdf.org/humanacollaborator/humanacollabora/src/branch/master/forge_comparison.md <= list of forges to eventually sync with the table below.  Note that there may be some [https://framablog.org/2019/09/26/lets-de-frama-tify-the-internet urgency] to harvest framagit.org.
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 23: Line 36:
 
!Conservation status
 
!Conservation status
 
!How to mirror
 
!How to mirror
!How to keep up to date
+
!How to keep up
 
!Notes
 
!Notes
 +
![https://forge.softwareheritage.org/ Forge] task
 +
|-
 +
|'''(sample entry)''' ''GitHubBub forge''
 +
|''Git hosting''
 +
|''John Doe <john@example.com>''
 +
|''LC''
 +
|''retrieve full repo list at /api/list, then git clone on each entry''
 +
|''poll RSS feed at /api/updates?since=YYYY-MM-DD''
 +
|''nothing special to add''
 +
|''[https://forge.softwareheritage.org/T123456 T123456]''
 +
|-
 +
 +
|[https://wiki.debian.org/Derivatives/Census all Debian derivatives]
 +
|Debian-based distros
 +
|Paul Wise <pabs@debian.org>
 +
|varying, depending on the distro
 +
|see [[Suggestion_box:_source_code_to_add/Debian_derivatives|details]]
 +
|see [[Suggestion_box:_source_code_to_add/Debian_derivatives|details]]
 +
|
 +
|
 +
|-
 +
|[https://www.gentoo.org/ Gentoo]
 +
|
 +
|Johannes Kellner <gentoo@johannes-kellner.eu>
 +
|style="background-color: lightgreen"|LC
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|[http://pauillac.inria.fr/~huet/cea.html Gérard Huet's seminal work on 3D]
 +
|Scanned source code
 +
|Gérard Huet <gerard.huet@inria.fr>
 +
|style="background-color: red"|EN
 +
|retrieve listing images from the web pages
 +
|N/A
 +
|links are half broken, yquem should be replaced with pauillac everywhere it appears
 +
|
 +
|-
 +
|[http://www.softwarepreservation.org/projects Software Preservation Project]
 +
|Website with a collection of archives
 +
|Paul McJones <paul@mcjones.org>
 +
|style="background-color: lightgreen"|LC
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|[https://code.nasa.gov/ 253 NASA open source software projects]
 +
|
 +
|
 +
|style="background-color: lightgreen"|LC
 +
|
 +
|
 +
|
 +
|
 +
|-
 +
|[http://smaky.ch/ Smaky], the swiss micro-computer series
 +
|[http://infini.smaky.ch/sources.html Partial code dump]
 +
|[mailto:arnaud@epsitec.ch Pierre Arnaud] (current CEO of Epsitec) and/or [mailto:jean-daniel.nicoud@epfl.ch Jean-Daniel Nicoud] (founder of the computer series]
 +
|style="background-color: red"|EN
 +
|Probably manually
 +
|No new updates
 +
|Some references to this history: [http://www.memoires-informatiques.org/ Fondation Mémoires Informatiques], [http://smaky.ch/ Smaky.ch] (in particular, [http://smaky.ch/theme.php?id=lami the short history]
 +
|
 +
|-
 +
|[https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/FLOSS#Conservation_status_2 wikidata endangered software]
 +
|depends on the "source code repository" property
 +
|Loic Dachary <loic@dachary.org>
 +
|style="background-color: yellow"|The risk is higher than [https://www.wikidata.org/wiki/Property_talk:P141 LC]
 +
|A script should obtain the "source code repository" property for the software and mirror it depending on the [https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/FLOSS#source_code_repository protocol] qualifier. If the "source code repository" is "no value", the [https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/Software#streaming_media_URL streaming media URL] of the "preferred" [https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/Software#software_version_.28P348.29 software version] should be downloaded instead.
 +
|Once a copy is secured by software heritage, a URL to the software heritage repository should be added to the "source code repository" property and the "conservation status" property should be removed, meaning it is "least concerned" by default. The software will no longer show in the list of endangered software.
 +
|This is work in progress, part of the [https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/FLOSS wikidata FLOSS project] and the scripts do not exist yet.
 +
|
 +
|-
 +
|historical KDE repos
 +
|CVS/SVN/Git
 +
|KDE sysadmin team <sysadmin@kde.org>
 +
|style="background-color: yellow"|NT
 +
|See [[Suggestion box: source code to add/KDE|details]]
 +
|See [[Suggestion box: source code to add/KDE|details]]
 +
|
 +
|
 +
|-
 +
|[https://java.net/projects Java.net] & [https://kenai.com/ Kenai.com]
 +
|hg, git, svn
 +
|communitymanager@java.net
 +
|style="background-color: black; color: white"|CR
 +
|
 +
|
 +
|[https://community.oracle.com/community/java/javanet-forge-sunset Shutting down on April 28, 2017]
 +
|
 +
|-
 +
|[https://fedorahosted.org/ fedorahosted.org]
 +
|git, svn, hg, bzr
 +
|[https://lists.fedoraproject.org/admin/lists/infrastructure@lists.fedoraproject.org infrastructure@lists.fedoraproject.org]
 +
|style="background-color: black; color: white"|CR
 +
|
 +
|
 +
|[https://communityblog.fedoraproject.org/fedorahosted-sunset-2017-02-28/ Shutting down on Feb. 28, 2017]
 +
|
 +
|-
 +
|[http://www.societe-informatique-de-france.fr/wp-content/uploads/2015/12/1024-no7-Baude.pdf Langage symbolique d'Enseignement (LSE)]
 +
|archives
 +
|Association Enseignement Public et Informatique (EPI) <bureau@epi.asso.fr>
 +
|style="background-color: black; color: white"|CR
 +
|
 +
|
 +
| + educational software (INRP-CNDP)
 +
|
 +
|-
 +
|[http://www.netlib.org The Netlib collection of numerical software]
 +
|structured website with links to archives
 +
|
 +
|style="background-color: lightgreen"|LC
 +
|
 +
|
 +
|many of these libraries are mirrored in sources already collected in Software Heritage; there is sure value in the curation information.
 +
|
 +
|-
 +
|[https://codebender.cc/ codebender]
 +
|IOT and educationnal ressources
 +
|
 +
|style="background-color: black; color: white"|CR
 +
|
 +
|
 +
|closing expected at Dec31st, 2016 upon this announce by founders https://codebender.cc/next-chapter
 +
|
 +
|-
 +
|[https://git.oschina.net/ OS China]
 +
|Chinese GitHub equivalent
 +
|
 +
|style="background-color: lightgreen"|LC
 +
|
 +
|
 +
|interesting test case for all the unicode tooling in Software Heritage
 +
|
 +
|-
 +
|Usenet source code archives
 +
|NNTP
 +
|
 +
|style="background-color: yellow"|NT
 +
|crawl relevant newsgroup archives (e.g., at Google Groups), retrieve archives (possibly chunked), ingest
 +
|one shot might be enough?
 +
|suggestion by John Gilmore
 +
|
 +
|-
 +
|[https://www.x.org/releases/ X11/XOrg archives]
 +
|http
 +
|
 +
|style="background-color: lightgreen"|LC
 +
|
 +
|one shot might be enough
 +
|
 +
||''[https://forge.softwareheritage.org/T1774 T1774]''
 +
|-
 +
|[https://www.curseforge.com/ Curse mods]
 +
|Code distributed as versioned tarballs
 +
|
 +
|style="background-color: lightgreen"|LC
 +
|The API should be sufficient, maybe some scraping will be required
 +
|
 +
|
 +
|
 +
|-
 +
|Symbian source code
 +
|HG
 +
|carlo.daffara@nodeweaver.eu
 +
|style="background-color: orange"|VU
 +
|
 +
|
 +
|Cloned from Nokia's Symbian Mercurial repository, a few days before the closure of the repo and the change to a proprietary license. The mercurial HG files are the only copy publicly available; I have made a snapshot of the code and placed on sourceforge- but missing is the entire project history and commit log.
 +
|
 
|-
 
|-
|'''(sample entry)''' GitHubBub forge
+
|[https://gist.github.com/ GitHub's gists]
|Git hosting
+
|git
|John Doe <john@example.com>
+
|
|least concern
+
|style="background-color: lightgreen"|LC
|retrieve full repo list at /api/list, then git clone on each entry
+
|No proper listing API, but there's [https://developer.github.com/v3/gists/#list-all-public-gists an endpoint] to get gists created after a given date
|poll RSS feed at /api/updates?since=YYYY-MM-DD
+
|use the same endpoint
|nothing special to add
+
|
 +
|
 +
|-
 +
|[https://puszcza.gnu.org.ua/ Puszcza]
 +
|cvs/git/hg/svn, VCS snapshot tarballs, tarballs
 +
|https://puszcza.gnu.org.ua/contact.php
 +
|style="background-color: lightgreen"|LC
 +
|Savane instance, https://download.gnu.org.ua/ (also has ftp), http://git.gnu.org.ua/
 +
|Savane instance
 +
|
 +
|
 +
|-
 +
|[https://directory.fsf.org/ Free Software Directory]
 +
|solely contains links (to VCS and tarballs)
 +
|https://lists.gnu.org/archive/html/directory-discuss/
 +
|style="background-color: lightgreen"|LC
 +
|MediaWiki instance, use API to clone (or git-remote-mediawiki)
 +
|MediaWiki instance, use API for updates clone (or git-remote-mediawiki)
 +
|
 +
|
 +
|-
 +
|[https://kb.netgear.com/2649/NETGEAR-Open-Source-Code-for-Programmers-GPL NetGear GPL tarballs]
 +
|Tarballs
 +
|
 +
|style="background-color: yellow"|NT
 +
|
 +
|
 +
|
 +
|
 
|-
 
|-
|}
 
  
 +
|[https://www.erasmatazz.com/library/source-code/ Source code for historical games from 1979-1990]
 +
|Zipballs
 +
|
 +
|style="background-color: yellow"|NT
 +
|
 +
|
 +
|Copied to archive.org already too.
 +
|
 +
|-
 
[[Category:Archive]]
 
[[Category:Archive]]
 
[[Category:Suggestions]]
 
[[Category:Suggestions]]

Latest revision as of 05:19, 4 October 2023

The Archive is growing organically. We started "small", tracking 3 software origins (GitHub + Debian + GNU), and we will be adding new origins bit by bit, depending on the urgency of archiving them and available development energies to integrate them into Software Heritage.

Using this page you can add suggestions of software origins that we aren't following yet, but we should. You can include information about who to contact for technical collaboration, the urgency of archival, and other useful information. To that end, just add a row to the table below. Here some information about the meaning of the various columns.

Entries are currently listed simply in order of addition to this page; we will add more structure when the list will start growing.

Legend

Software origin
any (public accessible) "place" on the Internet that host software in source code form. Please provide a title for it and hyperlink it to the relevant URL
Type of origin
information about the kind of hosting, e.g., whether it is a forge, a collection of repositories, an homepage publishing tarball, or a one shot source code repository. For all kind of repositories please specify which VCS system is in use (Git, SVN, CVS, etc.)
Contact
who to contact for technical collaboration on how to best archive source code hosted on the software origin. You can list yourself if you're the relevant person, or provide the most relevant contact point if you know it
Conservation status
information about how likely it is that the software origin will disappear; high likelihood will make it more urgent for us to archive software hosted there. We suggest to use the species conservation status, i.e., one of: Critically endangered (CR), Endangered (EN), Vulnerable (VU), Near threatened (NT), Least concern (LC).
How to mirror
(pointers to) technical information on how to do a full mirror of all the source code available at the software origin, ideally one shot and in batch
How to keep up
(pointers to) technical information on how to incrementally retrieve new source code accumulated since the last visit; usually this should be based on some kind of incremental change feed or event API
Notes
anything else you think we should know about this software origin
Forge task
pointer to the task on our forge tracking the work needed to ingest the software origin

Suggestions

Software origin Type of origin Contact Conservation status How to mirror How to keep up Notes Forge task
(sample entry) GitHubBub forge Git hosting John Doe <john@example.com> LC retrieve full repo list at /api/list, then git clone on each entry poll RSS feed at /api/updates?since=YYYY-MM-DD nothing special to add T123456
all Debian derivatives Debian-based distros Paul Wise <pabs@debian.org> varying, depending on the distro see details see details
Gentoo Johannes Kellner <gentoo@johannes-kellner.eu> LC
Gérard Huet's seminal work on 3D Scanned source code Gérard Huet <gerard.huet@inria.fr> EN retrieve listing images from the web pages N/A links are half broken, yquem should be replaced with pauillac everywhere it appears
Software Preservation Project Website with a collection of archives Paul McJones <paul@mcjones.org> LC
253 NASA open source software projects LC
Smaky, the swiss micro-computer series Partial code dump Pierre Arnaud (current CEO of Epsitec) and/or Jean-Daniel Nicoud (founder of the computer series] EN Probably manually No new updates Some references to this history: Fondation Mémoires Informatiques, Smaky.ch (in particular, the short history
wikidata endangered software depends on the "source code repository" property Loic Dachary <loic@dachary.org> The risk is higher than LC A script should obtain the "source code repository" property for the software and mirror it depending on the protocol qualifier. If the "source code repository" is "no value", the streaming media URL of the "preferred" software version should be downloaded instead. Once a copy is secured by software heritage, a URL to the software heritage repository should be added to the "source code repository" property and the "conservation status" property should be removed, meaning it is "least concerned" by default. The software will no longer show in the list of endangered software. This is work in progress, part of the wikidata FLOSS project and the scripts do not exist yet.
historical KDE repos CVS/SVN/Git KDE sysadmin team <sysadmin@kde.org> NT See details See details
Java.net & Kenai.com hg, git, svn communitymanager@java.net CR Shutting down on April 28, 2017
fedorahosted.org git, svn, hg, bzr infrastructure@lists.fedoraproject.org CR Shutting down on Feb. 28, 2017
Langage symbolique d'Enseignement (LSE) archives Association Enseignement Public et Informatique (EPI) <bureau@epi.asso.fr> CR + educational software (INRP-CNDP)
The Netlib collection of numerical software structured website with links to archives LC many of these libraries are mirrored in sources already collected in Software Heritage; there is sure value in the curation information.
codebender IOT and educationnal ressources CR closing expected at Dec31st, 2016 upon this announce by founders https://codebender.cc/next-chapter
OS China Chinese GitHub equivalent LC interesting test case for all the unicode tooling in Software Heritage
Usenet source code archives NNTP NT crawl relevant newsgroup archives (e.g., at Google Groups), retrieve archives (possibly chunked), ingest one shot might be enough? suggestion by John Gilmore
X11/XOrg archives http LC one shot might be enough T1774
Curse mods Code distributed as versioned tarballs LC The API should be sufficient, maybe some scraping will be required
Symbian source code HG carlo.daffara@nodeweaver.eu VU Cloned from Nokia's Symbian Mercurial repository, a few days before the closure of the repo and the change to a proprietary license. The mercurial HG files are the only copy publicly available; I have made a snapshot of the code and placed on sourceforge- but missing is the entire project history and commit log.
GitHub's gists git LC No proper listing API, but there's an endpoint to get gists created after a given date use the same endpoint
Puszcza cvs/git/hg/svn, VCS snapshot tarballs, tarballs https://puszcza.gnu.org.ua/contact.php LC Savane instance, https://download.gnu.org.ua/ (also has ftp), http://git.gnu.org.ua/ Savane instance
Free Software Directory solely contains links (to VCS and tarballs) https://lists.gnu.org/archive/html/directory-discuss/ LC MediaWiki instance, use API to clone (or git-remote-mediawiki) MediaWiki instance, use API for updates clone (or git-remote-mediawiki)
NetGear GPL tarballs Tarballs NT
Source code for historical games from 1979-1990 Zipballs NT Copied to archive.org already too.