From Software Heritage Wiki
Pointers to papers, research projects, or other "science" stuff related to the scientific part of Software Heritage (AKA: the "CERN" of software engineering)
- Hash studies
- Boa: mining ultra-large-scale software repositories (language/infrastructure for software mining)
- Dremel: engine used to process GitHub Archive queries
- GitHub Archive
- paper presentation: Analyzing Millions of GitHub Commits
- PLINY: bayesian auto-completion for code
Digital preservation, digital dark age
- Avoiding "Bit Rot": Long-Term Preservation of Digital Information, Vinton Cerf, Proceedings of the IEEE, June 2011, pages 915-916
- The digital dark age, David Anderson, Communications of the ACM, Vol. 58 No. 12, Pages 20-23, December 2015
- Castagné, Michel. Consider the Source: The Value of Source Code to Digital Preservation Strategies. iSchool Student Research Journal 2.2 (2013): 5.
- Brian Matthews, Arif Shaon, Juan Bicarregui, Catherine Jones. A Framework for Software Preservation. International Journal of Digital Curation, Vol. 5, No. 1, pp. 91-105, 2010.
- Jonathan A. Rees, Digital preservation and independently held copies (blog post)
- about the need of being redundant all the stack down, spanning things like domain registration, DNS, etc.