Difference between revisions of "Ingest Wikidata software origins (internship)"

From Software Heritage Wiki
Jump to: navigation, search
(add IRC nicknames)
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Context''': [https://www.softwareheritage.org/ Software Heritage] is an
+
{{Internship
ambitious research project whose goal is to collect, preserve in the very long
+
|description=The Software Heritage archive currently contains source code
term, and share the whole publicly accessible Free/Open Source Software
 
(FOSS) in source code form.
 
 
 
'''Description''': The Software Heritage archive currently contains source code
 
 
coming mostly from major development forges and distributions.
 
coming mostly from major development forges and distributions.
 
[https://www.wikidata.org/ Wikidata] is a free and open knowledge base about
 
[https://www.wikidata.org/ Wikidata] is a free and open knowledge base about
Line 12: Line 8:
 
and ingested into the Software Heritage archive.
 
and ingested into the Software Heritage archive.
  
'''Desirable skills''' to obtain this internship:
+
|skills=
* familiarity with the Distributed Version Control Systems (DVCs)
+
* familiarity with the Version Control Systems
 
* familiarity with Wikipedia and/or Wikidata
 
* familiarity with Wikipedia and/or Wikidata
 
* Python development
 
* Python development
  
'''Workplace''': Inria Paris
+
|mentors=
 
+
* Stefano Zacchiroli <zack@upsilon.cc> (zack on [[IRC]])
'''Environnement''': you will work shoulder to shoulder with all members of the
+
}}
Software Heritage team, and you will have a chance to witness from within the
 
construction of the ultimate source code archive.
 
 
 
'''Internship mentors''':
 
* Stefano Zacchiroli <zack@upsilon.cc>
 
 
 
  
 
[[Category:Available internship]]
 
[[Category:Available internship]]
[[Category:Internship]]
 
[[Category:Lang:English]]
 

Revision as of 10:34, 2 March 2021

Context: Software Heritage is an ambitious research project whose goal is to collect, preserve in the very long term, and share the whole publicly accessible Free/Open Source Software (FOSS) in source code form.

Description: The Software Heritage archive currently contains source code coming mostly from major development forges and distributions. Wikidata is a free and open knowledge base about everything, including software development projects. The goal of this internship is to list software origins described in Wikidata (in particular, but not only, version control system) and make sure they get periodically crawled and ingested into the Software Heritage archive.

Desirable skills to obtain this internship:

  • familiarity with the Version Control Systems
  • familiarity with Wikipedia and/or Wikidata
  • Python development

Workplace: on site at Inria Paris (contact mentors for remote opportunities)

Environment: you will work shoulder to shoulder with all members of the Software Heritage team, and you will have a chance to witness from within the construction of the great library of source code.

Internship mentors:

  • Stefano Zacchiroli <zack@upsilon.cc> (zack on IRC)

See also