Difference between revisions of "Google Season of Docs 2019"

From Software Heritage Wiki
Jump to navigation Jump to search
(copy gsoc 2019 page as initial skeleton)
(→‎Contact: freenode -> libera)
 
(21 intermediate revisions by 5 users not shown)
Line 1: Line 1:
'''TODO WORK IN PROGRESS — mostly a copy of [[Google Summer of Code 2019]] for now, please come back later'''
+
[[File:GSoDLogo.png|512px]]
 
 
[[File:GSoCLogo.png|1024px]]
 
  
 
== General information ==
 
== General information ==
  
This page is the central point of information for [[Software Heritage]] participation into the [https://summerofcode.withgoogle.com/ Google Summer of Code] program.
+
This page is the central point of information for [[Software Heritage]] participation into the [https://developers.google.com/season-of-docs/ Google Season of Docs] program.
  
Google Summer of Code is a program where Google pays students stipends to work over the (northern hemisphere) summer on free software projects such as Software Heritage. Each student works with mentors from the community to complete a software project.
+
Google Season of Docs is a program where Google pays technical writers stipends to work on free software projects such as Software Heritage. Each writer works with mentors from the community to complete a documentation project.
  
== I want to participate as a student ==
+
== I want to participate as a technical writer ==
  
 
Great!, we are very glad for your interest in contributing to Software Heritage and we are looking forward to work together.
 
Great!, we are very glad for your interest in contributing to Software Heritage and we are looking forward to work together.
Line 15: Line 13:
 
=== Prerequisites ===
 
=== Prerequisites ===
  
The following prerequisites apply to Software Heritage GSoC projects:
+
The following prerequisites apply to Software Heritage GSoD projects:
  
* [https://www.python.org Python] 3 is our language of choice, you should be fluent with that language to apply
+
* [http://www.sphinx-doc.org/ Sphinx] is our documentation system of choice, you should be familiar with it to apply. In particular, we generally use reStructuredText markup and (for API references) Python docstrings with the [https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html Napoleon style]
 
* [https://git-scm.com Git] is our version control system of choice, you should be familiar with it to apply
 
* [https://git-scm.com Git] is our version control system of choice, you should be familiar with it to apply
 
* additional prerequisites depend on the project you will work on; check project descriptions for details
 
* additional prerequisites depend on the project you will work on; check project descriptions for details
Line 23: Line 21:
 
=== Before you apply ===
 
=== Before you apply ===
  
Here are the steps you should follow before applying, to make sure you have a good grasp of what we are doing at Software Heritage and how we do it:
+
Here are the steps you should follow before applying, to make sure you have a general idea of the current state of Software Heritage technical documentation:
  
# Follow our [https://docs.softwareheritage.org/devel/getting-started.html getting started guide]: it will make sure you can locally run a (small) copy of the archive and ingest source code into it
+
* Learn about our project via our [https://www.softwareheritage.org/ main website] and the actual [https://archive.softwareheritage.org/ source code archive]
# Create an account on our [https://forge.softwareheritage.org development forge]
+
* Check out the index of our [https://www.softwareheritage.org/community/developers/ resources for developers]
# Familiarize yourself with our [[Code review in Phabricator|code review workflow]]
+
* In particular make sure to have a look at:
# Make a simple change to any one of our [https://docs.softwareheritage.org/devel/ software components] and submit it as a [https://forge.softwareheritage.org/differential/ diff] for code review, following the above workflow. [[Easy hacks]] and [https://forge.softwareheritage.org/project/view/20/ Web UI] issues are good options for what to fix, but feel free to submit any patch you think it might be useful.
+
** our [https://wiki.softwareheritage.org/ public wiki]
 +
** our [https://docs.softwareheritage.org/devel/ documentation index]
 +
* Technical setup:
 +
** Create an account on our [https://forge.softwareheritage.org development forge]
 +
** Familiarize yourself with our [[Code review in Phabricator|code review workflow]]
 +
** Make a simple change to the documentation of any one of our [https://docs.softwareheritage.org/devel/ software components] and submit it as a [https://forge.softwareheritage.org/differential/ diff] for code review, following the above workflow. Feel free to submit any patch you think it might be useful.
  
 
=== What to include in your application ===
 
=== What to include in your application ===
Line 34: Line 37:
 
Make sure that your application includes the following information:
 
Make sure that your application includes the following information:
  
* Describe the '''specific project''' you want to work on. What do you want to achieve? Why is it important? Why is it useful for Software Heritage? The project might be one of the project ideas that we have prepared below, or something else entirely that you want to contribute to Software Heritage. Your source code archival pet peeve, surprise us!
+
* Describe the '''specific project''' you want to work on. What do you want to achieve? Why is it important? Why is it useful for Software Heritage? The project might be one of the project ideas that we have prepared below, or something else entirely that you want to contribute to Software Heritage. Your pet peeve, surprise us!
 
* Detail your '''work plan''': a brief description of how you plan to go about your project, including a list of  ''deliverables'' and a ''timeline'' of when do you expect them to be available.
 
* Detail your '''work plan''': a brief description of how you plan to go about your project, including a list of  ''deliverables'' and a ''timeline'' of when do you expect them to be available.
 
* Include a reference to '''the diff''' you submitted before applying (see the "Before you apply" section above).
 
* Include a reference to '''the diff''' you submitted before applying (see the "Before you apply" section above).
Line 41: Line 44:
  
 
Below you can find a list of project ideas that are good options for a
 
Below you can find a list of project ideas that are good options for a
reasonably sized GSoC project.  They are just suggestion though, don't feel
+
reasonably sized GSoD project.  They are just suggestion though, don't feel
 
obliged to pick one of them if there is nothing that fits your taste and
 
obliged to pick one of them if there is nothing that fits your taste and
 
abilities.  Feel free to propose something else that you are excited about and
 
abilities.  Feel free to propose something else that you are excited about and
that contributes to improve the Software Heritage archive: we will be happy to
+
that contributes to improve the Software Heritage documentation: we will be
consider it!
+
happy to consider it!
  
=== Increase archive coverage ===
+
=== Reorganize developer documentation as tutorials/how-to/discussions/references ===
  
Software Heritage aims to archive ''all publicly available software source
+
Our [https://docs.softwareheritage.org/devel/ developer documentation] is not particularly structured, making it hard to use it properly.
code''. We naturally started with the place where most of the software is
+
We would like to reorganize it following the [https://www.youtube.com/watch?v=t4vKPhjcMZg tutorials/how-to/discussions/references taxonomy], or something equally sensible.
easily available today: git repositories on GitHub.
 
  
As Software Heritage grows, we're incrementally increasing archive coverage by
+
This project will require discussing the intended use cases of this page, conceptually structure the future version of it, implement it, and (re)write suitable language to introduce the various parts.
expanding the sources from which we archive software; a list of currently
 
crawled sources is listed on the
 
[https://archive.softwareheritage.org main archive page]. As you can see there,
 
we have already built ways of archiving Mercurial repositories, Debian packages,
 
PyPI bundles, and more.
 
  
Further expansions of archive coverage are very suitable GSoC project.
+
=== Write a high-level technical overview of the project, data model, and data flow ===
  
Practically, to expand archive coverage two kinds of software components need
+
We have described the data model and archival data flow of Software Heritage in [https://upsilon.cc/~zack/research/publications/ipres-2018-doi.pdf various] [https://upsilon.cc/~zack/research/publications/cacm-2018-software-heritage.pdf scientific] [https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf papers], but haven't really worked on a general, high-level, technical presentation of it that targets developers.
to be implemented: listers and loaders.
+
The current [https://docs.softwareheritage.org/devel/swh-model/data-model.html#data-model data model documentation] has been copy-pasted from scientific papers.
 +
We want to revisit it, to make sure the language description is suitable for a more general technical public, possibly complementing it with more abstract data-structure descriptions and cross-references to the code implementing the model and the workflow in the actual implementation.
  
'''Listers''' are components that crawl the APIs of software
+
=== Design and document consistent writing conventions for Python docstrings ===
[https://en.wikipedia.org/wiki/Forge_(software) forges] (e.g., Bitbucket,
 
Gitorious, Sourceforge, ...) or package managers (a larges list is maintained
 
by [https://libraries.io Libraries.io]) and return a list of the software
 
available in it. See the official
 
[https://docs.softwareheritage.org/devel/swh-lister/ listers documentation] for
 
more details.
 
  
'''Loaders''' take a bundle of software (tarball, Git repository, Python
+
The docstrings of the [https://docs.softwareheritage.org/devel/#components various software components] in the Software Heritage stack are not very consistent.
package, ...) and load it into Software Heritage, by adapting it so that it
+
They have been written by a number of different developers, with varying degrees of English proficiency.
matches the archive
 
[https://docs.softwareheritage.org/devel/swh-model/data-model.html data model].
 
  
While listers answer questions like "what are all the repositories available on
+
We would like to document, as a set of writing guidelines, how to write them to the benefits of current and future developers.
npm.org?", a loader addresses the "how do I load the NPM package I downloaded
+
The guidelines should include what to write in docstrings and whatnot, writing styles and guidance, examples of good and bad content for them, etc.
into Software Heritage?" problem.
 
  
Writing a missing lister or a loader is a great way to contribute to expand the
+
To the extent it is possible, this project will also explore automating checks for guidelines conformance, in the form of basic structural (e.g., are all function parameters documented? is the function name correct? etc.) and writing checks (e.g., spellchecking, grammar checking, etc.).
coverage of the Software Heritage archive! Feel free to propose the
 
implementation of one (or several!) listers or loaders that are currently
 
missing. For inspiration you can check out our [[Suggestion box]] for code to
 
archive, or propose your favorite missing forge or package repository.
 
  
=== Mine information from archived content ===
+
=== Revamp new (code) contributor tutorial ===
  
In addition to archival, Software Heritage indexes the retrieved source code
+
The current [https://docs.softwareheritage.org/devel/#getting-started getting started documentation] include two overlapping documents: one about how to ''run'' a local instance of the full Software Heritage stack, another about how to create a local setup for current and future project ''developers''.
artifacts, to enable semantic searches on the archive and scientific research.
+
We would like to review those documents for consistency and, more generally, revamp documentation that will help wannabe code contributors to quickly get started and submit patches.
  
Indexing can happen at the individual file-level (e.g., detect the programming
+
As part of this project we might also want to revamp glossary, data model and other cross-cutting documents that help newcomers understand where-is-what in the code base and how to practically prepare and send us patches.
language the file is written in or the license declared in its header), or at
 
more coarse grained granularity (e.g., what metadata are declared for the most
 
recently archived version of a given project).
 
  
A number of indexes are [https://forge.softwareheritage.org/source/swh-indexer/ currently supported],
+
=== Restructure wiki landing page as main community entry point ===
such as:
 
  
* file level mining:
+
Our [[Main_Page|public wiki landing page]] is, well, a mess.
** MIME type detection (using libmagic)
+
It has grown inorganically by accumulating links to pages used more as working drafts than reasonably structured documents.
** license detection (using FOSSology/nomossa)
+
We would like to restructure it to cater for various use cases (users, developers, students, curators, scientists, etc.) in the attempt of making it the primary entry point into Software Heritage for all community members.
** language detection (using Pygments)
 
** ctags extraction (using universal-ctags)
 
* project level mining:
 
** Ruby gemspec metadata
 
** Python PKG-INFO metadata
 
** Maven pom.xml metadata
 
** NPM package.json metadata
 
  
Writing additional indexers that extract more information from archived source
+
This project will require discussing the various uses cases, design a suitable page layout, implement it in the wiki, write the introductory parts for each path into the wiki.
code is welcome and would constitute a suitable GSoC project.
+
A related task, if time permits, will be to reorganize the wiki [[Special:Categories|category taxonomy]] and document when/how to use the available categories in the future.
  
Name the kind of data mining you want to do!
+
Working on this project will require familiarity with [https://www.mediawiki.org/wiki/MediaWiki Mediawiki] and its markup language.
 
 
For inspiration you can have a look at [https://libraries.io Libraries.io], as
 
most package formats/package managers support dedicated ways of expressing
 
metadata and we only support a small number of them up-to-now. But do not
 
restrict your ambition to those, any kind of data extraction/mining you want to
 
do on the archive could work.
 
 
 
=== Improve and extend the archive Web UI ===
 
 
 
As you probably know already, The Software Heritage archive can be
 
[https://archive.softwareheritage.org browsed on the Web]. The
 
[https://forge.softwareheritage.org/source/swh-web/ code] powering that
 
interface is a Django application that also implements a
 
[https://archive.softwareheritage.org/api/ Web API].
 
 
 
Several improvements are possible on the archive Web interface and would make
 
great GSoC projects, some ideas to whet your appetite:
 
 
 
* improve navigation on mobile devices and browsers
 
* add new source code search criteria and improve the search interface
 
* add developer-oriented features, e.g., source file history, blame/praise interface, in-browser edit (with patch download), ...
 
* improve [https://www.w3.org/WAI/ accessibility]
 
* add missing API endpoints (name your pet peeves!)
 
* add end to end tests using [https://www.seleniumhq.org/ Selenium]
 
 
 
=== Research internships ===
 
 
 
For the more research-inclined students, we also maintain a separate list of [[Internships]].
 
Any topic there is also a viable GSoC project.
 
  
 
== Contact ==
 
== Contact ==
  
GSoC students are encouraged to get in touch with the Software Heritage community using the standard development communication channels, i.e.:
+
GSoD applicants are encouraged to get in touch with the Software Heritage community using the standard development communication channels, i.e.:
  
* the #swh-devel IRC channel on [https://freenode.net Freenode]
+
* the #swh-devel IRC channel on [https://libera.chat/ Libera Chat]
 
* the [https://sympa.inria.fr/sympa/info/swh-devel swh-devel mailing list]
 
* the [https://sympa.inria.fr/sympa/info/swh-devel swh-devel mailing list]
  
Line 157: Line 102:
 
== Timeline ==
 
== Timeline ==
  
See the official [https://developers.google.com/open-source/gsoc/timeline Google Summer of Code timeline].
+
See the official [https://developers.google.com/season-of-docs/docs/timeline Google Season of Docs timeline].
 +
 
 +
[[Category:Google Season of Docs]]

Latest revision as of 07:30, 15 June 2021

GSoDLogo.png

General information

This page is the central point of information for Software Heritage participation into the Google Season of Docs program.

Google Season of Docs is a program where Google pays technical writers stipends to work on free software projects such as Software Heritage. Each writer works with mentors from the community to complete a documentation project.

I want to participate as a technical writer

Great!, we are very glad for your interest in contributing to Software Heritage and we are looking forward to work together.

Prerequisites

The following prerequisites apply to Software Heritage GSoD projects:

  • Sphinx is our documentation system of choice, you should be familiar with it to apply. In particular, we generally use reStructuredText markup and (for API references) Python docstrings with the Napoleon style
  • Git is our version control system of choice, you should be familiar with it to apply
  • additional prerequisites depend on the project you will work on; check project descriptions for details

Before you apply

Here are the steps you should follow before applying, to make sure you have a general idea of the current state of Software Heritage technical documentation:

What to include in your application

Make sure that your application includes the following information:

  • Describe the specific project you want to work on. What do you want to achieve? Why is it important? Why is it useful for Software Heritage? The project might be one of the project ideas that we have prepared below, or something else entirely that you want to contribute to Software Heritage. Your pet peeve, surprise us!
  • Detail your work plan: a brief description of how you plan to go about your project, including a list of deliverables and a timeline of when do you expect them to be available.
  • Include a reference to the diff you submitted before applying (see the "Before you apply" section above).

Ideas list

Below you can find a list of project ideas that are good options for a reasonably sized GSoD project. They are just suggestion though, don't feel obliged to pick one of them if there is nothing that fits your taste and abilities. Feel free to propose something else that you are excited about and that contributes to improve the Software Heritage documentation: we will be happy to consider it!

Reorganize developer documentation as tutorials/how-to/discussions/references

Our developer documentation is not particularly structured, making it hard to use it properly. We would like to reorganize it following the tutorials/how-to/discussions/references taxonomy, or something equally sensible.

This project will require discussing the intended use cases of this page, conceptually structure the future version of it, implement it, and (re)write suitable language to introduce the various parts.

Write a high-level technical overview of the project, data model, and data flow

We have described the data model and archival data flow of Software Heritage in various scientific papers, but haven't really worked on a general, high-level, technical presentation of it that targets developers. The current data model documentation has been copy-pasted from scientific papers. We want to revisit it, to make sure the language description is suitable for a more general technical public, possibly complementing it with more abstract data-structure descriptions and cross-references to the code implementing the model and the workflow in the actual implementation.

Design and document consistent writing conventions for Python docstrings

The docstrings of the various software components in the Software Heritage stack are not very consistent. They have been written by a number of different developers, with varying degrees of English proficiency.

We would like to document, as a set of writing guidelines, how to write them to the benefits of current and future developers. The guidelines should include what to write in docstrings and whatnot, writing styles and guidance, examples of good and bad content for them, etc.

To the extent it is possible, this project will also explore automating checks for guidelines conformance, in the form of basic structural (e.g., are all function parameters documented? is the function name correct? etc.) and writing checks (e.g., spellchecking, grammar checking, etc.).

Revamp new (code) contributor tutorial

The current getting started documentation include two overlapping documents: one about how to run a local instance of the full Software Heritage stack, another about how to create a local setup for current and future project developers. We would like to review those documents for consistency and, more generally, revamp documentation that will help wannabe code contributors to quickly get started and submit patches.

As part of this project we might also want to revamp glossary, data model and other cross-cutting documents that help newcomers understand where-is-what in the code base and how to practically prepare and send us patches.

Restructure wiki landing page as main community entry point

Our public wiki landing page is, well, a mess. It has grown inorganically by accumulating links to pages used more as working drafts than reasonably structured documents. We would like to restructure it to cater for various use cases (users, developers, students, curators, scientists, etc.) in the attempt of making it the primary entry point into Software Heritage for all community members.

This project will require discussing the various uses cases, design a suitable page layout, implement it in the wiki, write the introductory parts for each path into the wiki. A related task, if time permits, will be to reorganize the wiki category taxonomy and document when/how to use the available categories in the future.

Working on this project will require familiarity with Mediawiki and its markup language.

Contact

GSoD applicants are encouraged to get in touch with the Software Heritage community using the standard development communication channels, i.e.:

See our development information page for more details.

Timeline

See the official Google Season of Docs timeline.