Difference between revisions of "Google Season of Docs 2019"

From Software Heritage Wiki
Jump to: navigation, search
(Ideas list)
(Ideas list)
Line 50: Line 50:
 
happy to consider it!
 
happy to consider it!
  
=== Increase archive coverage ===
+
=== Restructure wiki landing page as main community entry point ===
  
Software Heritage aims to archive ''all publicly available software source
+
'''TODO'''
code''. We naturally started with the place where most of the software is
 
easily available today: git repositories on GitHub.
 
  
As Software Heritage grows, we're incrementally increasing archive coverage by
+
=== Reorganize developer documentation as tutorials/how-to/discussions/references ===
expanding the sources from which we archive software; a list of currently
 
crawled sources is listed on the
 
[https://archive.softwareheritage.org main archive page]. As you can see there,
 
we have already built ways of archiving Mercurial repositories, Debian packages,
 
PyPI bundles, and more.
 
  
Further expansions of archive coverage are very suitable GSoC project.
+
''' TODO'''
  
Practically, to expand archive coverage two kinds of software components need
+
=== Elicit and document consistent writing conventions for Python docstrings ===
to be implemented: listers and loaders.
 
  
'''Listers''' are components that crawl the APIs of software
+
'''TODO'''
[https://en.wikipedia.org/wiki/Forge_(software) forges] (e.g., Bitbucket,
 
Gitorious, Sourceforge, ...) or package managers (a larges list is maintained
 
by [https://libraries.io Libraries.io]) and return a list of the software
 
available in it. See the official
 
[https://docs.softwareheritage.org/devel/swh-lister/ listers documentation] for
 
more details.
 
  
'''Loaders''' take a bundle of software (tarball, Git repository, Python
+
=== Revamp new (code) contributor tutorial ===
package, ...) and load it into Software Heritage, by adapting it so that it
 
matches the archive
 
[https://docs.softwareheritage.org/devel/swh-model/data-model.html data model].
 
  
While listers answer questions like "what are all the repositories available on
+
'''TODO'''
npm.org?", a loader addresses the "how do I load the NPM package I downloaded
 
into Software Heritage?" problem.
 
  
Writing a missing lister or a loader is a great way to contribute to expand the
+
=== Write a high-level technical overview of the project, data model, and data flow ===
coverage of the Software Heritage archive! Feel free to propose the
 
implementation of one (or several!) listers or loaders that are currently
 
missing. For inspiration you can check out our [[Suggestion box]] for code to
 
archive, or propose your favorite missing forge or package repository.
 
  
=== Mine information from archived content ===
+
'''TODO'''
 
 
In addition to archival, Software Heritage indexes the retrieved source code
 
artifacts, to enable semantic searches on the archive and scientific research.
 
 
 
Indexing can happen at the individual file-level (e.g., detect the programming
 
language the file is written in or the license declared in its header), or at
 
more coarse grained granularity (e.g., what metadata are declared for the most
 
recently archived version of a given project).
 
 
 
A number of indexes are [https://forge.softwareheritage.org/source/swh-indexer/ currently supported],
 
such as:
 
 
 
* file level mining:
 
** MIME type detection (using libmagic)
 
** license detection (using FOSSology/nomossa)
 
** language detection (using Pygments)
 
** ctags extraction (using universal-ctags)
 
* project level mining:
 
** Ruby gemspec metadata
 
** Python PKG-INFO metadata
 
** Maven pom.xml metadata
 
** NPM package.json metadata
 
 
 
Writing additional indexers that extract more information from archived source
 
code is welcome and would constitute a suitable GSoC project.
 
 
 
Name the kind of data mining you want to do!
 
 
 
For inspiration you can have a look at [https://libraries.io Libraries.io], as
 
most package formats/package managers support dedicated ways of expressing
 
metadata and we only support a small number of them up-to-now. But do not
 
restrict your ambition to those, any kind of data extraction/mining you want to
 
do on the archive could work.
 
 
 
=== Improve and extend the archive Web UI ===
 
 
 
As you probably know already, The Software Heritage archive can be
 
[https://archive.softwareheritage.org browsed on the Web]. The
 
[https://forge.softwareheritage.org/source/swh-web/ code] powering that
 
interface is a Django application that also implements a
 
[https://archive.softwareheritage.org/api/ Web API].
 
 
 
Several improvements are possible on the archive Web interface and would make
 
great GSoC projects, some ideas to whet your appetite:
 
 
 
* improve navigation on mobile devices and browsers
 
* add new source code search criteria and improve the search interface
 
* add developer-oriented features, e.g., source file history, blame/praise interface, in-browser edit (with patch download), ...
 
* improve [https://www.w3.org/WAI/ accessibility]
 
* add missing API endpoints (name your pet peeves!)
 
* add end to end tests using [https://www.seleniumhq.org/ Selenium]
 
 
 
=== Research internships ===
 
 
 
For the more research-inclined students, we also maintain a separate list of [[Internships]].
 
Any topic there is also a viable GSoC project.
 
  
 
== Contact ==
 
== Contact ==

Revision as of 13:03, 22 April 2019

GSoDLogo.png

General information

This page is the central point of information for Software Heritage participation into the Google Season of Docs program.

Google Summer of Code is a program where Google pays technical writers stipends to work on free software projects such as Software Heritage. Each writer works with mentors from the community to complete a documentation project.

I want to participate as a technical writer

Great!, we are very glad for your interest in contributing to Software Heritage and we are looking forward to work together.

Prerequisites

The following prerequisites apply to Software Heritage GSoD projects:

  • Sphinx is our documentation system of choice, you should be familiar with it to apply. In particular, we generally use reStructuredText markup and (for API references) Python docstrings with the Napoleon style
  • Git is our version control system of choice, you should be familiar with it to apply
  • additional prerequisites depend on the project you will work on; check project descriptions for details

Before you apply

Here are the steps you should follow before applying, to make sure you have a general idea of the current state of Software Heritage technical documentation:

What to include in your application

Make sure that your application includes the following information:

  • Describe the specific project you want to work on. What do you want to achieve? Why is it important? Why is it useful for Software Heritage? The project might be one of the project ideas that we have prepared below, or something else entirely that you want to contribute to Software Heritage. Your pet peeve, surprise us!
  • Detail your work plan: a brief description of how you plan to go about your project, including a list of deliverables and a timeline of when do you expect them to be available.
  • Include a reference to the diff you submitted before applying (see the "Before you apply" section above).

Ideas list

Below you can find a list of project ideas that are good options for a reasonably sized GSoD project. They are just suggestion though, don't feel obliged to pick one of them if there is nothing that fits your taste and abilities. Feel free to propose something else that you are excited about and that contributes to improve the Software Heritage documentation: we will be happy to consider it!

Restructure wiki landing page as main community entry point

TODO

Reorganize developer documentation as tutorials/how-to/discussions/references

TODO

Elicit and document consistent writing conventions for Python docstrings

TODO

Revamp new (code) contributor tutorial

TODO

Write a high-level technical overview of the project, data model, and data flow

TODO

Contact

GSoD applicants are encouraged to get in touch with the Software Heritage community using the standard development communication channels, i.e.:

See our development information page for more details.

Timeline

See the official Google Summer of Code timeline.