Improve the Code Scanner (GSoC task)

WORK IN PROGRESS

Introduction

The Software Heritage archive is the most comprehensive open data knowledge base about source code that has been published openly.

As such, it can be used to scan local source code bases to detect which parts of it come from public code, including Free and Open Source Software.

The Software Heritage scanner (swh-scanner) (documentation, code) is a command line tool that enables doing that.

Task description

swh-scanner is currently an experimental tool, which works well in practice, but need some polishing to make it usable in production in real use cases.

Several improvements are possible:

make
use our in-memory graph database swh-graph to speed up fetching the necessary subgraphs.
write cookers to output new formats (e.g git tarballs/git bundles or even other VCS?)
improve unit and end-to-end testing
other general code improvements (better progress/error reporting in the frontend, etc.)

Desirable skills

Python 3 and Git are a must to work on any Software Heritage project
Basic understanding of the Software Heritage data model and of SWHID identifiers
JavaScript and front-end web development, if you want to work on the interactive dashboard

Potential mentors

Stefano Zacchiroli <zack@upsilon.cc> (zack on IRC)

Improve the Code Scanner (GSoC task)

Contents

Introduction

Task description

Desirable skills

Potential mentors

Navigation menu

Improve the Code Scanner (GSoC task)

Introduction

Task description

Desirable skills

Potential mentors

Navigation menu

Search