Dashboard UI for the Code Scanner (GSoC task)
Introduction
The Software Heritage archive is the most comprehensive open data knowledge base about source code that has been published openly.
As such, it can be used to scan local source code bases to detect which parts of it come from public code, including Free and Open Source Software.
The Software Heritage scanner (swh-scanner
) (documentation, code, slides of a 2021 presentation about swh-scanner) is a command line tool that enables doing that.
Task description
swh-scanner
is currently an experimental tool, which works well in practice, but need a real dashboard user interface to be useful.
Several output options are currently available when invoking the swh scanner scan
command, in particular batch output in textual and JSON format, and an interactive
dashboard (with the -i/--interactive
) option.
The interactive view currently works by producing a local HTML file and opening it using the local browser. The goal of this project is to improve the interactive view, making it a serious dashboard-style UI to peruse scanning results.
The following improvements are suggested, although more can be proposed (and even more could be discovered during the project work):
- Technology: generating a local HTML file is not necessarily the best way to render results, alternative solutions should be explored, including a self-hosted web app, rendering results with state-of-the-art frontend web frameworks (css/html/javascript)
- Scalability: currently rendering doesn't work when scanning large code bases such as the Linux kernel, rendering should be made lazy, by only loading data to show when needed
- Functionality: dashboard rendering should be integrated with the possibility of opening the local source code files that have been scanned, e.g., users will want to be able to open in-browser files that have been detected as known/unknown, in order to figure why
- Functionality: in the future additional information will be added to scanning results, including license and provenance information. While not yet available right now due to backend limitations, the proposed UI should plan ahead about how/where to display such information
- Paper cuts: various issues affect the usability of swh-scanner, improving them would be welcome as part of this project
Expected duration
350 hours. Difficulty: medium
Desirable skills
- Python 3 and Git are a must to work on any Software Heritage project
- Basic understanding of the Software Heritage data model and of SWHID identifiers
- HTML/CSS/JavaScript and web development in general
- Working knowledge of UI/UX design principles