Improve project search engine (GSoC task)

From Software Heritage Wiki
Revision as of 12:09, 10 February 2021 by Vlorentz (talk | contribs) (Created page with "The [https://archive.softwareheritage.org/ homepage of the Software Heritage archive] features a small search engine, that searched in project URLs and project metadata. Proje...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The homepage of the Software Heritage archive features a small search engine, that searched in project URLs and project metadata. Project metadata includes name, description, authors, etc.

By the time GSoC starts, it will be implemented by a very small Python service (under 100 lines of code) backed by ElasticSearch.

This service is a MVP (Minimum Viable Product) that was written to replace an older service, based on Postgresql, which was too slow. So there is a lot of room for improvement in terms of adding features and making results more relevant.

Most of the features we have in mind are to allow finer search on project metadata, instead of simply doing a full-text search on the entire metadata; but we are open to suggestions.