Software ontologies

From Software Heritage Wiki
Revision as of 08:35, 27 April 2017 by Morane (talk | contribs) (Ontologies, vocabularies and metadata terms)
Jump to: navigation, search

Definition of Software Ontology

In computer science, the term ontology refers to a structure of concepts or entities within a domain, organized by relationships [1]

The specification takes the form of a representational vocabulary (classes, relations, and so forth), which provide meanings for the vocabulary and formal constraints on its coherent use. [2]

A software ontology is a classification of categories describing software with explicit specifications of its entities and relationships.

We are working on a list of all ontologies, vocabularies and metadata formats describing software. The categorization of contexts is defined as follows:

  • software ontology: for a well defined ontology using xml/rdf with a direct link to the ontology itself
  • linked data: vocabularies used by search engines
  • generic: metadata terms used in other contexts as well as software domain
  • research: metadata terms used in research context, in particular software citation
  • catalog: metadata terms in a specific catalog
  • dev: metadata terms used in the development process. Can be contained in the software source code package, depending usually on code language

Ontologies, vocabularies and metadata terms

Pointers to relevant software ontologies and software metadata, that might be used as inspiration for the upper-level/metadata part of the Software Heritage data model.

name description context created last update version links in CodeMeta crosswalk table file name
ADMS.SW Asset Description Metadata Schema for Software, and specifically FOSS software ontology, research 2012 2015 1.00 global description ,

specification, rdf2html

no -
DOAP Description of a project software ontology 2010 2017 no version on github,

on wikipedia, rdf

waiting pull request doap.xml, doap.json Vocabularies for structured data use on the internet and beyond. Code, SoftwareSourceCode and SoftwareApplication are the main classes describing software. linked data 2011 2017 3.2 homepage

wikipedia github

no but used by CodeMeta -
SEON A family of Software Evolution ONtologies software ontology 2012 unknown no version homepagedocumentation[] no -
SWOP The Software Ontology Project- "is a resource for describing software tools, their types, tasks, versions, provenance and data associated." funded by the JISC software ontology 2011 2016 - SWOP,

sourceforge project, overview blog post

no -
TOTEM Trustworthy Online Technical Environment Metadata Database for digital objects in general generic, catalog 2008 unknown unknown homepage no -
Wikidata provides data about software with Q7397 linked data, catalog 2012 2017 no version generic software propreties,

FOSS-specific propertiesgithub Source code page

no -
Dbpedia Multi-domain ontology


  • between dbpedia and without Software entities
  • between dbpedia and wikidata and a wikiparser
linked data 2007 2015 3.11 software page

software_code page

no -
DataCite the schema is not software specific generic, research 2009 2016 4.0 schemaguidlines yes
Dublin Core - generic - - - - yes -
Zenodo exported possibilities : MARCXML, Dublin Core, and DataCite Metadata Schema generic, research - - - github citable-code yes no file in source code
Figshare Making research outputs available online generic 2011 - - - yes no file in source code
code.jsonld in CodeMeta crosswalk table but can't find source - - - - - yes -
R Package Description DESCRIPTION file stored in an R package, containing important metadata dev - - - - yes TBD
Debian Package An effort to collect meta-information about projects. Trying to use DOAP vocabulary. Captured in a file called debian/upstream/metadata in YAML format. another file with EDAM ontology can be provided at: debian/upstream/edam dev - 2017 - wiki yes debian/upstream/metadata
debtags debtags are terms used to describe package content in a non-formal way. dev 2005 2017 no version vocabulary


no not in source code
Python Distutils (PyPI) The Python Package Index (PyPI) stores metadata that describes package. setup;py file is used when package was packaged and distributed by Distutils (the standard for distributing Python Modules) dev 1999 2017 35.0.1 Setuptools setuptools doc

wikipedia analyzing pypi metadata 2015

Trove Software Map Distutils Trove Classification dev 1998 2002 - Trove project

usage with Python

yes -
CPAN::Meta Comprehensive Perl Archive Network (CPAN) used somewhat like a package manager. The CPAN::META known as META.yml or META.json file is typically created by other tools, Module::Build and ExtUtils::MakeMaker. The raw form of the metadata doesn't exists in the source code. dev 2003 - 2.150010 wikipedia

homepage on github

yes META.json, META.yml, .spec
Ruby Gem Specifications in Ruby called gemspec that can hold arbitrary metadata in a .gemspec file or a Rakefile dev 2006 2015 0.3.1 guide

on github

yes .gemspec, Rakefile
JavaScript -npm A package.json file containing specifications about npm package. The name and version of the package are a unique identifier, changes to the package should come with changes to the version dev 2010 2017 - documentation


yes package.json
Maven pom.xml file in the project root. POM stands for "Project Object Model", an XML representation of a Maven project. dev - 2017 3.5.0 POM reference

maven model

yes pom.xml
Octave DESCRIPTION file in package which contains various information about package dev - - - howto yes DESCRIPTION
CodeMeta a minimal metadata schema for science software and code, in JSON and XML software ontology, research, linked data 2014 2017 no version homepage

on github schema

yes code.json
Marc MAchine-Readable Cataloging
  • MIT is using MARC records
generic 1960s - - homepage no -

PRONOM - [3]

DOLCE -Outdated ?

CSO - Outdated ?