Publishing Taxonomies

From Evolutionary Interoperability and Outreach
Revision as of 15:22, 11 November 2009 by Hilmar (talk | contribs)
Jump to navigation Jump to search

What 'format' should we ask publishers to make their biological classification available in?

Candidates include

  • Don't define anything - client's must do all the conversion to a class hierarchy.
  • Use a generic RDF vocabulary that doesn't imply a class hierarchy
  • Use SKOS (Simple Knowledge Organization System)
  • Use a defined class hierarchy based on rdfs:subClassOf (see Roger Hyam's Linked Data Tutorial)

Outcomes:

  • A recommendation document

Simple Subsumption Hierarchy

In this approach we would represent taxonomic hierarchies as subsumption hierarchies. Each taxon would be an RDFS class and joined to the others with rdfs:subClassOf links.

Strength of this is that it seems "natural". Many would consider taxonomy and possibly phylogenetic hierarchies as being a series of nested sets (classes) of individuals.

Weakness is that if anyone makes an assertion directly about a class then OWL reasoners will consider it to be simultaneously an individual and a class - which is OWL Full and therefore not guaranteed to be decidable. It also makes it difficult to think about. To avoid this all properties that have the class resource as a subject must be declared owl:annotationProperties which excludes them from inference (except possibly under OWL 2.0). People may wish to infer over these classes.

Using class based publishing 'format' may be dangerous as if some data suppliers 'polluted' their data with assertions about classes it would prevent their data being imported directly into systems that relied on inference.

Monophyly Class Examples

This is described on another page.

[[Monophyly_in_DL] [1]]

Taxa are Logical Individuals with Parent/Child Links

This seems like a more practical approach in that it is less likely to become broken by rogue publisher. We can then induce useful class hierarchies through restriction.

Much of this is summed up in this pdf

This is the preferred option for publishing classification - but we need to work on it.

Use Cases

This overlaps with Taxo Reasoning

  • Give me the most recent common ancestor
  • Give me all the specimens that have been identified to this taxon
  • Give me specimens identified to this taxon or a synonym of this taxon or a taxon with the same name in another classification or a synonym of that taxon - as a class hierarchy.
  • Give me give me give me

OWL / RDF skill Biological Taxonomists Ecologists please.


We are currently playing with some test data and failing to get it into Protege.

If this doesn't work we will try hand coding "Hello World" type data type to test the theories.