In what 'format' should we ask publishers of linked-data RDF to make their biological classification available?
- Don't define anything - clients must do ad hoc conversions of many representations into their own internal form
- Use a generic RDF vocabulary that doesn't imply a class hierarchy
- Use SKOS (Simple Knowledge Organization System), with taxa is members of the skos:Concept class and subtaxon = skos:narrower
- Use a defined class hierarchy based on rdfs:subClassOf (see Roger Hyam's Linked Data Tutorial)
- A recommendation document
- Roger Hyam
- Jonathan Rees
- Greg Whitbread
- Mark Wilkinson
Simple Subsumption Hierarchy
In this approach we would represent taxonomic hierarchies as subsumption hierarchies. Each taxon would be an RDFS class and joined to the others with rdfs:subClassOf links.
Strength of this is that it seems "natural". Many would consider taxonomy and possibly phylogenetic hierarchies as being a series of nested sets (classes) of individuals.
Another strength is the possibility that it would better leverage OWL-DL, as compared to a taxon-as-individual approach.
Weakness is that if anyone makes an assertion directly about a class then OWL reasoners will consider it to be simultaneously an individual and a class - which is OWL Full and therefore not guaranteed to be decidable. It also makes it difficult to think about. To avoid this all properties that have the class resource as a subject must be declared owl:annotationProperties which excludes them from inference (except possibly under OWL 2.0). People may wish to infer over classes of taxa and properties of taxa.
Using a class based publishing 'format' may be dangerous, because if some data suppliers 'polluted' their data with assertions about classes it would prevent their data being imported directly into systems that relied on inference.
Monophyly Class Examples
An exercise attempting to show some advantages of taxa-as-classes is described in VoCamp1/Monophyly in DL.
Taxa are Logical Individuals with Parent/Child Links
This seems like a more practical approach in that it is less likely to become broken by rogue publisher. We can then induce useful class hierarchies through restriction.
(Jonathan is skeptical that any RDF authored without OWL in mind can be combined with some OWL and fed to a reasoner and get sane answers out, but would be glad to be proved wrong.)
Much of this is summed up in this pdf
This approach is taken by the current TDWG recommended option for publishing classifications (i.e. the TDWG ontology [?]) - but we need to work on it.
This overlaps with Taxonomic Reasoning.
- Give me the most recent common ancestor
- Give me all the specimens that have been identified to this taxon
- Give me specimens identified to this taxon or a synonym of this taxon or a taxon with the same name in another classification or a synonym of that taxon - as a class hierarchy
- Give me taxa whose members possess some trait
- Give me give me give me
OWL / RDF skill Biological Taxonomists Ecologists please.
We are currently playing with some test data and failing to get it into Protege.
If this doesn't work we will try hand coding "Hello World" type data type to test the theories.
Stefan Schulz, Holger Stenzhorn, Martin Boeker: The ontology of biological taxa. ISMB 2008: 313-321 - Roger and Jonathan both found this to be inspirational, and it appears some of our work in the VoCamp was simply to rediscover some of what's in this paper.