CommentsOnOpenAnnotationDataModel

From Evolutionary Interoperability and Outreach
Jump to: navigation, search

comments from Arlin after attending OA workshop

Yesterday I attended the east-coast rollout of the Open Annotation (OA) Data Model [1]. I attended this because I though it would be relevant to MIAPA, so I would like to share what I learned.

The context for this is that, based on our published analysis of current practices [7], I assume as a background condition that, even if our community pushed for authors to archive their trees with a MIAPA-compliant report (e.g., in NeXML), it will be many *years* before even 10 % of trees are archived with a compliant report. Meanwhile, those of us who want to build infrastructure for synthesizing and sharing trees (e.g., OToL, Phylotastic) face a problem: the trees we want to use now do not have MIAPA-compliant reports, either from the authors, or from anyone else. How do we gather, encode and manage the metadata for the trees that we want to re-use and distribute now? The answer IMHO is that we will have to rely on 3rd party annotators to create non-authoritative 'stand-off' annotations that are created, managed, and stored separately from the trees themselves. This is the preferred way for the semantic web to work, and there is an impressive amount of activity in academia and industry around supporting this kind of annotation. Such annotations are broadly useful (in various domains) for sharing information, improving discovery of the primary resources, organizing the resources, and providing a means for resource-users to interact. The OA model apparently is the emerging standard for annotations.

The OA model [1] is basically this:

annotation ::=

  • body = what you are saying, i.e., the annotation content
  • target = what it is about, i.e., the resource

And this is extended with a small number of concepts from key vocabularies (OA, RDF, dublin core, etc) to cover the complications of a dozen different kinds of annotating, e.g., an annotation typically applies specifically to a *part* of the target, so there must be means to reference the part. For instance, ancient manuscripts are stored in image files, and anotations reference pixel locations in bitmapped images. See MapHub [2] for a way to combine the images and maps. The body can refer to an RDF graph, and I think it also can embed an RDF graph (need to double-check this), so this means that OA extends automatically to include RDF statements in our own favorite domain-specific languages (e.g., CDAO).

The OA data model is the product of the W3C open annotation community group. This is a highly organized and well funded project in which a data model has been refined over several years in collaboration with use-case projects, and has gone through a comment and revision period. Some of the driving informatics problems motivating the data model, and the tools built on it, are coming from the library-science and humanities research communities. Annotating digital collections, such as ancient manuscripts, is a critical use-case.

However, some of the use-cases are scientific, such as the Domeo annotation toolkit for ontology-based standoff annotations [4]. Those of you familiar with TDWG may know Bob Morris, who was at the OA roll-out to present his "filtered push" project for managing annotations of biological collections data. Even projects that aren't aimed at scientists, such as lorestore [5] or the Open Knowledge Foundation's Annotator [6] have broad usefulness for annotating documents, web pages, digital collections, etc.

The take-home message (for me) is that the information science community is developing tools to support the logical association of secondary information (metadata, comments, likes, whatever) with primary resources.

We will not have to create the back end of a system for the phylogenetics community to annotate trees with metadata, or to rate them using social bookmarking or some other means. These systems are already being developed for other uses. Other people are solving the generic problems associated with this kind of system. We would just need to adapt their tools to our uses.

[1] http://www.openannotation.org/spec/core/ [2] http://maphub.github.io/ [3] http://www.w3.org/community/openannotation/ [4] Domeo - http://swan.mindinformatics.org/ [5] lorestore - http://austese.net/lorestore/index.html [6] OKF annotator - http://okfnlabs.org/annotator/ [7] http://www.biomedcentral.com/1756-0500/5/574