Tree Annotation: Difference between revisions

Revision as of 16:51, 1 February 2013

Synopsis Annotate a small set of large trees used as sources of phylogenetic knowledge in an automated delivery system for tree-o-life knowledge called "Phylotastic".

Quick links

Reports

AnnotatedPhylotasticSourceTrees - report on the set of source trees, focusing on the types of metadata available, and how they might be used in phylotastic systems
- spreadsheet
TreestoreMetadataQueryDemonstration - report on the model of semantic encoding, the technology for translation, the treestore technology, and the implications of this for supporting phylotastic querying
AdvancingMIAPA report page - report on the adequacy of the MIAPA checklist, recommendations for revisions, ontology development, challenges of semantic encoding, and also (redundant to above report) the model of semantic encoding.

Other tangible outcomes

new MIAPA ontology
GSOC project proposal

Key resources

checklist from TDWG 2011

Overview

Metadata annotations represent an essential part of the design of phylotastic systems, enabling users to find trees based on sources and methods, and to generate a credible report of provenance for phylotastically generated trees. Yet, metadata play no role in current phylotastic component implementations. The TreeAnnotation team of hackathon 2 (Enrico, Hilmar, Joachim, Arlin, Ramona and 0.5 of Andrea) set out to address this deficiency. We developed an approach with 3 inter-connected goals:

create a set of 10 usefully annotated source trees
demonstrate metadata-based querying in a treestore
leverage this exercise to advance the MIAPA project

Our approach consisted of the following steps

identify 10 useful source trees with available publications
generate free-text annotations
encode citations and annotations in computable form
load the citation, annotations, and trees into a treestore
demonstrate querying based on metadata

In particular, we chose to gather metadata corresponding to the MIAPA draft checklist, to enode it as RDF using a new ontology that imports several other ontologies, and to load the results into Ben Morris's Virtuoso-based treestore implementation.

During the hackathon, group members spent their time developing and revising a strategy, interpreting source materials, developing language support, encoding annotations, implementing tools, and addressing emerging challenges.

The tangible outcomes of the group relate to phylotastic source trees (a set of trees with metadata); software tools for processing, storage and querying; an ontology to support MIAPA annotations, along with a revised MIAPA checklist and form; and written reports on these 3 types of outputs, available on this wiki.

Detailed approach

develop plan (day 1)
- revise as needed
- some work is done in parallel

main workflow

identify 10 trees for use as phylotastic source trees
annotate them in free-text form
- create web form in Google docs for input of annotations, based on MIAPA draft checklist from TDWG 2011 workshop
- Spread sheet has pull down menus, plus options for free text entries under "other"
transform annotations into a formal language statements in RDF
- encoding process is iterative with ontology editing
- Hilmar is working on language support
- Joachim is working on the technology for getting this into a triplestore
- Get URI for tree from TreeStore, add annotations to that URI in Protege
Load trees into TreeStore
- Will need to have trees in the correct format
execute queries to demonstrate success

Log and accomplishments

initial plan (day 1)
initial MIAPA checklist-based input form (day 1)
revised input form
plan for (temporarily) storing trees and matrices (data) separate from metadata
annotations of 10 trees
translation technology
- NEXUS issues, dendropy,
- protege deals poorly with unnamed individuals
ontology for annotation

From day 4, Media:followup_goals.jpg from white board.

citation exercise

goal: annotate trees with citation data, encode, import into treestore, demonstrate querying based on citation metadata

notes on encoding

after some discussion, we decided to use BIBO (not dc or prism alone)
we failed to find any pre-existing method to auto-convert EndNote (or BibTex or Zotero) into BIBO
so we started hand-encoding them using Protege instances
- authors
- articles
  - used Data property "short title" instead of object property title
  - used date of issue for publication year
- author-lists (RDF:list?)
ultimately we ended up getting the encoded citations via PubMed--> EndNote --> bibtex export --> Zotero --> bibo export (bibliontology RDF).
- here is the File:10trees bibliontology.rdf

additional suggestions for MIAPA ontology

From annotation session on afternoon of 1/31.

Add a class for parsimony under algorithm. Filed as Issue #7

It would be good to generate an instance of useMaximumLikelihood ("Maximum Likelihood algorithm") in MIAPA, so we don't have to create one for each annotation. Filed as Issue #8

Alternatively, maybe make classes of software (like PhyML or RAxML) implement ML algorithm, rather than having to assert it for each instance we create. Some software can use multiple algorithms, so we can't do this for each case.
- Note that in OWL classes cannot be asserted to have property values, only instances can. We can put property restrictions with existential quantification on classes, and a OWL reasoner could then infer that an instance must have at least one such property association (and thus a DL query should in principle return the instance), but this wouldn't work in an RDF triple store so that we could then actually query for these things in SPARQL.
- Note also that there can be multiple swo:implements assertions for a software instance, so multiple algorithms can be easily asserted. However, this wouldn't the also say which of those implemented algorithms was the one utilized for the generation of the tree of alignment. The idea is that this would be evident from the miapa:'Parameter specification'.

Remove class for SILVA. Filed as Issue #9

Add new class for set of trees. Filed as Issue #10

more annotations

miapa ontology

topology
- gene tree vs species tree: Network:Tree:'Gene tree' or SpeciesTree
- rooted: Network:Tree:RootedTree or UnrootedTree
- 'Consensus tree'
otus
- toTaxon, object property, points to taxon concept, can be URI from NCBI or other authority
- derived_from specimen
- location imported from geo
branch properties
- branch lengths:
  - data property edge length
  - object property has_Annotation edge_length
- branch support: data property has support value either bootstrap or posterior prob
character matrix
alignment method
- name of software, version
- parameters
- manual correction
tree inference method
- name of software, version: tree wasGeneratedBy (activity=) software procedure; software procedure wasAssociatedWith instance of software agent named "RaXML"
- parameters: (activity) used instance of a parameter specification (which is a kind of plan)
- character weights

semantic links for tree, citation, methods, etc

tree has unique URI produced during loading: http://phylotastic.org/hack2/...number.../...treename...#tree1

how rooted tree connects together

:tree1 has_root node0 ;

how unrooted tree connects together, using the belongs_to_tree relation

 :node9> obo:CDAO_0000200 :tree1 ;

and the same for all the other nodes and edges.

how tree connects with citation (assume that pub1 is the root of the <bibo:AcademicArticle> individual )

 :tree1 dcterms:isReferencedBy :pub1 ;

some other ideas
- :pub1 IAO:is_about :tree1
- :pub1 documents :tree1
- cito:provides_methods_for :tree1
- :pub1 cito:provides_data_for :tree1

how tree connects with methods annotation

:tree1 prov:wasGeneratedBy :tree_activity1 ;

how char matrix connects with methods annotation

:align1 prov:wasGeneratedBy :align_activity1 ;

how tree connects with char matrix

:tree1 prov:wasDerivedFrom :align1 ;

Annotation Workflow

Example file: Tree_2_Peters_et_al.newick

1. `python treestore.py add Tree_2_Peters_et_al.newick newick Peters2011hymenoptera`

reads Newick file `Tree_2_Peters_et_al.newick`
stores the tree in the named graph `http://prefix/Peters2011hymenoptera`
the URI prefix is automatically generated; it is a hash that (more or less) uniquely identifies the data loaded

2. `python treestore.py uri`

lists tree URIs in the triple store
will show something along the line: "Peters2011hymenoptera http://phylotastic.org/hack2/bd414f8f72a8fabb9454b4ea72cf0e8a760171ba/Peters2011hymenoptera#tree0000001"

3. `rdfcat -out N-TRIPLE annotations.rdf > annotations.ntriples`

takes annotations (saved with Protege as RDF/XML, Turtle, or other format)
outputs N-Triples

4. `python treestore.py add annotations.ntriples ntriples http://phylotastic.org/hack2/bd414f8f72a8fabb9454b4ea72cf0e8a760171ba/Peters2011hymenoptera`

adds the annotations to the named graph `http://phylotastic.org/hack2/bd414f8f72a8fabb9454b4ea72cf0e8a760171ba/Peters2011hymenoptera`
the URI for the named graph is the URI returned by `python treestore.py uri` up to the `#` character

@@ Line 92: / Line 92: @@
 From annotation session on afternoon of 1/31.
-Add a class for parsimony under algorithm.
+* Add a class for parsimony under algorithm. [http://github.com/miapa/miapa/issues/7 Filed as Issue #7]
-It would be good to generate an instance of useMaximumLikelihood ("Maximum Likelihood algorithm") in MIAPA, so we don't have to create one for each annotation.
+* It would be good to generate an instance of useMaximumLikelihood ("Maximum Likelihood algorithm") in MIAPA, so we don't have to create one for each annotation. [http://github.com/miapa/miapa/issues/8 Filed as Issue #8]
-Alternatively, maybe make classes of software (like PhyML or RAxML) implement ML algorithm, rather than having to assert it for each instance we create. Some software can use multiple algorithms, so we can't do this for each case.
+* Alternatively, maybe make classes of software (like PhyML or RAxML) implement ML algorithm, rather than having to assert it for each instance we create. Some software can use multiple algorithms, so we can't do this for each case.
-* Note that in OWL classes cannot be asserted to have property values, only instances can. We can put property restrictions with existential quantification on classes, and a OWL reasoner could then infer that an instance must have at least one such property association (and thus a DL query should in principle return the instance), but this wouldn't work in an RDF triple store so that we could then actually query for these things in SPARQL.
+** ''Note that in OWL classes cannot be asserted to have property values, only instances can. We can put property restrictions with existential quantification on classes, and a OWL reasoner could then infer that an instance must have at least one such property association (and thus a DL query should in principle return the instance), but this wouldn't work in an RDF triple store so that we could then actually query for these things in SPARQL.''
-* Note also that there can be multiple swo:implements assertions for a software instance, so multiple algorithms can be easily asserted. However, this wouldn't the also say which of those implemented algorithms was the one utilized for the generation of the tree of alignment. The idea is that this would be evident from the miapa:'Parameter specification'.
+** ''Note also that there can be multiple swo:implements assertions for a software instance, so multiple algorithms can be easily asserted. However, this wouldn't the also say which of those implemented algorithms was the one utilized for the generation of the tree of alignment. The idea is that this would be evident from the miapa:'Parameter specification'.''
-Remove class for SILVA.
+* Remove class for SILVA. [http://github.com/miapa/miapa/issues/9 Filed as Issue #9]
-Add new class for set of trees.
+* Add new class for set of trees. [http://github.com/miapa/miapa/issues/10 Filed as Issue #10]
 === more annotations ===

Tree Annotation: Difference between revisions

Revision as of 16:51, 1 February 2013

Contents

Quick links

Overview

Detailed approach

Log and accomplishments

citation exercise

additional suggestions for MIAPA ontology

more annotations

semantic links for tree, citation, methods, etc

Annotation Workflow

Navigation menu

Tree Annotation: Difference between revisions

Revision as of 16:51, 1 February 2013

Quick links

Overview

Detailed approach

Log and accomplishments

citation exercise

additional suggestions for MIAPA ontology

more annotations

semantic links for tree, citation, methods, etc

Annotation Workflow

Navigation menu

Search