Advancing MIAPA: Difference between revisions

From Evolutionary Interoperability and Outreach
Jump to navigation Jump to search
Line 91: Line 91:
** Emily McTavish's [https://docs.google.com/a/utexas.edu/spreadsheet/ccc?key=0Av6YLPeop2n0dGZiMjNGS2NleWNvQlVSeThVQXdSdmc#gid=0 Tree Annotation Vocabulary] as resulting from the [[Phylotastic I]] hackathon. Includes a mapping to MIAPA draft checklist attributes.  
** Emily McTavish's [https://docs.google.com/a/utexas.edu/spreadsheet/ccc?key=0Av6YLPeop2n0dGZiMjNGS2NleWNvQlVSeThVQXdSdmc#gid=0 Tree Annotation Vocabulary] as resulting from the [[Phylotastic I]] hackathon. Includes a mapping to MIAPA draft checklist attributes.  
** [http://www.slideshare.net/ElliottHauser/phylogenetics-data-provenance-survey-results Slideshow] with some results from the [http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/MIAPADraft#Community_survey MIAPA community survey] orchestrated by the Open Tree of Life project in fall 2012.
** [http://www.slideshare.net/ElliottHauser/phylogenetics-data-provenance-survey-results Slideshow] with some results from the [http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/MIAPADraft#Community_survey MIAPA community survey] orchestrated by the Open Tree of Life project in fall 2012.
* Development:
** [http://github.com/miapa/miapa MIAPA repo on Github]
* Ontologies  
* Ontologies  
** [http://bioportal.bioontology.org/ontologies/49276?p=terms phylont] from [http://arnetminer.org/publication/phylont-a-domain-specific-ontology-for-phylogeny-analysis-3601839.html;jsessionid=E93A88FA38E826E1113A957BBA5BB352.tt Maryam Panahiazar, et al]
** [http://www.evolutionaryontology.org/cdao CDAO]
** [http://www.evolutionaryontology.org/cdao CDAO]
* Development:
** [http://www.w3.org/TR/prov-o/ W3C Provenance Ontology]
** [http://github.com/miapa/miapa MIAPA repo on Github]
** [http://code.google.com/p/information-artifact-ontology/ Information Artefact Ontology]
** Template development after the [https://github.com/phylotastic/treestore/tree/master/terms TNRS ontology] developed at (and after) Phylotastic I.
** [http://theswo.sourceforge.net/ Software Ontology] (includes the [http://edamontology.org/ EDAM ontologies])
** [http://bioportal.bioontology.org/ontologies/49276?p=terms phylont] from [http://arnetminer.org/publication/phylont-a-domain-specific-ontology-for-phylogeny-analysis-3601839.html;jsessionid=E93A88FA38E826E1113A957BBA5BB352.tt Maryam Panahiazar, et al]. (We did not use this.)


== Annotation ==
== Annotation ==

Revision as of 00:28, 3 February 2013

Factors shaping our conception of source-tree annotations

  1. what folks in the evoinfo community believe is the Minimal Information About a Phylogenetic Analysis (MIAPA)
    • The current synopsis of this is the MIAPA checklist from teh 2011 TDWG meeting.
  2. need to support assignment of credit (blame) for tree-producers and phylotastic service-providers
  3. need to support licensing that protects creators, resource-providers and end-users
  4. need to contribute to a credible provenance report for phylotastic-generated trees
    • e.g., a tree might be returned with information as follows (free text form): "This tree was obtained on Jan 29, 2013. An input list of 58 names was submitted to Taxosaurus, resulting in 45 valid species binomials ( list ). This list was sent to a pruner with instructions to prune out the indicated species from the phylogeny of Bininda-Emonds, et al. 2007. The resulting sub-tree with 40 species was scaled using teh DateLife service. "
  5. support the most common user criteria for phylotastic searching
    • limits on source trees
      • return tree with maximal coverage of list ::= { species }
        • specify namespace of <list>
      • restrict by publication status (published or not)
      • restrict by publication year (e.g., no trees older than 5 years)
      • restrict by author (e.g., author = bininda-emonds)
      • restrict by other citation information (e.g., jrnl = index fungorum)
      • restrict by method
        • use (exclude) trees made with method class = { parsimony, likelihood, Bayesian, distance, supertree, supermatrix, hand-crafted. . . }
        • use (exclude) trees made with evolutionary model = { }
        • use (exclude) trees made with software = { RaXML, BEAST, PAUP*, . . . }
        • other restrictions (e.g., supertree, . . . )
      • restrict by availability of source data such as character matrix
        • restrict by minimum number of characters in matrix
      • restrict by type of source data = { molecular, morphological, mixed }
      • restrict by annotation level = { platinum, gold, silver, bronze, polystyrene }
      • require feature
        • require support values
        • require rooting
        • require branch lengths
        • require fully resolved
        • require other features
    • limits on phylotastic manipulations
      • disallow species substitution
      • disallow grafting of source trees
      • scaling: provide median age estimates
      • scaling: provide only lower age estimates
      • TNRS: prohibit fuzzy matching
      • TNRS: use matches from source = { NCBI, ITIS, et }
  6. support adequate annotation of the provenance of the resulting phylotastic tree
    • imagine that you have to publish an analysis using a phylogenetic tree-- how do you describe it? "The phylogeny of 40 mammal species was obtained via phylotastic (phylotastic.org
    • citation information

Lessons learned from tree-finding and annotation

  • data frequently is not readily accessible online
    • e.g. Jetz one must go to birdtree.org
    • e.g., we obtained hymenoptera tree by pers. communication from Peters
    • only one of the studies has trees in TreeBASE (GEBA)
    • another study has a tree in Dryad (Smith)
  • authors often don't provide minimal information explicitly
    • whether or not a tree is rooted is usually not explicit
      • e.g., taxonomic hierarchies (APG, ToLWeb, NCBI) imply rootedness
      • e.g., Peters, et al 2011 never invokes the term "root" but we infer rooting from the description of outgroups not present in the final tree
    • the methodology is largely unexplained for curated or hand-crafted taxonomic frameworks like APGIII or NCBI
  • a study frequently has several trees by slightly different methods, but the checklist seems to imply one method
    • e.g., Goloboff molecules only vs. molecules with morphology
    • e.g., Bininda-Emonds best dates vs min dates vs max dates
  • sometimes a study has many trees that all represent outputs of the same method
    • e.g., Jetz provide a large sample from the posterior distribution
  • process of constructing tree does not follow sequences--> alignment--> tree
    • e.g., supertree method in Bininda-Emonds
    • e.g., hand-crafted APG, Smith, trees
  • process of constructing tree cannot be condensed easily
    • e.g., Goloboff, iterative procedure with divide-and-conquer search to find parsimony tree ("tree fuse" & "sector" search repeated)
    • GEBA tree, { missing explanation }
    • partitioned alignment (e.g., Smith, et al), but miapa implies "model of evolution" as though there were only one, whereas in Peters, et al there are 2 models for 2 partitions
  • clustering to define orthologs not included in checklist, but seems important
    • Smith phlawd, alignment: no pre-orthology.
  • mixed data is common
    • Goloboff has morpho and molecular
    • multiple studies have DNA (e.g., SSU rDNA) and protein sequences
  • concatenated alignments are common, e.g., multiple proteins
    • this means accession:OTU mapping is not 1:1 but many to one
  • not encountered in our inputs but sometimes the OTU is <genus_sp> and the data are fused from multiple species (this is common in MorphoBank)
  • many important trees do not have branch lengths
    • e.g., APGIII is a taxonomic framework
    • e.g., some supertrees don't have branch lengths
  • do binomials count as meaningful external identifiers for OTUs?
    • in some cases, the methods make clear that these come from a specific source
      • e.g., Goloboff names clearly come directly from NCBI via their bioinfo pipeline
      • e.g., Bininda-Emonds publication declares that naming authority is Wilson & Roeder (Mammal Species of the World)
      • e.g., NCBI taxonomy comes as a database dump wth taxids and synonyms, so it represents its own authority
    • usually the naming authority is not clear
  • was any study straightforward?
  • OTUs checklist question may be redundant: why have external identifier and then ask for collections information?
  • OTU external refs in checklist: not applicable for supertree methods, consensus methods, hand-crafted

MIAPA Resources

Annotation