Conference Plan 2011
Highlights from the MIAPA Survey (Lightning talk)
topic: results from survey .
Authors: Nico, Brandon, Emily, Sudhir, Ross, Arlin, Rutger
Abstract Developing a standard involves community engagement to understand user needs. To develop a MIAPA standard for a phylogenetic record, one must understand how and why scientists re-use data from phylogenetic studies, and what barriers they face. To understand this issue better, members of the MIAPA Survey Team documented stories of re-use from their own experience and from interviews with secondary consumers of phylogenetic results (http://www.evoio.org/wiki/BarriersToReUse). These stories suggest that there is a considerable market for re-use. Secondary consumers systematically search for, locate, download, and inspect phylogenies, inferred ancestral dates and states, sequence alignments, character matrices, unaligned characters, and workflow descriptions. Such searches frequently are carried out with only a limited awareness of the scope of available phylogenetic resources. Some re-use is casual, to gain preliminary knowledge, while other re-use aims to create a flow of coded information from a primary study into a secondary study. However, re-use is rarely automatic, and frequently involves manual evaluation of a primary publication. Secondary consumers may spend large amounts of time replicating a study, or attempting to evaluate the quality of re-usable results by examining data sources and methods. Such attempts frequently end in disappointment. To gain a broader and more quantitative picture of barriers to re-use, the survey team is developing an online survey to disseminate to thousands of scientists.
Building a Foundation to Enable Semantic Technologies for Phylogenetically-Based Comparative Analyses (Lightning talk)
presenter: Maryam Panahiazar
topic: Progress report on development of an ontology for concepts relating to tree estimation.
Authors: Maryam Panahiazar, Rutger Vos, Enrico Pontelli, Todd Vision, Arlin Stoltzfus, Jim Leebens-Mack
In revealing historical relationships among genes and species, phylogenies provide a unifying context across the life sciences for investigating diversification of biological form and function. The utility of phylogenies for addressing a wide variety of biological questions is evident in the rapidly increasing number of published gene and species trees. Further, this trend is certain to pick up pace with the explosion of data being generated with next generation sequencing technologies. The impact that this deluge of species and gene tree estimates will have on our understanding of the forces that shape biodiversity will be limited by the accessibility of these trees, and the underlying data and methods of analysis.
The true structure of species trees and gene trees is rarely known. Rather, estimates are obtained through the application of increasingly sophisticated phylogenetic inference methods to increasingly large and complicated datasets. The need for a Minimum Information about Phylogenetic Analyses (MIAPA) reporting standard is clear, but specification of the standard has been hampered by the absence of controlled vocabularies to describe phylogenetic methodologies and workflows. PhylOnt is an extensible ontology being developed to describe the methods employed to estimate trees given a data matrix and thus support specification of MIAPA. PhylOnt will be linked with the Comparative Data Analysis Ontology (CDAO) to provide a comprehensive set of concepts relating to phylogeny estimation that can be used by searchable tree databases and web services. Moreover, we aim to use PhylOnt/CDAO concepts that describe tree estimation procedures to explicitly relate tree descriptions to data matrices within NeXML files. We view this as an important step in the development and specification of MIAPA.
Publishing Re-useable Phylogenetic Trees, in Theory and Practice
topic: practices (not necessarily the best) for publishing a phylogenetic tree, based on the TDWG report and subsequent analyses.
Authors: Brian O'Meara, Jamie Whitacre, Ross Mounce, Dan Rosauer, Rutger Vos, Arlin Stoltzfus
Abstract Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information ('data'), including empirical data as well as computed inferences such as phylogenetic trees. Shared and reused data help to validate previous findings, and address new questions not envisaged by the creators of the data.
Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results.
Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices.