Phylotastic

From Evolutionary Interoperability and Outreach
Revision as of 13:50, 17 February 2012 by Hilmar (talk | contribs) (Added HIP category)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is the public page for the Phylotastic hackathon (as distinct from the Leadership Team's planning page).

to do

  • best tag line:
    • Phylotastic: a web-services infrastructure to make megatrees accessible for research use
    • Phylotastic: the Tree of Life meets Phyloinformatics
    • Phylotastic: trees when you need them
    • Phylotastic: MegaTrees for everyone. Automagically.
  • We need to get the main flowchart in here.
  • Content below can be re-factored.
  • Create a layout that can be used for more detailed planning when the hackathon starts, e.g., the accepted participants and their areas of expertise.

Draft Plan

Please note that this page is a draft plan and a place to develop ideas. The overall target of the hackathon is fixed (build phylotastic), but no single aspect of the plan has been fixed. Participants will have the opportunity to re-think things on day 1 of the hackathon.

Overview

Statement of goals. 1. Build phylotastic, a collection of interoperable web services that collectively provide the means to extract a subtree (specified by tips) from any of several large species tree, and to supply branch lengths and provenance annotation. 2. For demonstration purposes, leverage these services within a graphical interface that also integrates the resulting species tree with the user's choice of several high-value types of data. Optionally, this may involve adapting an existing environment (e.g., Galaxy, Taverna) to manage a phylotastic workflow.

problem In the most typical use, what phylomatic does is this: starting with a huge topology for plant genera, and a user-supplied list of species, it grafts the species onto the tree wherever it can match the genus name, and it prunes away all the rest of the tree. This is just a topology, so often users find ways to add branch lengths to the resulting tree. The result is that the user, so long as she is only interested in plants, can get a phylogeny for an arbitrary list of named species.

Phylomatic rocks: its frequent use shows that big species trees are highly useful for applications in ecology, biodiversity, & trait analysis,when the interfaces that serve user needs— and the mega tree providing vast coverage— are available. But phylomatic would rock harder if:

  • the back-end data store were populated with large phylogenies available for fungi, fish, mammals and prokaryotes (not just plants)
  • the core functionality (name-matching, grafting & pruning) were modularized in an open-source bioinfo library
  • methods for adding branch lengths were easier and more generalized
  • all of the above were wrapped up as web services that could be invoked from computing environments

If this were a web service, we could plug it into Mesquite, and users could load up their species-based character matrix, then get a tree for it. In fact, lets go back a step, to consider users with only a list of species, and no data to compare: consider an even more open-ended discovery environment, which we could implement in Galaxy or Taverna (given that this is all based on web services). The user starts with a list of species (or a higher taxon), and a request for some useful types of data that could be obtained by querying various available sources, e.g., whether it has a cyt oxidase sequence in GenBank, whether it is found in California, where is the nearest specimen, etc.

approach

Hackathon agenda and guiding principles

  • create a demo implementation of a system based on open standards
  • allow alternative implementations, at least for some steps
  • allow flexibiilty for multiple use-cases

Architecture

Error creating thumbnail: Unable to save thumbnail to destination

scoping statements

In Scope

  • Populating data store of existing trees
  • Evolution of PhyloWS to support the needs of Phylomatic
  • Taxonomic name resolution (embedding existing TNRS capacities)
  • Pruning trees and grafting species on them
  • Branch length (existing methods for incorporating branch lengths)
  • Integration of data and trees (e.g., mashups) - species-wise integration
  • Display of resulting trees (using existing technologies)
  • Wrap all these existing tools as web services
  • NeXML syntax extensions if needed
  • If needed, determine methods for compressing NeXML representations
  • Simple user interface (web form)

Not In Scope

  • Constructing new input trees
  • New Data Generation
  • Arguing or evaluating the correctness of trees
  • Design of new TNRS systems
  • Debates about which naming system is best
  • Developing new techniques to derive branch lengths

Uncertain, depends on participant skills and perspectives

  • Phylo-referencing
  • MIAPA annotations of the steps; provenance annotations

approach

in addition to the basic functionality needed for power users, it would be helpful to have a graphical display to show off the results.

demos and other links