Phylotastic1

From Evolutionary Interoperability and Outreach
Jump to: navigation, search

There was a first Phylotastic hackathon at NESCent from June 4-8. This page has content about what happened in Durham. [1] is a project to enable convenient, computable, credible access to the "Tree of Life" comprising expert knowledge of phylogeny: the species tree you want, in ready-to-use form, when you want it. It is a project started by a NESCent working group called HIP - Hackathons, Interoperabilties and Phylogenetics. The second hackathon is happening at iPlant on January 28 through Feb 1.

Participants self-assembled into task groups to work on pieces of the project.

Background

A problem faced in many areas of life sciences research, from community ecology to comparative genomics to biomedical genetics, is to put the data available for a set of species into a phylogenetic context, based on a "species tree". For all we know, scientists are facing this type of problem hundreds of times every day. The past decade of efforts to assemble a large "tree of life", a phylogeny for all species, have produced many "megatrees" or "supertrees", usually limited to a particular group of organisms such as fungi, mammals or plants. Most scientists don't know how to use such huge trees. Yet, it ought to be possible to address the scientific demand for species trees by taking the existing supertrees, pruning away unneeded parts, and grafting on (where possible) missing species.

An existing tool called "phylomatic" does precisely this: starting with a user-supplied list of species and a huge phylogenetic topology for plant families, it grafts the species onto the tree wherever it can match the family name, and it prunes away all the rest. This is just a topology, so users find ways to add branch lengths to the resulting tree. The result is that the user, so long as she is only interested in plants, can get a phylogeny for an arbitrary list of named species. Phylomatic rocks: its frequent use shows that big species trees are highly useful for applications in ecology, biodiversity, & trait analysis,when the interfaces that serve user needs— and the mega tree providing vast coverage— are available.

This suggests that if a more general tool can be built, it will be extraordinarily useful, especially if

  • it is an open standard that can be implemented in many ways
  • the back-end data store is populated with large phylogenies available for fungi, fish, mammals, butterflies, etc (not just plants)
  • the core functionality (name-matching, grafting & pruning) is modularized in open-source bioinfo toolboxes
  • methods for adding branch lengths are easier and more generalized
  • all of the above operations are wrapped up as web services that can be invoked from existing computing environments

If this were a web service, we could plug it into Mesquite, and users could load up their species-based character matrix, then get a tree for it. In fact, lets go back a step, to consider users with only a list of species, and no data to compare: consider an even more open-ended discovery environment, which we could implement in Galaxy or Taverna (given that this is all based on web services). The user starts with a list of species (or a higher taxon), and a request for some useful types of data that could be obtained by querying various available sources, e.g., whether it has a cyt oxidase sequence in GenBank, whether it is found in California, where is the nearest specimen, etc.

Subgroups

Currently, the best places to go for information are the sub-group pages here:

Phylotastic pages

In addition, there are separate pages that continue to be updated:

Older material

Some of this material, developed during pre-hackathon planning, is no longer relevant or has gone stale:

Projected tangible outcomes

The table below include tangible outcomes of the hackathon such as code repositories, live demos, specifications, and documentation.

Group Description Item (link) Documentation (link) NEAD responsible person
all manuscript (evol bioinfo?) draft ms NA NA Arlin
all iEvoBio talk slides at slideshare NA yes Karen, Arlin
all promo (screencast) PhylotasticPromo NA NA Rutger, Arlin
all swag - phylotastic t-shirts, anyone? PhyloT Vote for Phylotastic NA no Meg?
arch demo galaxy server live demo and code (github) base class and screencast yes Rutger
arch demo topology server live demo and code on github README.pod yes Rutger
arch extensions to phylomatic github NA yes Cam
arch prototype controller architecture in nodeJS github project [2] no Helena
arch prototype controller as Perl CGI script https://github.com/phylotastic/cgi README on github yes Ben
arch report: a reference architecture for phylotastic services draft NA no Helena
branch DateLife demo service to annotate tree with dates http://datelife.org NA yes Brian O.
branch iEvoBio challenge talk YouTube video NA yes Brian O.
branch Publication a specialized journal NA NA Brian O.
shiny demo for reconcile-tree use-case live demo NA yes Chris B.
shiny Mesquite-o-tastic demo module Java code on github screencast yes Arlin & Peter
shiny scripts to convert Goloboff tree from TNT dir with perl code POD within code yes Arlin
shiny 5 blogs about the event blogspot NA no Holly
shiny refinement of gene duplication inference algorithm implementation dir with Java code limited no Christian Z.
TNRS API specification API TNRS yes Naim
TNRS NCBI implementation of the API github NCBI no Siavash
TNRS MSW2 implementation of the API github MSW3 no Siavash
TNRS Demo server (TaxoSaurus) Demo TNRS yes Naim
TNRS (treestore) RDF model and ontology for TNRS requests and results link to release NA yes Hilmar
treestore New release of CDAO ontology adopting OBO conventions link to release NA yes Jim
treestore Prototype tree-pruning SADI service Github NA yes Jim
treestore Perl ingestor of Newick trees/TNRS connection github NA no Enrico
treestore PhyloWS REST wrapper around tree store live demo NA no Mark
NA NA NA NA no NA
NA NA NA NA no NA

after the hackathon

Opportunities right after the hackathon to build on the phylotastic momentum

  • do a challenge project for Geneious, present it at iEvoBio
  • develop slide presentation to accompany PhylotasticiEvoBio abstract for iEvoBio 2012
  • do the iEvoBio challenge at iEvoBio
  • work on Galaxy integration at a workshop

Manuscript

Phylotastic Architecture

A draft design resulted from pre-hackathon planning. This was then completely overhauled and superseded by the results of the work of the architecture subgroup.