Phylotastic: Difference between revisions

Revision as of 12:51, 10 June 2012

Error creating thumbnail: Unable to save thumbnail to destination

Warning

The Hackathon took place at NESCent June 4 to 8. This wiki, which was used as a central resource for pre-hackathon planning, is in a state of flux. Some parts reflect pre-hackathon brainstorming and are out of date. Other parts reflect outcomes of the hackathon.

Where to go

This is the public page for the Phylotastic hackathon (as distinct from the Leadership Team's planning page). Participants self-assembled into task groups to work on pieces of the project. For now, the best place to go for up-to-the-minute information is the sub-group pages here:

Phylotastic/Architecture: Architecture and API for Phylotastic
Phylotastic/shiny subgroup producing demos to showcase phylotastic capabilities
TNRS
Branch length group
Tree Store Group

In addition, there are separate pages developed during pre-hackathon planning. Some of this material may be out of date:

PhylotasticParticipants - roster with photos (see also PhylotasticPictures)
PhylotasticSchedule - hackathon event schedule
PhylotasticUseCases - use cases (ideally with data files and outputs for testing
PhylotasticSpec - for developing detailed specifications (includes scoping statements)
PhylotasticDatastore - for considering options for a Phylotastic data store

Projected tangible outcomes

The table below include tangible outcomes of the hackathon such as code repositories, live demos, specifications, and documentation.

Group	Description	Item (link)	Documentation (link)	has Galaxy adapter	responsible person
arch	demo topology server	live demo	README.pod	yes	Rutger
arch	prototype controller architecture in nodeJS	github project	[1]	NA	Helena
arch	report: a reference architecture for phylotastic services	draft	NA	NA	Helena
arch	extensions to phylomatic	NA	NA	NA	Cam
branch	demo service to annotate tree with dates	http://datelife.org	NA	NA	Brian O.
branch	iEvoBio lightning talk	NA	NA	NA	Brian O.
branch	Publication	a specialized journal	NA	NA	Brian O.
shiny	demo for reconcile-tree use-case	live demo	NA	NA	NA
all	swag - phylotastic t-shirts, anyone?	PhyloT Vote for Phylotastic	NA	NA	NA
TNRS	API specification	API	TNRS	N/A	Not as yet
TNRS	Demonstration	Demo	TNRS	N/A	Not as yet
branch	NA	NA	NA	NA	NA
treestore	New release of CDAO ontology adopting OBO conventions	link to release	NA	NA	Jim
treestore	Perl ingestor of Newick trees/TNRS connection	github	NA	NA	Enrico
treestore	PhyloWS REST wrapper around tree store	live demo	NA	NA	Mark
shiny	scripts to convert Goloboff tree from TNT	dir with perl code	POD within code	NA	Arlin
shiny	mesquite-o-tastic screencast	youtube video	NA	NA	Arlin
NA	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA

Background

A problem faced in many areas of life sciences research, from community ecology to comparative genomics to biomedical genetics, is to put the data available for a set of species into a phylogenetic context, based on a "species tree". For all we know, scientists are facing this type of problem hundreds of times every day. The past decade of efforts to assemble a large "tree of life", a phylogeny for all species, have produced many "megatrees" or "supertrees", usually limited to a particular group of organisms such as fungi, mammals or plants. Most scientists don't know how to use such huge trees. Yet, it ought to be possible to address the scientific demand for species trees by taking the existing supertrees, pruning away unneeded parts, and grafting on (where possible) missing species.

An existing tool called "phylomatic" does precisely this: starting with a user-supplied list of species and a huge phylogenetic topology for plant families, it grafts the species onto the tree wherever it can match the family name, and it prunes away all the rest. This is just a topology, so users find ways to add branch lengths to the resulting tree. The result is that the user, so long as she is only interested in plants, can get a phylogeny for an arbitrary list of named species. Phylomatic rocks: its frequent use shows that big species trees are highly useful for applications in ecology, biodiversity, & trait analysis,when the interfaces that serve user needs— and the mega tree providing vast coverage— are available.

This suggests that if a more general tool can be built, it will be extraordinarily useful, especially if

it is an open standard that can be implemented in many ways
the back-end data store is populated with large phylogenies available for fungi, fish, mammals, butterflies, etc (not just plants)
the core functionality (name-matching, grafting & pruning) is modularized in open-source bioinfo toolboxes
methods for adding branch lengths are easier and more generalized
all of the above operations are wrapped up as web services that can be invoked from existing computing environments

If this were a web service, we could plug it into Mesquite, and users could load up their species-based character matrix, then get a tree for it. In fact, lets go back a step, to consider users with only a list of species, and no data to compare: consider an even more open-ended discovery environment, which we could implement in Galaxy or Taverna (given that this is all based on web services). The user starts with a list of species (or a higher taxon), and a request for some useful types of data that could be obtained by querying various available sources, e.g., whether it has a cyt oxidase sequence in GenBank, whether it is found in California, where is the nearest specimen, etc.

resources: software, references, tutorials, and other useful links

Add links to papers, websites, code, tutorials, etc that would help people get up to speed on any of the proposed tasks.

about the Phylotastic project
- Phylotastic slide presentation in ppt or PDF format
pruning and grafting
- Phylomatic web home
- Phylomatic: tree assembly for applied phylogenetics (PDF) by Webb & Donoghue, 2005
- Rutger's proof-of-concept uses map-reduce
- Phylogenetic Diversity within Seconds shows that pruning-grafting with 10^5 leaf nodes can be done in seconds.
about phyloinformatics web services APIs
- PhyloWS
- TreeBASE web API, an example of phyloWS
- CIPRES rest api (basic) (note that the service is no longer online)
Taxonomic Name Resolution
- iPlants TNRS
- taxize, an R package that interfaces with phylomatic, TNRS, ITIS, uBio, EoL
ideas and resources for species-wise mashups
- Rod Page's http://ispecies.org creates an on-the-fly web page for a species based on info from NCBI, google scholar, etc
standards for representing data
- NeXML: rich, extensible, and verifiable representation of comparative data and metadata describes an extensible XML format for comparative data
adaptable viz environments
adaptable workflow environments

after the hackathon

Opportunities right after the hackathon to build on the phylotastic momentum

do a challenge project for Geneious, present it at iEvoBio
Develop PhylotasticDILS2012 poster for Data Integration in the Life Sciences conference
develop slide presentation to accompany PhylotasticiEvoBio abstract for iEvoBio 2012
do the iEvoBio challenge at iEvoBio
work on Galaxy integration at a workshop
- ISMB in Long Beach, July 13: Bioinformatics Software Interoperability (BIS SIG) - approaches to interoperability, including Cytoscape, Galaxy, GenePattern, GenomeSpace, and others, including the opportunity to adapt tools to one of these environments in a hackathon session.(http://www.broadinstitute.org/bsi-sig/)
- go to Chicago, July 25 to 27 for the 2012 Galaxy Community Conference (GCC2012, http://galaxyproject.org/GCC2012).

Manuscript

Phylotastic design

don't look at this

This is the draft design section from pre-hackathon planning. It has been superceded by the work of the architecture subgroup.

goal statement

Statement of goals. 1. Build phylotastic, a collection of interoperable web services that collectively provide the means to extract a subtree (specified by tips) from any of several large species tree, and to supply branch lengths and provenance annotation. 2. For demonstration purposes, leverage these services within a graphical interface that also integrates the resulting species tree with the user's choice of several high-value types of data. Optionally, this may involve adapting an existing environment (e.g., Galaxy, Taverna) to manage a phylotastic workflow.

inputs and outputs for a simple case

inputs = {

the user's list of species { S }; # the main input under the control of user
optionally, the user's character data, one row for each species in { S } ;
repository of megatrees that we have built for the project ;
any information on { S } conveniently available online via web services (e.g., NCBI, gbif)

outputs = {

phylogeny (with branch lengths) including only species in { S }; # main output
optionally, user's comparative data with tree (NEXUS or NeXML), ready for phylogenetic character analysis;
optionally, a mash-up with other information on { S } from online resources

}

where this output is presented graphically in some viewer that is relatively adaptable, e.g., Mesquite.

a bit more about the issue of integration and mashups

The main work of this project is to develop the "engine", the stuff that is "under the hood". But if this is going to benefit users all over the world, we need to show what the engine can do. For this reason, a substantial fraction of the energy will be devoted to creating integration tools that combine the engine of phylotastic, with species information that is easily gathered via existing services, such as:

images of an individual of the species, collected from EoL or wikipedia; or silhouettes from phylopic
geographic distribution of the species, from GBIF
the location of the nearest museum specimen of the species
whether a genome is available for this species, from NCBI
the number of protein sequences known for this species, from NCBI
the rDNA or cytochrome C sequence for this species, if available from NCBI
the average<link rel="shortcut icon" href="/favicon.ico" />

thinking about phylotastic in an MVC design pattern

background This is an application of Model-View-Controller or MVC design pattern (http://en.wikipedia.org/wiki/Model–view–controller, or see the discussion here: http://msdn.microsoft.com/en-us/library/ff649643.aspx). In the design sketched below, the model (the M in MVC) is precisely the USER's tree. This may sound odd at first, if you've been thinking of "phylotastic" as a centralized resource with back-end megatrees at its heart. The design below gives us considerable freedom (to imagine different kinds of phylotastic implementations) by abstracting the operations away from the model. It frees us from thinking of a conventional workflow, because many operations can be done asynchronously (e.g., we can decorate OTUs with images before or after getting the topology). Because of this potential for multiple asynchronous operations, it may be helpful to add an "Observer" element to the MVC design.

model The "model" is the user's tree along with its metadata. Of course, the user typically doesn't begin with a tree, but with a kind of pre-tree. In mathematics its ok for a "graph" to be a set of unlinked nodes. We'll borrow that way of thinking and imagine that the initial state of the tree is (typically) a list of OTUs that will become the terminal nodes. The final state of the tree typically is a fully connected tree with a topology and branch lengths. The final tree may be missing some nodes that could not be found. Also, there may be annotations of individual nodes, and annotations (metadata) for the tree-as-a-whole (e.g., this tree was assembled on a particular date by a particular service).

operations If that is the model, then here is how we would conceptualize the KINDS of operations that update the model:

a "TNRS" updates the model by replacing input OTU names, or annotating input names, with qualified OTU names.
a "topology service" updates the model by linking some or all of the OTUs into a connected graph
a "scaling service" updates the model by estimating the lengths of branches connecting nodes
a decorating or annotating service updates the model by adding annotations to nodes or branches, such as
- collecting images of OTUs
- gathering fossil-based dates for internal nodes
- assessing quality or reliability of a node
- and so on

In addition

every service updates the model by adding provenance information (e.g., describing how it has modified the model)

controllers and views The typical view of the model is going to be a phylogeny or an OTU-based table. A controller invokes services to modify the model (the user's tree) in response to user commands. Frequently we have discussed phylotastic in terms of automated controllers, such as workflow engines that manage the inputs and outputs of a series of operations. But we also could think of an interactive controller.

Architecture

Error creating thumbnail: Unable to save thumbnail to destination

Phylotastic: Difference between revisions

Revision as of 12:51, 10 June 2012

Contents

Warning

Where to go

Projected tangible outcomes

Background

resources: software, references, tutorials, and other useful links

after the hackathon

Phylotastic design

don't look at this

goal statement

inputs and outputs for a simple case

a bit more about the issue of integration and mashups

thinking about phylotastic in an MVC design pattern

Architecture

Navigation menu

Phylotastic: Difference between revisions

Revision as of 12:51, 10 June 2012

Warning

Where to go

Projected tangible outcomes

Background

resources: software, references, tutorials, and other useful links

after the hackathon

Phylotastic design

don't look at this

goal statement

inputs and outputs for a simple case

a bit more about the issue of integration and mashups

thinking about phylotastic in an MVC design pattern

Architecture

Navigation menu

Search