Phylotastic/shiny

From Evolutionary Interoperability and Outreach
Revision as of 17:10, 10 June 2012 by Hilmar (talk | contribs) (→‎Quick links)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Error creating thumbnail: Unable to save thumbnail to destination


Members

Quick links

Overview

In the day 1 discussion, Mark Holder suggested it was important to make something "shiny" to showcase phylotastic capabilities. This led to the name of the "shiny" group, which aims to develop a small set of well documented, user-oriented demonstration projects to show what phylotastic can do.

Progress reports

Day 2. We developed a list of 5 targets, got feedback from other participants, and then prioritized the list. Our plan is to implement them in order of priority. One of the sticking points in our discussion was that some members thought that our task was to build a multi-capable web front end, while others thought that our task was to build several stand-alone demos targeted at specific use-cases. We compromised by deciding to build several separate demos, one of them being a multi-capable web front end, which was given a high priority. Some other decisions and conclusions:

  • Christian is excited about getting to work on reconcili-o-tastic
  • Chris and Meg agree on a web-based implementation using Python and JavaScript, and both of them have experience doing this.

Day 3. Today we did the following

  • installed web server framework web2py
  • set up web home for phyloshiny
  • made mock-ups of some web interfaces
  • designed and implemented most of the parts for reconcili-o-tastic (source code is on google code)
    • see Christian's demo for getting species and gene name from input gene tree
  • developed precise test files and data inputs for interface
    • names in tol and mammal trees
    • test case for reconciliation using phylotastic tree via Rutger's service

Day 4. Today we did the following

  • developed Java to preprocess gene trees (obtain species names from EBI, delete nodes for which a species cannot be established) see: gene tree preprocessing download
  • Did a lot of web2py backend stuff getting reconciliotastic to work
  • worked more on test cases based on real data

Day 5. Today we did the following

Plan of action

  • (done) Develop a list of targets. Each target is a shiny demo that showcases something phylotastic
  • (done) Solicit feedback & additional ideas from hackathon participants
  • (done) Prioritize the targets considering feasibility and impact.
  • Implement targets in order of priority
    • (1) Reconcili-o-tastic
      • (done, needs to be connected to other services once they are ready) web interface
    • (2) Phylotastic Web Front-end
      • Mock-ups started
    • (2) Phylo-Taxic
    • (3) Character Analysis Workflow Integration
    • (3) Tree from PDF

Prioritized targets

Reconcili-o-tastic (priority level 1)

Synopsis Starting with only a gene tree, carry out gene-tree-species-tree reconciliation.

Rationale This is directly useful to molecular evolution researchers. Genome annotation, in particular functional annotation, depends on distinguishing orthology from paralogy, and this is best done via phylogenetic methods of tree reconciliation (although ad hoc approaches also are possible). Because of its association with genome annotation and genome analysis pipelines, reconciliation is potentially a high-volume use-case for phylotastic services. Current reconciliation approaches assume that the user will supply a species tree. According to Christian, this is a significant barrier-- first the user must figure out which species are implicated by the gene (protein) tree, then the user must find a tree for those species. To our knowledge, there is no current tool to automatically get the species phylogeny to use in reconciliation. However, in the case of the EnsemblCompara pipeline [1] and possibly some other reconciliation packages, a species "tree" may be obtained using a tool that prunes a topology from the NCBI taxonomy hierarchy.

Preconditions

  • A gene tree

Steps

  1. User upload gene tree or selects from list ==> demo tool responds by
    1. uploading and consuming gene tree
    2. displaying gene tree
    3. parsing out accession numbers
    4. querying a DB service to get species names for accession numbers
    5. querying phylotastic server to get species tree
    6. (displaying species tree) not implemented
    7. reconciling gene tree and species tree
    8. displaying reconciled tree

Approach

  • Sample data are in the sample_data directory on github (see also example input gene tree)
  • Use NCBI e-utils to go from GI (Accn) to taxid to species name
  • use Rutger's phylotastic demo server
  • Do reconciliation in back-end using Christian's "SDI" Java code
  • Display reconciled tree using embedded Archaeopteryx (Archaeopteryx applet manual)

Generalized web interface (priority level 2)

Synopsis Provide users with a flexible way to access phylotastic services and integrate the results with available species data.

Rationale This demo does not target a particular use-case, but attempts to display flexibility. There is nothing similar to it presently.

Steps

  • input species lists
    • user types names with auto-completion
    • pre-determined lists (e.g., pet species, scary things, bugs that live in your gut)
  • user chooses metadata (lists or checkboxes)
    • pics?
    • genome availability from NCBI - Eutils Ebot script which returns a UID File:Genome mining.txt
    • gene marker availability from NCBI (e.g., cytC, cytOX, 16S rDNA, etc)
  • user chooses output formats
  • user selects "get tree" ==> app responds by
    1. Obtaining phylogeny and metadata
    2. Displaying phylogeny
    3. displaying notes about how we got this tree phylotastically

Approach

  • lists of names are in the github sample_data directory -- we can match against these to make sure that the user's species list can be satisfied.

Phylo-Taxic (priority level 2)

Synopsis From a user-supplied taxon name, provide a tree for subset of species.

Rationale This is directed at relatively non-demanding users who just want a phylogeny to represent the diversity of a higher-level taxon, e.g., for the purpose of making a wikipedia page. ToLWeb provides taxon pages but they typically do not show a complete tree, only a few levels of a taxon hierarchy.

Preconditions

  • taxon name, possibly common name

Steps

  • user inputs name (Rodentia, Lagomorpha)
  • resolve that into a list of species, e.g., via NCBI taxonomy
  • sample from the list of species by user-selected criteria:
    • retain species with an available genome sequence
    • retain only 1 species per genus
    • retain species with most popular wikipedia pages
  • Obtain phylogeny for species phylotastically
  • Display phylogeny along with notes about how we got this tree phylotastically

Approach

Facilitating phylogenetic comparative analysis (priority level 3)

Synopsis Obtain tree from species referenced in user-supplied character matrix, ideally in the context of an analysis environment familiar to users.

Rationale This demo reflects use-cases that culminate in a phylogenetic comparative analysis of the traits of species (e.g., on the usecases wiki, the fishy, Riek and Walls use-cases). It is similar to an analysis that Naim described in his talk about the iPlant discovery environment.

Preconditions

  • Input matrices:
    • user-supplied NEXUS files
    • pre-selected NEXUS files
    • samples from Leaf Economics Spectrum (LES) data of Wright, et al., 2004
    • MorphoBank

Steps

  • user loads file
  • user selects option to get tree ==> app responds by
    • obtaining phylogeny for species phylotastically
    • Displaying phylogeny along with notes about how we got this tree phylotastically
  • user continues with downstream tree-dependent analysis steps

Approach

  • Mesquite-o-tastic (see http://www.youtube.com/watch?v=Lak-zjwFuhQ&feature=youtube_gdata_player)
    1. Load NEXUS into Mesquite
    2. select (new) menu item Taxa & Trees > Get Tree Phylotastically
    3. carry out visualization and analysis steps, e.g., trace characters, ancestral states, etc
  • an R workflow implemented as an R script. Use LES data.
    1. Load LES data frame
    2. Randomly pick 25 species from data frame
    3. Carry out pre-designed character analysis steps on the data
    4. Create a graph of results to display

PDF-to-tree (priority level 3)

Synopsis Obtain a tree for the species referenced in a PDF, e.g., a scientific article.

Rationale Just because we can.

Preconditions one of

  • a PDF document that refers to multiple species
  • a literature resource that includes or links to PDFs (Mendeley, EndNote, etc)

Steps

  1. user chooses from
    • user file
    • pre-selected PDFs to show different topology services or back-end trees
      • plant article (process using phylomatic web api)
      • mammal article (use Rutger's demo)
  2. user selects "process" ==> app responds by
    1. Obtaining phylogeny for species phylotastically
    2. Displaying phylogeny along with notes about how we got this tree phylotastically

Approach

  • name ideas: PaperTree
  • Names can be parsed out with gnrd

Notes from other participants

  • comparative analysis
    • use nexml rather than nexus
  • people like
    • pdf to tree
    • phylotaxic
    • webapp
  • Suggestions
    • QR codes + phylotaxic + mobile app ==> way for citizens to see on-the-fly phylogeny for a group of organisms