Phylotastic/shiny

From Evolutionary Interoperability and Outreach
Jump to: navigation, search

Shiny.png


Members

Quick links

Overview

In the day 1 discussion, Mark Holder suggested it was important to make something "shiny" to showcase phylotastic capabilities. This led to the name of the "shiny" group, which aims to develop a small set of well documented, user-oriented demonstration projects to show what phylotastic can do.

Progress reports

Day 2. We developed a list of 5 targets, got feedback from other participants, and then prioritized the list. Our plan is to implement them in order of priority. One of the sticking points in our discussion was that some members thought that our task was to build a multi-capable web front end, while others thought that our task was to build several stand-alone demos targeted at specific use-cases. We compromised by deciding to build several separate demos, one of them being a multi-capable web front end, which was given a high priority. Some other decisions and conclusions:

  • Christian is excited about getting to work on reconcili-o-tastic
  • Chris and Meg agree on a web-based implementation using Python and JavaScript, and both of them have experience doing this.

Day 3. Today we did the following

  • installed web server framework web2py
  • set up web home for phyloshiny
  • made mock-ups of some web interfaces
  • designed and implemented most of the parts for reconcili-o-tastic (source code is on google code)
    • see Christian's demo for getting species and gene name from input gene tree
  • developed precise test files and data inputs for interface
    • names in tol and mammal trees
    • test case for reconciliation using phylotastic tree via Rutger's service

Day 4. Today we did the following

  • developed Java to preprocess gene trees (obtain species names from EBI, delete nodes for which a species cannot be established) see: gene tree preprocessing download
  • Did a lot of web2py backend stuff getting reconciliotastic to work
  • worked more on test cases based on real data

Day 5. Today we did the following

Plan of action

  • (done) Develop a list of targets. Each target is a shiny demo that showcases something phylotastic
  • (done) Solicit feedback & additional ideas from hackathon participants
  • (done) Prioritize the targets considering feasibility and impact.
  • Implement targets in order of priority
    • (1) Reconcili-o-tastic
      • (done, needs to be connected to other services once they are ready) web interface
    • (2) Phylotastic Web Front-end
      • Mock-ups started
    • (2) Phylo-Taxic
    • (3) Character Analysis Workflow Integration
    • (3) Tree from PDF

Prioritized targets

Reconcili-o-tastic (priority level 1)

Synopsis Starting with only a gene tree, carry out gene-tree-species-tree reconciliation.

Rationale This is directly useful to molecular evolution researchers. Genome annotation, in particular functional annotation, depends on distinguishing orthology from paralogy, and this is best done via phylogenetic methods of tree reconciliation (although ad hoc approaches also are possible). Because of its association with genome annotation and genome analysis pipelines, reconciliation is potentially a high-volume use-case for phylotastic services. Current reconciliation approaches assume that the user will supply a species tree. According to Christian, this is a significant barrier-- first the user must figure out which species are implicated by the gene (protein) tree, then the user must find a tree for those species. To our knowledge, there is no current tool to automatically get the species phylogeny to use in reconciliation. However, in the case of the EnsemblCompara pipeline [1] and possibly some other reconciliation packages, a species "tree" may be obtained using a tool that prunes a topology from the NCBI taxonomy hierarchy.

Preconditions

  • A gene tree

Steps

  1. User upload gene tree or selects from list ==> demo tool responds by
    1. uploading and consuming gene tree
    2. displaying gene tree
    3. parsing out accession numbers
    4. querying a DB service to get species names for accession numbers
    5. querying phylotastic server to get species tree
    6. (displaying species tree) not implemented
    7. reconciling gene tree and species tree
    8. displaying reconciled tree

Approach

  • Sample data are in the sample_data directory on github (see also example input gene tree)
  • Use NCBI e-utils to go from GI (Accn) to taxid to species name
  • use Rutger's phylotastic demo server
  • Do reconciliation in back-end using Christian's "SDI" Java code
  • Display reconciled tree using embedded Archaeopteryx (Archaeopteryx applet manual)

Generalized web interface (priority level 2)

Synopsis Provide users with a flexible way to access phylotastic services and integrate the results with available species data.

Rationale This demo does not target a particular use-case, but attempts to display flexibility. There is nothing similar to it presently.

Steps

  • input species lists
    • user types names with auto-completion
    • pre-determined lists (e.g., pet species, scary things, bugs that live in your gut)
  • user chooses metadata (lists or checkboxes)
    • pics?
    • genome availability from NCBI - Eutils Ebot script which returns a UID File:Genome mining.txt
    • gene marker availability from NCBI (e.g., cytC, cytOX, 16S rDNA, etc)
  • user chooses output formats
  • user selects "get tree" ==> app responds by
    1. Obtaining phylogeny and metadata
    2. Displaying phylogeny
    3. displaying notes about how we got this tree phylotastically

Approach

  • lists of names are in the github sample_data directory -- we can match against these to make sure that the user's species list can be satisfied.

Phylo-Taxic (priority level 2)

Synopsis From a user-supplied taxon name, provide a tree for subset of species.

Rationale This is directed at relatively non-demanding users who just want a phylogeny to represent the diversity of a higher-level taxon, e.g., for the purpose of making a wikipedia page. ToLWeb provides taxon pages but they typically do not show a complete tree, only a few levels of a taxon hierarchy.

Preconditions

  • taxon name, possibly common name

Steps

  • user inputs name (Rodentia, Lagomorpha)
  • resolve that into a list of species, e.g., via NCBI taxonomy
  • sample from the list of species by user-selected criteria:
    • retain species with an available genome sequence
    • retain only 1 species per genus
    • retain species with most popular wikipedia pages
  • Obtain phylogeny for species phylotastically
  • Display phylogeny along with notes about how we got this tree phylotastically

Approach We might be able to make use of some stuff that Jim Balhoff has produced, which includes the NCBI taxonomy. Jim writes:

"I'm finally getting around to sending a link to my RDF-based tree pruner. I loaded the OBO foundry version (already RDF) of the NCBI taxonomy into the Virtuoso triplestore, then wrote a SADI service which takes an RDF document containing a tree node and the required leaves. It returns an RDF document containing the actual tree.
The actual structure of the RDF needs some refinement, along with the class definitions I've used to describe the input and output for SADI, but I just wanted to make it run, first.
You can post the attached document to the service like so:
curl -X POST --header "Content-Type:application/rdf+xml" -d @sadi-input.rdf http://pkb.nescent.org/phylotastic/tree
The source is here:
https://github.com/balhoff/phylotastic-sadi

Facilitating phylogenetic comparative analysis (priority level 3)

Synopsis Obtain tree from species referenced in user-supplied character matrix, ideally in the context of an analysis environment familiar to users.

Rationale This demo reflects use-cases that culminate in a phylogenetic comparative analysis of the traits of species (e.g., on the usecases wiki, the fishy, Riek and Walls use-cases). It is similar to an analysis that Naim described in his talk about the iPlant discovery environment.

Preconditions

  • Input matrices:
    • user-supplied NEXUS files
    • pre-selected NEXUS files
    • samples from Leaf Economics Spectrum (LES) data of Wright, et al., 2004
    • MorphoBank

Steps

  • user loads file
  • user selects option to get tree ==> app responds by
    • obtaining phylogeny for species phylotastically
    • Displaying phylogeny along with notes about how we got this tree phylotastically
  • user continues with downstream tree-dependent analysis steps

Approach

  • Mesquite-o-tastic
    1. Load NEXUS into Mesquite
    2. select (new) menu item Taxa & Trees > Get Tree Phylotastically
    3. carry out visualization and analysis steps, e.g., trace characters, ancestral states, etc
  • an R workflow implemented as an R script. Use LES data.
    1. Load LES data frame
    2. Randomly pick 25 species from data frame
    3. Carry out pre-designed character analysis steps on the data
    4. Create a graph of results to display
Mesquite-o-tastic demo
  • Screencast at http://www.youtube.com/watch?v=Lak-zjwFuhQ&feature=youtube_gdata_player
    • go to morphobank
    • get study 291 here
    • open it in Mesquite
      • Delete all taxa but the first 5
      • change the first one to E. europaeus.
  • looking for other possible mammal data sets from MorphoBank without much luck: I'm finding that most of these data sets do not have fully resolved species as OTUs, e.g., project 246. They appear to be composite matrices where there is no resolution below the genus level. Sometimes the name is just the genus name, and sometims it is "Genus spp." In cases where there is "Genus species" this probably means they only had data for one species.
    • Project 417: Lehmann, T. 2009. Phylogeny and systematics of the Orycteropodidae (Mammalia, Tubulidentata). (Zoological Journal of the Linnean Society. 155 (3):649–702.) This has 12 taxa but the phylotastic query only gets 3 of them
    • Project 696: Macrini, T. E. 2012. Comparative morphology of the internal nasal skeleton of adult marsupials based on X-ray computed tomography (Bulletin of the American Museum of Natural History. 365:1-91.) this has 34 taxa, an excellent case, but it only lists the genus
    • bingo Project 216: Voss, R. S., and S. A. Jansa. 2009. Phylogenetic Relationships and Classification of Didelphid Marsupials, an Extant Radiation of New World Metatherian Mammals. Bulletin of the American Museum of Natural History. 322:1-177. 51 taxa and we can get most of them.
    • second best this one has 18 taxa, they are all bat species, and I can get most of them. Project 599: Velazco P.M. 2005. Morphological Phylogeny of the Bat Genus Platyrrhinus Saussure, 1860 (Chiroptera: Phyllostomidae) with the Description of Four New Species. Fieldiana, Zoology. 105:1-53.
  • revised screencast plan - updates: no longer need to delete taxa; found better source study; want to explain services better
    • morphbank-mesquite
      • to to morpbank, open study 216, select matrices, download nexus
      • open in Mesquite, get tree
      • trace chars
    • what's under the hood?
      • go to windows: log to see web services invocation; go there
      • MapReduce server examples
    • explain larger project
      • TNRS - extract out some missing names, see if TNRS tells us anything: positive example T. macrurus --> T. macrura
      • DateLife - try Caluromys_philander,Gracilinanus_agilis,Perameles_gunnii
      • Galaxy controller
PhyloWidgetastic

using Phylowidget's get-tree-by-URL feature

  • see the Phylotastic/Use_Cases for the use of PhyloWidget to display a phylogeny corresponding to that used by Riek, 2011. Here is the complete URL
http://phylotastic-wg.nescent.org/script/phylotastic.cgi?species=Bettongia_penicillata,Macropus_eugenii,Pseudocheirus_peregrinus,Phyllostomus_hastatus,Suricata_suricatta,Mustela_vison,Mephitis_mephitis,Felis_silvestris,Canis_lupus,Ursus_americanus,Ursus_arctos,Cystophora_cristata,Erignathus_barbatus,Halichoerus_grypus,Phoca_groenlandica,Callorhinus_ursinus,Arctocephalus_australis,Equus_caballus,Lama_glama,Camelus_dromedarius,Sus_scrofa,Ovis_aries,Bos_taurus,Capra_hircus,Capra_ibex,Oreamnos_americanus,Ovibos_moschatus,Cervus_elaphus,Alces_alces,Odocoileus_hemionus,Rangifer_tarandus,Cephalophus_monticola,Gazella_dorcas,Papio_hamadryas,Homo_sapiens,Rattus_norvegicus,Mus_musculus,Cavia_porcellus,Oryctolagus_cuniculus,Lepus_europaeus&format=newick&tree=mammals
  • Greg Jordan writes:
I've run into this URL problem before, and it has to do with the way PhyloWidget parses URL parameters. If you paste a URL with "&" symbols into the main text box ("PhyloWidget Quickstart") then it will be passed to the next page as a key-value URL parameter... and I didn't do any sort of smart escaping (d'oh!), so the "&" in Rutger's URL breaks the way the PhyloWidget javascript tries to parse the URL parameters.
A solution is to use the "PhyloWidget Full" applet, and to go File -> Load Tree ->Manual Input..., then paste the URL into the text box that shows up. The URL should be auto-recognized and the tree should load (at least it did for me when I tried just now...)

PDF-to-tree (priority level 3)

Synopsis Obtain a tree for the species referenced in a PDF, e.g., a scientific article.

Rationale Just because we can.

Preconditions one of

  • a PDF document that refers to multiple species
  • a literature resource that includes or links to PDFs (Mendeley, EndNote, etc)

Steps

  1. user chooses from
    • user file
    • pre-selected PDFs to show different topology services or back-end trees
      • plant article (process using phylomatic web api)
      • mammal article (use Rutger's demo)
  2. user selects "process" ==> app responds by
    1. Obtaining phylogeny for species phylotastically
    2. Displaying phylogeny along with notes about how we got this tree phylotastically

Approach

  • name ideas: PaperTree
  • Names can be parsed out with gnrd

Notes from other participants

  • comparative analysis
    • use nexml rather than nexus
  • people like
    • pdf to tree
    • phylotaxic
    • webapp
  • Suggestions
    • QR codes + phylotaxic + mobile app ==> way for citizens to see on-the-fly phylogeny for a group of organisms

#Phylotastic Blog/Twitter Coverage

Plan for workday July 31, 2012

Team Shiny - work day plan for July 31

Possible priorities

  • manuscript section - draft is ready, not a priority for today
  • reconcili-o-tastic - bugs, interface redesign, important (Chris & Arlin)
  • phylotastic.org - respond to feedback, improve text, demos (Meg, Arlin)
  • dynamic logo to illustrate phylotastic (Meg)

1. dynamic logo (Meg)

  • use frames so that this is easier to update or split (use in slides)
  • some of the ways that we can use dynamics to show steps:
    • the names float around and then attach themselves to a tree
    • the tips of the tree light up when a name matches
    • the branches connecting the highlighted tips also light up, showing the subtree
    • the framework tree fades away, revealing the subtree (shows pruning)
    • the subtree gets rescaled

2. Reconciliotastic (Chris B, Arlin, Chris Z)

  • fix github repo
  • fix the problem with the species tree file (bug in current demo)
  • remove the option for user upload
  • find another sample data set
  • implement choice of 2 sample data sets
  • changes to text (Arlin)
    • finish description of sample data sets
    • change "The Shiny Team" to "Credits"
    • change "Reconciliotastic" about box to "What this demo does" or something

3. web site (Meg for icons, Arlin for text)

  • "about" - ok
  • "demos"
    • reduce font of "More to come"
    • lead with general intro on why we have demos
    • rewrite blurbs for newbies
    • rewrite blurbs to identify audience ("do you want to see how to put together a workflow in galaxy? then use this demo")
    • put best demos at top
    • talk to Aaron about playground - Arlin has list
    • talk to Datelife
      • landing page should be "about", then go to web form
      • define your audience
  • "services"
    • change text icon to "web services (Meg)
    • add general statement
    • add TNRS, DateLife, MapReduce
    • add blurbs for these
    • reduce font of "More to come"
  • "wiki" - ok
  • "gitrepo" - change to "code" (Meg)