User:Shannon.lynn.oliver

From Evolutionary Interoperability and Outreach
Jump to: navigation, search
Error creating thumbnail: Unable to save thumbnail to destination


Architastic Team

Karen Cranston, NESCent
Cody Hinchliff, Washington State University
Mark Holder, University of Kansas
Naim Matasci, iPlant Collaborative
Ben Morris, University of North Carolina
Shannon Oliver, iPlant Collaborative
Rutger Vos, NCB Naturalis
Derrick Zwikci, University of Arizona

Overview

Our architecture is based on the Model-View-Controller design pattern and takes the following modules into consideration:

Phylotastic controller

The controller is the "central element" of the Phylotastic workflow. It performs many core functions including:

  • coordinating communication between TNRS, TreeStore, PhyloGeoTastic, and DateLife.
  • communicating with and accepting queries from clients like [PhyloGeoTastic].
  • implementing an interface for user interaction (disambiguation) with the results from the TNRS (or internal logic to auto-disambiguate, if possible).
  • possibly being involved/requiring translation among formats (at the very least, for final output, conversion of trees to the desired format).

Controller implementation

Available within the Phylotastic/Architastic repository here.

Controller behavior

File:To be updated.png

Prior art

Standards Compliance

  • For inter-component queries, use the PhyloWS specification.
  • Use CQL and the OBO ontology (CDAO).
  • This may seem counter-intuitive for opentree queries, but there does not seem to be anything to specify how to break ties among conflicting source trees.
    • You do need a pruning/grafting query specification.
    • From the PhyloWS specification page, this task is not yet specified:

Task: Project tree to subtree induced by a set of nodes
- Input: specifications of nodes, such as labels and identifiers, that induce a subtree.
- Output: the subtree induced by the specified nodes with all other nodes pruned.

  • Use the CQL /phylows/tree/<identifier>/jquery=<CQL query>]
  • A potential URI is /phylows/tree/subtree?query=pt.identifier.tree=<tree URI> and pt.taxaForSubtree=<taxon JSON>.
    • where:
      • identifier is a valid and unique identifier of the tree of which a subtree is to be returned.
      • specieslist specifies that only these species (nodeIDs) should be included in the returned clade. Note: this is a novel parameter not within the existing specifications.
      • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, you will receive an error.

Composing and processing CQL on server side

In the naive case, HTML form parameters are all contained in a query string as key/value pairs separated by ampersnads.

However, this does not work for this project. For CQL, a query needs to be composed with different search term predicates and values composed into larger syntax trees using boolean nodes (don not use "foo=bar&baz=quux", use "foo=bar" and "baz=quux").

A sample client side web page is available here that takes HTML form values and turns them into CQL. Note that the search values, URIs and JSON, are both quoted and URL-encoded.

Subsequently, on the server side, your framework will probably perform the URL decoding.

The following is what you need to do programmatically:

  • First, parse the CQL.
  • Lastly, parse the JSON.

The workflow will perform the TNRS resolution and then the TreeStore modify that clean list for its own purposes, such as picking the preferred taxon URIs from 0 or more taxonomies for each dirty label, and decorated those names with its own internal IDs.

The pass JSON should look similar to this:

[
	{ "treestoreId" : 14 },
	{ "treestoreId" : 16 },
	{ "treestoreId" : 17 },
	{ "treestoreId" : 20 },
	{ "treestoreId" : 37 },
	{ "treestoreId" : 43 },
	{ "treestoreId" : 46 },
	{ "treestoreId" : 51 },
	{ "treestoreId" : 57 },
	{ "treestoreId" : 77 },
	{ "treestoreId" : 79 },
	{ "treestoreId" : 86 },
	{ "treestoreId" : 114 },
	{ "treestoreId" : 115 },
	{ "treestoreId" : 122 },
	{ "treestoreId" : 123 },
	{ "treestoreId" : 124 },
	{ "treestoreId" : 125 },
	{ "treestoreId" : 126 },
	{ "treestoreId" : 127 },
	{ "treestoreId" : 128 },
	{ "treestoreId" : 130 },
	{ "treestoreId" : 133 },
	{ "treestoreId" : 134 },
	{ "treestoreId" : 136 },
	{ "treestoreId" : 138 },
	{ "treestoreId" : 139 },
	{ "treestoreId" : 150 },
	{ "treestoreId" : 198 },
	{ "treestoreId" : 203 },
	{ "treestoreId" : 204 },
	{ "treestoreId" : 216 },
	{ "treestoreId" : 219 },
	{ "treestoreId" : 221 },
	{ "treestoreId" : 222 },
	{ "treestoreId" : 223 },
	{ "treestoreId" : 225 },
	{ "treestoreId" : 226 },
	{ "treestoreId" : 227 },
	{ "treestoreId" : 233 },
	{ "treestoreId" : 234 },
	{ "treestoreId" : 235 },
	{ "treestoreId" : 236 },
	{ "treestoreId" : 237 },
	{ "treestoreId" : 239 },
	{ "treestoreId" : 240 },
	{ "treestoreId" : 241 },
	{ "treestoreId" : 245 },
	{ "treestoreId" : 248 },
	{ "treestoreId" : 249 },
	{ "treestoreId" : 251 },
	{ "treestoreId" : 260 },
	{ "treestoreId" : 267 },
	{ "treestoreId" : 269 },
	{ "treestoreId" : 295 },
	{ "treestoreId" : 298 },
	{ "treestoreId" : 299 },
	{ "treestoreId" : 304 },
	{ "treestoreId" : 305 },
	{ "treestoreId" : 328 },
	{ "treestoreId" : 329 },
	{ "treestoreId" : 336 },
	{ "treestoreId" : 343 },
	{ "treestoreId" : 361 },
	{ "treestoreId" : 373 },
	{ "treestoreId" : 374 },
	{ "treestoreId" : 380 },
	{ "treestoreId" : 403 },
	{ "treestoreId" : 418 },
	{ "treestoreId" : 426 },
	{ "treestoreId" : 428 },
	{ "treestoreId" : 431 },
	{ "treestoreId" : 436 },
	{ "treestoreId" : 439 },
	{ "treestoreId" : 472 },
	{ "treestoreId" : 475 },
	{ "treestoreId" : 476 },
	{ "treestoreId" : 490 },
	{ "treestoreId" : 491 },
	{ "treestoreId" : 498 },
	{ "treestoreId" : 514 },
	{ "treestoreId" : 516 },
	{ "treestoreId" : 524 },
	{ "treestoreId" : 525 }
]

Workflow

The architecture diagram below presents the interaction between the various modules. Each module will be treated as a "black box" in this architecture and the only elements we are concerned about here will be the interoperation between the modules.

These should fuel the input/output specifications for each module.

Error creating thumbnail: Unable to save thumbnail to destination

Workflow Steps: (see example)

  1. Input a list of names.
  2. The controller will query TNRS with the list of names.
  3. TNRS will then provide a token for the location of results.
  4. Controller will get the TNRS names.
  5. Controller then matches TNRS results against the list of names and URIs provided by the TreeStore.
  6. Controller passes a list of URIs to the TreeStore.
  7. TreeStore will prune to a subtree.
  8. A tree will be returned containing a map of input names to resolved names.

Phylotastic Sample Workflow

See the Workflow page for a workflow discussion and a basic workflow script (used from command line or as web service).

API

TNRS

See the TNRS API page.

Querying the TreeStore

The TreeStore must implement the following queries:

  • Namedump
    • Contains an exhaustive list of names, each with provenance info, and a URI specific to the treestore (e.g. internal UID).
    • Uses the format defined in the gist.
    • This will be a large file, so it is recommend to generate the file in intervals and be made available in a repository.
  • Namedump version
    • This provides the current version of the namedump.
  • Namedump diff
    • Given a previous namedump version, give a namedump consisting of ONLY names whose metadata has changed, such as new external source IDs, and new names since that version.

A preliminary opentree namedump is available here.

This information does not need to be accessible through direct interaction with the database itself.

Each versioned namedump will consist of two files which may be accessed via other mechanisms, such as cgi scripts, in order to answer the queries defined above. This will avoid putting additional computational overhead on the database itself, which may be considerable in the case of diffs.

At a minimum, each treestore must provide two files for each namedump:

  1. The namedump file, including all metadata (see https://gist.github.com/4675090).
  2. The metadata, which should correspond to "metadata" and "externalSources" items from the JSON defined in the gist.

Subtree queries for OpenTree taxonomy:
Try it:

curl -X POST http://opentree-dev.bio.ku.edu:7474/db/data/ext/GetJsons/graphdb/subtree -H "Content-Type: Application/json" -d '{"query":"pt.taxaForSubtree=Python,Homo,Malus, Bacillus, Buteo, Agaricus campestris ,Pinus sylvestris"}'

Example

Request:


Results:



Output:


TNRS

The taxosaurus TNRS now supports the optional parameters "source" and "code" to allow a finer selection of the taxonomic sources.

Both can be specified and, in the cases of conflicts, the "code" takes precedence.

For example, the iPlant_TNRS (source=iPlant_TNRS) contains only plant names (code=ICN). If a request is made specifying iPlant_TNRS as a source but requiring a zoologic code (e.g. http://taxosaurus.org/submit?query=Buteo+swainsoni%0ACircus+cyaneus%0AHaliaeetus+leucocephalus%0APandion+haliaetus&source=iPlant_TNRS&code=ICZN), taxosaurus will ignore the "source" parameter and return matches from zoological sources.

Common Interface Parameters

Term Value(s) Namespace Example Description
format newick, nexus, json, nexml, rdf TBD (maybe dublin core) format=json The output format of the service. Alternatively, use content negotiation.
identifier.taxon URI TBD identifier.taxon=http://www.ncbi.nlm.nih.gov/taxonomy/9606 Input taxon identifier, value is a (urlencoded) URI into a taxonomic data resource.
pt.identifier.tree URI (pt=PhyloTastic) pt.identifier.tree=<uri> Input tree identifier.
pt.taxaForSubtree JSON (pt=PhyloTastic) pt.taxaForSubtree=<json> {need}