Main Page

From Evolutionary Interoperability and Outreach
Revision as of 11:36, 28 June 2010 by Hilmar (talk | contribs) (→‎What's happening)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What's happening

Participants in EvoIO activities are preparing for the iEvoBio satellite conference at Evolution 2010 in Portland, OR. These include

  • Rutger Vos is preparing a talk on TreeBase2 (Rutger Vos, Hilmar Lapp, Bill Piel, Val Tannen)
  • Brandon Chisham will present "CDAO-Store: A New Vision for Data Integration" (Brandon Chisham, Trung Le, Enrico Pontelli, Tran Son, Ben Wright)
  • Arlin Stoltzfus will present "EvoIO: Community-driven standards for sustainable interoperability" (Arlin Stoltzfus, Nico Cellinese, Karen Cranston, Hilmar Lapp, Sheldon McKay, Enrico Pontelli, Rutger Vos)

The EvoIO group staged a successful Phyloinformatics VoCamp November 7-11, 2009 in Montpellier, France, co-located with the annual meeting of the International Biodiversity Information Standards Organization (TDWG). A VoCamp is a hands-on meeting for investigators to create and develop ontologies and lightweight vocabularies in support of data integration and re-use-- in this case, the integration and re-use of phylogenetic trees and associated data and metadata. More information at VoCamp1.

TolWeb2 ?

Based on some meetings in the winter, we developed a whitepaper calling for a meeting of ToLWeb stakeholders (contributors, research users, educational users, linked data providers) to develop a vision for ToLWeb2. This meeting would be followed by a process to develop a concrete plan and a proposal for funding.

The EvoIO INTEROP project

Background

Over several years a variety of people, including NESCent's informatics staff, NESCent's Evolutionary Informatics working group, and the participants in the recent Evolutionary Database Interoperability hackathon laid the foundation that put us in a position to apply to the NSF INTEROP program. This program provides up to 250 K per year to support a data interoperability network. The network should be multidisciplinary; the network proposal should have a community aspect and a technology aspect. The deadline for this program in 2009 was July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

In light of this, we developed a proposal for a data interoperability network focused on trees and associated data and metadata. Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (NeXML, CDAO, PhyloWS) as a technological nucleus for growing an Interop network. The proposal, the project summary and description of which are given below, is currently (as of Oct 2009) under review.

The EvoIO NSF Interop Proposal

Project Summary

INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)

PI: Arlin Stoltzfus, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: Karen Cranston, EOL and Field Museum of Natural History; Enrico Pontelli, New Mexico State University, Computer Science (CDAO); Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), Hilmar Lapp, NESCent (PhyloWS, BioSQL); Nico Cellinese, University of Florida, Florida Museum of Natural History (TOLKIN, RegNum).

Components of the proposed EvoIO interoperability network and their interactions

Intellectual Merit. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and provide a framework for rigorous methods of comparative analysis used throughout the biosciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own” small data set. The great mass of currently available data makes possible far-reaching and systematic analyses, but only if trees (and associated data and metadata) can be accessed, searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated solution to this problem requires attention to the syntax and semantics of data, metadata, and services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NESCent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web services standard). Recently, the group staged a “hackathon” that engaged a fresh group of researcher-programmers (chosen to represent community data resources) to learn, apply, and extend the EvoIO Stack, with results that show the remarkable promise of this approach to train early-career scientists, disseminate standards, and improve interoperability. The investigators will build on this approach and on their unique technology and experience to engage a larger community in improving interoperability of trees with associated data and metadata (e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize hackathons, hold training workshops, host working groups, and implement infrastructure for community-building around emerging standards. Network staff will provide technical expertise in knowledge representation and bioinformatics, working to support standards and to build reference implementations. The resulting EvoIO community will extend broadly into systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into key areas of community ecology, phylogenetic epidemiology and paleobiology.

Broader impacts. The research areas affected by this proposal— all those areas in which phylogenetic trees are used routinely— are diverse and currently are not unified by professional organizations, software platforms, or standards. By bringing together scientists from various disciplines, we will develop awareness of the need for standards, cohesion around preferred approaches to interoperability, and ultimately a broad consensus on specific standards. This will be accomplished by building on the momentum of work done under prior NSF funding via NESCent. The key to developing a cohesive community in the absence of pre- existing cohesion is the hackathon mechanism, which generates success stories and arms young researcher-programmers with the know-how to create further successes. Through this mechanism, user requirements will be translated into standards and specifications, and implemented in community software tools. Reference Implementations (developed concurrently with standards and specifications) will be used to aid in standards development and training. Hackathons will take place in eastern, western, and central locations to maximize diversity in impact, and will include strategically selected participants as well as a large fraction of participants chosen in response to a broad solicitation in the biodiversity, systematics, genomics, and phylogenetics communities. Standards and specifications developed by the Network will be disseminated via the relevant international standards group (the TDWG Phylogenetics Standard Interest Group). Efforts will be made to integrate ideas from this project into existing educational and outreach programs, with particular focus on involving students from NMSU (a minority-serving institution).

Project Description

The Project Description is available as a PDF.

MediaWiki Help if you need it