- 1 Overall Plan
- 1.1 Sunday, Nov 8
- 1.1.1 The Day's Plan
- 1.1.2 High Level Topics
- 1.1.3 Data Sets Useful for Each Broad Use Case
- 1.1.4 Use Case Refinement
- 1.1.5 Miscellaneous Observations
- 1.1.6 Next Steps
- 1.2 Tuesday
- 1.3 Wednesday
- 1.1 Sunday, Nov 8
- 2 Concluding Comments
- A set of questions involving inference using phylogenies and related data
- Details of how an ontology should look to support those inferences
- Description of current representation systems and what inferences they support
Sunday, Nov 8
The Day's Plan
- Come up with high level topics
- Develop one or two very defined questions that could use ontologies for inference
- Develop one use case using inference that might be implemented now
High Level Topics
Identification Use Case
- I saw a black bird in Montpellier in November, what is it?
- The bird I saw looks like a bird I know, does that help?
Ecosystem shift Use Case
- Is there a connection between ice cover reduction and a change in fish population in the Barents Sea?
- The fish population in the Barents Sea is changing, infer some reasons why that might be.
Comparative anatomy Use Case
- When cancers stop responding to hormone therapy, things get worse quickly. When bears hibernate, they take a "hormone vacation".
Some bears hibernate, some, like the sun bear, don't. Compare humans, hibernating bears and non-hibernating bears to see if we can isolate cellular or genetic differences that account for the ability to hibernate.
Data Sets Useful for Each Broad Use Case
- biogeography over time, including range, habitat and occurances
- ice cover over time
- fish phylogeny
- fish taxonomy
- fishery data, fish landings, trawls
- fish traits to possibly infer food webs
Comparative Anatomy Use Case
- pathology information for humans and bears
- tissue and endocrin interaction model
- genomes for species
- difference representation
- bear phylogeny, check whether some sloth bears hibernate and others don't
Use Case Refinement
- Come up with specific questions in each use case
- Narrow them to something we *can* make inferences about
- Outline what the ontologies would have to look like to support those inferences
Given our access to data, we're moving from fish to mammals. Here's the use case. Given a temperature gradient over a large transect, generate phylogenetic trees using observations of species at various points along the transect. From these trees, gather traits and also infer new species that may be of interest. Compare the resulting traits at different points along the transect to infer what sorts of change might be occurring between measurement points.
This use case uses the following data sets, between data sets is a line signifying that a mapping from one type of data to another will be necessary.
temperature - geography (transects) - observations (range maps) - taxonomy - phylogeny - traits
Much talk about confidence values for such things as traits and identifications.
Interest in ranking results
Using phylogeny to study community composition
Ideas about what kind of phylogeny tree metadata should be returned - MarkS
Do we need alternative forms of reasoning?
- non monotonic
- The real next step is to figure out what the next steps should be. Here are some ideas:
- Map out use case more thoroughly
- Identify points of connection between a subset of the ontologies
- Mock something up, maybe in protege
Established the workflow
observations (species name and location)
phylogeny (nodes, ancestors, descendants)
need to calculate most recent ancestor via some external process from that
descendants of most recent ancestor
traits in an inheritance hierarchy
The ontology links the data sets. Instances in the data sets are linked via properties. Many of the properties have inverses which may be used in reasoning.
Hacked out ontology
Here it is: File:Vocamp taxo reasoning.owl.txt
Discussed the kinds of data we needed. Struggled trying to get those data. Just starting making things up, resulting in progress.
Encountered a challenge trying to determine the most recent ancestor using DL. The working hypothesis is that the open world hypothesis is making it difficult. Further research necessary. A bit of investigation showed the Jena has a special getLCA() function that could be used to find the most recent ancestor. Alternatively, we could use a workflow system like Kepler to pass the output of the first part of our reasoning to another process that determined the most recent ancestor and passed it back to our ontology for further processing.
The phylogeny is not a subsumption hierarchy. Instead nodes are instances of a node class. Each node can have a name. Nodes are linked via hasAncestor/hasDescendant properties.
Other data sets, such as the trait data set, do have an inheritance hierarchy, which we leverage to provide interesting inferences (see below).
Having the nodes as instances permits us to link the different types of data directly via properties. We did not try to do this by using classes to represent phylogenetic nodes, it would be interesting to try.
- From location determine species present there
- We can then determine the traits of those species
- Have to leave system to get the most recent ancestor of that set of species
- Once we have that we can determine all descendants
- Then we can determine the traits of those descendants
What's nice is that we attach traits to species and the final results (traits of observed species and of the clade) are automatically determined
We're relating three different data sets:
- traits of species in a subsumption hierarchy
- observations of species with location and name
- phylogenetic relationships amongst the species
Look at other types of reasoning
Reasoning through inverse and subsumption relations
We checked how easy it would be do a new kind of inference by simply adding a new relation and saying it's the inverse of a relation we already used. And it was EASY!
Tell me the locations where we know we have species for which we have body shape data.
hasResident some (hasTrait some body_shape)
Challenge: What about specimens?
In our representation, species are represented as instances. What if we wanted to have specimens as the instances? So then, capybara could have both small_size and big_size if we saw examples of both
Is there a way to do both?
Computer science folks and biologists concur that working together as a group was mutally beneficial. Participants agreed that jointly working out a use case, even though not entirely practical, was extremely useful in clarifying both the data discovery and integration needs of domain scientists, and the capabilities (and limitations) of semantics-assisted approaches. Having the advantage of an experienced knowledge modeler and Protege user, and spending the time together working on our Use case had tremendous bi-directional heuristic value.
Observations of our Ontology Development Process
We didn't really have an idea how to go about developing the ontology. We did a fair amount of work determining use cases that we thought we had data for, interest in, and would exercise the area of our interest. At some point it became clear that progress would best be made by sitting down around a keyboard and implementing a prototype.
Here is a list of the things that we needed to clarify and would have been nice to know before we tried doing anything.
- what is inference?
- why are we using semantic web technologies?
- what are some good ways to go about designing ontologies?
We probably should have STARTED with a reasoning mini-bootcamp. We got to that about 1/2 way through the second day.
We iteratively added knowledge to the ontology and generated questions that would require additional knowledge. The cycle of building the knowledge base and then determining the queries that the base could and should answer was only possible because of the collaboration between the biologists and computer scientists in the group.
In conclusion, the biologists learned a lot about how to use informatics tools. Anne will go home and try to use them by herself on a small scale and later expand and get other biologists interested in using informatics tools. Dave T will go home and work on a grant proposal revolving around the integration of the data set types we attempted to integrate here.
The primary accomplishments of the group were:
- A list of grand challenges within the biological domain that could benefit from applying reasoning techniques in data integration and data discovery
- A small set of use cases within those grand challenges
- A working ontology that integrated many types of data sets and used reasoning to answer domain-relevant queries
- A slogan: don't romanticize, semanticize!
Data discovery: How do you find the data sets? How can you plug the data sets in? Placement of the ontology within a workflow for the parts of the application that cannot be satisfied via description logic representations and reasoning. How to settle on the best possible model, given that there are many ways to model anything.