TNRS - Name Cleaner
Background: Arlin pitches an idea for a simple tool/app that the Tree Annotation group use to overcome a problem they ran into as they were getting trees and trying to input them.
Name Cleaner (mr-naims) is a tool for generating a report (CSV) on the validity of species names in a document. It is written in python and designed to be omnivorous about the types of files on which it can operate. This is aided in large part by the [Global Names Discovery Service] API, which accepts PDFs, Office documents, images, or plain text. DendroPy is used to read trees in Newick and NeXML formats.
Source code and tool documentation are on Github: [mr-naims]
Usage: usage: simple.py [options] file-input or simple.py [options] --file file-input Options: -h, --help show this help message and exit -f FILE, --file=FILE the file, FILE, read from... -s, --skip-gnrd Do not lookup names at GNRD. Only valid for a text file or newick tree -n, --newick The file is a newick tree -x, --nexml The file is NeXML --source=LIMIT_SOURCE Limit taxosaurus to a single source: [MSW3|iPlant|NCBI] --match-threshold=MATCH_SCORE_THRESHOLD the matching score threshold to use, defined as a decimal, all matches equal to or greater will be replaced. The default is 0.9
Milestones from Day 1 (Tue):
- Read txt file as list of names, call Taxosaurus for cleaning [milestone].
Milestones from Day 2 (Wed):
- Accept minimum score, only replace if match exceeds minimum score [milestone]
- Reading PDF Input and extracting names using GNRD API
- Initial reporting output (CSV)
Milestones from Day 3 (Thu):
- Fix UTF-8 issues
- Catch additional stats from GNRD: occurrence count and location in document
- Allow limiting to a specific source [milestone]
- Read Newick tree files [milestone]
- Investigational NeXML reading via DendroPy [milestone]
Milestones from Day 4 (Fri):
- Integrated NeXML reading/writing into simple.py [milestone]
- Allow skipping of GNRD name lookups [milestone]
- Reporting output (CSV)
Background: Nirav Merchant mentioned a student project at iPlant trying to create a widget that would help suggest scientific names or provide name resolution within a rich user interface.