The things we would like to know, and how they might be measured. We would need to have a baseline for each one, based on current BioStars activity in all domains.
- Level of activity
- number of new evolution-related questions (responses, answers) in the past month
- up-votes and responses to existing evolution-related questions in the past month
- Responsiveness and effectiveness
- how many questions get an answer within a week? a response?
- what is the distribution of waiting times for answers? responses?
- how many users respond to (view) evolution-related questions?
- what is the average expertise of users in this community?
- what is the apparent male:female ratio?
- how many different people view evolution-related content in a month?
Caveat: the seasonality of academia-related efforts. Prior knowledge is that there are some really strange patterns, e.g., spike of activity just before winter break. If we do "before" in August, we really should wait until next August to do the "after".
Methods and APIs
- You can get a picture of activity by clicking on the tag cloud. Activity can be sorted by votes. For example: http://www.biostars.org/show/tag/phylogenentics/?sort=votes&since=all%20time
- Everything can be measured retroactively, and there are API's to query site history. For example: a snapshot of the site 100 days ago can be obtained with http://www.biostars.org/api/stats/100/
reported number of results based on August 20, 2013 search
- phylostars 0
- evolution 760
- ortholog* 1790
- paralog* 186
phylogenetics - interestingly, most references to "distance matrix" and "mega" are the phylogenetics meanings
- phylogen* 2920
- Phylip 241
- distance matrix 221
- MEGA 137
- MrBayes 124
- PAUP 70
- BEAST 68
- phylogram 36
- neighbor joining 28
- cladogram 4
- population genetics 260
- "dN/dS" 175
- tajima's D 78
- "nucleotide diversity" 22
- coalescent 19
- McDonald-Kreitman 1
- "site frequency spectrum" 1