Sunday, 26 January 2014

A critique of the [adjective][species] survey methodology

The [adjective][species] (AJ) surveys and their results and various analyses have been getting quite a bit of attention lately, and I wanted to draw attention to the fundamental flaws in the methodology of the analysis of the data they collect.

I'll start with a slight disclaimer about this critique, I'm not a social scientist, but I am a Zoologist, or more specifically an Ethologist, and, as such, a core part of my work involves collecting data on the behaviour of animals and then analysing it statistically.

For this critique I'll be specifically focusing on their article on the furry fandom and re-evaluating one's sexual orientation as an example of the site's methodological failings that are so fundamental that I'd be surprised if any of them have any training in basic statistics or the scientific method.

So, where to begin...

A pretty visualisation does not analysis make

One of the major problems with this article is that it conflates making a pretty graph with analysing data. The writer uses the graph to assert a trend which indicates the hypothesis to be true; that the furry fandom does lead people to re-assess their sexual orientation. But there are no analytical statistics to back up this assertion, and there is no test (such as a chi square) to prove that this distribution did not occur due to random variation.

Some data points on the distribution make it obvious that you can't take anything for granted with this graph without a significance test, as I've highlighted in the image below, and these are just the worst offenders:

The number of heterosexuals can increase by about half in 3 years across the distribution (years 8-11) but apparently this isn't worthy of discussion. The number of pansexuals can more than double in a 1 year span and then drop off but this isn't worthy of discussion. There has been no critical analysis of the reliability of the dataset anywhere in the article. There has been no actual analytical test to check the statistical significance of the dataset anywhere in the article. This is a very basic principle of using any dataset like this.

Without testing that the data distribution did not occur by chance, why would I accept your hypothesis? For a scientist to accept your hypothesis, you need a 0.05 probability or less (less than 5%) that the data was acquired by chance alone. Simply looking at the distribution, I'd posit that the data would likely fail this test.

Even if you proved that the distribution is not random, you would then still have to establish a correlation coefficient, etc. with other tests.

Look, ma! No hands control!

For any scientific test, you really need a control sample. Re-evaluation of one's sexuality occurs all the time in the general population; running something like a Mann-Whitney test against a data sample from the general population is basically essential to establish that it differs at all from other populations.

Lack of alternative hypotheses

They present no alternative hypotheses that don't involve the fandom. There are a couple of obvious ones:

  • Furries who have been in the fandom for a shorter period of time are by definition going to be younger on average than those who have been in the fandom for longer; it's established that many furries first join the fandom in their teenage years. The author's own hypothesis asserts that the fandom doesn't "turn" people gay or bi, just makes them re-evaluate their sexuality. If these younger furries are in their teens, it stands to reason that they may re-evaulate their sexuality, anyway.
  • Since this is not a longitudinal study but based on a question asking someone to report how long they've been in the fandom for, it is impossible to determine whether they have re-evaluated their sexuality during their time in the fandom based on this data, or whether the growth and greater attention the fandom has received recently has attracted a greater proportion of heterosexuals to the fandom among younger cohorts.

Spurious claims

"The trend is almost certainly starker than the chart shows."
Says what statistical test?

"It’s safe to conclude that more than half of the heterosexual furries coming into the community will change their sexual preference."
Even if you were to prove that this is not a random distribution and establish a correlation coefficient, correlation does not imply causality, as I've demonstrated with my above alternative hypotheses.


This analysis is based entirely on a single data visualisation with no statistical testing whatsoever. The conclusions reached by the author of the article are completely pseudo-scientific, and even assuming, for the sake of argument, the data is non-random, and has a significant correlation coefficient, the author makes a spurious conclusion that correlation implies causality.