Any research data (survey or otherwise) should be better understood by people in tech.
While I was watching the #ZeitDay live feed, one of the speakers had a slide that showed all of the things that go into their development process:
Our seeming lack of ability to meaningfully interpret research data was still on my mind, and my immediate train of thought went something like this:
“This is a great list. Wow that’s a lot of things to know! Yay us. Wait, not yay us. How is it possible that we know how to do all of these things (including Accessibility, ahem) and still not understand how to interpret research results?”
Since developers tend to live in echo chambers (sometimes out of willful ignorance, other times because we have too much to do already- refer to the list above), it’s vital that we are able to think clearly when presented with survey data. Now, I’m not claiming to have this process down perfectly but I do have a grasp of the basics. My spouse is a research scientist (biomedical engineering), and we discuss this type of thing frequently.
Here are the steps I have found useful for the critical thinking process:
- Understand the context
- Understand the collection methods
- Understand the backers
- Understand why the research was even done in the first place
Understanding the context
Since a survey is only made up of the people who respond to it, let’s examine ways we could understand the context more completely. For the purposes of this guide, we will be examining a claim:
“The survey data says that 50% of people who use X do Y.”
Who took the survey?
Do you have a realistic number of the total number the survey was meant to reach, vs the number of folks who actually filled out the survey?
Let’s compare two (of many) scenarios that might apply to this claim:
Scenario A
- if we can approximate that 10,000 people use X
- we know 1,000 people took the survey
- the survey represents 10% of all users
- the claim (“50%”) represents 5% of all users who also did the survey
- Ask yourself: if 500 people (50% of 1000) answered this way, out of the 10,000 total, do I think this is a statement that has enough significance for me to care about?
Scenario B
- if we can approximate that 10,000 people use X
- we know that 8,000 people took the survey
- the survey represents 80% of all users
- the claim (“50%”) represents 40% of all users who also did the survey
- Ask yourself: if 4,000 people (50% of 8,000) answered this way, out of the 10,000 total, do I think this is a statement that has enough significance for me to care about?
However, this alone isn’t enough information to determine whether or not the data should be considered valid, because it *is* possible to get a small population that fairly represents the whole. So let’s look at some of the other variables.
Understanding the Collection Methods
- Research should be reproducible. If the collection methods are not available, disregard the study entirely.
- Who can you contact? There should be someone you can contact if you have concerns about the data or the research. If no one is listed, it shouldn’t be considered a viable study.
- How did they obtain survey respondents? This matters. Let’s consider our example claim, and think about places where survey respondents could have been obtained (such as coding bootcamps, social media, meetup groups, users of a specific website, etc.)
- Did they obtain a survey population that accurately represents the whole? Again, it’s possible to have a small sample of respondents that fairly represent the whole population, and it’s important to be up front about that.
- Did they try to determine some sort of data normalization? Or did they address it if not? For example, a group of engaged coding bootcamp participants are more likely to answer a technical survey than a group of developers in a large enterprise organization.
Understanding the backers
- Who funded this work? If you can’t figure that out, throw away the entire study. Scientific research is required to say who funded the study in the paper (it’s usually the last section of the paper, or in the footnote if it’s printed in a journal). It usually goes hand-in-hand with a conflict of interest statement. In general, if it’s the most amazing thing you’ve ever read, but you can tell that the person who funded the research will somehow profit from these results, then you should be careful when interpreting the results.
- Who benefits from this work? It’s easy to tweak any results to reflect what we want it to say.
- Who did the research? If the research was meant to be, say, a study about the community at large, but then only has committee members from a specific circle of that same community, assume implicit bias.
Understanding why
- Why was this research done in the first place? If it was a survey, what was the survey trying to determine? What outcomes or goals do they have? Is this research meant to only target an internal community, or a larger group of technologists?
- Are the reasons likely to produce a viable set of results? Consider some potential goals:
- We want to show that X is more popular than Y
- We want to demonstrate a new technology & someone had the idea that doing a survey and putting those results on a website would be a fun thing
- We do an annual survey
Concluding thoughts
These are my own reasons for writing this guide:
- I am disappointed with a few surveys that have gone around, such as this one and this one.
- A tweet
- Another tweet
- because this happens way too often
- because this happens way too infrequently
- because it’s important to keep learning and be part of a community who continues to learn
- because there will be a test
Here are some resources for more learning that I found interesting:
- Redefining statistical significance: https://psyarxiv.com/mky9j/?_ga=2.29887741.370827084.1500902659-399963933.1500902659
- The Vox article about it: https://www.vox.com/science-and-health/2017/7/31/16021654/p-values-statistical-significance-redefine-0005
- John Oliver- realistic-but-funny take on Scientific Studies: https://www.youtube.com/watch?v=0Rnq1NpHdmw
- Test your implicit biases: https://implicit.harvard.edu/implicit/takeatest.html
We could do better in this area, and we can! Consider this: it may take a person less time getting up to speed on how to better interpret research results than it would take to figure out how to use that shiny new JS framework. Or something.