Today, I’ve been trying to make notes on the section of my dissertation entitled ‘methodologies’. Frankly, it hasn’t gone well.
Methodologies aren’t necessarily the same as methods. In a chemistry experiment, the ‘method’ states each step in the process of setting up and carrying out an experiment. You have to do that so that your process can be repeated by anyone, is explicit, and can be critiqued by others who may feel that some part of the method was flawed, or could perhaps have been refined. After all, it’s the method that facilitates the outcome, and that’s the bit everyone is interested in. For the type of research I’m doing, there are basically two methodologies I can use and, just like the school chemistry experiment, I have to explain them and justify them on academic grounds – why am I using them, and what will they offer me as a way of being able to test my hypothesis and demonstrate the results I’m hoping to?
The first one is qualitative research. This is often things like interviewing people, or carrying out research, and/or reviewing a collection of research papers covering a particular area. It’s subjective, and the results are often largely down to the skill of the researcher in gathering appropriate and relevant evidence (i.e. designing the study effectively in the first place) and interpreting the results with skill. Part of my research is reviewing research papers around the topic of online harassment / flaming / trolling however you want to term it. I intend to use this to demonstrate the scope of the problem of misogynist abuse aimed at women using the online world of the www.
My literature review (and I’m not clear if this is also part of the methodology) will also include some reading around the topic of language as an indicator of culture(s) / cultural norms / beliefs and attitudes. My position is that the way we use language encodes how we feel about certain things, and was the subject of my previous blog post.
Finally, the methodology includes data analysis, making my research officially a ‘mixed method’ approach. The data will be a sample of tweets from twitter that contain a series of keywords, all of which have been gleaned from the qualitative review of papers on ‘e-bile’ (a very handy and appropriate term that I’ve borrowed from a key research paper). This is the part I’m having the most issues with.
Unless you want to pay a commercial company for access to a set of data that contains every tweet (well, nearly every one) over, say, the period of a week, and that’s a lot of money, you have to settle for accessing a sample of the twitter stream. The key word there is ‘sample’. Twitter will give anyone access to a 1% sample of the twitter stream based on keywords, but how the sample is generated is, in effect, a secret. Straight away, to a serious researcher, it’s unreliable. Researchers have cottoned on to this, and have spent some time (and money) exploring ways of assessing a) how biased the streaming sample is compared with a 1% sample of ‘everything’, and what other researchers can do to mitigate the bias. Now, I can see how bias would arise if you’re searching for all tweets that use a hashtag. If, for example, the hashtag is one that relates to an event in Australia, than you’re unlikely to generate much of a sample of you query the twitter stream in UK time. Actually, it’s not as easy as you’d think to find the geographic location (geotags) of most tweets. Not all of them contain a) geotags, or b) the correct geotags. The real problem, though, is the fact that the sample is generated using a parameter or parameters chosen by the you, the researcher, which means that you have absolutely no idea if they are representative of what’s actually going on on Twitter, because you have no background ‘norm’ to measure them against.
The second problem in one of semantics, or the actual words used if you’re searching for keywords, as I will be. As an example, if I decide to select tweets on the bases that they’re using the word ‘slut’, I’ll get ‘what a slut’ AND ‘calling someone a slut is wrong’. Both are valid, but the sentiment expressed in each is very different. It is now possible to create sophisticated sentiment analysis models that will look at how keywords are used in combination with other words, and attach a numerical ‘weighting’ to the word ‘slut’ in the context of the whole tweet, which is brilliant, but beyond the scope of what I can do for my research right now. Also, consider the use of the word ‘rape’. Gamers use the word all the time to mean ‘beaten’ or ‘thrashed’ in the sense that they won a game by a considerable margin. In that context, the word has a different meaning, although I would argue that the attitude encoded within the use of the word is an indicator of unconscious bias against women.
Given that 350,000 tweets are sent every minute, I get why access to the entire stream (also known as the fire hose) is limited. The maximum number of tweets I can get, based on keywords I select, is 18,000 although it’s unlikely I would actually get that many as the number I get will be based on the 1% sample of the twitter stream, and I certainly hope not all of them will contain the keywords I’m looking for. Oh, and let’s not forget that making one misogynist remark doesn’t mean the person that made it is a misogynist. Ideally, the individual behind the tweet should be the subject of some more research over time.
The fact is, I’m going to have to read the tweets that I collect myself. I don’t have the time to learn how to use sentiment analysis in any meaningful way for my Masters thesis, and so I’m going to have to select the tweets that use misogynist language myself, and perhaps rank them in some way for the purposes of creating a visual representation. This is fine, and I can make my ranking criteria explicit. What I can’t do is deal with 8,000 or so tweets, which, if the examples of data sets I’ve seen recently are anything to go by, is what I’ll be looking at; and of course every one of those tweets will have been selected because it contains one of my keywords. Of course, the irony here is that although the data is quantitative, the way the data is being assessed is qualitative.
So, my questions thus far, are as follows:
- Should I collect all tweets, based on my keywords, in batches over a period of a week so that I get a good spread of days and times? Why / why not?
- How will that benefit me from the point of view of an academic, quantitative research methodology, and how can I demonstrate the benefits if they exist?
- How many tweets should I collect per batch?
- I want to end up with a sample of, say, 1000 tweets as a case study. Will this be sufficient for me to conclude – as the sample data I’ve generated suggests I’ll be able to – that the use of misogynist language is commonplace among users? How will I ‘prove’ that I have a methodology that’s robust to stand up to scrutiny?
- Am I even asking the right questions in the right way?