Imagine you’re in a hotel room, or other building. It’s a high building, so you have a panoramic view of the city or mountainous landscape below you. In your hand, you have a piece of cardboard, A4 size, and in the other hand, a pin. Use the pin to make a hole in the middle of the cardboard, and then hold the cardboard up against the window. Get up close, and look through the pinhole. What can you see now? A tiny fraction of the total view that was available to you just a moment ago.
That’s what getting data from Twitter is like. You end up with a tiny fraction, and have no idea what the rest of the landscape, that represents the data, looks like. What if your pinhole view was trained on the distant mountains, and completely blotted out the city that makes up most of the rest of your view? You’d assume you were in the Alps until you had the chance to widen your view, and that perfectly demonstrates the other problem with using Twitter data: you have no idea of the tiny fraction you’re seeing is representative of the whole picture. And, to make matters worse, as the number of active twitter users grows, your pinhole is actually getting smaller because the landscape is becoming more dense. Furthermore, some bits of your view – the view you get when you look out of the window as normal – are empty. Large swathes of the global population aren’t on twitter at all, and you’ll only be able to see the middle- and upper-class districts. Most of the poor, urban suburbs are behind you, out of sight, because in general the poor can’t afford internet access enough of the time to justify using social networks much, if at all.
So what can you do? Well, you can make lots more pin pricks on your cardboard, and try and build a more representative picture that way, but in reality the number of holes you can make are limited, and if you happen upon one of those blank areas, you’ll see nothing. And you’re still not seeing what’s behind you. You could get another piece of cardboard, put it up against a different part of the window, and try there. Again, limited number of holes but the picture is a tiny fraction better, a bit like looking at Twitter and a Facebook page.
Let’s consider the time of day as well. If you look in the morning, at dawn, the view will be very different from midday of ten o’clock at night. So, you make as many holes as you are able, and hold your cardboard up at different times. Your picture is improving, but that’s probably as good as it’s going to get. And there are still blanked out areas, and the bits behind you where the socially disadvantaged, and the older people who can’t engage with technology, live.
The very best you can do, then, is assume that the bits of information you are able to put together, using as many pin-pricks as you can and gathering the data over different times of the day, is as good as it gets, and extrapolate your data from that.
And I think that’s pretty much how astronomy works as well as social media research.