It’s quite late on Tuesday evening – 11.45pm – but if I don’t get this blog out I’m going to be even more behind so it has to be done!
Today has been a full-on day. There’s been endless sitting around listening to talks on various topics related to the way data from the www has / can be harvested and analysed to reveal various things. This started at 10am and didn’t finish until nearly six, when it was on to a posh hotel nearby for a meal organised by our hosts, followed by a few beers back at the bar in our hotel. The talks were, on the whole, very interesting although it can be hard to follow sometimes as the distinguished lecturers were, with two exceptions, from China, Korea or Singapore. Tuning your ear in to pick up words said with a pronounced accent is hard, but it can be mastered quite quickly if you concentrate. I’ll blog more about the speakers later when I have more time, but suffice to say sitting down for such a long time is quite tiring, and I wasn’t the only one who nodded off for a few minutes…..
The university campus is really spread out. There’s even a bridge over a viaduct built, I presume, to deal with excess water (I’ve added new images to the Shenzhen page of this blog). Incidentally, the weather yesterday was dry and sunny – more like late summer in the UK than winter. We had a brief welcome yesterday morning, before the projects we were expected to work on were outlined, and we chose which group we wanted to work with. I chose a group let by a Professor from Singapore (I can’t remember his name and I’m too tired to get out of bed and retrieve the programme from my bag. I’ll tell you tomorrow) which was tasked with looking at Twitter data gathered in the run-up to the elections in India.
The Professor was very clear about the research question, and the tasks we should undertake in order to address that question. I’ve only participated in one other event like this, and in that instance we were given sets of data and left to come up with our own ideas and analytical processes, but given that I was the only English speaking student in the group it probably wasn’t a bad idea to have some direction. Our biggest problem was getting the data itself – as some of you may know, the Great Firewall Of China restricts access to huge chunks of the www, and even when access is available, the filters make it extremely slow to use. It took us a good couple of hours to download the sets of data from Singapore where it was held, and then when I got a look at it, I could see that it was going to take a lot of work before it was in any kind of state to work with.
I don’t have the skills to strip out useful data, but one of the Korean students provided me with sub-sets of data, and I confess I spent some time on my laptop today (yes, during the talks) playing with Python code and managed to clean one of the sub-sets up so that it’s almost in a fit state to work with. I’m not sure how many tweets make up the entire data set, but the sub-set I worked on consisted of 44,000 tweets, and that’s just one out of a total of 18. Excel spreadsheets are useless with file sizes like that.
Anyway, In response to a Facebook comment, here’s a look at part of yesterday’s breakfast, followed by some of last night’s dinner…..
Some more observations:
- Everyone drives like a lunatic.
- The price of stuff here is, generally, pretty similar to the UK. Except the cost of a taxi, which is very cheap. Presumably this is to compensate you for the white-knuckle ride you’re going to have.
- Clearly, ‘ethics’ is a word unknown in this part of the world.
Finally, we have a whole day to ourselves to explore Hong Kong, so if anyone has any ‘must see’ suggestions, I’d love to hear them!