As some of you know, I’m in the final year of my PhD in Web Science. For whatever reason, I decided I’d learn whole load of new stuff from the ground up. In my 50s. With zero knowledge to start with except some very basic maths. I needed to learn to write code, and although my MSc year included a module on writing code in Python, it did nothing more than get me familiar with what code actually looks like on the page.
I cried every Sunday night, prior to the workshop on Monday, because I just couldn’t see how to make things work.
Today, over two years on, I get it. I can write it (although I still have to refer to a book or previous code I’ve written as a reminder) and my ability to think logically and has improved considerably. During that time, I’ve amassed a range of books and URLs that have been, and still are, incredibly useful. It’s time to share and provide myself with a post of curated resources at the same time.
First of all, you absolutely need a pencil (preferably with a rubber on the end), some coloured pens if you’re a bit creative, and plenty of A3 paper. Initially, this is just for taking notes but I found then incredibly useful further along when I wanted to write the task that I needed my code to carry out, step by step.
Post-it notes – as many colours and sizes as you fancy. Great for scribbling notes as you go, acting as bookmarks, and if you combine them with the coloured pens and A3 paper, you can make a flow chart.
Code Academy is a good place to start. It takes you through the basics step by step, and helps you to both see what code looks like on screen, and how it should be written. There are words that act as commands e.g. print, while, for etc. that appear in different colours so you can see you’ve written something that’s going to do something, and you can see straight away that indents are important as they signal the order in which tasks are carried out (indents act like brackets in maths).
Just about every book that covers writing code includes a basic tutorial, but one that I bought and still keep referring back to is Automate The Boring Stuff With Python. By the time you get here, you’ll be wanting to start writing your own code. For that, I recommend you install Anaconda which will give you a suite of excellent tools. Oh, and I use Python 3.6.
Once you’ve opened Anaconda, Spyder is the basic code editor. I also use the Jupyter Notebook a lot. I like it because it’s much easier to try out code bit by bot, so for example when I’m cleaning up some text data and want to remove white space, or ‘new line’ commands, I can clear things one step at a time and see the results at the end of each one. You can do the same using Spyder, but it isn’t as easy.
I’m going to list some books next, but before I do I should mention Futurelearn. I have done several of the coding courses – current ones include ‘Data MiningWith WEKA’, ‘Advanced Data Mining With WEKA’ and ‘Learning To Code For Data Analysis’. While these may not cover exactly what you have in mind to do (more on that in a minute), they will all familiarise you with gathering data, doing things with the data by writing code, and visualising the results. They also help to get you thinking about the whole process.
I had a series of tasks I needed code to do for me. In fact, I think the easiest way to learn how to write code is to have something in mind that you want it to do. I needed to be able to gather text from blog posts and store it in a way that would make it easily accessible. In fact, I needed to store the content of a blog post, the title of the post and the date it was published. I later added the URL, as I discovered that for various reasons sometimes the title or the date (or both) were missing and that information is usually in the URL. I then identified various other things I needed to do with the data, which led to identifying more things I needed to do with the data….. and so on. This is where I find books so useful, so here’s a list:
- Mining The Social Web, 2nd Edition. The code examples given in this book are a little dated, and in fact rather than write the code line-by-line to do some things, you’d be better off employing what I’ll call for the sake of simplicity an app to do it for you. It was the book that got me started, though, and I found the simple explanations for some of the things I needed to achieve very useful.
- Data Science From Scratch. I probably should have bought this book earlier, but it’s been invaluable for general information.
- Python For Data Analysis, 2nd Edition. Again, good for general stuff, especially how to use Pandas. Imagine all the things you can do with an Excel spreadsheet, but once your sheet gets large, it becomes very difficult to navigate, and calculations can take forever. Pandas can handle spreadsheet-style stuff with consummate ease and will only display what you want to see. I love it.
- Programming Collective Intelligence. This book answered pretty much all the other questions I had, but also added a load more. It takes you through all sorts of interesting algorithms and introduces things like building classifiers, but the main problem for me is that the examples draw on data that has already been supplied for you. That’s great, but like so many other examples in all sorts of other books (and on the web, see below) that’s all fine until you want to use your own data.
- This book began to answer the questions about how to gather your own data, and how to apply the models from the books cited above: Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data. This book has real-world examples which were relatively easy for me to adapt, as well as straightforward explanations as to how the code works.
Finally, some useful web sites. The first represented a real break-through for me. Not only did it present a real-world project from the ground up, but the man behind it, Brandon Rose (who also contributed to the last book in my list) is on Twitter and he answered a couple of questions from me when I couldn’t get his code to work with my data. In fact, he re-wrote bots of my code for me, with explanations, which was incredibly helpful and got me started. http://brandonrose.org/ is amazing.
This is the one and only video tutorial I’ve found useful. Very useful, actually. I find video tutorials impossible to learn anything from on the whole – you can’t beat a book for being able to go back, re-read, bookmark, write notes etc. – but this one was just what I needed to help me write my code to scrape blog posts, which are just web pages https://www.youtube.com/watch?v=BCJ4afDX4L4&t=34s.
https://datasciencelab.wordpress.com/2013/12/12/clustering-with-k-means-in-python/ and other blog posts.
https://www.naftaliharris.com/blog/visualizing-k-means-clustering/ does what it says, and more.
http://www.ritchieng.com/machine-learning-multinomial-naive-bayes-vectorization/ useful walk-through.
The URLs listed above are quite specific to the project I’ve been working on. I’d also like to add Scikit-Learn which provided all the apps I’ve been using. The explanations and documentation that is included on the site was less than helpful as it assumed a level of knowledge that was, and to a certain extent still is way above my head. However, what it gave me was the language to use when I was searching for how to write a piece of code. Stack Overflow is the best resource there is for this, and most of my bookmarks are links to various questions and responses. However, it did take me a while to a) learn what form of words would elicit an answer to my problem, and b) to understand the answers. I even tried asking a question myself. Never again. Unless you’re a fully-fledged computer science geek (and if you were, you wouldn’t be here) it’s hostile territory.
Finally, an excellent site that has been useful again and again: DataVizTools.
Going back to Anaconda for a minute, when you’re feeling a bit more confident, have a look at the Orange application. I’ve blogged about it several times, and blog on the site is an excellent source of information and example projects. The help pages are excellent for all the basic apps, although some of the newer ones don’t have anything yet.
And to finish, a site that I found, courtesy of Facebook, this very morning. This site lets you see how your code works with a visualiser, something I found myself doing with pencil and paper when my code wasn’t doing what it should and I didn’t know why.