Category Archives: My Education

The Problem With Guessing K-Means

I’ve been grappling with the problem of how to find out what a group of professionals blog about. That seems simple enough on the face of it, but when there are over 9,000 blogs in a sample set of data, it’s not so easy. I can’t read every one, and even if I could, can you imagine how long it might take me to group them into topics?

Enter computer science in the form of algorithms.

I’ll gloss over the hours…. days…. weeks of researching how the various alternatives work, and why algorithm A is better than algorithm B for my type of data. Turns out k-means is the one I need.

Put very simply, each blog post (document) is made up of words. Each word is used x amount of times, both in the document and in the entire collection of documents (corpus). An adjustment must be made for the overall length of the document (a word used ten times in a document of 100 words doesn’t have the same significance as the same word used ten times in document of 1000 words), but once this has been done it’s possible to give each document an overall ‘score’, which is converted to a position (or vector) within the corpus.

It helps to think of the position as a ‘vector’ in a space with an infinite number of dimensions, even if you can’t visualise it, which I can’t. But, having done this, it’s then possible to k-means to randomly pick a number of starting vectors (the number being picked in advance) and it will proceed to find all of the documents closest to it until it finds the distance becomes too great or it begins to overlap with a neighbouring group, in which case it starts again somewhere else. The algorithm does this over and again until it completes the task successfully as it can (or it’s told to do it for a maximum number of tries, or iterations) and then it tells you how many documents it’s put in each cluster.

In theory, the algorithm should produce the same number of clusters every time you run it, although that doesn’t always happen as I found with my data. The other thing is, without grouping the set manually, there’s no way of telling what the actual number of k should be, which rather defeats the point of the algorithm…. except when you’re dealing with large data sets, you’ve got no choice.

Of course, you CAN just keep clustering, adding 1 to your chosen number for k until you think you’ve got results you’re happy with. I started doing that, beginning with 10 and working up to 15, by which time I was totally bored and considering the possibility that my actual optimum number of clusters might we over 100…. Every time I ran the algorithm, the number of posts in each cluster changed, although two were stable. That seemed to be telling me that I was a long way from finding the optimum number.

Enter another load of algorithms that can help you estimate the optimum number for k. They aren’t a magic bullet – they can only help with an estimation, and each one goes about the process in a different way. I chose the one I did because a) I found the code in a very recent book written by a data scientist, and b) he gave an example of how to write the code AND IT WORKED.

Guess how many clusters it estimated I had? Go on, guess….. seven hundred and sixty. Of course I now have to go back and evaluate the results, but still. Seven hundred and sixty.

Good job I stopped at 15.



Having successfully divided my data set up into separate years yesterday, I thought I’d go back to basics and have a look at stopwords.

in language processing, it’s apparent that that are quite a few words that absolutely no value to a text.  These are words like ‘a’, ‘all’, ‘with’ etc.  NLTK (Natural Language Tool Kit – a module that can be used to process text in various ways.  You can have a play with it here) has a list of 127 words that could be considered the most basic ones.  Scikit-learn, which I’m using for some of the more complicated text processing algorithms) uses a list of 318 words taken from research carried out by the University of Glasgow .  A research paper published by them makes it clear that a fixed list is of limited use, and in fact a bespoke list should be produced if the corpus is drawn from a specific domain, as I’m doing with my blogs written by teachers and other Edu-professionals.

Basically, the more frequently a word is used in a corpus, the less useful it is.  For example, if you were presented with a data base of blogs written by teachers, and you wanted to find the blogs written about ‘progress 8’, that’s what your search term would be, possibly with some extra filtering-words like ‘secondary’ and ‘England’.  You would know not to bother with ‘student’, ‘children’ or ‘education’ because they’re words you’d expect time find in pretty much everything.  Those words are often referred to as ‘noise’.

The problem is that if the word ‘student’ was taken out of the corpus altogether, and treated as a stopword, that might have an adverse effect on the subsequent analysis of the data.  In other words, just because the word is used frequently doesn’t make it ‘noise’.   The bigger problem, then, is how to decide which of the most frequently used terms in a corpus can safely be removed.  And of course there’s the issue of whether the words on the default list should be used as well.

The paper I referred to above addresses this very problem, with some success.  I’m still trying to understand exactly how it works, but it seems to be based on the idea that a frequently-used word may in fact be an important search term.  And the reason I’ve spent so much time on this is because the old adage ‘rubbish in, rubbish out’ is indeed true, and before I go any further with the data I have, I at least need to understand the factors that may impact the results.

Thinking it through… Part 2

Having had chance to think about, and articulate some ideas as to how to deal with my data set, I started dividing it up into blogs posts by year.  I like using Pandas for Python, although it can be difficult to find help with it that is pitched at the right level.  Anyway, I separated out all the year from 2004 to 2017 and saved them in individual .csv files.

Than I had a go at clustering posts from 2017.  With ‘only’ 230 blog posts, this was relatively easy in terms of processing using the hardware available on my laptop.  I stuck with 10 clusters as I’d used this arbitrary number when I clustered the whole set.  I’ll talk in more detail about the results in the next post, but some issues remain to be addressed:

  • What to do with the entries that don’t include the year they were posted.
  • The stop words obviously need sorting out, as I’m getting rubbish like ‘facebooktwittergoogleprintmoreemaillinkedinreddit’ as one of the top terms in a cluster.  Two clusters, in fact.
  • As mentioned in the previous post, some of the titles include ‘posted on’ followed by the date of posting, and/or the category; and sometimes the blog post itself rather than the title.  I should probably try and remove the ‘posted by’ from the beginning, and I can probably get rid of the category as well.  Following that, the first sentence would probably do as the title.

The big question, though, is should I use the data from the entire set as training data for these subsequent sub-sets?  That would probably mean experimenting with different numbers of clusters until I got what looked like a coherent set of topics (which will obviously be down to my own professional judgement and inevitable researcher bias) and label them, or should I subject each subset to the principles of unsupervised learning and see what happens?

Then there’s presenting my data.  I would like something like this, explained here by the late, great Hans Rosling.

I’m imagining my timeline along the horizontal axis, probably starting around 2004 and finishing with the present.  This will probably be broken down into quarters.  The vertical axis will be the topics discussed, summed up in one or two words if possible.  How cool would that be?

Thinking It Through…

This blog is intended to be a record of the things I’ve been thinking about as I’ve looked over a sample of my data.  You might find it a bit boring…. that’s allowed.  You don’t have to read it.

Dealing with Data: Dates

I’m working on a sample of blog post data that I scraped for my PhD upgrade report (and a paper for the Web Science conference that wasn’t accepted, sadly).  The data contains ‘just’ 11,197 rows of text data: The contents of each blog post, the date it was posted, and the title of the post.  Well, that’s what I wanted when I wrote the code that went through a list of URLs and scraped the data.

A spreadsheet with 12,000 rows is just about manageable, by which I mean you wouldn’t want to print the data out, but you can scroll through and have a look at what you’ve got using Excel.  A sample like this is useful because you can observe the data you’ve gathered, and anticipate some of the problems with it.

The first thing I noticed is that rows 1486 to 2971 appear to be duplicates of the previous batch of rows.  Obviously this has happened because the source URLs have become duplicated.  Now, when I got my first list of URLs together, not all of them could be scraped.  There are several reasons for this:

  • wrong address;
  • URL no longer available;
  • password protected blog;
  • the code simply won’t work on the given URL.

My code stops running when it encounters a URL it can’t access.  Up to now, I’ve been manually cutting out the offending URL, and copying it in to a separate document that I can look at later.  This is the first place an error could be made, by me of course.

Task 1: amend code so that a URL that can’t be processed is automatically written to a separate file, and the code continues to iterate through the rest of the list.

When you’re dealing with around 1000 URLs, as I hope to do, the less intervention by me the better.

Then, there’s the data that’s gathered.  First, Excel is a very poor tool for viewing data scraped from the web.  I used Pandas (a Python module) to clean it up a bit (removing the whitespace at the beginning and end of the text) first before opening it up in Excel.  Then, it’s possible to see what’s in each cell, and align it top/left if necessary.  As I was only interested in reviewing the ‘date’ and ‘title’ columns at this stage, I saved the file with a slightly different name and deleted the ‘content’ column.  The reduction in file size makes it a bit easier to manage.

All looks good.  This is a typical entry:

65 September 11, 2012 Reading

65 is the index number given to the entry when the data was scraped, so it’s the 66th blog post from this URL (entries start at zero).

Then there’s this:

Problem 1

0 Posted on December 5, 2016 Carnival of Mathematics 140

The way the date is represented is crucial to my project.

Task 2: Remove ‘posted on ’ from the string.

Easy enough to do you’d think, but actually not.  It is possible to strip the first n-characters from the beginning of a string, but the code will iterate through every row and do the same, which is not what I want.  The other option is to split the string and copy the ‘posted on ’ (the space after ‘on’ is deliberate) bit to another column.  So, the pseudo-code would look like this:

if row in ‘Date column’ contains the string ‘posted on ’;

split string after ‘posted on ’;

write to row in ‘Posted On’ column.

Problem 2

1 Posted on January 29, 2017January 29, 2017 Education

So much is such a waste of time

Posted on January 29, 2017January 29, 2017

There are a couple of problems here.  If I split the date string as I did previously, it’s not going to help me.  I’d be left with ‘January 29, 2017January 29, 2017’.  Now what?

Secondly, the title cell looks to me as if it contains a category for the post, the title, and the date the post was made (again).  At this point, I’m thinking of finding this particular blog post via a google search, and looking at the HTML structure of the page to see why I’m getting these extra bits of unnecessary information.  It may not look like much, but:

  • when my spreadsheet has one hundred and eleven thousand rows, or more, that’s a lot of extra data;
  • I eventually want to use the titles when I present my data visually to an audience;
  • The title itself may be useful to add some substance to my analysis, so I don’t want it ‘dirtied’ with useless characters.

Problem 3

This row has a similar issue, although there is no category.  I’ve added the stars to protect the identity of the blogger.

0 Posted on November 9, 2015 What did I learn?

Posted on November 9, 2015 by C******* M*****

I’m not sure what to do about the date here, so let’s move on.  I can do this though:

Task 3: examine the HTML structure of this blog URL with a view to modifying the code used to scrape the data.

Problem 4

Here’s something else interesting:

183 Posted on March 1, 2010September 9, 2010 Software and websites I couldn’t do without

Two dates.  I suspect that the first date is the one on which the blog entry was posted, and the second is the date it was amended /updated.  Again, how am I going to deal with this?  I think I’m going to have to go back to the HTML again and see if I can make another modification to my code.  I’m only on row 925… let’s move on.

Problem 5

Here’s my next oddity:

0 2016-09-12 by k***** National Drama CPD Training for secondary teachers

I can split the string here:

if row in ‘Date column’ contains the string ‘ by’;

split string before ‘ by’;

write to row in ‘By’ column.

The space is in a different place now.  This matters, because while you and I see a space in Excel, there is in fact a character there, and it counts.  It quite literally ‘counts’ too, because it has a place.  It’s number 10 in the string (remember, counting begins at zero).  So, if I were to split the string at the space before ‘by’, it might actually split at a different place in a different cell (remember my code will iterate through every row of the column, so I need to be sure that it will only impact the cells I want it to).

Task 4: split string at ‘ by’.

The date that’s left in the cell will be in a different format from previous dates i.e. it’s 2016-9-12 rather than September 12, 2016.  Will this make a difference?  I don’t know yet.

Problem 6

0 2017-02-05 00:00:00 314. Maths is a foreign language

Problem 7

This date has the time as well.  Again, I don’t know what difference this will make.

1 21st December Phase diagrams

Now here’s a problem – no year.  A crucial piece of information is missing, and it’s missing for 696 rows (from row 4603).  Previously, I used Pandas to do a quick audit (locating rows containing 2017, 2016, 2015 etc. and had established that 786 rows were unaccounted for.  It looks as if I’ve found some of them.

p.s. rows 9522 to 9552 are similar, so there’s another 30.  Only 40 unaccounted for.

30 Posted by  b********1 Hello world!

Problem 8

This cell indicates there’s no spaces between ‘Feb’, ‘17’ and ‘2017’ although when I pasted the row into this word document, each element was on a different line.

0 Feb172017 Learning & Teaching GCSE Mathematics

This will probably be ok because when I come to analyse my data, the important pieces are the month and the year, both of which are clear.

Problem 9

And what about this?

36 8. März 201330. März 2016 Build your own low-cost slate! | Baue dein eigenes low-cost Slate!

I know from looking at this blog before that not all of it is in a foreign language (I’m assuming it’s a foreign language teacher), so do I leave this entire blog out of my master list?

Problem 10

I could split these strings, although the figure given for the number of comments varies.

0 04 Apr 2016 Leave a comment The World is Upside Down
6 22 Apr 2015 3 Comments Revision – what works best?

if row in ‘Date column’ contains the string ‘ leave a comment’;

split string before ‘ leave’;

write to row in ‘Leave’ column.

if row in ‘Date column’ contains the string ‘ (number) comments’;

split string before ‘ (number)’;

write to row in ‘Leave’ column.

It’s possible to write code that will take any numerical value for ‘(number)’.

Problem 11

Then there’s this – no title at all.

333 March 3, 2012  

I really need something here, but what?  I could amend my code so that, if it fails to find a blog post title, the phrase ‘No Title’ is written into the row instead.  Alternatives include:

  • use the first sentence from the blog post itself (which can be extracted from the ‘Contents’ cell);
  • use the three most common terms from the post (obtained from the TF-IDF analysis I’m doing on the whole data set);
  • deploy some other text analysis technique to summarise the post in one sentence, which, when you think about it, is exactly what we try and do when we come up with a title for our own blogs.

This affects quite a few rows, so it needs addressing.

Problem 12



Posted by


Posted on

March 14, 2015

Posted under



Leave a comment

Peer Observation – Priceless CPD, for free!


I’ve copied and pasted this ‘as is’, although in the spreadsheet the data in the date cell appears on one line.  This highlights one of the issues when viewing data – it will appear differently when looked at through different windows, and yet each window has its advantages.  Excel is good for scrolling through data, and for basic numerical functions.  For everything else, I use Pandas for Python, usually via the Jupyter notebook that’s part of the Anaconda suite.

Problem 13




Dec 2016

Hi Guys. This page will contain all the BSGP (bronze, silver, gold, platinum) skill sheets for your perusal.

All resources are free to a good home and are intended to be used for what they are… banks of questions rising in difficulty to help complement your teaching, not replace it!

As I create new resources I’ll add them here so check back often. At some point i’ll probably give the project a formal name and organise it a little better than I am at the minute.

All answer sheets can be found in a password protected blog post (called ‘answer sheets’ of all things!).

Hit me up on twitter  ( @mrlyonsmaths ) for the password














Here’s a row where the contents are appearing where the title should be.  I’m willing to bet that this is because of the HTML structure of the page, so I need to revisit my master code.  It’s not the only set of blog posts from a URL either.

Task 5: revisit master code for extracting ‘Title’ from this blog URL.

And all these problems are, of course, the ones I’ve uncovered in my sample.  The ones I know about.  My final data set will be huge, and I’ll have little chance of spotting anomalies unless I accidentally stumble upon them.

Welcome to my world of big data.




I’m a slow learner.  By that I mean it can take me a while to put all the pieces together so I can see the whole picture.  If I was a detective, I’d be the plodding kind that takes ages to interrogate every witness, look at every piece of evidence, and use one of those huge pin boards to visually represent the case.  I wouldn’t have a Eureka! moment part way through when I could suddenly see whodunnit and spend just a few seconds demonstrating how everything that remained fitted together.

I’ve just spent the best part of three months teaching myself to write code so that I can copy blog posts from over 800 bloggers, together with the date the blog was posted and the title.  Actually, in the end I’ve written code that will do that for most of the blogs in my list, for reasons I’ll explain in the next post.

Writing computer code to do a variety of somethings, and do them in the right order, is hard.  I’ve been using Python, which is pretty straightforward and relatively easy to read if you’ve never seen code before.  While lots of things still happen ‘under the bonnet’ so to speak, the commands that make those things happen are pretty transparent.  It does exactly what you tell it to do, and executes your commands in a precise and logical order.  This is how it works:

(1 + 2) + (3 x 4) = ?

3 + 12 = 15

It will do the calculations in the brackets first before moving on to the second stage, where it adds the totals from the bracketed calculations together.

A similar instruction in Python would be:

if len(blogPostTitle) > len(blogPostDate):

So, if the length of the list ‘blogPostTitle’ is greater than the length of the list ‘blogPostDate’, remove the last item in the blogPostTitleList.  The second line is indented so that Python knows it must execute this line of code before it moves on to the next.  My code goes through a sequence of instructions, not all of which have to be carried out if certain conditions aren’t met, and it must execute this code several times before it can move on to repeat the process – in my case, on every item in a list – before it ends.

Typing it out like that makes it sound extremely simple, but the form of words, and the sequential structure of those words, have kept me occupied for weeks.  I’ve no doubt someone with a better grasp of maths than I have would grasp the logical structure behind it, and learn the language, much faster than I.  In fact, a long piece of code that does a specific thing can be labelled as a ‘function’ and given a name, and called on to do its work using just the name, saving you from copying and pasting all the code again (and having to make numerous corrections if it needs amending).

During the course of this project, I’ve written a bit of code.  Searched on Google for how to write the next bit of code.  Read bits from books on programming.  Searched again.  Written a bit.  Got one little thing working (like putting all the URLs in a list Python can read).  Written the next bit of code.  Or rather, tried, and repeated the process above several times over.  And believe me, reading coding solutions online, when you’re a coding novice, is less than helpful.  Just knowing what key words to put into your search is a major leap forward.  Finally, I ended up drawing diagrams of what I needed my code to do, printing it out and cutting it up with scissors so I could visualise the sequence of events and the result if I changed anything around, and then I went back to that last piece of working code I put together and I could see the final thing I needed to do to make it work.

I strongly suspected that getting to grips with code would improve my maths skills, and I was right.  It really made me think about the sequence of events as much as the language used to describe them, and of course if you want to be any kind of an engineer, you have to understand the rules of logic.  I feel as if I’ve really actually learned something properly, and that was one of my main goals in doing this PhD.  I’ve levelled up.


My personal blood, sweat and time, but no tears.

What Big Data Can’t Tell You

I’ve spent what seems like months writing Python code that will let me download the content of blog posts.  You can  do this using what’s known as an RSS (Rich Site Summary) feed, but that only yields a summary of the most recent posts, when I need the whole post, and every post the blogger has written.  In some cases, this goes back years.  It’s been a painful process, and will be the subject of my next blog post, but just for a bit of ‘fun’ I thought I’d look at the comments feed instead.

A while ago, Tom Starkey (@tstarkey1212) asked on Twitter if there was any way of finding out which Edu-blogs might be the most popular.  One way of finding out might be to look at the number of comments made on posts, so I thought I’d use the RSS feed this time to download the latest ones and have a look.  I wrote some Python code, and bingo! there they all were in a nice tidy spreadsheet.  There are some issues, though.  Quite a few, actually.

  1. I have no idea if I have the http address of every Edu-blogger out there.  My source was the list in a spreadsheet provided by Andrew Old .  How complete it is depends on a) whether you’ve heard of Andrew, or b) whether you’ve heard of him but don’t want to add your blog to ‘his’ spreadsheet.  Still, there are over 800 blogs on there so it’s a big enough sample to be getting on with.
  2. The information I needed was the blog post title, the name of the commenter, the date the comment was made, the comment itself, and the http link to the comment.  The link is important because it contains the title of the blog site.  RSS feeds yield particular information as they’re kind of standardised.  However, the title of the blog post contained in the link, the bit in bold in fact: isn’t always the actual title.  Nor is it always after the //, so any attempt to automatically extract the title based on its position in the http address was difficult.  When you’ve got over 8000 rows in your spreadsheet, you so want to automate the process if you can.  I chose not to, because….
  3. The name of the commenter might also be the title of the blog.  In fact, this was the case for quite a few posts, something that only becomes obvious when you slowly scroll through each of those 8000-plus rows.
  4. The name of the commenter should be the very last item in the field yielded by the RSS feed.  In theory, it should be easy to extract because it would come after a comma or possibly even a | symbol.  So, I could write some code that would iterate over every one of those 8000 rows and just extract the commenter’s name and put it in a separate column, right?  Wrong.  Some fields were truncated because they were too long.  Relying on commas to demarcate the right characters risked getting the wrong information.  Sometimes there was nothing more than a space.  In the end, I did it manually, copying and pasting.  That also helped me to identify names that related to the blog title and the name of the commenter, so I could match them up.
  5. Finally, the most obvious thing.  Not everyone who read a blog leaves a comment.  In fact, I’m willing to bet most people don’t. And if they do, I bet they do it via either posting a link to the blog with a recommendation, or simply retweeting the link that brought the blog to their attention in the first place.  The only way of knowing who reads a blog is in the hands of the blogger themselves via their stats pages, or possibly Google with their page link algorithm.  Still, I think the real proof (in spite of what some bloggers have claimed) lies in those stats.  And given I’ve been accessing some sites repeatedly in an effort to see if my code works, there may be some glitches there as well.

In spite of all this, I gathered my data and used NodeXL to produce a graph.  Three, in fact.  The basic one is here and is best viewed using a laptop or PC.  I’ve made some notes based on the graph metrics (graph-notes) and there are two other versions here and here .  Again, it’s best you view them using a laptop or a PC.

Finally, if your blog isn’t on Andrew’s spreadsheet, and you want to see how it compares with everyone else’s (or you’d like me to include it in the data I’ll be using for my PhD) you can either add it yourself or let me know the address and I’ll add to my own records.  I intend to anonymise all my data before I publish it because I know how sensitive it is even though it’s public (I’m an ex-teacher myself).  Or you can send me your viewing stats because, after all, they paint the clearer picture.

The thing is, though, that while it’s easy to think your blog might the one that’s influencing everyone and getting them ‘on your side’,  knowing and proving it are completely different things altogether.

A Mini Review

I’m a bit behind with my blogs, I know.  This is an attempt to try and catch up a bit, pending a longer post tomorrow that will for the bare bones for a chapter in my PhD.

  1. I bought an electric bike a few months ago.  The journey to Soton Uni is 2/3 uphill and takes half-an-hour to walk.  This is fine when I have time to spare, and the weather is nice (but not too hot).  If I drive, it takes five minutes, which is not good for a car with an old diesel engine, and I have to move the car every two hours.  As well as having a battery that helps take the effort out of pedaling, my bike also folds in half and so I can store it under my desk.  It’s been reasonably well-used so far, and given that the journey is no more than about 10 minutes, I’m not at too much risk from Southampton’s shocking drivers.  And I wear a helmet and cycle in the middle of my half of the road.wp_20160504_001
  2. I’ve been to Amsterdam three times now, and I absolutely love it.  I have photos to publish, and I’ll put them in an album soon.  I’ve also been to Berlin, and in October I’m gong to Rome.
  3. My beloved car is 15 years old, and this year I paid for the air conditioning to be fixed.  I don’t care what anyone says, this was essential, and there’s nothing more annoying than having something in the car that doesn’t work.  Oh, and it’s also gained a dent that’s classic city damage.  Some numpty has swung into the car parking space beside me and rubbed their bumper up against my front passenger-side wing.
  4. I’ve been on HRT for a whole month.  It’s completely transformed my life  on so far as I’m the person I was when I came here.  I’d been experiencing hot flushes, which was disturbing my sleep, and consequently my ability to concentrate on my work.  Frankly, the jury is out as to the health risks of HRT.  If you don’t have to do a demanding job (or work at all) then you can probably manage.  Otherwise, don’t suffer.  I know my symptoms will return when I come off the HRT, but when that happens will be down to me.  In the meantime, I’m in control and back as a fully-functioning human being.
  5. I have become a really good cook of vegetarian food.  The thing about being a vegetarian is that you can’t just cook what you did before and simply leave out the meat.  It’s a whole new way of thinking about food.  It’s easy, and it’s delicious, and I can honestly say that I can’t get excited over meat-based food any more.  I will eat it occasionally, and I still like the unidentified pink stuff that comes  in the middle of your average greasy sausage roll as much as I ever did.  Just not as often.  My small vegetable garden here is fully planted  up with veggies for the winter, and I’ve applied for an allotment here in Southampton.
  6. After several weeks – months, probably – of writing and re-writing code, I’ve finally written some (with some help from friends) that does what I want it to.  I now know more than I ever thought possible about the structure of web pages and how to deploy Python code to open them and gather the contents.  When I mentioned what I wanted to do to someone from the Uni back in December last year, he suggested that rather than spend lots of time learning how to do it myself, I should get someone else to do it for me.  This goes against my very soul.  I was brought up to be independent and resourceful, to always be in control and to be self-reliant.  the idea of getting someone else to write the code I need….. well, it just wasn’t going to happen.  So I’ve written a bit, tested a bit, asked for help, written a bit more, tested and tested lines of code, and now I have a working model.  Ok, it only works with one set of web pages, but I know I can modify it to work with others.   I came here to learn, and learn is what I’ll do.
  7. I will never cease to be grateful for the chance to do this.  Thank you to everyone who has had a hand in it, whether you wanted to (or anticipated the consequences of your actions) or not.  My dreams aren’t big, but I’m living them.