Sunday, March 31, 2013

How MOOCs will take over the world

Everyone in the e-learning world is talking about MOOCs right now. In a lot of ways they aren't anything new; just the stuff we've been doing for a decade or two, scaled up rather dramatically. They are succeeding and failing in interesting ways - they are getting very large numbers of enrolments, but they have enormous drop-out rates. There are all sorts of questions and assertions floating around, ranging from "are they a fad, with no quality learning emerging from them" to "they are going to destroy all Universities within five years".

In reality, they are just a segment of the existing spectrum of learning experiences. In fact it's more than a spectrum - there are multiple dimensions in play. Perhaps something like this would start to give a picture of where they sit:

The claims that MOOCs will take over the world are driven largely by economics, ignoring the various types of learning experience that exist, and the value each of them delivers. The argument is basically that if Stanford can deliver the best quality lectures on a topic, and can give them out for free, then other Universities won't be able to compete, and people will stop coming to Unis.

There are two reasons we enrol in a course. Firstly, to learn; and secondly, to be able to claim knowledge. The 10% or so of people currently finishing their MOOC courses are interested in the first part - the learning. Currently a certificate of completion from a MOOC doesn't hold a lot of value in the employment market, so the ability to claim knowledge from these courses is a bit tenuous. I believe that this is a very significant factor in the drop-out rates - it doesn't matter if you fail a MOOC, and further, it doesn't matter if you pass. So the second reason for enrolling, that extrinsic motivator which pushes us through the hard parts and the boring parts, is a lot weaker for MOOCs. We won't start to see more normal patterns of completion in MOOCs until this changes.

The thing that will change it is employer acceptance of MOOC courses as valid indicators of expertise in a topic. This will be a gradual process, as awareness of these courses becomes greater.

(Warning: here's where I start waffling about what would happen in an ideal world). What is really needed is some decent certification of courses - probably to be done by the industry bodies that already certify University degrees in many disciplines. To do it properly, you'd need a set of standard learning objectives in each discipline, and then each MOOC could get itself certified to a certain level against those learning objectives by these independent industry bodies. Only by doing this will learners be able to objectively demonstrate that a collection of courses they have done - from MOOCs, Universities and commercial training bodies - is equivalent to a University degree. What's more, it opens up the ability to demonstrate narrower or wider expertise, or cross-disciplinary expertise.

It's an interesting time - I'm looking forward to seeing what happens in this space. I'd hate to see Universities go, but I'd love to see degree structures shaken up and for people to have the ability to create their own degrees from a range of sources.

Sunday, March 24, 2013

CAM: both easy and bewildering

As I'm sure I've mentioned before, I have several data sources to analyse for my PhD research - log file data, and database records of postings, comments, and ratings. In my reading of the literature I came across CAM - Contextualized Attention Metadata. It looks like exactly what I'm  looking for to do my analysis - a generalized schema for tracking information about learner's interactions with learning tools (and each other). It's a sensible schema, and there are a few pages about it on the net:

Fraunhofer Institute
CAM Schema (seems to be also done my the fine folks at Fraunhofer)

As well as a few academic papers that describe the schema, such as:

Muñoz-Merino, PJ, Pardo, A, Kloos, CD, Munoz-Organero, M, Wolpers, M, Niemann, K & Friedrich, M 2010, ‘CAM in the semantic web world’, ACM Press, New York, New York, USA, p. 1.

Scheffel, M, Niemann, K, Pardo, A, Leony, D, Friedrich, M, Schmidt, K, Wolpers, M & Kloos, CD 2011, ‘Usage Pattern Recognition in Student Activities’, Springer Berlin Heidelberg, pp. 341–355.

However, the information available online about it seems awfully sparse, compared to my experiences with open source software. There are a bunch of downloadable items, but no real "getting started with CAM" documents, not SQL create scripts, etc. The SQL binding is distributed as a PNG file or GraphML, not as SQL. This is not a major hurdle; just not what I'm used to from experiences as a software developer. I spent a couple of hours searching for some sample SQL create scripts; in the end I spent twenty minutes writing my own based on the available documentation.

The other oddity is the number of different versions of the schema, which seem to differ quite radically. I've settled on version 1.5, this one:

though I think I'll need to make some changes (the same Item is referenced by events in different Feeds; rather than keeping separate Items for the separate Feeds, I'd rather link the feed to the Events table).

It seems pretty good - there is space in the schema for most of the things I'd like to log. I've had a chat to Abelardo Pardo who, in addition to being very intelligent, is an author on both the papers mentioned above, and he seemed to agree that the schema isn't something set in stone; it's a flexible framework that allows you to use various components when you need them. The version he was using also had a User table, which will also be useful and which I'm considering adding to my existing schema.

So I have my data in there. I'm very pleased with my session splitting code (for detecting user sessions in the log data). My importer is loading over a thousand records per second* (which is important when each day's logs are five to thirty thousand lines - over half a million lines for January, which is a very light month in terms of site usage). Currently I'm recreating the database each time and loading all the data, but at some point I'll set it up to run with each day's logs as they are created, and just have a database sitting there with all my data, ready to go. Also on the list to investigate (at Abelardo's suggestion) is a NoSQL database - which apparently simplifies queries and improves performance.

* From the latest import: "636600 events inserted in 416 seconds - 1530 events per second". I am fond of my shiny new iMac.