Now that I'm gathering data, I'm starting to look at ways to analyze it. My first attempt has been to take some off-the-shelf tools to try to see how students are moving through the website. I've got a nice pile of data to look at (535 megabytes of BZip2 compressed log data), so I decided to analyze a day's work of logs using Statviz. I've come to a couple of conclusions:
- I need to put a lot more work into data cleaning and massaging before I'll get useful results out of this data. There is just too much noise in these logs to give useful output
- I need a faster machine! I set Statviz running on my PhD machine, and gave up three days later when it just kept processing. I started up a run on another, much faster machine in the house, and took over 24 hours, but it eventually came up with a result.
As an example, here's one of the graphs generated from the log files:
It's substantially shrunk down from the original, which is 49680 by 6460 pixels in size.

No comments:
Post a Comment