Since moving to the analytics team at Khan Academy, I have endeavored to grow my knowledge and skills in machine learning and data analysis, to help balance my data science venn. Thankfully, there are quite a few free online courses available at Coursera that cover these topics in great detail. Over the second half of 2013, I completed several of these courses and wanted to write a quick review of each of them.
This course is great not only for its content, but also as an experience in the evolution of online education itself. This was the first successful MOOC put out by Stanford and became the basis of Andrew Ng and Daphne Koller founding Coursera. Each week the lecture introduces the mathematics behind each concept, goes through some visualizations to build an intuition for how they work, and then leads into how to put these tools together to make useful predictions. The course uses Octave, a free alternative to MatLab, for all of the programming assignments. You upload your completed programming assignment into the website and it immediately responds with how your code performed against the test cases. This immediate feedback loop was very beneficial in working through the homework assignments and debugging until everything was perfect. The course does a great job of exposing and building intuition for most of the fundamental concepts for machine learning, but since the programming assignments are very well contained, it is light on end-to-end model building skills.
Hours/week: 8 + 20 hours for 2 peer-graded papers
This was a great course! The lectures were full of worked examples in the R programming language, which were very helpful in portraying the key concepts while also explaining some of the tips and tricks required to get things working. The weekly quizzes were cleverly composed to ask correlated questions that required critical thinking on top of the material described in the lectures.
The analysis assignments were structured to take you through an entire workflow of visualizing and exploring data to find interesting patterns, boiling down the most important factors into a statistical model, and then communicating the entire process to interested parties. The final result was a whitepaper style report which was submitted to the website for peer grading. After the submission deadline, you were required to evaluate your own paper and four of your peers using a system of ~15 Likert scales. Your final grade was a combination of the self and peer evaluations you received. The open-ended nature of the project had me obsessively sleuthing through the datasets, while the great communication on the forums helped to pull me out of some rabbit holes when I went too deep. I spent more time on these forums than I have for any other course, and it was all time very well spent.
Although the content of this course does a good job of exploring the landscape of recent research in educational data mining, the style and depth leaves a lot to be desired. The first few weeks gave me a reason to download and try out RapidMiner, but the assignments after that were algebraic plug’n’play equations from the lecture notes. The lectures themselves were the professor reading directly from his PowerPoint slides. I found myself watching the lectures at 2x speed and then following up by skimming through the research papers that were referenced. I am glad I went through the course and think it will inspire new ideas and provide good research references, but cannot recommend it beyond that.
In the next few months, I plan to complete Computing for Data Analysis to continue honing my R skills, and Model Thinking to learn more about existing models that have proved useful. Courses high on my watch list are Probabilistic Graph Models and Social Network Analysis.
I’ll keep you updated as I make my way through these courses. Let me know in the comments if you have encountered any other particularly insightful learning resources!