Assignment 5 due by 11:59pm on sunday April 8th

Machine Learning – CS 158 Programming Assignment 5 – Expectation Maximization (60 Points) The purpose of this exercise is to allow you to get further acquainted with expectation maximization. This assignment is to be completed SOLO or in PAIRS. On this assignment, you will be building on your learner.py file that you used in assignments 2 and 3 and your dataset.py file that you used in all past assignments. Additionally, you’ll write a very short document, answering some questions listed below (called writeup.txt or .pdf). Submit these three files in the usual way. This week your exercise will be implementing the Expectation Maximization algorithm with a Naïve Bayes model. As we discussed in class (http://www.cs.pomona.edu/classes/cs158/resources/158-10(EM).pdf), the benefit of Expectation Maximization is that you can begin with only a small amount of labeled training data, and be able to make use of unlabeled training data. Of course, for all of your datasets, you have completely labeled data. In this task, you’ll have to assume that some of the data is unlabeled. Also, there is NO NEED to implement a Naïve Bayes Classifier for this assignment – you should already have one from your completion of Assignment 3. The implementation outlined below differs slightly from that which we discussed in class as our implementation uses hard assignment whereas the version in class used soft assignment. The following are steps that you’ll need to complete in this assignment: 1. Create an ExpectationMaximizationLearner class (similar to the other Learner classes that we’ve already implemented). This class must include: a. A train method that takes in a dataset (containing labeled data) as well as another dataset (containing unlabeled data). This method is where the actual EM algorithm should be implemented. Here you’ll want to iterate between the Expectation Step (labeling all unlabeled examples) and the Maximization Step (recomputing your Naïve Bayes Model) until it converges on a Naïve Bayes Model. b. A predict method that takes in an example and labels it using the Naïve Bayes model that has been computed. 2. Write an em_cross_validation function to do k-fold cross validation on your Expectation Maximization implementation. a. In addition to the arguments already passed in to the existing cross_validation function, this function should take in an additional argument num_labeled, which tells you how many examples to partition into the labeled dataset (keeping the rest of the training examples as unlabled).

b. Keep in mind that you can copy and modify your existing cross_validation function for this task. 3. Use your em_cross_validation function to evaluate your Expectation Maximization function using 1/20th, 1/10th, 1/2, and all of the wine and iris datasets as labeled training data (using the remaining portion of the dataset as unlabeled training data). Report your results as a chart in your writeup file. Extra Credit – 3 points The above outlined implementation differs slightly from that which we discussed in class as our implementation uses hard assignment whereas the version in class used soft assignment. As a bonus, create another version of the em classifier that uses soft assignment.