6 Boosted Decision Trees and Random Forest (Henry)
In this assignment, you are going to fit and tune random forest and boosted decision trees to predict whether a given passenger survived the sinking of the Titanic based on the variables provided in the data set. You may use sklearn and gridsearch.
6.1
First, download the Titanic.csv file from Sakai, and drop the columns that have “na” values from the data set. Convert the gender variable into a binary variable with 0 representing male, and 1 representing female.
Then split 80% of the data into train and 20% of the data into test set.
6.2
Fit a random forest and boosted decision tree model on the training set with default parameters, and
estimate the time it required to fit each of the models. Which model required less time to train? Why do
you think that is? (Hint: look at the default values of the parameters.)
6.3
Choose a range of parameter values for each of the algorithms (tell us what parameters you played with),
and tune on the training set over 5 folds. Then, draw the ROC and provide the AUC for each of the
tuned models on the whole training set (in one figure) and test set (in another figure). Comment on the
similarities/di↵erences between the two models (if any).