Report length: 2500 words with flexibility (excluding references and appendix).
Assignment Requirements:
Question.
You are given a training dataset, “trainDataset.csv”, and a testing dataset, “testDataset.csv”, which will be provided in electronic form. The data are extracted and pre-processed from the original Titanic dataset. The attributes of each object (a passenger in this case) are defined as follows:
• Survived: represent whether the passenger survived (1) or not survived (0);
• PC (Passenger Class): the class of the passenger on ship;
• Sex: indicate the passenger’s sex;
• Age: indicate the passenger’s age group at the time of ship departure;
• SS (Sibling Spouse): indicate the number of Siblings/Spouses that the passenger has on the ship;
You are required to apply decision tree classification technique and the association rule evaluation to the above case appropriately. Specifically, you are required to:
1. Use the training dataset, apply the basic Hunt’s Algorithm to train a fully-grown decision tree model, where the selection of attributes should follow the sequence: PC -> Age -> Sex -> SS. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction).
2. Use the training dataset, apply the Greedy strategy combined with the Gini impurity measure to rebuild a fully-grown decision tree. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction). Samples of the calculations and explanations should be provided to demonstrate the application process of the Greedy strategy and Gini impurity measure.
3. Use the test dataset to test two fully-grown decision tree models, and discuss the results.
4. Perform the post-pruning activities to two fully-grown decision trees by applying the following rules: (i) prune any sub-tree if its leaf nodes have the same class label, and (ii) prune any sub-tree if the number of objects (passengers) at each leaf node is not more than one. After pruning, please test two pruned decision trees using the test dataset. Discuss the results.
5. From two pruned decision trees, extract the association rules for each leaf node based on the information on the path from the root node to the leaf node in the decision trees. Evaluate the support, confidence, and lift of the identified association rules using the training dataset. Discuss the results.