Data Loading
Q1. Load the data from the source file and set up the target y and predictors X as expected by scikit-learn.
Train-Test Split
Q2. ) Create a train-test 80-20 split of the data while maintaining the same target value proportion in each of the training and testing partition. You should use the training partition for the subsequent analysis and then finally use the testing at the last step for final validation using the best model found. Optional Step: Create some charts for data exploration to gain an understanding of the data with respect to the given prediction problem.
Explain your observations along with each chart. Data Preprocessing & Feature Selection/Engineering Q3. Set up a data preparation pipeline using scikit-learn to perform the following preprocessing steps. A. If there are missing values in the data, take appropriate measures. B. Select the features as follows:
Q5.Select any two classification algorithms listed in Q6 below and demonstrate how to find the “best” hyperparameters for each of these two models with grid search using 5-fold cross-validation experimenting with 1-2 parameters in each case.