Identify the best technology to conduct data conversion, data cleaning, and data munging. Apply those techniques to your selected dataset and produce a single merged dataset for further analysis.
Identify the research question/or a broader goal and what characteristics (variables) you will need to study.
Identify the need or a potential for a need in distributed computing in order to store, manipulate, or analyze data.
Conduct the preliminary analysis by running one of the data mining techniques (e.g. clustering, or regression).