The goal of the assignment is to do the exploratory data analysis on your selected datasets. The purpose of this analysis to explain your dataset to someone who is interested in the topic the dataset is collected for and highlight the interesting properties of the dataset through charts, slices, and summaries. The analysisshould reveal interesting aspects of the data (e.g. worst reviewed airports have this and that properties)
- Create an overview of the dataset covering the columns, data types of the columns, values in the categories (in case of categorial columns), and the size of the dataset.
- Clean the data if necessary (including null values).
- Make sure the columns are in proper data types.
- Use the Tidy data definition and use it to analyze if your dataset is in tidy or not. Give reasons and convert it to tidy format if necessary.
- Merge, melt, and pivot the dataset if necessary.
- Create highlights on the datasets using grouped statistics.
- Create subsets of the dataset that might tell something interesting.
- Use charts and plots to visualize distributions of interesting variables or distributions of variables against each other.
- If your dataset contains textual data, use regular expressions to select a subset of the dataset that is of interest.
- Use apply function to aggregate, transform or filter your dataset with your custom functions.
- If they are strings values in your dataset perform some analysis using regular expressions or NLP.