An analysis that tidies a data set.
The files in the repo are:
- data/UCI HAR Dataset - contains copies of the original readme and features file.
- CodeBook.md - a thorough description of the analysis and a list of the variables.
- tidySet.txt - a copy of the independent set output from run_analysis.R
- README.md - this file
- run_analysis.R - The R script that completes the data transformation
To simplify analysis, this repo contains only one R script to complete the analysis. The details are laid out in CodeBook.md; but the steps to transform the data are as follows:
- Import the test/training data, labels(activity codes) and subject.
- Import the activity and feature labels.
- Attach the subject and activity codes to the data sets.
- Merge the data sets with
rbind()
- Attach the feature labels to the merged data set.
- Map the activity names to the activity code using
plyr::mapvalues()
. - Filter and keep only data columns that match
subject
,activity
,mean()
andstd()
. Columns that do not match these expressions are discarded. - Clean the activity and feature labels with
gsub
andgrep
. Clean variable/label names are free of spaces; and special characers such as-
,()
and_
. In addition, clean names will follow the formatfirstSecondThird
. - The data is reduced to molten form via
melt()
, with id variablessubject
andactivity
. All remaining variables are considered measured variables. - The molten/tall data is then recast - and aggregated with
mean
- into its wide form viadcast()
. - Finally this separate, aggregated data is written to a file called tidySet.txt.
Questions or comments can be directed to [email protected]