Unsupervised learning is where you only have input data (X) and no corresponding output variables. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data. These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data. Unsupervised learning problems can be further grouped into clustering and association problems. Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
In this project I had to work with real data from Arvato Financial Solution. The data concerns a company that performs mail-order sales in Germany. I had to identify population groups that are most likely to be buyers of their products for a mailout campaign. I used unsupervised learning techniques to group the population into clusters and see which clusters comprise the main userbase of the population.