Article made in LaTeX - "Data Analysis and Their Visualization", Unicorn University, Winter 2022/2023
Attached: Article: pdf R code: R CSV file: Wine Catalogue Dataset - original has over 320.000 records - due to its extensive size the sample of only 120.000 records was uploaded in GitHub
Content:
- Large dataset cleaning
- Regression tree analysis.
The regression tree variables:
- Wine price - dependent variable
- Wine category and country of wine production - two independent variables.
Four hypothese testing:
- The hypotheses on the dessert wine to be the most expensive wine category was confirmed
- The hypotheses on white wines being less expensive than red wines was also confirmed.
- The hypotheses that Italy produces the most expensive wines was confirmed only partially as it depends on the category of wine.
- The hypotheses that the regression tree model was more complex than regression tree model from reduced dataset was not confirmed.