Problem: Given an IMDB dataset with reviews, we need to estimate sentiment
root
└─── data
│ │ └───dev
│ │ | feed_count.npz
│ │ | feed_tfidf_lite.npz
│ │ submission.csv
│ │ test.csv
│ │ train.csv
└─── notebooks
│ data.ipynb
│ model.ipynb
│ model_ensamble.ipynb
- data.ipynb - notebook for data cleaning and efficient preprocessing.
- model.ipynb - notebook with model selection.
- model_ensemble.ipynb - notebook for estimation of performance improvements via usage of ensemble techniques.
- feed_count.npz - preprocessed docs and vectorized with CountVectorizer
- feed_tfidf_lite.npz - preprocessed docs and vectorized with TfIdf.