A taste of "install hell."
The install would be:
- Clone the repository.
- If an old environment is present:
conda remove --name a_taste_of_data_science --all
conda env create -f a_taste_of_data_science.yaml
And that would be it. Some days this works, some days it does not.
Honestly, installing and versions are a bit of a nightmare for Python, Jupyter, and some combinations of data science packages.
Our instructions are oriented towards a MacOS system. Similar should work on Linux or Windows, however Linux does also have its own package managers.
The basic install idea is:
git clone [email protected]:WinVector/ATasteOfDataScience.git
- Read XKCD 1987: https://xkcd.com/1987/
- Install Anaconda from https://www.anaconda.com
- Start the
Anaconda Navigator
App (either installed where applications are installed or in HOME/opt) - Select the
Environments
panel - Press
Import
and use the pop-up file browser to importdata_science_examples.yaml
- Return to the
Home
panel
To run we then, in a running Anaconda Navigator
:
- Make sure the
Applications
pull-down is on data_science_examples - Click Launch on the
JupyterLab
pane (if that fails one can fall back toJupyterNotebook
)
The installation YAML is data_science_examples.yaml, and the exact versions used (listed via conda list
) is data_science_examples_versions.txt.
We suggest re-running some of the example .ipynb
files to see if the install is working.
Installing software is pain. However, we feel it is worth the effort when possible.
However, always using remote services and pre-built containers has its own risks and promotes a learned helplessness. By working through a single install once we are trying to isolate many issues into one session. Also, installing must be possible- else how are remote services and containers provisioned in the first place?
Everything needed to re-run the examples is installed by the above instruction.
The only variation from this, is to use train on a different data set using GloVe encodings one needs to download glove.840B.300d.zip
into data/GloVe
from https://nlp.stanford.edu/projects/glove/ . We have not automated this as a courtesy to the authors.
Find out which of your home startup dot-files Anaconda wrote "added by Anaconda" into, and copy this code into which startup file is actually executed on shell startup. Candidates include: .bash_profile
, .zsh_profile
, .profile
, .bashrc
(depending on your system).
Note: to get JupyterLab from Anaconda to run on a Mac we have found one must run once:
conda activate data_science_examples.yaml
jupyter server extension disable nbclassic
on the command line in the conda environment (source).
Baring that, one can run JupyterNotebook or VSCode.
Make sure one has selected the data_science_examples environment.
This is a version incompatibility between Tensorflow
and numpy
. The web-advice is to pin numpy at something like 1.9.1
(ref).
We, instead, moved forward from the conda versions to pip versions that seem to be past this era of incompatibility. Exact versions know to work for us are here.
One no longer imports from Keras when using Tensorflow. Instead one imports Tensorflow's Keras API adapters as:
# used to be: import keras
import tensorflow.keras as keras