This is a full example the tensorflow board with Word2Vec as seen in the google AI Experiments Video AI Experiments Video
Word2Vec is one of the common used NLP machine learning tools to cluster words based on co-occurance (skipgram model). Due to the programmatic nature of Word2Vec, the resulting word vectors are high dimensional (>200 dimensions). To make those word vectors human interpretable common dimensionality reduction techniques like PCA and T-SNE are regularly applied.
With the Google TF Board, users have the chance to see the stepwise learning process of the algorithm and to study the word vector graphics in a D3.js based interactive interface.
In this repo the advanced word2vec.py example from tensorflow was taken and connected to the TF board by writing a pipeline with all neccessary training steps & browser opening.
Tensorflow 1.0
g++ compiler (latest)
python 2.7
- git clone https://github.com/tensorflow/models
- Follow the word2vec steps according to the manual by downloading the example text and g++ compiling (execute those steps in the /tutorials/embedding folder)
- copy the pipeline.py and word2vec.py from this repo into the /tutorials/embedding folder
- Run
python pipeline.py --epochs_to_train 3 --train_data text8 --eval_data questions-words.txt --save_path /tmp/log
in the command line
The standart interface (with T-SNE)
The focused interface (with PCA)
- add document2vec algorithms
- improve speed & efficiency of the word2vec.py file