Skip to content

Kohze/TFBoard_Word2Vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Tensorflow Board + Word2Vec

This is a full example the tensorflow board with Word2Vec as seen in the google AI Experiments Video AI Experiments Video


Introduction

Word2Vec is one of the common used NLP machine learning tools to cluster words based on co-occurance (skipgram model). Due to the programmatic nature of Word2Vec, the resulting word vectors are high dimensional (>200 dimensions). To make those word vectors human interpretable common dimensionality reduction techniques like PCA and T-SNE are regularly applied.

With the Google TF Board, users have the chance to see the stepwise learning process of the algorithm and to study the word vector graphics in a D3.js based interactive interface.

In this repo the advanced word2vec.py example from tensorflow was taken and connected to the TF board by writing a pipeline with all neccessary training steps & browser opening.

Build pre-requisites

Tensorflow 1.0
g++ compiler (latest)
python 2.7

Installation Steps:

  • git clone https://github.com/tensorflow/models
  • Follow the word2vec steps according to the manual by downloading the example text and g++ compiling (execute those steps in the /tutorials/embedding folder)
  • copy the pipeline.py and word2vec.py from this repo into the /tutorials/embedding folder
  • Run python pipeline.py --epochs_to_train 3 --train_data text8 --eval_data questions-words.txt --save_path /tmp/log in the command line

Result:

The standart interface (with T-SNE)

alt tag

The focused interface (with PCA)

alt tag

Next Steps:

  • add document2vec algorithms
  • improve speed & efficiency of the word2vec.py file

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages