Skip to content

Implementation Algorithmic Explanation

Sam Nosenzo edited this page Apr 30, 2018 · 3 revisions

t-SNE Algorithm (t-Distributed Stochastic Neighbor Embedding)

At it's core what the algorithm does is it reduces the dimensionality of vectors while trying to preserve the same cluster's in the higher dimensionality. There are many other algorithms that have this same purpose like:

  • PCA (linear)
  • Sammon mapping (nonlinear)
  • Isomap (nonlinear)
  • LLE (nonlinear)
  • CCA (nonlinear)
  • SNE (nonlinear)
  • MVU (nonlinear)
  • Laplacian Eigenmaps (nonlinear)

t-SNE is one that is non-parametric and nonlinear. How it works is that it identifies patterns in the higher dimensionality data by finding clusters based on the similarities of the vectors within those clusters. It uses these clusters and the similarities within them as a comparison metric for when it maps the points to fewer dimensions. However, in enforcing the fewer dimensions to represent the data it loses any semblance of what the original numbers that used to represent it looked like. The only thing that t-SNE tries to preserve is the relative distances within the clusters and sometimes to neighboring clusters.

This algorithm doesn't have a finite time that it can run to find a solution because it's essentially a search algorithm. At the initial step it initializes all of the points initial vector in 3D space, then it iterates. At each iteration the algorithm is comparing all of the distances between the points to what they were before being mapped to a smaller dimension, and calculating an error that is the difference in similarities of the lower dimensionality and the higher dimensionality. Based on the learning rate (theta), it will move all of the points in a way to attempt to reduce that error value. Over many iterations the error becomes lower and lower and the points start to converge into their respective clusters that existed in the higher dimensionality.

The t-SNE implementation is the same used in Gene Kogan's ofxTSNE. As the basis for this project, I actually forked his code, and adapted it to my needs. The visualization iterates the algorithm every frame, and that creates the movement seen. However, I have programmed in a start point for the application to start the path from a specific iteration because the first few hundred iterations of t-SNE result in very erratic behavior that doesn't lend itself to the form I'm trying to emphasize in this project.