Principal Component Analysis (PCA)

Principal component analysis in Ruby. Uses GSL for calculations.

PCA can be used to map data to a lower dimensional space while minimizing information loss. It's useful for data visualization, where you're limited to 2-D and 3-D plots.

For example, here's a plot of the 4-D iris flower dataset mapped to 2-D via PCA:

PCA is also used to compress the features of a dataset before feeding it into a machine learning algorithm, potentially speeding up training time with a minimal loss of data detail.

Install

GSL must be installed first. On OS X it can be installed via homebrew: brew install gsl

gem install pca

Example Usage

require 'pca'

pca = PCA.new components: 1

data_2d = [ 
  [2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0],
  [2.3, 2.7], [2.0, 1.6], [1.0, 1.1], [1.5, 1.6], [1.1, 0.9]
]

data_1d = pca.fit_transform data_2d

# Transforms 2d data into 1d:
# data_1d ~= [
#   [-0.8], [1.8], [-1.0], [-0.3], [-1.7],
#   [-0.9], [0.1], [1.1], [0.4], [1.2]
# ]

more_data_1d = pca.transform [ [3.1, 2.9] ]

# Transforms new data into previously fitted 1d space:
# more_data_1d ~= [ [-1.6] ]

reconstructed_2d = pca.inverse_transform data_1d

# Reconstructs original data (approximate, b/c data compression):
# reconstructed_2d ~= [
#   [2.4, 2.5], [0.6, 0.6], [2.5, 2.6], [2.0, 2.1], [2.9, 3.1]
#   [2.4, 2.6], [1.7, 1.8], [1.0, 1.1], [1.5, 1.6], [1.0, 1.0]
# ]

evr = pca.explained_variance_ratio

# Proportion of data variance explained by each component
# Here, the first component explains 99.85% of the data variance:
# evr ~= [0.99854]

See examples for more. Also, peruse the source code (~ 100 loc.)

Options

The following options can be passed in to PCA.new:

option	default	description
:components	nil	number of components to extract. If nil, will just rotate data onto first principal component
:scale_data	false	scales features before running PCA by dividing each feature by its standard deviation.

Working with Returned GSL::Matrix

PCA#transform, #fit_transform, #inverse_transform and #components return instances of GSL::Matrix.

Some useful methods to work with these are the #each_row and #each_col iterators, and the #row(i) and #col(i) accessors.

Or if you'd prefer to work with a standard Ruby Array, you can just call #to_a and get an array of row arrays.

See GSL::Matrix RDoc for more.

Plotting Results With GNUPlot

Requires GNUPlot and gnuplot gem.

require 'pca'
require 'gnuplot'

pca = PCA.new components: 2
data_2d = pca.fit_transform data

Gnuplot.open do |gp|
  Gnuplot::Plot.new(gp) do |plot|
    plot.title "Transformed Data"
    plot.terminal "png"
    plot.output "out.png"

    # Use #col accessor to get separate x and y arrays
    # #col returns a GSL::Vector, so be sure to call #to_a before passing to DataSet
    xy = [data_2d.col(0).to_a, data_2d.col(1).to_a]

    plot.data << Gnuplot::DataSet.new(xy) do |ds|
      ds.title = "Points"
    end
  end
end

Sources and Inspirations

A tutorial on Principal Components Analysis (PDF) a great introduction to PCA
Principal Component Analyisis Explained Visually
scikit-learn PCA
Lecture video and notes (requires Coursera login) from Andrew Ng's Machine Learning Coursera class
Implementing a Principal Component Analysis (PCA) in Python step by step
Dimensionality Reduction: Principal Component Analysis in-depth

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
examples		examples
lib		lib
test		test
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
MIT-LICENSE		MIT-LICENSE
README.md		README.md
Rakefile		Rakefile
pca.gemspec		pca.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Principal Component Analysis (PCA)

Install

Example Usage

Options

Working with Returned GSL::Matrix

Plotting Results With GNUPlot

Sources and Inspirations

About

Releases

Packages

Languages

License

gbuesing/pca

Folders and files

Latest commit

History

Repository files navigation

Principal Component Analysis (PCA)

Install

Example Usage

Options

Working with Returned GSL::Matrix

Plotting Results With GNUPlot

Sources and Inspirations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages