Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with Statsample::Bivariate#correlation_matrix #5

Open
agarie opened this issue Mar 18, 2015 · 1 comment
Open

Trouble with Statsample::Bivariate#correlation_matrix #5

agarie opened this issue Mar 18, 2015 · 1 comment

Comments

@agarie
Copy link
Member

agarie commented Mar 18, 2015

(Original: clbustos/statsample#17)

Hi, I'm in trouble with statsample to do PCA analysis for large data. Does anyone have any good idea?

I want to do PCA alanysis with very large data. (3000 variables, 50 samples)
Then, I wrote this code.

data_raw = IO.readlines('data1.txt').map{|v| v.split }[1..-1]

hash_tmp = {}

data_raw[1..3000].each do |ary|
  hash_tmp[ary[0]] = ary[1..-1].map(&:to_i).to_scale
end

ds = hash_tmp.to_dataset

puts "Input data done!"

cor_matrix=Statsample::Bivariate.correlation_matrix(ds)

puts "cor_matrix was prepared."

pca=Statsample::Factor::PCA.new(cor_matrix)

binding.pry

But the ruby on my mac doesn't return "Cor_matrix was prepared.".
I wrote another code to investigate a cause of this.

# Opening Class to investigate where is bottleneck
module Statsample
  module Bivariate
    class << self
      def covariance_matrix_optimized(ds)
        x=ds.to_gsl
        n=x.row_size
        m=x.column_size
        puts "calculating means..."
        means=((1/n.to_f)*GSL::Matrix.ones(1,n)*x).row(0)
        puts "centering matrix..."
        centered=x-(GSL::Matrix.ones(n,m)*GSL::Matrix.diag(means))
        puts "calculating covariance matrix..."
        ss=centered.transpose*centered
        puts "calculating n..."
        s=((1/(n-1).to_f))*ss
        puts "done!"              #<= This line has executed
        s
      end



      def correlation_matrix(ds)
        vars,cases=ds.fields.size,ds.cases
        if !ds.has_missing_data? and Statsample.has_gsl? and prediction_optimized(vars,cases) < prediction_pairwise(vars,cases)
          binding.pry
          cm=correlation_matrix_optimized(ds)
          binding.pry             #<= This line hasn't executed. :(
        else
          cm=correlation_matrix_pairwise(ds)
        end
        binding.pry
        cm.extend(Statsample::CovariateMatrix)
        binding.pry
        cm.fields=ds.fields
        binding.pry
        cm
      end
    end
  end
end

Then the Ruby return until "done!" and doesn't return from Statsample::Bivariate#covariance_matrix_optimized method.
I haven't seen a Ruby method which doesn't return.

If someone knows a way to solve this problem or investigate cause deeply, please tell me.

@v0dro
Copy link
Member

v0dro commented Jan 24, 2016

has this been solved in the latest release? could you check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants