Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exceptions-based exceptions #141

Open
ocramz opened this issue Mar 28, 2018 · 13 comments
Open

exceptions-based exceptions #141

ocramz opened this issue Mar 28, 2018 · 13 comments

Comments

@ocramz
Copy link
Contributor

ocramz commented Mar 28, 2018

Rather than Nothing/0/NaN etc. (the first option being way better than the others), it would be great to generalize code that may throw to the MonadThrow class from exceptions.

This way, functions using throwM (e :: Exception) would have the signature MonadThrow m => ... -> m ( ... ), where m may become Maybe, or Either e, or even IO, according to the calling context.

@ocramz
Copy link
Contributor Author

ocramz commented Mar 28, 2018

Related: #128 , #100 , #111 , #118 ...

@Shimuuar
Copy link
Collaborator

That's excellent suggestion!

@ocramz
Copy link
Contributor Author

ocramz commented Jul 19, 2018

I've started addressing this here: https://github.com/DataHaskell/statistics/tree/exceptions-not-error

@Shimuuar
Copy link
Collaborator

Shimuuar commented Jul 20, 2018 via email

@ocramz
Copy link
Contributor Author

ocramz commented Jul 20, 2018

Yes, I noticed, error is used pretty much throughout. We could skip refactoring the input validation parts for now (i.e. zero input size or negative parameters etc.) and focus on the important ones, e.g. the NaN correlations etc. For example, I've replaced Sample.correlation with this:

-- | Correlation coefficient for sample of pairs. Also known as
--   Pearson's correlation. For empty sample it's set to zero.
correlation :: (G.Vector v (Double,Double), G.Vector v Double, MonadThrow m)
           => v (Double,Double)
           -> m Double
correlation xy
  | n == 0    = pure 0
  | nearZero varX = throwM $ NaNE "Variance of X == 0"
  | nearZero varY = throwM $ NaNE "Variance of Y == 0"
  | otherwise = pure corr
  where
    corr = cov / sqrt (varX * varY)
    n       = G.length xy
    (xs,ys) = G.unzip xy
    (muX,varX) = meanVariance xs
    (muY,varY) = meanVariance ys
    cov = mean $ G.zipWith (*)
            (G.map (\x -> x - muX) xs)
            (G.map (\y -> y - muY) ys)
{-# SPECIALIZE correlation :: U.Vector (Double,Double) -> Maybe Double #-}
{-# SPECIALIZE correlation :: V.Vector (Double,Double) -> Maybe Double #-}

@ocramz
Copy link
Contributor Author

ocramz commented Jul 20, 2018

@Shimuuar would you like to join forces on this? I don't have an efficient implementation in mind for Matrix.generateSym , though

@Shimuuar
Copy link
Collaborator

Shimuuar commented Jul 20, 2018 via email

@ocramz
Copy link
Contributor Author

ocramz commented Jul 24, 2018

Hi @Shimuuar :) as discussed, if you point me to your working branch for this we can figure out how to collaborate :)

@Shimuuar
Copy link
Collaborator

I just pushed branch exception2 (exception was complete failure). It's mostly complete except for

  • Statistics.Sample some functions are commented out and I'm thinking about using type classes from monoid-statistics for things like calculation of mean and variance in single call (saving one evaluation of mean). Having dedicated functions is not terribly good since in that case we have combinatorial explosion.
  • Resampling. Again I'm thinking about jackknife which is clearly monoidal (although it's obscured by API)
  • Bootstrap didn't even touch it
  • Regression depends on resampling
  • KruskalWallis test
  • Few other thing I certainly forgot about

monoid-statistics is in rather poor state currently. I got lost in figuring out numeric precision and performance of different algorithms for variance

@ocramz
Copy link
Contributor Author

ocramz commented Jul 25, 2018

@Shimuuar Re. monoid-statistics ; did you know of foldl-statistics? https://hackage.haskell.org/package/foldl-statistics

@Shimuuar
Copy link
Collaborator

Yes. Main difference is monoid-statistics exposes accumulator types and allows to merge estimates with several data set without refolding them.

@ocramz
Copy link
Contributor Author

ocramz commented Jul 25, 2018 via email

@Shimuuar
Copy link
Collaborator

Why, of course! Without benchmarks all performance statements are just hopes and prayers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants