-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Stats Functions #1732
Comments
t-test-pooled, t-test-independent, chi-square brownplt#1732
Thanks, @ds26gte! Can you add some tests, please? |
…lt#1732 - statistics.arr: added exceptions for t-test-{pooled, independent}
@ds26gte awesome to see this progress! I'm still hoping we can add a z-test function as well (see checklist in the issue). |
@team, the z-test seems to require, in addition to the two samples, also the population (rather than the sample) variances. Please add what you think are the right arguments for the z-test and the other functions that I've already added. |
@ds26gte waiting to hear back about the desired contract from one of the teachers who requested these functions, which should give me a sense for whether these are close enough to what they need that I couldn't bridge the gap in a teachpack. Will wait to hear back. |
@ds26gte I spoke with Nancy Pfenning today, who gave the following descriptions of what the inputs to various functions should be: z-test: list of numbers, stddev, hypothesized mean I think this is all inline with what you have, with the exception of the z-test. Can you double-check your implementation, and let me know why it has two lists of numbers? |
list of known x-inputs, and a list of the corresponding known y-outputs, and returns a predictor function that takes a list of x-inputs and returns its estimated y-output brownplt#1732 - js/trove/multiple-regression.js contains the JS implementation of multiple-regression and all its matrix subroutines - tests/test-statistics.arr: added a basic test (can add more from curriculum examples, when these are added)
representing one input (setting of indep vars to values). The returned predictor fn also takes an N-tuple brownplt#1732
multiple-regression.js: clean-up w/ better row/col indexing names
- check mulreg test on 1 var matches our linreg on same var - add mulreg test for 2 vars statistics.arr: add pointers to docs for formulas used
…ts arg tuple's elts are numbers brownplt#1732
@ds26gte Sorry for the delay on this! I was hoping to hear back from the teacher who was requesting them, but they're overwhelmed with end-of-year stuff so I hopped on Zoom with Joy instead. :) Below are the contract and purpose statements for the various functions that Bootstrap would export:
You'll want to replace |
(BTW, our naming needs to move away from contrasting linear against multiple. They are both linear -- it's actually single vs multiple.) |
@ds26gte good call. I propose |
Looks like at least the googleable literature also contrasts linear against multiple. To be sure, multiple-regression desribes an n-dimensional plane, which is not, in a geometric sense, linear. On the other hand, even in the single-dimensional case, we can contrast linear against quadratic and other higher powers, which we don't use. Essentially, our code and curriculum only deal with predictor functions that operate on one or multiple independent variables, but in both cases only take the first power of the independent variable(s). We want names that capture this and also don't mislead. |
OK, apropos the various The score gives us an abscissa to associate with our sample. The confidence level identifies one or two contiguous areas under the probability density function (normal, t, F, etc). The tailness is additional input that helps us identify this area. We then find the terminus abscissa associated with this area. Finally, we check if our own sample's abscissa is on the correct side of this terminus abscissa. So the test's result is a boolean. As a coding task, what we need is the ability to find an abscissa given an area. At a lower level, this means finding the root ("zero") of the difference of the integral of the function (with one integration bound varying) against a known area. This requires me to implement a suitable numerical integration function and a Newton-Raphson interpolation function. Both of which I can do, but it is a big undertaking, so... Do we want to do this? Could you check with Nancy or our curriculum goals. (The current texts don't mention anything, but maybe I'm not grepping expertly.) |
Latest changes to z-, t- and chi- functions in commit 207d18b. Using Note: if the original spec setter did mean Important: there is a non-glaring typo on the Investopedia website in its formula for the pooled t-test. So I've checked all the |
@ds26gte In Bootstrap:DS, everything is done via tables. The previous domain of The current domain of Can we bring the domains of LR and MR into alignment, so that both consume a list of values on different axes? |
This is a 1-line wrapper for you, e.g.
|
But that only works for 2 lists. What about 10? 20? |
Do you actually have any such scenarios in BS:DS? |
If we never needed more than x and y, we'd be happy to stay with linear regression. The whole point of adding multiple regression is allow for such scenarios, right? And if 10 is extreme, how about 5? 4? At some point relying on I have a solution that does what I want already, but I have real concerns about doing all this list munging in Pyret instead of JS. For a table with 5k rows, even a 3 column MR will require a pretty huge number of swaps in memory. |
First of all, no, the signature for multiple regression does not currently use tuples at all: pyret-lang/src/arr/trove/statistics.arr Line 227 in 448cfdf
It's a list of lists of numbers, where each inner list is an individual sample of the data. You want the transpose of this, if you're trying to extract columns and do it that way. Second, @ds26gte , the easiest way for you to support this is to implement
that does the same thing as
that has converted Third, @schanzer , you should use this API via |
I know MR doesn't use tuples - the issue is having to transpose tens of thousands of cells into the list format MR needs, and having to do it all in Pyret when it feels like this is a task for JS. Having this supported in the stats library as you propose would be fantastic. @ds26gte is this something you can add? If so, I'll use the proper select-columns API to pass you the right table |
@ds26gte nevermind -- Ben and I spoke by phone, and he explained that I'm worrying about the wrong performance hit. If it's going to be slow anywhere, it'll happen in the matrix inversion. I'm ready to sign off on this as-is, and if we find a real dataset for which this is a problem we can revisit the issue. Thanks for all your work on this! |
Issue brownplt/code.pyret.org#520 filed by @schanzer
We've had a few teachers ask if Pyret supports various stats functions:
Getting these implemented as a Pyret program would be great, but implementing them as part of Pyret's stats library would be much better.
(In keeping with the other stats functions, these should all operate on lists. I'll wrap them to work with tables in the DS teachpack.)
The text was updated successfully, but these errors were encountered: