-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Allow cum_returns to accept DataFrame #39
Conversation
@@ -140,11 +142,11 @@ def cum_returns(returns, starting_value=0): | |||
if len(returns) < 1: | |||
return type(returns)([]) | |||
|
|||
if np.isnan(np.asanyarray(returns)[0]): | |||
if np.any(np.isnan(np.asanyarray(returns)[0])): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twiecki Can you comment again on this (and the corresponding change below)? Not clear to me what's expected in the new test case with one of the returns streams starting with a nan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only catch the case where the whole first row are nan
s, as it's the only case produced by .pct_change()
.
@@ -1019,6 +1019,62 @@ def empyrical(self): | |||
return ReturnTypeEmpyricalProxy(self, (pd.Series, float)) | |||
|
|||
|
|||
class TestDataFrameStats(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems surprising for readers to have this 2D TestCase split up the 1D TestClasses. What do you think about moving it below them?
|
||
@property | ||
def empyrical(self): | ||
return ReturnTypeEmpyricalProxy(self, (pd.DataFrame)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add subclasses that test the other inputs we (I assume) now support, like ndarray
s of more than 1 dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal - the extra parens here aren't necessary.
4) | ||
|
||
@property | ||
def empyrical(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do me a favor and add a docstring here that says it returns "empyrical", so that my dev env will autocomplete empyrical's functions for self.empyrical
? Would be good to mention what we're testing with this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gusgordon! Had some requests - take a look.
@@ -140,11 +142,11 @@ def cum_returns(returns, starting_value=0): | |||
if len(returns) < 1: | |||
return type(returns)([]) | |||
|
|||
if np.isnan(np.asanyarray(returns)[0]): | |||
if np.any(np.isnan(returns)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twiecki wrote "We should only catch the case where the whole first row are nans, as it's the only case produced by .pct_change()." Seems like the behavior here is to replace nans anywhere in the array with zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel deeply about this, it's really a corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I wasn't sure, if we start seeing nans at arbitrary locations, whether we should mask it or raise an error or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to alert early instead of masking the unexpected input.
|
||
df_cum = np.exp(nancumsum(np.log1p(returns))) | ||
df_cum = (returns + 1).cumprod(axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this has the same results (and speed) as the original, we should remove the helpers that we were using: nancumsum
and array_wrap
.
(df_input.as_matrix(), 0, df_0_expected.as_matrix()), | ||
(df_input.as_matrix(), 100, df_100_expected.as_matrix()) | ||
]) | ||
def test_cum_returns_matrix(self, returns, starting_value, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating the test methods for the input type, can we reuse the EmpyricalProxy
machinery to add a subclass?
@property
def empyrical(self):
return PassArraysEmpyricalProxy(self, np.ndarray)
If we have one class that just deals with DataFrames and another with ndarrays, then each can check that the output type matches the input type. Right now, we're asserting that DataFrame input will come back as either a DataFrame or an ndarray (and same with ndarray).
""" | ||
|
||
input_one = [np.nan, 0.01322056, 0.03063862, -0.01422057, | ||
-0.00489779, 0.01268925, -0.03357711, 0.01797036] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the edge cases that we have for 1D, like empty and nans in the final position?
class Test2DStatsArrays(Test2DStats): | ||
""" | ||
Tests pass np.ndarray inputs to empyrical and assert that outputs are of | ||
type np.ndarray or float. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like copy/paste mistake - No float!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me @gusgordon ! My only concern is about replacing nans everywhere.
Thanks @richafrank! We were doing that before anyway, and I think if we want to just replace the NaNs in the first row we would have to break apart the logic for different types. |
I opened this issue #40 |
Adds support for passing a
DataFrame
tostats.cum_returns
. Adds related tests.I messed up the commits, so I made a new PR; sorry about that.
cc @richafrank @twiecki