Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Allow cum_returns to accept DataFrame #39

Merged
merged 8 commits into from
Dec 7, 2016
Merged

Conversation

gusgordon
Copy link
Contributor

@gusgordon gusgordon commented Dec 5, 2016

Adds support for passing a DataFrame to stats.cum_returns. Adds related tests.

I messed up the commits, so I made a new PR; sorry about that.

cc @richafrank @twiecki

@@ -140,11 +142,11 @@ def cum_returns(returns, starting_value=0):
if len(returns) < 1:
return type(returns)([])

if np.isnan(np.asanyarray(returns)[0]):
if np.any(np.isnan(np.asanyarray(returns)[0])):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki Can you comment again on this (and the corresponding change below)? Not clear to me what's expected in the new test case with one of the returns streams starting with a nan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only catch the case where the whole first row are nans, as it's the only case produced by .pct_change().

@@ -1019,6 +1019,62 @@ def empyrical(self):
return ReturnTypeEmpyricalProxy(self, (pd.Series, float))


class TestDataFrameStats(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems surprising for readers to have this 2D TestCase split up the 1D TestClasses. What do you think about moving it below them?


@property
def empyrical(self):
return ReturnTypeEmpyricalProxy(self, (pd.DataFrame))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add subclasses that test the other inputs we (I assume) now support, like ndarrays of more than 1 dimension?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal - the extra parens here aren't necessary.

4)

@property
def empyrical(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do me a favor and add a docstring here that says it returns "empyrical", so that my dev env will autocomplete empyrical's functions for self.empyrical? Would be good to mention what we're testing with this property.

Copy link
Member

@richafrank richafrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gusgordon! Had some requests - take a look.

@@ -140,11 +142,11 @@ def cum_returns(returns, starting_value=0):
if len(returns) < 1:
return type(returns)([])

if np.isnan(np.asanyarray(returns)[0]):
if np.any(np.isnan(returns)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki wrote "We should only catch the case where the whole first row are nans, as it's the only case produced by .pct_change()." Seems like the behavior here is to replace nans anywhere in the array with zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel deeply about this, it's really a corner case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I wasn't sure, if we start seeing nans at arbitrary locations, whether we should mask it or raise an error or something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to alert early instead of masking the unexpected input.


df_cum = np.exp(nancumsum(np.log1p(returns)))
df_cum = (returns + 1).cumprod(axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this has the same results (and speed) as the original, we should remove the helpers that we were using: nancumsum and array_wrap.

(df_input.as_matrix(), 0, df_0_expected.as_matrix()),
(df_input.as_matrix(), 100, df_100_expected.as_matrix())
])
def test_cum_returns_matrix(self, returns, starting_value, expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the test methods for the input type, can we reuse the EmpyricalProxy machinery to add a subclass?

@property
def empyrical(self):
    return PassArraysEmpyricalProxy(self, np.ndarray)

If we have one class that just deals with DataFrames and another with ndarrays, then each can check that the output type matches the input type. Right now, we're asserting that DataFrame input will come back as either a DataFrame or an ndarray (and same with ndarray).

"""

input_one = [np.nan, 0.01322056, 0.03063862, -0.01422057,
-0.00489779, 0.01268925, -0.03357711, 0.01797036]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the edge cases that we have for 1D, like empty and nans in the final position?

class Test2DStatsArrays(Test2DStats):
"""
Tests pass np.ndarray inputs to empyrical and assert that outputs are of
type np.ndarray or float.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like copy/paste mistake - No float!

Copy link
Member

@richafrank richafrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me @gusgordon ! My only concern is about replacing nans everywhere.

@gusgordon
Copy link
Contributor Author

Thanks @richafrank! We were doing that before anyway, and I think if we want to just replace the NaNs in the first row we would have to break apart the logic for different types.

@gusgordon gusgordon merged commit e82f3f4 into master Dec 7, 2016
@gusgordon
Copy link
Contributor Author

I opened this issue #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants