ENH Allow cum_returns to accept DataFrame #39

gusgordon · 2016-12-05T21:35:03Z

Adds support for passing a DataFrame to stats.cum_returns. Adds related tests.

I messed up the commits, so I made a new PR; sorry about that.

richafrank · 2016-12-05T21:54:12Z

empyrical/stats.py

@@ -140,11 +142,11 @@ def cum_returns(returns, starting_value=0):
    if len(returns) < 1:
        return type(returns)([])

-    if np.isnan(np.asanyarray(returns)[0]):
+    if np.any(np.isnan(np.asanyarray(returns)[0])):


@twiecki Can you comment again on this (and the corresponding change below)? Not clear to me what's expected in the new test case with one of the returns streams starting with a nan.

We should only catch the case where the whole first row are nans, as it's the only case produced by .pct_change().

richafrank · 2016-12-05T21:55:20Z

empyrical/tests/test_stats.py

@@ -1019,6 +1019,62 @@ def empyrical(self):
        return ReturnTypeEmpyricalProxy(self, (pd.Series, float))


+class TestDataFrameStats(TestCase):


Seems surprising for readers to have this 2D TestCase split up the 1D TestClasses. What do you think about moving it below them?

richafrank · 2016-12-05T22:02:45Z

empyrical/tests/test_stats.py

+
+    @property
+    def empyrical(self):
+        return ReturnTypeEmpyricalProxy(self, (pd.DataFrame))


Should we add subclasses that test the other inputs we (I assume) now support, like ndarrays of more than 1 dimension?

Not a big deal - the extra parens here aren't necessary.

richafrank · 2016-12-05T22:05:19Z

empyrical/tests/test_stats.py

+                    4)
+
+    @property
+    def empyrical(self):


Could you do me a favor and add a docstring here that says it returns "empyrical", so that my dev env will autocomplete empyrical's functions for self.empyrical? Would be good to mention what we're testing with this property.

…. Forego replacing NaNs twice.

richafrank

Thanks @gusgordon! Had some requests - take a look.

richafrank · 2016-12-06T17:59:07Z

empyrical/stats.py

@@ -140,11 +142,11 @@ def cum_returns(returns, starting_value=0):
    if len(returns) < 1:
        return type(returns)([])

-    if np.isnan(np.asanyarray(returns)[0]):
+    if np.any(np.isnan(returns)):


@twiecki wrote "We should only catch the case where the whole first row are nans, as it's the only case produced by .pct_change()." Seems like the behavior here is to replace nans anywhere in the array with zero.

I don't feel deeply about this, it's really a corner case.

Ok, I wasn't sure, if we start seeing nans at arbitrary locations, whether we should mask it or raise an error or something else.

My preference is to alert early instead of masking the unexpected input.

richafrank · 2016-12-06T17:59:10Z

empyrical/stats.py


-    df_cum = np.exp(nancumsum(np.log1p(returns)))
+    df_cum = (returns + 1).cumprod(axis=0)


Assuming this has the same results (and speed) as the original, we should remove the helpers that we were using: nancumsum and array_wrap.

richafrank · 2016-12-06T18:05:10Z

empyrical/tests/test_stats.py

+        (df_input.as_matrix(), 0, df_0_expected.as_matrix()),
+        (df_input.as_matrix(), 100, df_100_expected.as_matrix())
+    ])
+    def test_cum_returns_matrix(self, returns, starting_value, expected):


Instead of duplicating the test methods for the input type, can we reuse the EmpyricalProxy machinery to add a subclass?

@property def empyrical(self): return PassArraysEmpyricalProxy(self, np.ndarray)

If we have one class that just deals with DataFrames and another with ndarrays, then each can check that the output type matches the input type. Right now, we're asserting that DataFrame input will come back as either a DataFrame or an ndarray (and same with ndarray).

richafrank · 2016-12-06T18:10:18Z

empyrical/tests/test_stats.py

+    """
+
+    input_one = [np.nan, 0.01322056, 0.03063862, -0.01422057,
+                 -0.00489779, 0.01268925, -0.03357711, 0.01797036]


Should we add the edge cases that we have for 1D, like empty and nans in the final position?

…ithin data and empty df

richafrank · 2016-12-06T22:21:30Z

empyrical/tests/test_stats.py

+class Test2DStatsArrays(Test2DStats):
+    """
+    Tests pass np.ndarray inputs to empyrical and assert that outputs are of
+    type np.ndarray or float.


Looks like copy/paste mistake - No float!

richafrank

This looks great to me @gusgordon ! My only concern is about replacing nans everywhere.

gusgordon · 2016-12-07T15:03:37Z

Thanks @richafrank! We were doing that before anyway, and I think if we want to just replace the NaNs in the first row we would have to break apart the logic for different types.

gusgordon · 2016-12-07T15:11:09Z

I opened this issue #40

gusgordon added 2 commits December 5, 2016 21:31

Add support for passing DataFrame to stats.cum_returns

99ef007

Add tests for cum_returns with DataFrame

a9919b8

richafrank requested changes Dec 5, 2016

View reviewed changes

gusgordon added 2 commits December 6, 2016 15:45

Slightly refactor so cum_returns works generally for dfs and ndarrays…

e2d27d6

…. Forego replacing NaNs twice.

Add cum_returns test for 2D ndarray. Move and rename 2D test.

5dee4ce

richafrank requested changes Dec 6, 2016

View reviewed changes

gusgordon added 3 commits December 6, 2016 18:22

Remove unused nancumsum and array_wrap

39fd05b

Remove

7de1038

Update tests to use same machinery as other tests; add test for NaN w…

c8496e9

…ithin data and empty df

richafrank reviewed Dec 6, 2016

View reviewed changes

Fix doc string

22c4366

richafrank approved these changes Dec 6, 2016

View reviewed changes

gusgordon merged commit e82f3f4 into master Dec 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Allow cum_returns to accept DataFrame #39

ENH Allow cum_returns to accept DataFrame #39

gusgordon commented Dec 5, 2016 •

edited

Loading

richafrank Dec 5, 2016

twiecki Dec 6, 2016

richafrank Dec 5, 2016

richafrank Dec 5, 2016

richafrank Dec 5, 2016

richafrank Dec 5, 2016

richafrank left a comment

richafrank Dec 6, 2016

twiecki Dec 6, 2016

richafrank Dec 6, 2016

richafrank Dec 6, 2016

richafrank Dec 6, 2016

richafrank Dec 6, 2016

richafrank Dec 6, 2016

richafrank Dec 6, 2016

richafrank left a comment

gusgordon commented Dec 7, 2016

gusgordon commented Dec 7, 2016

		@@ -1019,6 +1019,62 @@ def empyrical(self):
		return ReturnTypeEmpyricalProxy(self, (pd.Series, float))


		class TestDataFrameStats(TestCase):


		df_cum = np.exp(nancumsum(np.log1p(returns)))
		df_cum = (returns + 1).cumprod(axis=0)

ENH Allow cum_returns to accept DataFrame #39

ENH Allow cum_returns to accept DataFrame #39

Conversation

gusgordon commented Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richafrank left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richafrank left a comment

Choose a reason for hiding this comment

gusgordon commented Dec 7, 2016

gusgordon commented Dec 7, 2016

gusgordon commented Dec 5, 2016 •

edited

Loading