Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight differences in CoxPHFitter and CoxTimeVaryingFitter test cases #1599

Open
benslack19 opened this issue Feb 23, 2024 · 0 comments
Open

Comments

@benslack19
Copy link

This is likely very minor for most cases but I still don't understand why there would be a difference. This a result of comparing standard errors between the CoxPHFitter and CoxTimeVarying model when the data is equivalent (only one time period per subject). It originally stemmed from this discussion about left truncation.

I was using cluster_col in the CoxPHFitter and saw in the documentation that the sandwich estimator gets used and that's why the SE changes compared to the CoxTimeVarying model. When I attempted to match robust exactly (along the way I discovered issue #544 and created issue #1598), I could not match summary values past 3 decimal points.

Here's a reproducible example with my comments:

import numpy.testing as npt
import pandas as pd
from lifelines import CoxPHFitter, CoxTimeVaryingFitter
from lifelines.datasets import load_stanford_heart_transplants
from lifelines.utils import to_long_format

stanford = load_stanford_heart_transplants()

# Keep only the last record for each subject, drop all covariate columns except age to simplify data
stanford_last = (
    stanford.groupby("id")
    .tail(1)
    .drop(["year", "surgery", "transplant"], axis="columns")
)
stanford_last.head()

image

# Format the data for CPH model
stanford_last_cph_wid = stanford_last.rename(
    columns={"start": "W", "stop": "T", "event": "E"}
)
stanford_last_cph_wid.head()

image

The best I can do to match the standard errors between the CPH and CTV model, is to not use a cluster_col with the CPH model and use an id_col in the CTV model. But now the coefficient is slightly off (0.03616 vs. 0.36163).

image

When doing npt.assert_array_almost_equal, I could not match summary values past 3 decimal points. Why would this difference be observed?

lifelines version: 0.27.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant