Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robust is ignored when cluster_col is set #1598

Open
benslack19 opened this issue Feb 23, 2024 · 0 comments
Open

robust is ignored when cluster_col is set #1598

benslack19 opened this issue Feb 23, 2024 · 0 comments

Comments

@benslack19
Copy link

benslack19 commented Feb 23, 2024

TL;DR: When using CoxPHFitter.fit(), it doesn't matter whether a value for robust is specified. If there's a cluster_col specified, then presumably the Huber sandwich estimator will always be used.

I was using cluster_col in the CoxPHFitter and saw in the docstring that the sandwich estimator automatically gets used. I was aiming to match the standard errors in a test case with a CoxTimeVarying model by setting robust to the same value in the CoxPHFitter and CoxTimeVarying. (This explains my test data below.) However, I saw from issue #544 that the CoxTimeVarying has not been implemented leaving me only the option to set robust=False in the CoxPHFitter model. For the test case, I can just leave cluster_col unspecified. I think an error or error message should be returned in the case of cluster_col being set and robust=False. It looks like this conditional needs to be edited.

Here's a reproducible example with my comments:

import numpy.testing as npt
import pandas as pd
from lifelines import CoxPHFitter, CoxTimeVaryingFitter
from lifelines.datasets import load_stanford_heart_transplants
from lifelines.utils import to_long_format

stanford = load_stanford_heart_transplants()

# Keep only the last record for each subject, drop all covariate columns except age to simplify data
stanford_last = (
    stanford.groupby("id")
    .tail(1)
    .drop(["year", "surgery", "transplant"], axis="columns")
)

# Format the data for CPH model
stanford_last_cph_wid = stanford_last.rename(
    columns={"start": "W", "stop": "T", "event": "E"}
)
stanford_last_cph_wid.head()

image

Create a CoxPHFitter model and fit it with the cluster_col specified.

cph_stanford_last_wid = CoxPHFitter()
cph_stanford_last_wid.fit(
    stanford_last_cph_wid,
    duration_col="T",
    event_col="E",
    entry_col="W",
    cluster_col="id",
)
cph_stanford_last_wid.summary

image

However, if both a cluster_col and robust was specified, the SE value is always the same (0.14374) regardless of the value for robust.

image

The standard error is different (0.13862) when cluster_col is not specified, therefore letting robust be set to its default value of False.
image

lifelines version: 0.27.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant