Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

includes RMST, difference in RMST and confidence intervals #1526

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bayesfactor
Copy link

Implements RMST, difference in RMST from the R package: https://cran.r-project.org/package=survRM2

Includes RMST of one population (point estimate and confidence interval), difference in RMST of 2 populations (point estimate, p-value, and confidence intervals). Addresses #821

Implements RMST, difference in RMST from the R package:
https://cran.r-project.org/package=survRM2

Includes RMST of one population (point estimate and confidence interval), difference in RMST of 2 populations (point estimate, p-value, and confidence intervals). Addresses CamDavidsonPilon#821
@bayesfactor
Copy link
Author

@CamDavidsonPilon, I would very much appreciate your review (and approval, conditional on acceptance). My organization is really struggling without a sound method to generate confidence intervals on a difference in 2 survival populations.

add check on point_in_time argument for difference in RMST
@CamDavidsonPilon
Copy link
Owner

Hi @bayesfactor! Thanks for the PR (sorry about the delay).

We have an lifelines.utils.restricted_mean_survival_time function now - have you compared against that? We don't have a difference_ though, so that helps.

@bayesfactor
Copy link
Author

@CamDavidsonPilon, thank you for your response. I'm sorry, I overlooked lifelines.utils.restricted_mean_survival_time. I now validated that the 2 implementations produce the same point estimate on a test set. My newer implementation provides an estimate of standard error, and lower/upper confidence intervals based on that standard error, which I think is a nice feature. Perhaps you or I should put SE, LCI, and UCI into your original implementation of RMST. If you agree, please let me know if you would rather do it or ask me to.

As you said, the main point of this pull request is to provide a confidence interval on difference in RMST, which is a new feature as far as I know.

@bayesfactor
Copy link
Author

@CamDavidsonPilon, I'm hoping to hear back from you about whether you or I should include the other outputs from RMST, and therefore how to proceed with the difference in RMST (including confidence intervals) functionality. Can you please let me know what is your preference?

Changes:
1. fixed an issue of returning NaN confidence intervals if the followup time `point_in_time` is greater than the last observed event. Using `.replace(np.inf, 0)`, we correct the issue and return correct confidence intervals for that edge case.
2. cosmetic refactoring to do less recalculation  and rely more on pre-calculated values stored in `fitterA`
@bayesfactor
Copy link
Author

I fixed an issue with RMST where, if the followup interval is greater than the last observed event, the confidence intervals were NaN. The latest commit resolves that issue and makes some minor refactoring for efficiency.

@CamDavidsonPilon, I'm hoping to hear back from you about whether you or I should include the other outputs from RMST, and therefore how to proceed with the difference in RMST (including confidence intervals) functionality. Can you please let me know what is your preference?

wk_var = wk_var.tolist() + [0]
rmst_var = sum((np.flip(areas[1:])).cumsum() ** 2 * np.flip(wk_var)[1:])
wk_var = wk_n_event.observed / (wk_n_risk * (wk_n_risk - wk_n_event.observed))
wk_var = wk_var.replace(np.inf, 0).tolist()[1:] + [0]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the part that fixes the NaN issue. By adding .replace(np.inf, 0), the confidence intervals are not NaN now.

@CamDavidsonPilon
Copy link
Owner

Hi @bayesfactor,

Thanks for keeping up with this! Can you explain how this works for parametric fitters? In my head, to compute the AUC of a parametric fitter, some scipy.integrate.quad procedure is necessary.

@bayesfactor
Copy link
Author

Great point, this won't work for all fitters. The current implementation depends on a fitter.event_table, so it only works for fitters that have an event_table property. Using data from event_table, the code does a numerical integration:
https://github.com/bayesfactor/lifelines/blob/107da27a637cc2ae4125666b8ea2b3e217f2c699/lifelines/statistics.py#L392

I don't know how often people use other fitters; I could
a) modify the code to check if event_table exists, and if not, compute it
b) modify the existing utils.restricted_mean_survival_time to optionally return the RMST variance needed for the calculation of a confidence interval in the difference in RMST

def restricted_mean_survival_time(

@CamDavidsonPilon
Copy link
Owner

CamDavidsonPilon commented Nov 15, 2023

I think we can combine the functions! Have a global restricted_mean_survival_time that, based on the fitter, chooses an implementation. The current implementation for computing the variance for KMF's is imprecise compared to yours.

@bayesfactor
Copy link
Author

upon further thought, should we push one step further upstream and give all fitters an event_table? It seems like a handy attribute and that making fitters more similar would be beneficial.

@bayesfactor
Copy link
Author

Unless I'm missing something, my last commit moved the event_table property into the BaseFitter class so that all fits get an event_table property. This passes tests -- is it correct and kosher?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants