Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch fix for long formula names #152

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

jwallib
Copy link

@jwallib jwallib commented Jul 7, 2021

Patching a bug caused by an issue with the terms function in the stats package.

For very long feature names, such as those introduced by using the ridge function with many parameters the terms function splits the variable names by inserting a new line character.

Heres a demo example:
terms(as.formula(paste0('~ridge(',paste0('variable',1:50,collapse = ','),',theta = 1,scale = TRUE)')))
returns an object where attr(,"term.labels") and attr(,"factors") names have been split by a new line character '\n' (for me this was around variable 42).

The reason this is a problem is because later on in the coxph function (lines 541 and 543 here) it tries to match a version of the string with the newline character (attr(Terms, 'term.labels') and names(assign)) and one without (pname). It fails and returns NA. This in turn falsely triggers the error on line 542.

@therneau
Copy link
Owner

therneau commented Jul 7, 2021

Please take note of the very first lines in coxph.r. --- the file is automatically derived from the actual source in the now directory. Any patches have to be there.

Patching a bug caused by an issue with the terms function in the stats package.

For very long feature names, such as those introduced by using the ridge function with many parameters the terms function splits the variable names by inserting a new line character.

Heres a demo example:
terms(as.formula(paste0('~ridge(',paste0('variable',1:50,collapse = ','),',theta = 1,scale = TRUE)')))
returns an object where attr(,"term.labels") and attr(,"factors") names have been split by a new line character '\n' (for me this was around variable 42).

The reason this is a problem is because later on in the coxph function (lines 541 and 543 here) it tries to match a version of the string with the newline character (attr(Terms, 'term.labels') and names(assign)) and one without (pname). It fails and returns NA. This in turn falsely triggers the error on line 542.

@therneau
Copy link
Owner

therneau commented Jul 7, 2021

Typo in the last message: the "noweb" directory. (Actually, I typed correctly but a spell checker intervened).

@jwallib
Copy link
Author

jwallib commented Jul 8, 2021

Ahh sorry about that, I have now done the patch in the coxph.Rwd file in the noweb directory and rerun "make all -f Makefile".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants