-
Notifications
You must be signed in to change notification settings - Fork 985
R C API tips
This wiki page is meant to collect information that are useful for efficient R C api use, which itself is not very well documented.
pryr
package is doing that very well.
#install.packages("pryr")
print(sum) # take the body and paste into pryr::show_c_source
pryr::show_c_source(.Primitive("sum"))
truelength
is the allocated length. length
is amount used. truelength
was an unused field in R until recently. Now, finally, truelength
is used as it was intended by Ross originally (allocated length).
You can't set a new truelength. That's the actual allocation on R's heap / or allocated using malloc by R (R can do both depending on the size of the vector and how it has been configured/compiled). If a length is set smaller than truelength, though, which we do in data.table (e.g. at the end of fread) then the memory leak can be solved. I was told by an R core member there is a new 'growable' bit that can be set. When growable is set, gc() releases truelength rather than length, so the workarounds at the top of assign.c can be removed. It should have been like that in the first place in R, but for whatever reason they didn't use truelength at all
Very good example is the code contributed in fcaseR
by @2005m, related lines are https://github.com/Rdatatable/data.table/pull/4021/files#diff-25cd0b0c089d5976de15097388ff5683R153-R162
We should not use restrict when two threads update a shared variable, for example from within an atomic or critical, iiuc. I even found something online somewhere that even const together with restrict is beneficial too.
LENGTH
only when you're sure it's a vector.
xlength
not length
to support long vectors as intended by the int64_t type.
xlength
returns R_xlen_t
, length
returns R_len_t
.
We use OpenMP for making many routines parallelized. Special care has to be taken inside the regions that uses OpenMP. One of the restrictions is that you must not print to console, or raise exceptions. One way to deal with it is to defer those, and emit outside of parallel region. If the exception is happening then you can set own flag variable, then based on that flag escape all further computations (from all threads). Once outside of parallel region, raise the exception, or emit print.
In data.table we have a dedicated structure, that meant to carry results of the computation together with console output, messages, warnings, errors. Then it is easy to pass all those informations between functions, as a single object. This structure, named ans_t
, defined in src/types.h
, has been used in rolling function, and NA fill function. If you would like to use ans_t
please see usage of it in those functions.
The easy way to measure time in a platform independent way is to use OpenMP omp_get_wtime
function.
const bool verbose = GetVerbose();
double tic, toc;
if (verbose)
tic = omp_get_wtime();
/* my processing */
if (verbose)
toc = omp_get_wtime();
if (verbose)
Rprintf("My processing took %.3fs\n", toc - tic);
We use codecov
R package for code coverage. It works for a C code as well, but not that precisely as for the R code. Because of that to have a proper code coverage, the if
branches should have their body in a new line.
if (verbose) Rprintf("this message will never be properly checked");
if (verbose)
Rprintf("but this message will");
It is because if (verbose)
already marks the first line as covered.
If there is an internal error, so the error that should not be reachable by the normal use of the package, we are likely to be unable to test such error. Then nocov
keyword should be used to exclude specific line from the codecoverage report. To make it work in C we have to still keep the R's comment sign #
before the nocov
keyword.
error("the comment of the right is not the proper one"); // nocov
error("this is the proper way to use nocov in C"); // # nocov