-
Notifications
You must be signed in to change notification settings - Fork 276
Regression analysis functions
On this page we have documented the EPPlus implementations of some of the more complex statistical functions. Note that the below functions in some edge cases might return different results depending on which spreadsheet application (Microsoft Excel, LibreOffice, etc) you are using.
Added in version 7.2.2
Returns statistics for a straight line that best fits the inputted data. The function can also calculate the statistics for multiple linear regression when there are several independent variables in the input data (more than one column/row in knownXs
). The inputs for LINEST are knownYs
(required), knownXs
(optional), const
(optional) and stats
(optional). knownYs
is the y-coordinates and knownXs
is the x-coordinates. If knownXs
is omitted it is assumed to be 1, 2, ..., {length of knownYs
}. const
is a boolean flag, and if false
forces the intercept to go through the origin. stats
determines if descriptive statistics is included (if stats
= true
).
The statistics calculated with LINEST are the following:
- Coefficient(s) for all independent variables. Variables deemed collinear are represented by a zero.
- Standard error(s) for all coefficients, including the intercept. The standard error for the intercept is set to zero if
const
=false
. - R-squared (coefficient of determination). This value tells us how well the independent variables explain the dependent variable. This is calculated as the regression sum of squares divided by the total sum of squares.
- Standard error for the y-estimate.
- F-statistic. Do keep in mind that LINEST calculates the observed F-value and not an F-test.
- Degrees of Freedom (of ssresid, see below).
- The regression sum of squares (ssreg).
- The residual sum of squares (ssresid).
In some cases, an independent variable can be fully explained by another independent variable. This is called collinearity (or multicollinearity) and poses an issue since the parameter estimation can have infinitely many solutions and the moment matrix X'X can not be inverted due to it being singular. In this situation, LINEST removes collinear columns by transforming the moment matrix to echelon format with complete pivoting. The process of removing collinear columns affects the degrees of freedom of the regression model and some of the statistics.
EPPLUS uses a gaussian elimination process to identify collinear columns and which ones to remove. The idea is to reduce the matrix (X'X) to echelon format, which is done with complete pivoting. After the row and column swaps, if a value on the diagonal contains zero (EPPLUS LINEST version uses a threshold of 2e-13), that corresponding column is deemed redundant. In some rare cases the result might differ from the LINEST implementation in Microsoft Excel.
High multicollinearity between variables is fairly uncommon, and there is no standard practice for removing collinear variables. When pivoting equal columns, EPPLUS always chooses to remove the second column.
Added in version 7.2.2
LOGEST is based on the same calculations as LINEST, including how it handles collinearity for multiple regression analysis. However, the function returns statistics to a curve that best fits the data, rather than a line. The calculations for the LOGEST coefficients are calculated as EXP(LINEST(LN(knownYs), knownXs))
. It contains the same argument and outputs as LINEST.
Added in version 7.2.2
This function takes four arguments, knownYs
, knownXs
, newXs
and const
. TREND calculates the y-values on the regression line that is fitted to knownYs
and knownXs
. If newXs
is given, TREND returns the y-values corresponding to those x-values. TREND uses LINEST in order to construct the regression line and find all necessary coefficients.
The dimensions of newXs
have to correspond to knownXs
in terms of variables, but can be shorter in terms of amount of observations. For example, if knownXs
is 4 columns wide and 20 rows long, newXs
must be 4 columns wide and have any number of rows.
If const
is set to false
, the regression line is forced to the origin. If knownXs
or newXs
is omitted, they are assumed to be an array 1, 2, ..., {length of knownYs
}.
Added in version 7.2.3
This function takes four arguments, knownYs
, knownXs
, newXs
and const
. GROWTH calculates the exponential growth based on the curve that is fitted to knownYs
and knownXs
. If newXs
is given, GROWTH returns the y-values corresponding to those x-values. GROWTH uses LOGEST in order to construct the regression curve and find all necessary coefficients.
The dimensions of newXs
have to correspond to knownXs
in terms of variables, but can be shorter in terms of amount of observations. For example, if knownXs
is 4 columns wide and 20 rows long, newXs
must be 4 columns wide and have any number of rows.
If const
is set to false, the regression line is forced to the origin. If knownXs
or newXs
is omitted, they are assumed to be an array 1, 2, ..., {length of knownYs
}.
EPPlus Software AB - https://epplussoftware.com
- What is new in EPPlus 5+
- Breaking Changes in EPPlus 5
- Breaking Changes in EPPlus 6
- Breaking Changes in EPPlus 7
- Addressing a worksheet
- Dimension/Used range
- Copying ranges/sheets
- Insert/Delete
- Filling ranges
- Sorting ranges
- Taking and skipping columns/rows
- Data validation
- Comments
- Freeze and Split Panes
- Header and Footer
- Autofit columns
- Grouping and Ungrouping Rows and Columns
- Formatting and styling
- Conditional formatting
- Using Themes
- Working with custom named table- or slicer- styles