-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
81 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
--- | ||
slug: day-8-100-days-of-code | ||
title: Day 8 of 100 Days of Code | ||
description: | ||
image: https://github.com/rickmff/100DiasDeCodigo-landing/blob/master/public/thumb.png | ||
authors: paulohfs | ||
tags: [100DaysOfCode, ] | ||
--- | ||
|
||
Today I had a Data Science class and I learned about correlation.You can see my notes [here](https://www.paulohernane.me/my-brain/data-science/correlation), they are incomplete but at the end of this week I will complete them whem reviewing the content to work on a dataset that I had to finish until thursday. | ||
|
||
After that I had some mentorship about distributed systems and some projects to build to improve my skills, I will post my note soon. | ||
|
||
Want to know about the 100 Days of Code Challenge? Check it out [here](https://www.100daysofcode.com/). [PT-BR](<https://www.100diasdecodigo.dev/>) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
id: correlation | ||
title: Correlation | ||
tags: | ||
- Data Science | ||
- Statistics | ||
- Correlation | ||
--- | ||
|
||
# Correlation (or Dependence) | ||
|
||
> WORK IN PROGRESS | ||
Correlation is a statistical measure that describes the interdependence of two or more variables. It is a measure of how close two variables are to having a linear relationship with each other. The closer the correlation value is to -1 or 1, the stronger the relationship between the variables. The closer the correlation value is to 0, the weaker the relationship between the variables. | ||
|
||
## Warning | ||
|
||
Correlation not imply causation. | ||
|
||
Take a look in this graph: | ||
|
||
![Alt text](image.png) | ||
|
||
The graph shows a strong correlation but it's obvious that it is a coincidence. The correlation is not a proof of causation. | ||
|
||
## Covariance | ||
|
||
Before look at how to calculate the correlation, we need to understand about covariance. | ||
|
||
Covariance is a measure of the joint variability of two random variables. It is a measure of how changes in one variable are associated with changes in a second variable. It is similar to variance, but where variance tells you how a single variable varies, covariance tells you how two variables vary together. | ||
|
||
## Pearson Correlation Coefficient | ||
|
||
The Pearson correlation is commonly used in linear correlations. It is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. | ||
|
||
$$ | ||
pX,Y = CORR(X, Y) = \frac{COV(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y} | ||
$$ | ||
|
||
## Spearman Correlation Coefficient | ||
|
||
The Spearman correlation is commonly used in non-linear correlations. It is a measure of the monotonicity of the relationship between two variables X and Y. It has a value between +1 and −1, where 1 is total positive monotonicity, 0 is no monotonicity, and −1 is total negative monotonicity. | ||
|
||
$$ | ||
\rho = \frac{COV(rank(X), rank(Y))}{\sigma_{rank(X)} \sigma_{rank(Y)}} = \frac{E[(rank(X) - \mu_{rank(X)})(rank(Y) - \mu_{rank(Y)})]}{\sigma_{rank(X)} \sigma_{rank(Y)}} | ||
$$ | ||
|
||
## Kendall Correlation Coefficient | ||
|
||
The Kendall correlation is commonly used in non-linear correlations. It is a measure of the ordinal association between two measured quantities. It has a value between +1 and −1, where 1 is total positive association, 0 is no association, and −1 is total negative association. | ||
|
||
This is the most robust correlation coefficient. But is also the most computationally expensive. | ||
|
||
## P-value | ||
|
||
P-value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct. The smaller the p-value, the stronger the evidence against the null hypothesis. | ||
|
||
- p < 0.001: strong evidence that the null hypothesis is false | ||
- p < 0.05: moderate evidence that the null hypothesis is false | ||
- p < 0.1: weak evidence that the null hypothesis is false | ||
- p > 0.1: no evidence that the null hypothesis is false | ||
|
||
### References | ||
|
||
<https://en.wikipedia.org/wiki/Joint_probability_distribution> | ||
<https://en.wikipedia.org/wiki/Correlation> | ||
<https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.