day 8 of 100 days of code

PauloHFS · Aug 22, 2023 · 0d9fdc0 · 0d9fdc0
1 parent 95d53ac
commit 0d9fdc0
Show file tree

Hide file tree

Showing 3 changed files with 81 additions and 0 deletions.
diff --git a/blog/2023-08-21-day-8-100-days-of-code.md b/blog/2023-08-21-day-8-100-days-of-code.md
@@ -0,0 +1,14 @@
+---
+slug: day-8-100-days-of-code
+title: Day 8 of 100 Days of Code
+description: 
+image: https://github.com/rickmff/100DiasDeCodigo-landing/blob/master/public/thumb.png
+authors: paulohfs
+tags: [100DaysOfCode, ]
+---
+
+Today I had a Data Science class and I learned about correlation.You can see my notes [here](https://www.paulohernane.me/my-brain/data-science/correlation), they are incomplete but at the end of this week I will complete them whem reviewing the content to work on a dataset that I had to finish until thursday.
+
+After that I had some mentorship about distributed systems and some projects to build to improve my skills, I will post my note soon.
+
+Want to know about the 100 Days of Code Challenge? Check it out [here](https://www.100daysofcode.com/). [PT-BR](<https://www.100diasdecodigo.dev/>)
diff --git a/my-brain/data-science/correlation.md b/my-brain/data-science/correlation.md
@@ -0,0 +1,67 @@
+---
+id: correlation
+title: Correlation
+tags: 
+    - Data Science
+    - Statistics
+    - Correlation
+---
+
+# Correlation (or Dependence)
+
+> WORK IN PROGRESS
+
+Correlation is a statistical measure that describes the interdependence of two or more variables. It is a measure of how close two variables are to having a linear relationship with each other. The closer the correlation value is to -1 or 1, the stronger the relationship between the variables. The closer the correlation value is to 0, the weaker the relationship between the variables.
+
+## Warning
+
+    Correlation not imply causation.
+
+Take a look in this graph:
+
+![Alt text](image.png)
+
+The graph shows a strong correlation but it's obvious that it is a coincidence. The correlation is not a proof of causation.
+
+## Covariance
+
+Before look at how to calculate the correlation, we need to understand about covariance.
+
+Covariance is a measure of the joint variability of two random variables. It is a measure of how changes in one variable are associated with changes in a second variable. It is similar to variance, but where variance tells you how a single variable varies, covariance tells you how two variables vary together.
+
+## Pearson Correlation Coefficient
+
+The Pearson correlation is commonly used in linear correlations. It is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.
+
+$$
+    pX,Y = CORR(X, Y) = \frac{COV(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}
+$$
+
+## Spearman Correlation Coefficient
+
+The Spearman correlation is commonly used in non-linear correlations. It is a measure of the monotonicity of the relationship between two variables X and Y. It has a value between +1 and −1, where 1 is total positive monotonicity, 0 is no monotonicity, and −1 is total negative monotonicity.
+
+$$
+    \rho = \frac{COV(rank(X), rank(Y))}{\sigma_{rank(X)} \sigma_{rank(Y)}} = \frac{E[(rank(X) - \mu_{rank(X)})(rank(Y) - \mu_{rank(Y)})]}{\sigma_{rank(X)} \sigma_{rank(Y)}}
+$$
+
+## Kendall Correlation Coefficient
+
+The Kendall correlation is commonly used in non-linear correlations. It is a measure of the ordinal association between two measured quantities. It has a value between +1 and −1, where 1 is total positive association, 0 is no association, and −1 is total negative association.
+
+This is the most robust correlation coefficient. But is also the most computationally expensive.
+
+## P-value
+
+P-value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct. The smaller the p-value, the stronger the evidence against the null hypothesis.
+
+- p < 0.001: strong evidence that the null hypothesis is false
+- p < 0.05: moderate evidence that the null hypothesis is false
+- p < 0.1: weak evidence that the null hypothesis is false
+- p > 0.1: no evidence that the null hypothesis is false
+
+### References
+
+<https://en.wikipedia.org/wiki/Joint_probability_distribution>
+<https://en.wikipedia.org/wiki/Correlation>
+<https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation>
diff --git a/my-brain/data-science/image.png b/my-brain/data-science/image.png