Review of the required Statistics for Data Science - Part 1 - 26/11/2023
Quick review over concepts like mean, variance, covariance, correlation coefficient, standard deviation and z-score.
Review of the required Statistics for Data Science - Part 1
This article is a quick review of Essential Statistics for Working with Data.
Table of Contents
Mean
The mean is the average of a data set.
= Arithmetic Mean
= total number of samples
= dataset value
Variance
Variance is a measure of how data points differ from the mean.
= Variance
= total number of samples
= dataset value
= Arithmetic Mean
Covariance
Covariance measures the direction of the relationship between two variables.
= Covariance
= total number of samples
= dataset values
= Arithmetic Mean
Correlation Coefficient
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables.
= Correlation Coefficient
= Covariance
= Variance of x
= Variance of y
Example Problem
Standard Deviation
A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean.
= Standard Deviation
= total number of samples
= dataset values
= Arithmetic Mean
Z-Score
A z-score, or standard score, is used for standardizing scores on the same scale by dividing a score’s deviation by the standard deviation in a data set.
= Z-Score for sample i
= Standard Deviation
= dataset values
= Arithmetic Mean