Statistics libraryStatistics.h

Statistics library implements most popular statistics functions for data analysis. It finds measures of location, variability, shape and others. The library provides functions which find different aspects of normally distributed values, and also has functions from robust statistics like median, quartiles and inter quartile range. The last ones usually need data sorting, but this implementation uses fast quantile location algorithm, which significantly improves these methods. See the benchmarks.

Measures of location
Mean
Median
Lower quartile
Upper quartile
Mid-range
Measures of variability
Variance
Standard deviation
Absolute deviation
Interquartile range
Range
Measures of shape
Skewness
Kurtosis
Covariance
Correlation
Pearson correlation
Fechner correlation
Spearman correlation

Function list

C function name	Functions	C++ function name	Functions
AbsoluteDeviation	2 functions	AbsoluteDeviation	2 functions
Covariance	2 functions	Covariance	2 functions
FechnerCorrelation	2 functions	FechnerCorrelation	2 functions
InterquartileRange	11 functions	InterquartileRange	11 functions
Kurtosis	2 functions	Kurtosis	2 functions
LowerQuartile	11 functions	LowerQuartile	11 functions
Mean	2 functions	Mean	2 functions
Median	11 functions	Median	11 functions
Midrange	2 functions	Midrange	2 functions
PearsonCorrelation	2 functions	PearsonCorrelation	2 functions
Range	2 functions	Range	2 functions
Skewness	2 functions	Skewness	2 functions
SpearmanCorrelation	2 functions	SpearmanCorrelation	2 functions
StandardDeviation	2 functions	StandardDeviation	2 functions
UpperQuartile	11 functions	UpperQuartile	11 functions
Variance	2 functions	Variance	2 functions
C function name	Functions	C++ function name	Functions

Measures of location

A fundamental task in many statistical analyses is to estimate a location parameter for the distribution. In other words an analyst should find a typical or central value, which best describes the data. Most popular measures of location are: mean, median and mid-range.

Mean

Cflt32_t Statistics_Mean_flt32 (const flt32_t array[], size_t size);
flt64_t Statistics_Mean_flt64 (const flt64_t array[], size_t size);

C++flt32_t Statistics::Mean (const flt32_t array[], size_t size);
flt64_t Statistics::Mean (const flt64_t array[], size_t size);

Description: Compute sample mean.

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample mean value.
NaN (not a number) if size of sample array is equal to 0, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Tip:Sample mean as measure of location works the best for normal (Gaussian) distribution. For heavy-tailed distributions use sample median instead.

Median

C
Unsigned integer types
uint8_t Statistics_Median_uint8 (uint8_t array[], size_t size);
uint16_t Statistics_Median_uint16 (uint16_t array[], size_t size);
uint32_t Statistics_Median_uint32 (uint32_t array[], size_t size);
uint64_t Statistics_Median_uint64 (uint64_t array[], size_t size);

Signed integer types
sint8_t Statistics_Median_sint8 (sint8_t array[], size_t size);
sint16_t Statistics_Median_sint16 (sint16_t array[], size_t size);
sint32_t Statistics_Median_sint32 (sint32_t array[], size_t size);
sint64_t Statistics_Median_sint64 (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics_Median_flt32 (flt32_t array[], size_t size);
flt64_t Statistics_Median_flt64 (flt64_t array[], size_t size);

Other types
size_t Statistics_Median_size (size_t array[], size_t size);

C++
Unsigned integer types
uint8_t Statistics::Median (uint8_t array[], size_t size);
uint16_t Statistics::Median (uint16_t array[], size_t size);
uint32_t Statistics::Median (uint32_t array[], size_t size);
uint64_t Statistics::Median (uint64_t array[], size_t size);

Signed integer types
sint8_t Statistics::Median (sint8_t array[], size_t size);
sint16_t Statistics::Median (sint16_t array[], size_t size);
sint32_t Statistics::Median (sint32_t array[], size_t size);
sint64_t Statistics::Median (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics::Median (flt32_t array[], size_t size);
flt64_t Statistics::Median (flt64_t array[], size_t size);

Other types
size_t Statistics::Median (size_t array[], size_t size);

Description: Compute sample median.

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample median value.
0 (zero) if size of sample array of integer numbers is equal to 0.
NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is median value.

Note:These functions rearrange array elements during median computing.

Tip:Sample median is a tool of robust statistics. It works well with almost any distributions. It is strongly recommended for heavy-tailed distributions.

Lower quartile

C
Unsigned integer types
uint8_t Statistics_LowerQuartile_uint8 (uint8_t array[], size_t size);
uint16_t Statistics_LowerQuartile_uint16 (uint16_t array[], size_t size);
uint32_t Statistics_LowerQuartile_uint32 (uint32_t array[], size_t size);
uint64_t Statistics_LowerQuartile_uint64 (uint64_t array[], size_t size);

Signed integer types
sint8_t Statistics_LowerQuartile_sint8 (sint8_t array[], size_t size);
sint16_t Statistics_LowerQuartile_sint16 (sint16_t array[], size_t size);
sint32_t Statistics_LowerQuartile_sint32 (sint32_t array[], size_t size);
sint64_t Statistics_LowerQuartile_sint64 (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics_LowerQuartile_flt32 (flt32_t array[], size_t size);
flt64_t Statistics_LowerQuartile_flt64 (flt64_t array[], size_t size);

Other types
size_t Statistics_LowerQuartile_size (size_t array[], size_t size);

C++
Unsigned integer types
uint8_t Statistics::LowerQuartile (uint8_t array[], size_t size);
uint16_t Statistics::LowerQuartile (uint16_t array[], size_t size);
uint32_t Statistics::LowerQuartile (uint32_t array[], size_t size);
uint64_t Statistics::LowerQuartile (uint64_t array[], size_t size);

Signed integer types
sint8_t Statistics::LowerQuartile (sint8_t array[], size_t size);
sint16_t Statistics::LowerQuartile (sint16_t array[], size_t size);
sint32_t Statistics::LowerQuartile (sint32_t array[], size_t size);
sint64_t Statistics::LowerQuartile (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics::LowerQuartile (flt32_t array[], size_t size);
flt64_t Statistics::LowerQuartile (flt64_t array[], size_t size);

Other types
size_t Statistics::LowerQuartile (size_t array[], size_t size);

Description: Compute sample lower quartile (splits lowest 25% of data).

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample lower quartile value.
0 (zero) if size of sample array of integer numbers is equal to 0.
NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is lower quartile value.

Note:These functions rearrange array elements during lower quartile computing.

Upper quartile

C
Unsigned integer types
uint8_t Statistics_UpperQuartile_uint8 (uint8_t array[], size_t size);
uint16_t Statistics_UpperQuartile_uint16 (uint16_t array[], size_t size);
uint32_t Statistics_UpperQuartile_uint32 (uint32_t array[], size_t size);
uint64_t Statistics_UpperQuartile_uint64 (uint64_t array[], size_t size);

Signed integer types
sint8_t Statistics_UpperQuartile_sint8 (sint8_t array[], size_t size);
sint16_t Statistics_UpperQuartile_sint16 (sint16_t array[], size_t size);
sint32_t Statistics_UpperQuartile_sint32 (sint32_t array[], size_t size);
sint64_t Statistics_UpperQuartile_sint64 (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics_UpperQuartile_flt32 (flt32_t array[], size_t size);
flt64_t Statistics_UpperQuartile_flt64 (flt64_t array[], size_t size);

Other types
size_t Statistics_UpperQuartile_size (size_t array[], size_t size);

C++
Unsigned integer types
uint8_t Statistics::UpperQuartile (uint8_t array[], size_t size);
uint16_t Statistics::UpperQuartile (uint16_t array[], size_t size);
uint32_t Statistics::UpperQuartile (uint32_t array[], size_t size);
uint64_t Statistics::UpperQuartile (uint64_t array[], size_t size);

Signed integer types
sint8_t Statistics::UpperQuartile (sint8_t array[], size_t size);
sint16_t Statistics::UpperQuartile (sint16_t array[], size_t size);
sint32_t Statistics::UpperQuartile (sint32_t array[], size_t size);
sint64_t Statistics::UpperQuartile (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics::UpperQuartile (flt32_t array[], size_t size);
flt64_t Statistics::UpperQuartile (flt64_t array[], size_t size);

Other types
size_t Statistics::UpperQuartile (size_t array[], size_t size);

Description: Compute sample upper quartile (splits highest 25% of data).

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample upper quartile value.
0 (zero) if size of sample array of integer numbers is equal to 0.
NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is upper quartile value.

Note:These functions rearrange array elements during upper quartile computing.

Mid-range

Cflt32_t Statistics_Midrange_flt32 (const flt32_t array[], size_t size);
flt64_t Statistics_Midrange_flt64 (const flt64_t array[], size_t size);

C++flt32_t Statistics::Midrange (const flt32_t array[], size_t size);
flt64_t Statistics::Midrange (const flt64_t array[], size_t size);

Description: Compute sample mid-range.

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample mid-range value.
NaN (not a number) if size of sample array is equal to 0, or sample data has NaN values.

Function list top

Measures of variability

Another fundamental task in many statistical analyses is to characterize the spread, or variability, of a data set. Measures of scale are simply attempts to estimate this variability. And most popular of them are: variance, standard deviation, absolute deviation, interquartile range and range.

Variance

Cflt32_t Statistics_Variance_flt32 (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics_Variance_flt64 (const flt64_t array[], size_t size, flt64_t mean);

C++flt32_t Statistics::Variance (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics::Variance (const flt64_t array[], size_t size, flt64_t mean);

Description: Compute sample variance.

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)
mean - sample mean value (theoretic or computed by Mean function)

Return value:

Sample variance value.
NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Standard deviation

Cflt32_t Statistics_StandardDeviation_flt32 (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics_StandardDeviation_flt64 (const flt64_t array[], size_t size, flt64_t mean);

C++flt32_t Statistics::StandardDeviation (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics::StandardDeviation (const flt64_t array[], size_t size, flt64_t mean);

Description: Compute sample standard deviation (square root of variance).

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)
mean - sample mean value (theoretic or computed by Mean function)

Return value:

Sample standard deviation value.
NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Absolute deviation

Cflt32_t Statistics_AbsoluteDeviation_flt32 (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics_AbsoluteDeviation_flt64 (const flt64_t array[], size_t size, flt64_t mean);

C++flt32_t Statistics::AbsoluteDeviation (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics::AbsoluteDeviation (const flt64_t array[], size_t size, flt64_t mean);

Description: Compute sample absolute deviation (sum of absolute differences between elements and sample mean or median).

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)
mean - sample mean or median value

Return value:

Sample absolute deviation value.
NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Interquartile range

C
Unsigned integer types
uint8_t Statistics_InterquartileRange_uint8 (uint8_t array[], size_t size);
uint16_t Statistics_InterquartileRange_uint16 (uint16_t array[], size_t size);
uint32_t Statistics_InterquartileRange_uint32 (uint32_t array[], size_t size);
uint64_t Statistics_InterquartileRange_uint64 (uint64_t array[], size_t size);

Signed integer types
uint8_t Statistics_InterquartileRange_sint8 (sint8_t array[], size_t size);
uint16_t Statistics_InterquartileRange_sint16 (sint16_t array[], size_t size);
uint32_t Statistics_InterquartileRange_sint32 (sint32_t array[], size_t size);
uint64_t Statistics_InterquartileRange_sint64 (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics_InterquartileRange_flt32 (flt32_t array[], size_t size);
flt64_t Statistics_InterquartileRange_flt64 (flt64_t array[], size_t size);

Other types
size_t Statistics_InterquartileRange_size (size_t array[], size_t size);

C++
Unsigned integer types
uint8_t Statistics::InterquartileRange (uint8_t array[], size_t size);
uint16_t Statistics::InterquartileRange (uint16_t array[], size_t size);
uint32_t Statistics::InterquartileRange (uint32_t array[], size_t size);
uint64_t Statistics::InterquartileRange (uint64_t array[], size_t size);

Signed integer types
uint8_t Statistics::InterquartileRange (sint8_t array[], size_t size);
uint16_t Statistics::InterquartileRange (sint16_t array[], size_t size);
uint32_t Statistics::InterquartileRange (sint32_t array[], size_t size);
uint64_t Statistics::InterquartileRange (sint64_t array[], size_t size);

Floating-point types
flt32_t Statistics::InterquartileRange (flt32_t array[], size_t size);
flt64_t Statistics::InterquartileRange (flt64_t array[], size_t size);

Other types
size_t Statistics::InterquartileRange (size_t array[], size_t size);

Description: Compute sample interquartile range (difference between upper and lower quartiles).

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample interquartile range value.
0 (zero) if size of sample array of integer numbers is equal to 0.
NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is interquartile range value.

Note:These functions rearrange array elements during interquartile range computing.

Tip:Interquartile range is a robust statistic, having a breakdown point of 25%. You may prefer it to total range described below.

Tip:For a symmetric distribution, half of interquartile range equals to median absolute deviation.

Range

Cflt32_t Statistics_Range_flt32 (const flt32_t array[], size_t size);
flt64_t Statistics_Range_flt64 (const flt64_t array[], size_t size);

C++flt32_t Statistics::Range (const flt32_t array[], size_t size);
flt64_t Statistics::Range (const flt64_t array[], size_t size);

Description: Compute sample range (difference between largest and smallest values).

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)

Return value:

Sample range value.
NaN (not a number) if size of sample array is equal to 0, or sample data has NaN values.

Function list top

Measures of shape

An important aspect of description of a variable is its shape, which shows frequency of values from different ranges of the variable. One is typically interested to know how well the distribution can be approximated by the normal distribution.

Skewness

Cflt32_t Statistics_Skewness_flt32 (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics_Skewness_flt64 (const flt64_t array[], size_t size, flt64_t mean);

C++flt32_t Statistics::Skewness (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics::Skewness (const flt64_t array[], size_t size, flt64_t mean);

Description: Compute sample skewness.

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)
mean - sample mean value (theoretic or computed by Mean function)

Return value:

Sample skewness value.
NaN (not a number) if size of sample array is less than or equal to 2, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Kurtosis

Cflt32_t Statistics_Kurtosis_flt32 (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics_Kurtosis_flt64 (const flt64_t array[], size_t size, flt64_t mean);

C++flt32_t Statistics::Kurtosis (const flt32_t array[], size_t size, flt32_t mean);
flt64_t Statistics::Kurtosis (const flt64_t array[], size_t size, flt64_t mean);

Description: Compute sample kurtosis.

Parameters:

array - pointer to array which holds sample data
size - size of sample array (count of elements)
mean - sample mean value (theoretic or computed by Mean function)

Return value:

Sample kurtosis value.
NaN (not a number) if size of sample array is less than or equal to 3, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Function list top

Covariance

Cflt32_t Statistics_Covariance_flt32 (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2);
flt64_t Statistics_Covariance_flt64 (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);

C++flt32_t Statistics::Covariance (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2);
flt64_t Statistics::Covariance (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);

Description: Compute sample covariance value.

Parameters:

arr1 - pointer to first array which holds sample data
arr2 - pointer to second array which holds sample data
size - size of sample arrays (count of elements)
mean1 - sample mean value of first array (theoretic or computed by Mean function)
mean2 - sample mean value of second array (theoretic or computed by Mean function)

Return value:

Sample covariance value.
NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
Inf (infinity) If floating-point overflow is occurred, during summation process.

Function list top

Correlation

Pearson correlation

Cflt32_t Statistics_PearsonCorrelation_flt32 (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2);
flt64_t Statistics_PearsonCorrelation_flt64 (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);

C++flt32_t Statistics::PearsonCorrelation (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2);
flt64_t Statistics::PearsonCorrelation (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);

Description: Compute Pearson product-moment correlation coefficient.

Parameters:

arr1 - pointer to first array which holds sample data
arr2 - pointer to second array which holds sample data
size - size of sample arrays (count of elements)
mean1 - sample mean value of first array (theoretic or computed by Mean function)
mean2 - sample mean value of second array (theoretic or computed by Mean function)

Return value:

Sample correlation value.
NaN (not a number) if size of sample arrays is equal to 0, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Fechner correlation

Cflt32_t Statistics_FechnerCorrelation_flt32 (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2);
flt64_t Statistics_FechnerCorrelation_flt64 (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);

C++flt32_t Statistics::FechnerCorrelation (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2);
flt64_t Statistics::FechnerCorrelation (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);

Description: Compute Fechner sign correlation coefficient.

Parameters:

arr1 - pointer to first array which holds sample data
arr2 - pointer to second array which holds sample data
size - size of sample arrays (count of elements)
mean1 - sample mean value of first array (theoretic or computed by tatistics::Mean function)
mean2 - sample mean value of second array (theoretic or computed by Mean function)

Return value:

Sample correlation value.
NaN (not a number) if size of sample arrays is equal to 0, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Spearman correlation

Cflt32_t Statistics_SpearmanCorrelation_flt32 (flt32_t arr1[], size_t rank1[], flt32_t arr2[], size_t rank2[], flt32_t tarr[], size_t trank[], size_t size);
flt64_t Statistics_SpearmanCorrelation_flt64 (flt64_t arr1[], size_t rank1[], flt64_t arr2[], size_t rank2[], flt64_t tarr[], size_t trank[], size_t size);

C++flt32_t Statistics::SpearmanCorrelation (flt32_t arr1[], size_t rank1[], flt32_t arr2[], size_t rank2[], flt32_t tarr[], size_t trank[], size_t size);
flt64_t Statistics::SpearmanCorrelation (flt64_t arr1[], size_t rank1[], flt64_t arr2[], size_t rank2[], flt64_t tarr[], size_t trank[], size_t size);

Description: Compute Spearman rank correlation coefficient.

Parameters:

arr1 - pointer to first array which holds sample data
rank1 - pointer to first buffer for data ranks (should be able to hold at least "size" elements)
arr2 - pointer to second array which holds sample data
rank2 - pointer to second buffer for data ranks (should be able to hold at least "size" elements)
tarr - pointer to temporary data array (should be able to hold at least "size" elements)
trank - pointer to temporary ranks array (should be able to hold at least "size" elements)
size - size of sample arrays (count of elements)

Return value:

Sample correlation value.
NaN (not a number) if size of sample arrays is equal to 0, or sample data has NaN values.
Inf (infinity) if floating-point overflow is occurred, during summation process.

Function list top

Statistics libraryStatistics.h

Contents

Function list

Measures of location

Mean

Median

Lower quartile

Upper quartile

Mid-range

Measures of variability

Variance

Standard deviation

Absolute deviation

Interquartile range

Range

Measures of shape

Skewness

Kurtosis

Covariance

Correlation

Pearson correlation

Fechner correlation

Spearman correlation