Statistics libraryStatistics.h
Statistics library implements most popular statistics functions for data analysis. It finds measures of location, variability, shape and others. The library provides functions which find different aspects of normally distributed values, and also has functions from robust statistics like median, quartiles and inter quartile range. The last ones usually need data sorting, but this implementation uses fast quantile location algorithm, which significantly improves these methods. See the benchmarks.
Contents
Function list
C function name | Functions | C++ function name | Functions |
---|---|---|---|
AbsoluteDeviation | 2 functions | AbsoluteDeviation | 2 functions |
Covariance | 2 functions | Covariance | 2 functions |
FechnerCorrelation | 2 functions | FechnerCorrelation | 2 functions |
InterquartileRange | 11 functions | InterquartileRange | 11 functions |
Kurtosis | 2 functions | Kurtosis | 2 functions |
LowerQuartile | 11 functions | LowerQuartile | 11 functions |
Mean | 2 functions | Mean | 2 functions |
Median | 11 functions | Median | 11 functions |
Midrange | 2 functions | Midrange | 2 functions |
PearsonCorrelation | 2 functions | PearsonCorrelation | 2 functions |
Range | 2 functions | Range | 2 functions |
Skewness | 2 functions | Skewness | 2 functions |
SpearmanCorrelation | 2 functions | SpearmanCorrelation | 2 functions |
StandardDeviation | 2 functions | StandardDeviation | 2 functions |
UpperQuartile | 11 functions | UpperQuartile | 11 functions |
Variance | 2 functions | Variance | 2 functions |
C function name | Functions | C++ function name | Functions |
Measures of location
A fundamental task in many statistical analyses is to estimate a location parameter for the distribution. In other words an analyst should find a typical or central value, which best describes the data. Most popular measures of location are: mean, median and mid-range.
Mean
Cflt32_t Statistics_Mean_flt32 (const flt32_t array[], size_t size); flt64_t Statistics_Mean_flt64 (const flt64_t array[], size_t size);
C++flt32_t Statistics::Mean (const flt32_t array[], size_t size); flt64_t Statistics::Mean (const flt64_t array[], size_t size);
Description: Compute sample mean.
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample mean value.
- NaN (not a number) if size of sample array is equal to 0, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Tip:Sample mean as measure of location works the best for normal (Gaussian) distribution. For heavy-tailed distributions use sample median instead.
Median
C Unsigned integer types uint8_t Statistics_Median_uint8 (uint8_t array[], size_t size); uint16_t Statistics_Median_uint16 (uint16_t array[], size_t size); uint32_t Statistics_Median_uint32 (uint32_t array[], size_t size); uint64_t Statistics_Median_uint64 (uint64_t array[], size_t size); Signed integer types sint8_t Statistics_Median_sint8 (sint8_t array[], size_t size); sint16_t Statistics_Median_sint16 (sint16_t array[], size_t size); sint32_t Statistics_Median_sint32 (sint32_t array[], size_t size); sint64_t Statistics_Median_sint64 (sint64_t array[], size_t size); Floating-point types flt32_t Statistics_Median_flt32 (flt32_t array[], size_t size); flt64_t Statistics_Median_flt64 (flt64_t array[], size_t size); Other types size_t Statistics_Median_size (size_t array[], size_t size);
C++ Unsigned integer types uint8_t Statistics::Median (uint8_t array[], size_t size); uint16_t Statistics::Median (uint16_t array[], size_t size); uint32_t Statistics::Median (uint32_t array[], size_t size); uint64_t Statistics::Median (uint64_t array[], size_t size); Signed integer types sint8_t Statistics::Median (sint8_t array[], size_t size); sint16_t Statistics::Median (sint16_t array[], size_t size); sint32_t Statistics::Median (sint32_t array[], size_t size); sint64_t Statistics::Median (sint64_t array[], size_t size); Floating-point types flt32_t Statistics::Median (flt32_t array[], size_t size); flt64_t Statistics::Median (flt64_t array[], size_t size); Other types size_t Statistics::Median (size_t array[], size_t size);
Description: Compute sample median.
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample median value.
- 0 (zero) if size of sample array of integer numbers is equal to 0.
- NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is median value.
Note:These functions rearrange array elements during median computing.
Tip:Sample median is a tool of robust statistics. It works well with almost any distributions. It is strongly recommended for heavy-tailed distributions.
Lower quartile
C Unsigned integer types uint8_t Statistics_LowerQuartile_uint8 (uint8_t array[], size_t size); uint16_t Statistics_LowerQuartile_uint16 (uint16_t array[], size_t size); uint32_t Statistics_LowerQuartile_uint32 (uint32_t array[], size_t size); uint64_t Statistics_LowerQuartile_uint64 (uint64_t array[], size_t size); Signed integer types sint8_t Statistics_LowerQuartile_sint8 (sint8_t array[], size_t size); sint16_t Statistics_LowerQuartile_sint16 (sint16_t array[], size_t size); sint32_t Statistics_LowerQuartile_sint32 (sint32_t array[], size_t size); sint64_t Statistics_LowerQuartile_sint64 (sint64_t array[], size_t size); Floating-point types flt32_t Statistics_LowerQuartile_flt32 (flt32_t array[], size_t size); flt64_t Statistics_LowerQuartile_flt64 (flt64_t array[], size_t size); Other types size_t Statistics_LowerQuartile_size (size_t array[], size_t size);
C++ Unsigned integer types uint8_t Statistics::LowerQuartile (uint8_t array[], size_t size); uint16_t Statistics::LowerQuartile (uint16_t array[], size_t size); uint32_t Statistics::LowerQuartile (uint32_t array[], size_t size); uint64_t Statistics::LowerQuartile (uint64_t array[], size_t size); Signed integer types sint8_t Statistics::LowerQuartile (sint8_t array[], size_t size); sint16_t Statistics::LowerQuartile (sint16_t array[], size_t size); sint32_t Statistics::LowerQuartile (sint32_t array[], size_t size); sint64_t Statistics::LowerQuartile (sint64_t array[], size_t size); Floating-point types flt32_t Statistics::LowerQuartile (flt32_t array[], size_t size); flt64_t Statistics::LowerQuartile (flt64_t array[], size_t size); Other types size_t Statistics::LowerQuartile (size_t array[], size_t size);
Description: Compute sample lower quartile (splits lowest 25% of data).
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample lower quartile value.
- 0 (zero) if size of sample array of integer numbers is equal to 0.
- NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is lower quartile value.
Note:These functions rearrange array elements during lower quartile computing.
Upper quartile
C Unsigned integer types uint8_t Statistics_UpperQuartile_uint8 (uint8_t array[], size_t size); uint16_t Statistics_UpperQuartile_uint16 (uint16_t array[], size_t size); uint32_t Statistics_UpperQuartile_uint32 (uint32_t array[], size_t size); uint64_t Statistics_UpperQuartile_uint64 (uint64_t array[], size_t size); Signed integer types sint8_t Statistics_UpperQuartile_sint8 (sint8_t array[], size_t size); sint16_t Statistics_UpperQuartile_sint16 (sint16_t array[], size_t size); sint32_t Statistics_UpperQuartile_sint32 (sint32_t array[], size_t size); sint64_t Statistics_UpperQuartile_sint64 (sint64_t array[], size_t size); Floating-point types flt32_t Statistics_UpperQuartile_flt32 (flt32_t array[], size_t size); flt64_t Statistics_UpperQuartile_flt64 (flt64_t array[], size_t size); Other types size_t Statistics_UpperQuartile_size (size_t array[], size_t size);
C++ Unsigned integer types uint8_t Statistics::UpperQuartile (uint8_t array[], size_t size); uint16_t Statistics::UpperQuartile (uint16_t array[], size_t size); uint32_t Statistics::UpperQuartile (uint32_t array[], size_t size); uint64_t Statistics::UpperQuartile (uint64_t array[], size_t size); Signed integer types sint8_t Statistics::UpperQuartile (sint8_t array[], size_t size); sint16_t Statistics::UpperQuartile (sint16_t array[], size_t size); sint32_t Statistics::UpperQuartile (sint32_t array[], size_t size); sint64_t Statistics::UpperQuartile (sint64_t array[], size_t size); Floating-point types flt32_t Statistics::UpperQuartile (flt32_t array[], size_t size); flt64_t Statistics::UpperQuartile (flt64_t array[], size_t size); Other types size_t Statistics::UpperQuartile (size_t array[], size_t size);
Description: Compute sample upper quartile (splits highest 25% of data).
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample upper quartile value.
- 0 (zero) if size of sample array of integer numbers is equal to 0.
- NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is upper quartile value.
Note:These functions rearrange array elements during upper quartile computing.
Mid-range
Cflt32_t Statistics_Midrange_flt32 (const flt32_t array[], size_t size); flt64_t Statistics_Midrange_flt64 (const flt64_t array[], size_t size);
C++flt32_t Statistics::Midrange (const flt32_t array[], size_t size); flt64_t Statistics::Midrange (const flt64_t array[], size_t size);
Description: Compute sample mid-range.
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample mid-range value.
- NaN (not a number) if size of sample array is equal to 0, or sample data has NaN values.
Measures of variability
Another fundamental task in many statistical analyses is to characterize the spread, or variability, of a data set. Measures of scale are simply attempts to estimate this variability. And most popular of them are: variance, standard deviation, absolute deviation, interquartile range and range.
Variance
Cflt32_t Statistics_Variance_flt32 (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics_Variance_flt64 (const flt64_t array[], size_t size, flt64_t mean);
C++flt32_t Statistics::Variance (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics::Variance (const flt64_t array[], size_t size, flt64_t mean);
Description: Compute sample variance.
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
- mean - sample mean value (theoretic or computed by Mean function)
Return value:
- Sample variance value.
- NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Standard deviation
Cflt32_t Statistics_StandardDeviation_flt32 (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics_StandardDeviation_flt64 (const flt64_t array[], size_t size, flt64_t mean);
C++flt32_t Statistics::StandardDeviation (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics::StandardDeviation (const flt64_t array[], size_t size, flt64_t mean);
Description: Compute sample standard deviation (square root of variance).
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
- mean - sample mean value (theoretic or computed by Mean function)
Return value:
- Sample standard deviation value.
- NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Absolute deviation
Cflt32_t Statistics_AbsoluteDeviation_flt32 (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics_AbsoluteDeviation_flt64 (const flt64_t array[], size_t size, flt64_t mean);
C++flt32_t Statistics::AbsoluteDeviation (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics::AbsoluteDeviation (const flt64_t array[], size_t size, flt64_t mean);
Description: Compute sample absolute deviation (sum of absolute differences between elements and sample mean or median).
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
- mean - sample mean or median value
Return value:
- Sample absolute deviation value.
- NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Interquartile range
C Unsigned integer types uint8_t Statistics_InterquartileRange_uint8 (uint8_t array[], size_t size); uint16_t Statistics_InterquartileRange_uint16 (uint16_t array[], size_t size); uint32_t Statistics_InterquartileRange_uint32 (uint32_t array[], size_t size); uint64_t Statistics_InterquartileRange_uint64 (uint64_t array[], size_t size); Signed integer types uint8_t Statistics_InterquartileRange_sint8 (sint8_t array[], size_t size); uint16_t Statistics_InterquartileRange_sint16 (sint16_t array[], size_t size); uint32_t Statistics_InterquartileRange_sint32 (sint32_t array[], size_t size); uint64_t Statistics_InterquartileRange_sint64 (sint64_t array[], size_t size); Floating-point types flt32_t Statistics_InterquartileRange_flt32 (flt32_t array[], size_t size); flt64_t Statistics_InterquartileRange_flt64 (flt64_t array[], size_t size); Other types size_t Statistics_InterquartileRange_size (size_t array[], size_t size);
C++ Unsigned integer types uint8_t Statistics::InterquartileRange (uint8_t array[], size_t size); uint16_t Statistics::InterquartileRange (uint16_t array[], size_t size); uint32_t Statistics::InterquartileRange (uint32_t array[], size_t size); uint64_t Statistics::InterquartileRange (uint64_t array[], size_t size); Signed integer types uint8_t Statistics::InterquartileRange (sint8_t array[], size_t size); uint16_t Statistics::InterquartileRange (sint16_t array[], size_t size); uint32_t Statistics::InterquartileRange (sint32_t array[], size_t size); uint64_t Statistics::InterquartileRange (sint64_t array[], size_t size); Floating-point types flt32_t Statistics::InterquartileRange (flt32_t array[], size_t size); flt64_t Statistics::InterquartileRange (flt64_t array[], size_t size); Other types size_t Statistics::InterquartileRange (size_t array[], size_t size);
Description: Compute sample interquartile range (difference between upper and lower quartiles).
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample interquartile range value.
- 0 (zero) if size of sample array of integer numbers is equal to 0.
- NaN (not a number) if size of sample array of floating-point numbers is equal to 0, or sample data has NaN values and it is interquartile range value.
Note:These functions rearrange array elements during interquartile range computing.
Tip:Interquartile range is a robust statistic, having a breakdown point of 25%. You may prefer it to total range described below.
Tip:For a symmetric distribution, half of interquartile range equals to median absolute deviation.
Range
Cflt32_t Statistics_Range_flt32 (const flt32_t array[], size_t size); flt64_t Statistics_Range_flt64 (const flt64_t array[], size_t size);
C++flt32_t Statistics::Range (const flt32_t array[], size_t size); flt64_t Statistics::Range (const flt64_t array[], size_t size);
Description: Compute sample range (difference between largest and smallest values).
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
Return value:
- Sample range value.
- NaN (not a number) if size of sample array is equal to 0, or sample data has NaN values.
Measures of shape
An important aspect of description of a variable is its shape, which shows frequency of values from different ranges of the variable. One is typically interested to know how well the distribution can be approximated by the normal distribution.
Skewness
Cflt32_t Statistics_Skewness_flt32 (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics_Skewness_flt64 (const flt64_t array[], size_t size, flt64_t mean);
C++flt32_t Statistics::Skewness (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics::Skewness (const flt64_t array[], size_t size, flt64_t mean);
Description: Compute sample skewness.
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
- mean - sample mean value (theoretic or computed by Mean function)
Return value:
- Sample skewness value.
- NaN (not a number) if size of sample array is less than or equal to 2, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Kurtosis
Cflt32_t Statistics_Kurtosis_flt32 (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics_Kurtosis_flt64 (const flt64_t array[], size_t size, flt64_t mean);
C++flt32_t Statistics::Kurtosis (const flt32_t array[], size_t size, flt32_t mean); flt64_t Statistics::Kurtosis (const flt64_t array[], size_t size, flt64_t mean);
Description: Compute sample kurtosis.
Parameters:
- array - pointer to array which holds sample data
- size - size of sample array (count of elements)
- mean - sample mean value (theoretic or computed by Mean function)
Return value:
- Sample kurtosis value.
- NaN (not a number) if size of sample array is less than or equal to 3, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Covariance
Cflt32_t Statistics_Covariance_flt32 (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2); flt64_t Statistics_Covariance_flt64 (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);
C++flt32_t Statistics::Covariance (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2); flt64_t Statistics::Covariance (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);
Description: Compute sample covariance value.
Parameters:
- arr1 - pointer to first array which holds sample data
- arr2 - pointer to second array which holds sample data
- size - size of sample arrays (count of elements)
- mean1 - sample mean value of first array (theoretic or computed by Mean function)
- mean2 - sample mean value of second array (theoretic or computed by Mean function)
Return value:
- Sample covariance value.
- NaN (not a number) if size of sample array is less than or equal to 1, or sample data has NaN values.
- Inf (infinity) If floating-point overflow is occurred, during summation process.
Correlation
Pearson correlation
Cflt32_t Statistics_PearsonCorrelation_flt32 (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2); flt64_t Statistics_PearsonCorrelation_flt64 (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);
C++flt32_t Statistics::PearsonCorrelation (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2); flt64_t Statistics::PearsonCorrelation (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);
Description: Compute Pearson product-moment correlation coefficient.
Parameters:
- arr1 - pointer to first array which holds sample data
- arr2 - pointer to second array which holds sample data
- size - size of sample arrays (count of elements)
- mean1 - sample mean value of first array (theoretic or computed by Mean function)
- mean2 - sample mean value of second array (theoretic or computed by Mean function)
Return value:
- Sample correlation value.
- NaN (not a number) if size of sample arrays is equal to 0, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Fechner correlation
Cflt32_t Statistics_FechnerCorrelation_flt32 (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2); flt64_t Statistics_FechnerCorrelation_flt64 (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);
C++flt32_t Statistics::FechnerCorrelation (const flt32_t arr1[], const flt32_t arr2[], size_t size, flt32_t mean1, flt32_t mean2); flt64_t Statistics::FechnerCorrelation (const flt64_t arr1[], const flt64_t arr2[], size_t size, flt64_t mean1, flt64_t mean2);
Description: Compute Fechner sign correlation coefficient.
Parameters:
- arr1 - pointer to first array which holds sample data
- arr2 - pointer to second array which holds sample data
- size - size of sample arrays (count of elements)
- mean1 - sample mean value of first array (theoretic or computed by tatistics::Mean function)
- mean2 - sample mean value of second array (theoretic or computed by Mean function)
Return value:
- Sample correlation value.
- NaN (not a number) if size of sample arrays is equal to 0, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.
Spearman correlation
Cflt32_t Statistics_SpearmanCorrelation_flt32 (flt32_t arr1[], size_t rank1[], flt32_t arr2[], size_t rank2[], flt32_t tarr[], size_t trank[], size_t size); flt64_t Statistics_SpearmanCorrelation_flt64 (flt64_t arr1[], size_t rank1[], flt64_t arr2[], size_t rank2[], flt64_t tarr[], size_t trank[], size_t size);
C++flt32_t Statistics::SpearmanCorrelation (flt32_t arr1[], size_t rank1[], flt32_t arr2[], size_t rank2[], flt32_t tarr[], size_t trank[], size_t size); flt64_t Statistics::SpearmanCorrelation (flt64_t arr1[], size_t rank1[], flt64_t arr2[], size_t rank2[], flt64_t tarr[], size_t trank[], size_t size);
Description: Compute Spearman rank correlation coefficient.
Parameters:
- arr1 - pointer to first array which holds sample data
- rank1 - pointer to first buffer for data ranks (should be able to hold at least "size" elements)
- arr2 - pointer to second array which holds sample data
- rank2 - pointer to second buffer for data ranks (should be able to hold at least "size" elements)
- tarr - pointer to temporary data array (should be able to hold at least "size" elements)
- trank - pointer to temporary ranks array (should be able to hold at least "size" elements)
- size - size of sample arrays (count of elements)
Return value:
- Sample correlation value.
- NaN (not a number) if size of sample arrays is equal to 0, or sample data has NaN values.
- Inf (infinity) if floating-point overflow is occurred, during summation process.