Skip to main content

Statistical Functions

Functions that perform statistical analysis on data.

Frameworks and implementations for basic Descriptive statistics, frequency distributions, bivariate regression, and t-, chi-square and ANOVA test statistics. All these functions utilise numeric precision and missing value settings

geomean

Returns the geometric average value from a list of values.

Geometric mean, sometimes referred to as compounded annual growth rate or time-weighted rate of return, is the average rate of return of a set of values calculated using the products of the terms.

Geometric mean is an important tool for calculating portfolio performance for many reasons, but one of the most significant is it takes into account the effects of compounding.

Syntax

var = geomean(List or TimeSeries)

Result

A number representing the geometric average value from the input List or TimeSeries

Example

data = [1,2,3]
print geomean(data)
1.817121

max

Returns the largest value from a list of values.

Syntax

var = max(List or TimeSeries)

Result

A number representing the largest value from the input List or TimeSeries

Example

data = [5,6.7,-2]
print max(data)
6.7

mean

Returns the arithmetic average value from a list of values

Returns the arithmetic average value from a list of values using the formula:

Syntax

var = mean(List or TimeSeries)

Result

A number representing the arithmetic average value from the input List or TimeSeries

Example

data = [1,2,3]
print mean(data)
2

min

Returns the smallest value from a list of values.

Syntax

var = min(List or TimeSeries)

Result

A number representing the smallest value from the input List or TimeSeries

Example

data = [5,6.7,-2]
print min(data)
-2

simpleRegression

SimpleRegression provides ordinary least squares regression with one independent variable estimating the linear model: y = intercept + slope * x or y = slope * x

Standard errors for intercept and slope are available as well as ANOVA, r-square and Pearson's r statistics.

Observations (x,y pairs) are added as time=x and value=y

The function result can also be used to predict other values given a Date input as shown in the example

Syntax

var = simpleRegression(TimeSeries)

Result

The output from this function is an Object with properties representing some statistical calculations on the input time-series as shown in the table below:

PropertyDescription
interceptThe intercept (often labeled the constant) is the expected mean value of Y when all X=0. Start with a regression equation with one predictor, X. If X sometimes equals 0, the intercept is simply the expected mean value of Y at that value. If X never equals 0, then the intercept has no intrinsic meaning.
slopeSlope is calculated by finding the ratio of the "vertical change" to the "horizontal change" between (any) two distinct points on a line. Sometimes the ratio is expressed as a quotient ("rise over run"), giving the same number for every two distinct points on the same line.
slopeStdErrThe standard error of the regression slope, s (also called the standard error of estimate) represents the average distance that your observed values deviate from the regression line. The smaller the “s” value, the closer your values are to the regression line.
interceptStdErrThe standard error of the the intercept allows you to test whether or not the estimated intercept is statistically significant from a specified(hypothesized) value ... normally 0.0 . If you test against 0.0 and fail to reject then you can then re-estimate your model without the intercept term being present.
meanSquareErrThe mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value
NN is the number of elements in a population
RThe Pearson correlation coefficient, also referred to as Pearson's r, the Pearson product-moment correlation coefficient, or the bivariate correlation, is a statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1.
regressionSumSquaresRegression sum of squares (also known as the sum of squares due to regression or explained sum of squares) The regression sum of squares describes how well a regression model represents the modeled data. A higher regression sum of squares indicates that the model does not fit the data well.
RSquareR-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. ... So, if the R2 of a model is 0.50, then approximately half of the observed variation can be explained by the model's inputs
significanceStatistical significance is a determination that a relationship between two or more variables is caused by something other than chance. Statistical hypothesis testing is used to determine whether the result of a data set is statistically significant.
slopeConfidenceIntervalWith simple linear regression, to compute a confidence interval for the slope, the critical value is a t score with degrees of freedom equal to n - 2. ... df = n - 2 = 101 - 2 = 99. The critical value is the t statistic having 99 degrees of freedom and a cumulative probability equal to 0.995.
sumOfCrossProductsThe sum of cross products between all the elements of columns j and k is represented by Σ XrjXrk, summed over r. A matrix of sums of squares and sums of cross products is represented by X' X, as shown below. Thus, the diagonal elements of X' X are sums of squares, and the off-diagonal elements are cross products.
sumSquaredErrorsIn statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data).
totalSumSquaresThe Total SS (TSS or SST) tells you how much variation there is in the dependent variable. Total SS = Σ(Yi – mean of Y)2. Note: Sigma (Σ) is a mathematical term for summation or “adding up.” It's telling you to add up all the possible results from the rest of the equation
XSumSquaresThe sum of squares is a measure of deviation from the mean. In statistics, the mean is the average of a set of numbers and is the most commonly used measure of central tendency. The arithmetic mean is simply calculated by summing up the values in the data set and dividing by the number of values

The result also has methods as detailed in the following table:

MethodDescription
predict(date)Use the regression model to predict the y value for a new x (Date) value

Example

input = TimeSeries("DAILY")
input.add("2020-11-01", 12.5)
input.add("2020-11-02", 12.8)
input.add("2020-11-03", 12.9)
input.add("2020-11-04", 11.5)
input.add("2020-11-05", 11.9)

reg = simpleRegression(input)

print reg.slope
print reg.slopeStdErr
print reg.slopeConfidenceInterval
print reg.intercept
print reg.interceptStdErr
print reg.meanSquareErr
print reg.N
print reg.R
print reg.regressionSumSquares
print reg.RSquare
print reg.significance
print reg.sumOfCrossProducts
print reg.sumSquaredErrors
print reg.totalSumSquares
print reg.XSumSquares

// Predict the next value
print reg.predict(Date("2020-11-06"))