QuickStart: Analysis
Using aggregation pipelines and statistical functions to analyse your data.
Aggregation
Aggregation summarises portions of data to give you an overview of what is stored.
Distinct Aggregation
Get a unique list of values for any property:
//#region
// All unique IDs of public calendars
calendars = find unique _id from ${calendar:public}
print calendars
//#endregion
//#region
use training
// Unique dataset values in your private objects
datasets = find unique dataset from ${object}
print datasets
//#endregion
Aggregation Pipelines
Pipelines let you filter, group, and sort while aggregating one or more fields:
//#region
use training
// Sum process executions grouped by status, sorted by quantity descending
summary = aggregate ${exec}
match service = "ETL"
group _id = "$status", qty = count()
sort qty desc
end
print summary
//#endregion
Statistical Functions
Simple Regression
The simpleRegression function fits a linear model to a TimeSeries and can predict future values:
//#region
// Create a TimeSeries with 5 values
input = TimeSeries("2021-10-01", "DAILY", [12.5, 12.8, 12.9, 11.5, 11.9])
// Run regression
reg = simpleRegression(input)
print reg.slope
print reg.intercept
print reg.RSquare
// Predict the next day's value
print reg.predict(Date("2021-10-06"))
//#endregion
Statistical Function Library
OpenDataDSL includes an ever-growing library of statistical functions for TimeSeries analysis — including mean, stdev, min, max, sma, csum, and many more.
Next Step
In QuickStart: ETL you will extract data from remote sources, transform it, and load it into the platform.