Analysis
QuickStart Module
This quickstart module shows you how to utilise aggregation pipelines and statistical functions to analyse your data.
Aggregation
Aggregation is the process of summarising portions of data to allow you to get an overview of the stored data. In OpenDataDSL we have 2 forms of aggregation:
- Distinct of unique list of values for a property
- Aggregation pipelines which can summarise multiple properties
Distinct Aggregation
Distinct aggregation gets a unique list of values from the property of any entity in OpenDataDSL.
The syntax for this is:
list = find unique property from ${service:source}
Source can be omitted if you are querying your private repository
Example of getting a unique list of ID's of public calendars:
calendars = find unique _id from ${calendar:public}
print calendars
You can use any this command on any property on any entity, but the most common use is with Objects. To get a unique list values for the property DataSet of Objects from the public database use:
datasets = find unique dataset from ${object:public}
print datasets
To do the same with your own private repository, just omit the public modifier, e.g.
datasets = find unique dataset from ${object}
print datasets
Aggregation Pipeline
An aggregation pipeline allows you to filter, group and sort whilst aggregating 1 or more fields.
An example of summing up the number of process executions grouped by status and sorted by quantity and description:
summary = aggregate ${exec}
match service="ETL"
group _id="$status", qty=count()
sort qty desc
end
print summary
Statistical Functions
To analyse TimeSeries data, we have an ever-growing list of statistical functions
An interesting function for this QuickStart module is the SimpleRegression predict function which uses regression to determine the value at a specific point in time.
The following code:
- Creates a TimeSeries with 5 values
- Runs the simpleRegression function on the TimeSeries
- Predicts the 6th value using regression
input = TimeSeries("2021-10-01", "DAILY", [12.5, 12.8, 12.9, 11.5, 11.9])
reg = simpleRegression(input)
// Predict the next value
print reg.predict(Date("2021-10-06"))
The simpleRegression function is a special type of function that returns an object with calculations and extra functionality