Skip to main content

Working with TimeSeries in OpenDataDSL

Introduction

This tutorial explores a few aspects of TimeSeries in the OpenDataDSL language

Prerequisites

User requirements

This tutorial assumes some familiarity with writing code as we will be doing some basic coding in OpenDataDSL in Visual Studio Code

System requirements

You will need to have the following:

Tutorial Steps

Calendars

A TimeSeries needs a calendar in order to manage the intervals between datapoints, there are a number of built in or standard calendars, but you can also create your own.

NOTE: If the data you are storing is not observed at regular intervals, you can use the special SPARSE calendar to record both the time and value of the observation

Daily calendar

A simple example of a calendar is a Daily Calendar where we expect a value to be recorded every calendar day, e.g.

// Basic daily calendar
daily = DailyCalendar()
dates = daily.getDates("2021-01-01", "2021-01-07")
print dates

This is the output from that script:

[
"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",
"2021-01-06",
"2021-01-07"
]

Holiday calendar

A holiday calendar is an extension of the Daily calendar and allows us to define non-working days or days where we don’t expect to observe any data. A simple example of a holiday calendar is a Business Calendar where every Saturday and Sunday are non-working days, e.g.

// Business day calendar
business = BusinessCalendar()
dates = business.getDates("2021-01-01", "2021-01-07")
print dates

This displays the following:

[
"2021-01-01",
"2021-01-04",
"2021-01-05",
"2021-01-06",
"2021-01-07"
]

If we want to also exclude Mondays, we can add a rule to the calendar like this:

// Add a holiday rule
business.addRule("Every Monday")
dates = business.getDates("2021-01-01", "2021-01-07")
print dates

This now displays:

[
"2021-01-01",
"2021-01-05",
"2021-01-06",
"2021-01-07"
]

Note how Saturday, Sunday and Monday are now excluded from the output dates.

For further details about the rules you can use, see Holiday Calendar.

Creating a TimeSeries

Construction

To create a TimeSeries variable, we use the TimeSeries constructor method. This has various parameter combinations but the easiest is to construct the TimeSeries with a calendar, e.g.

// Basic Construction with a named calendar
ts1 = TimeSeries("BUSINESS")

// Basic Construction with a pre-defined calendar
business = BusinessCalendar()
ts2 = TimeSeries(business)

The 2 methods shown above construct a TimeSeries which expects values for every weekday but not on weekends, if you attempt to add a value on a weekend, you will be shown an error message.

Data Type

By default, any scalar data (numeric, string, boolean) can be added to a TimeSeries, but you can explicitly set the data type you want to store in the TimeSeries with a second parameter like this:

// Construction with a calendar and data type
ts3 = TimeSeries("BUSINESS", "string")

This will restrict the type of data that you add to the TimeSeries to only accepts strings.

Adding Observations

Once you have constructed your TimeSeries, you can now add observations to it using the add method on the TimeSeries variable, e.g.

// Add a value for the 1st of January
ts1.add("2021-01-01", 21.5)
print json(ts1)

This displays:

{
"_type": "VarTimeSeries",
"_id": "ts1",
"description": "",
"start": "2021-01-01",
"calendar": "BUSINESS",
"timezone": "UTC",
"valueType": "TRACKED",
"dataType": "any",
"properties": {},
"data": [
{
"time": "2021-01-01",
"value": 21.500000,
"status": {},
"_type": "TimeValue",
"_links": {}
}
]
}

If we try to add a value for the 2nd of January, we will get an error message because that is a Saturday and our calendar won’t allow it, e.g

ts1.add("2021-01-02", 21.5)

[9001] Index: 2021-01-02 does not align with calendar: BUSINESS

We can add a value for the following Tuesday and the calendar automaticall understands that you have missed a value for the Monday which it expected, e.g.

ts1.add("2021-01-05", 21.7)
print json(ts1)

Which displays:

{
"_type": "VarTimeSeries",
"_id": "ts1",
"description": "",
"start": "2021-01-01",
"calendar": "BUSINESS",
"timezone": "UTC",
"valueType": "TRACKED",
"dataType": "any",
"properties": {},
"data": [
{
"time": "2021-01-01",
"value": 21.500000,
"status": {},
"_type": "TimeValue",
"_links": {}
},
{
"time": "2021-01-04",
"value": "NaN",
"status": {},
"_type": "TimeValue",
"_links": {}
},
{
"time": "2021-01-05",
"value": 21.700000,
"status": {},
"_type": "TimeValue",
"_links": {}
}
]
}

Notice how it displays “NaN” for the 4th of January to show that we are missing a value for that day.

Setting Value Statuses

We can add some context about an observation by adding some status information. Status information is categorised as follows:

  • Quality
  • Source
  • Reliability

Quality Status

The quality status defines whether the value of the observation is deemed to to be representative of the TimeSeries at that point in time, so a missing value or a value that is much higher or lower than the other observations around it would be deemed to be of bad quality.

The states of the quality status are:

  • Unchecked (default)
  • Valid
  • Failed

Source Status

The source status defines where the value came from. This documents the provenance of the observations in the TimeSeries, for example if we fill in some gaps with an interpolation algorithm or create a value based on the values from other TimeSeries etc.

The states of the source status are:

  • New (default)
  • Changed
  • Implied
  • Filled
  • Converted
  • Calculated

Reliability Status

The reliability status is used for regulatory reporting regarding the trustworthiness of the observation if it is used for accounting purposes.

The states of the reliability status are:

  • Unknown (default)
  • Quoted
  • Observed
  • Unobserved

Examples

Setting the status when the value is added
// Setting the status when the value is added
ts1 = TimeSeries("BUSINESS")
ts1.add("2021-01-01", 21.5, ["Valid", "Calculated"])
print json(ts1)

Results:

{
"_type": "VarTimeSeries",
"_id": "ts1",
"description": "",
"start": "2021-01-01",
"calendar": "BUSINESS",
"timezone": "UTC",
"valueType": "TRACKED",
"dataType": "any",
"properties": {},
"data": [
{
"time": "2021-01-01",
"value": 21.500000,
"status": {
"quality": "VALID",
"source": "CALCULATED"
},
"_type": "TimeValue",
"_links": {}
}
]
}
Setting the value on an observation in a TimeSeries
// Setting the status on a value
ts1 = TimeSeries("BUSINESS")
ts1.add("2021-01-01", 21.5)
ts1["2021-01-01"].status.quality = "Valid"
print json(ts1)
Setting additional information
// Setting the status and info on a value
ts1 = TimeSeries("BUSINESS")
ts1.add("2021-01-01", 21.5)
ts1["2021-01-01"].status.quality = "Valid"
ts1["2021-01-01"].status.quality.info = "Tested for missing and spikes"
print json(ts1)

Results:

{
"_type": "VarTimeSeries",
"_id": "ts1",
"description": "",
"start": "2021-01-01",
"calendar": "BUSINESS",
"timezone": "UTC",
"valueType": "TRACKED",
"dataType": "any",
"properties": {},
"data": [
{
"time": "2021-01-01",
"value": 21.500000,
"status": {
"quality": "VALID:Tested for missing and spikes"
},
"_type": "TimeValue",
"_links": {}
}
]
}

Properties on TimeSeries

There are some standard properties on TimeSeries, but you can also add your own custom properties on a TimeSeries.

Standard properties

Standard Properties are:

  • currency
  • units
  • timezone
  • tenor
  • source
  • description

Here is an example of setting these properties:

// Standard properties
ts1 = TimeSeries("BUSINESS")
ts1.currency = "GBP"
ts1.units = "KG"
ts1.timezone = "Europe/London"
ts1.tenor = "SPOT"
ts1.source = "Trader A"
ts1.description = "My Timeseries"
print json(ts1)

Results:

{
"_type": "VarTimeSeries",
"_id": "ts1",
"description": "My Timeseries",
"start": "",
"calendar": "BUSINESS",
"currency": "GBP",
"units": "KG",
"timezone": "Europe/London",
"tenor": "SPOT",
"source": "Trader A",
"valueType": "TRACKED",
"dataType": "any",
"properties": {},
"data": []
}

Custom properties

Adding custom properties is similar to adding properties to an Object, e.g.

// Custom properties
ts1 = TimeSeries("BUSINESS")
ts1.name = "My TimeSeries"

// You can add any type of variable
ts1.geolocation = Point([51.51958, -0.12695])

// You can also nest properties
ts1.address = Object()
ts1.address.city = "London"
ts1.address.country = "England"
print json(ts1)

Results:

{
"_type": "VarTimeSeries",
"_id": "ts1",
"description": "",
"start": "",
"calendar": "BUSINESS",
"timezone": "UTC",
"valueType": "TRACKED",
"dataType": "any",
"properties": {
"address": {
"_id": "address",
"city": "London",
"country": "England"
},
"geolocation": {
"type": "Point",
"coordinates": [
51.51958,
-0.12695
]
},
"name": "My TimeSeries"
},
"data": []
}

TimeSeries operations and functions

You can perform mathematical operations and functions on TimeSeries, this section of the tutorial explorers some of the things you can do.

Operations

TimeSeries can be used in expressions involving other TimeSeries and scalars with the following operators:

    • (add)
    • (subtract)
  • / (divide)
    • (multiply)
  • ^ (power)

Scalar operations

If operated on using a scalar, all observations of the TimeSeries will be operated on using that scalar, e.g.

// Create a TimeSeries
t1 = TimeSeries("BUSINESS")

// Add some values
t1.add("2021-01-01", [21.5, 22.6, 23.5, 24.1])

// Create a new TimeSeries with the value 2 added to all values of t1
t2 = t1 + 2

// Display the values
print t2.values

Results:

[
2021-01-01 23.500000
2021-01-04 24.600000
2021-01-05 25.500000
2021-01-06 26.100000
]

TimeSeries operations

if operated on using another TimeSeries, only observations that are present at the same index in both TimeSeries are operated on, e.g.

// Create a TimeSeries
t1 = TimeSeries("BUSINESS")

// Add some values
t1.add("2021-01-01", [21.5, 22.6, 23.5, 24.1])

// Create another TimeSeries
t2 = TimeSeries("BUSINESS")

// Add some values
t2.add("2021-01-01", [1.5, 2.6, 2.5])

// Add them together
t3 = t1 + t2

// Display the values
print t3.values

Results:

[
2021-01-01 23.0
2021-01-04 25.200000
2021-01-05 26.0
]

### Functions

There are various functions in OpenDataDSL that work with TimeSeries - all of them are listed [here].

#### Basic Statistics

In this tutorial we will take a look at some of the statistics functions which allow you to statistically analyse the observations of a TimeSeries.

```js
// Create a TimeSeries
t1 = TimeSeries("2021-01-01", "BUSINESS", [12.5,12.6,12.7,12.8,12.9])

// Print out some statistics
print min(t1)
print max(t1)
print mean(t1)
print geomean(t1)
print count(t1)
print sum(t1)
print sumsq(t1)
print stdev(t1)
print var(t1)

Results:

12.500000
12.900000
12.700000
12.699213
5
63.500000
806.550000
0.158114
0.025000

Regression

Some TimeSeries functions produce an object with multiple resultant values, and example of this is the simpleRegression function.

// Create a TimeSeries
t1 = TimeSeries("2021-01-01", "BUSINESS", [12.5,12.6,12.7,12.8,12.9])

// Execute and print out the results of the simple regression
sr = simpleRegression(t1)
print sr

Results:

{
"_id": "sr",
"properties": {
"N": 5.0,
"R": 0.961524,
"RSquare": 0.924528,
"XSumSquares": 1.58257152E11,
"intercept": -1217.688679,
"interceptStdErr": 202.961497,
"meanSquareError": 0.002516,
"regressionSumSquares": 0.092453,
"significance": 0.009007,
"slope": 1.0E-6,
"slopeConfidenceInterval": 0.0,
"slopeStdErr": 0.0,
"sumOfCrossProducts": 120960.0,
"sumSquaredErrors": 0.007547,
"totalSumSquares": 0.1
},
"_version": {
"version": 0,
"tag": []
},
"_type": "VarObjectWrapper",
"_links": {}
}

The individual values can be extracted simply using the name, e.g.

print sr.intercept

> -1217.688679