Skip to main content

Data Monitoring

This guide gives detailed information about best-practices for utilising the data monitoring capabilities of the OpenDataDSL platform.

Monitoring Terminology

Monitoring of the loading of data is organised using the following entities:

Process

A process defines the workflow or script that is run to extract, transform and load data into the platform.

Dataset feed

A dataset feed is a configuration which defines the time window that you expect to receive all the data for a specific feed.

The identifier for a dataset feed is of the following format:

{provider id}.{feed id}

e.g. ICE.IFEU

Dataset

A dataset is an individual product within a dataset feed, this defines the quantity of data we expect to receive on a daily basis.

The identifier for a dataset feed is of the following format:

{provider id}.{feed id}.{product id}

e.g. ICE.IFEU.B

Dataset delivery

A dataset delivery is a record of all information regarding the process of getting the data for a dataset for a single day. It also calculates a score which identifies how well the process worked for each day.

The identifier for a dataset delivery is of the following format:

{provider id}.{feed id}.{product id}:{ondate}

e.g. ICE.IFEU.B:2024-06-12

Quality Group

A quality group contains the name of a script that contains the functions that will perform quality checks on the data. It also contains the list of checks to perform on a dataset including any parameters.

A check consists of:

  • name - The free-form name you want to give to this check, e.g. Check for zero values
  • expression - The expression to use to run the check. This is usually the function call, e.g. zeroCheck(['SETTLEMENT_PRICE'])

Scoring

Every day that dataset loading occurs, a score is calculated which is a measurement used to indicate how well the process worked that day.

The scores are as follows:

  • 4 - Data was loaded on time
  • 3 - Data was maximum of 1 hour late
  • 2 - Data was between 1 hour and 4 hours late
  • 1 - Data was more than 4 hours late
  • 0 - No data was expected, e.g. a holiday

Monitoring Lifecycle

The dataset delivery record represents the lifecycle for the dataset loading for a single day.

Initialisation

Each day begins with an initialisation of the dataset delivery record. This sets the following default values:

  • status - waiting
  • score - 4 (0 if it is a holiday) ( any lateness will start reducing that number)
  • completeness - 0
  • initialised - the timestamp of when the intialisation was run

Data loaded

When data is loaded that includes a _dsid property, the following occurs:

  • Check for corrections - if the data for this data was previously loaded, it is checked to see if any of the values have changed.
  • Calculate metrics
    • Determine the quantity of tenors by tenor type loaded during this update
    • Determine if all the expected tenors have been loaded
  • Update the timeline and delivery information in the dataset delivery
  • Update the status

Quality checks

Any quality checks that have been configured on the dataset are triggered after data has been loaded. The quality checks are always performed on the entire dataset, not just the most recently loaded.

Check for lateness

Periodically, a process runs that determines if any datasets are late according to the dataset feed time window. If a dataset is determined to be late, the dataset delivery score is updated and a message is sent to any subscriptions that are triggered by a late action.

Using the Dataset Monitoring API

This section guides you through using the dataset monitoring features.

Datasets and Dataset Feeds

Datasets and dataset feeds are 'managed' in 3 different locations:

  • private - these are configured and managed by your company
  • common - these are configured and managed by us, you will only see the common datasets you have access to
  • public - these are configured and managed by us, you have access to all of these

Managing private datasets and dataset feeds

With private datasets, you can configure everything about the dataset and dataset feed, specifically:

  • Dataset ID

    As mentioned above, the dataset id comprises 3 sections: provider, feed and product. For private datasets, it is recommended to set the provider to a short-form version of your company name, e.g. for OpenDataDSL we use ODSL

  • Expected tenors

    You can either manually enter the list of expected tenors or if you have already loaded some data, you can get the system to calculate the minimum actual loaded tenors

  • Calendar

    This the the calendar that defines the days that you expect this data to be available, any non-calendar days are marked as holidays

  • Timings

    You can specify the time window when you expect the data to be ready to collect or be loaded into the system

  • Quality Group

    You can specify the quality group which defines the checks you want to perform on the data loaded to this dataset

Examples

An example of creating a private dataset feed

dsf = object as DatasetFeed
dsid = "ODSL.TRADER2"
name = "ODSL Trader2"
calendar = "BUSINESS"
timezone = "Europe/London"
time = "19:00 EU1"
late = "21:00 EU1"
end

save dsf

An example of creating a private dataset

ds = object as Dataset
dsid = "ODSL.TRADER2.NBP"
name = "Trader2 NBP Prices"
expected = SimpleObject()
end
ds.expected.set("*",12)
ds.expected.set("Month", 12)

save ds

Managing public and common datasets and dataset feeds

If you want to add public and common datasets into you monitoring, you can create references to them in your private database.

With public and common datasets, you can only configure the following:

  • Quality Group

    You can specify the quality group which defines the checks you want to perform on the data loaded to this dataset

Examples

Getting a list of all the available datasets

print ${dataset:"allds"}

Creating a reference to a common dataset

ds = object as Dataset
dsid = "ICE.NDEX.NLB"
source = "common"
end
save ds

Removing a reference to a common dataset

delete ${dataset:"info/ICE.NDEX.NLB"}

Dataset Deliveries

You can retrieve the delivery information for any datasets that you are monitoring. Any specific quality check results that you have added will also be shown in the dataset delivery information.

Examples
// Get all dataset deliveries for the current ondate
find ${dataset:"delivery"}

// Get all dataset deliveries for a specific ondate
find ${dataset:"delivery/2024-06-25"}

Dataset Quality Groups

You can retrieve the delivery information for any datasets that you are monitoring. Any specific quality check results that you have added will also be shown in the dataset delivery information.

Examples
// Create a quality group
g = QualityGroup()
g.name = "ICE.NDEX.QUALITY"
g.description = "ICE Endex Dataset Quality Checks"
g.script = "#QualityCheckDatasets"
g.addCheck("Check for zero values", "zeroCheck(['SETTLEMENT_PRICE'])")
save g

Dataset Quality Scripts

The ODSL scripts you create contain functions to perform quality checks on datasets.

The functions have access to the following variables:

  • #DSID - The string dataset id
  • #ONDATE - The date for the dataset update
  • #LOG - A log object to place all the failure information
  • #EVENTS - A list of all the events for this dataset update

Example Functions

function zeroCheck(properties)
zeroCheck = "valid"
#LOG.failures = []
print "Checking " + #EVENTS.size() + " events for dataset: " + #DSID + " for " + #ONDATE
for pc in #EVENTS
for p in properties
// Test for zero value
v = variable(pc, p)
tenor = variable(pc, "absolute")
if v.isZero()
zeroCheck = "failed"
#LOG.failures.add("Zero value for " + tenor)
end
next
next
end