Data model

DP³ data model

Basic elements of the DP³ data model are entities (or objects), each entity record (object instance) has a set of attributes. Each attribute has some value (associated to a particular entity), optionally associated with a timestamp (history of previous values can be stored) and confidence value.

There can also be relations between entities. A relation can also have some attributes associated to it.

TODO scheme

TODO make clear difference between entity type (object class) and entity (object instance), etc.

TODO example

Attributes

There are three main types of attributes supported by DP³, each handled quite differently:

Plain attributes
- Common attributes with only one value of some data type.
- No history is stored.
- Confidence can be stored optionally.
Observations
- A history of attribute values is stored as tuples containing the value and observation time (or time interval), optionally with confidence estimation.
- A mechanism to derive the most probable value (and its confidence) of the attribute at any given time is provided.
- This attributes may be single or multi value.
  - TODO: describe multi-value
Timeseries
- Regular or irregular timeseries, i.e. a row of timestamped numerical data.
- Multiple values per time instant are supported (multivariate time-series)
- Types of timeseries:
  - regular - regularly-sampled timeseries, i.e. time is divided into intervals of a fixed length and exactly one value (or one set of values) is assigned to each interval. For example, a temperature measured every 5 minutes. If no data are received for an interval, it's filled with N/A (nan). (TODO make it configurable, zero or nan?)
  - irregular - irregularly-sampled timeseries, i.e. a timestamp is explicitly attached to each value (or a set of values) and these timestamps doesn't generally have the same gaps between them.
  - irregular_intervals - same as irregular, but an interval (two timestamps) is attached to each value instead of a single timestamp. The intervals may overlap.

Configuration

TODO

Data ingestion (datapoint API)

Data-points

All data are written to DP³ in the form of data-points. A data-point sets a value of a given attribute of given entity. It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).

`key`	description	data-type	required?	plain	observations	timeseries
`type`	Entity type	string	mandatory	✔	✔	✔
`id`	Entity identification	string	mandatory	✔	✔	✔
`attr`	Attribute name	string	mandatory	✔	✔	✔
`v`	The value to set, depends on attr. type and data-type, see below	--	mandatory	✔	✔	✔
`t1`	Start time of the observation interval	string (rfc 3339 format)	mandatory	--	✔	✔
`t2`	End time of the observation interval	string (rfc 3339 format)	optional, default=`t1`	--	✔	✔
`c`	Confidence	float (0.0-1.0)	optional, default=1.0	✔	✔	✔
`src`	Identification of the information source	string	optional, default=""	✔	✔	✔

More details depends on the particular type of the attribute ...

Plain

TODO

Example:

{
  "type": "ip",
  "id": "192.168.0.1",
  "attr": "note",
  "v": "My home router",
  "src": "web_gui"
}

Observation

TODO (stávající data-pointy)

Example:

{
  "type": "ip",
  "id": "192.168.0.1",
  "attr": "open_ports",
  "v": [22, 80, 443],
  "t1": "2022-08-01T12:00:00",
  "t2": "2022-08-01T12:10:00",
  "src": "open_ports_module"
}

Timeseries

Timeseries are sent to DP³ in "chunks", short timeseries that can later be joined together. Each chunk bears value(s) for one or more time instants.

//The time-series datapoint looks like the other ones, but its value (v) is a 2D array - an array of arrays containing values of sub-series. In case of irregular (or irregular_intervals) timeseries, the first one (or two) arrays are the timestamps. All arrays must have the same length.

TODO: nebylo by lepší mít to jako dict polí, aby bylo jasné, které pole je která sub-series? Nechávat to jen na pořadí přijde moc náchylné k chybám.

t1 and t2 of the data-point should specify the observation period covered by this chunk. All times within v must lie between t1 and t2.

In regular time-series, time is not passed explicitly. The first value each of the sub-series is the value of the interval starting at t1, the second is of the next interval (t1 + time_step), etc. If t2 is given, it must be t1 + n*time_step, where n is the number of items in the sub-series (t2 can be omitted, in which case it's computed automatically).

For regular timeseries, the intervals of individual chunks must not overlap. Any gaps between intervals will be filled by "N/A" values (or zeros, depending on configuration - TODO).

Example of regularly sampled timeseries:

{
  ...
  "t1": "2022-08-01T12:00:00",
  "t2": "2022-08-01T12:15:00", // assuming time_step = 5 min
  "v": {
    "a": [1, 3, 0, 2]
  }
}

In irregular time-series, timestamps must always

Example of irregular timeseries:

{
  ...
  "t1": "2022-08-01T12:00:00",
  "t2": "2022-08-01T12:05:00",
  "v": {
    "time": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
    "x": [0.5, 0.8, 1.2, 0.7],
    "y": [-1, 3, 0, 0]
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly