diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..5115c37f --- /dev/null +++ b/404.html @@ -0,0 +1,1479 @@ + + + +
+ + + + + + + + + + + + + + + +DP³'s has HTTP API which you can use to post datapoints and to read data stored in DP³.
+As the API is made using FastAPI, there is also an interactive documentation available at /docs
endpoint.
There are several API endpoints:
+GET /
: check if API is running (just returns It works!
message)POST /datapoints
: insert datapoints into DP³GET /entity/<entity_type>
: list current snapshots of all entities of given typeGET /entity/<entity_type>/<entity_id>
: get data of entity with given entity idGET /entity/<entity_type>/<entity_id>/get/<attr_id>
: get attribute valueGET /entity/<entity_type>/<entity_id>/set/<attr_id>
: set attribute valueGET /entities
: list entity configurationGET /control/<action>
: send a pre-defined action into execution queue.Health check.
+GET /
200 OK
:
{
+ "detail": "It works!"
+}
POST /datapoints
All data are written to DP³ in the form of datapoints. A datapoint sets a value of a given attribute of given entity.
+It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).
+Payload to this endpoint is JSON array of datapoints. For example:
+ +Key | +Description | +Data-type | +Required? | +Plain | +Observations | +Timeseries | +
---|---|---|---|---|---|---|
type |
+Entity type | +string | +mandatory | +✔ | +✔ | +✔ | +
id |
+Entity identification | +string | +mandatory | +✔ | +✔ | +✔ | +
attr |
+Attribute name | +string | +mandatory | +✔ | +✔ | +✔ | +
v |
+The value to set, depends on attr. type and data-type, see below | +-- | +mandatory | +✔ | +✔ | +✔ | +
t1 |
+Start time of the observation interval | +string (RFC 3339 format) | +mandatory | +-- | +✔ | +✔ | +
t2 |
+End time of the observation interval | +string (RFC 3339 format) | +optional, default=t1 |
+-- | +✔ | +✔ | +
c |
+Confidence | +float (0.0-1.0) | +optional, default=1.0 | +-- | +✔ | +✔ | +
src |
+Identification of the information source | +string | +optional, default="" | +✔ | +✔ | +✔ | +
More details depends on the particular type of the attribute.
+{
+ "type": "ip",
+ "id": "192.168.0.1",
+ "attr": "note",
+ "v": "My home router",
+ "src": "web_gui"
+}
+
{
+ "type": "ip",
+ "id": "192.168.0.1",
+ "attr": "open_ports",
+ "v": [22, 80, 443],
+ "t1": "2022-08-01T12:00:00",
+ "t2": "2022-08-01T12:10:00",
+ "src": "open_ports_module"
+}
+
regular
:
{
+ ...
+ "t1": "2022-08-01T12:00:00",
+ "t2": "2022-08-01T12:20:00", // assuming time_step = 5 min
+ "v": {
+ "a": [1, 3, 0, 2]
+ }
+}
+
irregular
: timestamps must always be present
{
+ ...
+ "t1": "2022-08-01T12:00:00",
+ "t2": "2022-08-01T12:05:00",
+ "v": {
+ "time": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
+ "x": [0.5, 0.8, 1.2, 0.7],
+ "y": [-1, 3, 0, 0]
+ }
+}
+
irregular_interval
:
{
+ ...
+ "t1": "2022-08-01T12:00:00",
+ "t2": "2022-08-01T12:05:00",
+ "v": {
+ "time_first": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
+ "time_last": ["2022-08-01T12:01:00", "2022-08-01T12:01:15", "2022-08-01T12:03:00", "2022-08-01T12:03:40"],
+ "x": [0.5, 0.8, 1.2, 0.7],
+ "y": [-1, 3, 0, 0]
+ }
+}
+
Can be represented using both plain attributes and observations. The difference will be only +in time specification. Two examples using observations:
+no data - link<mac>
: just the eid is sent
{
+ "type": "ip",
+ "id": "192.168.0.1",
+ "attr": "mac_addrs",
+ "v": "AA:AA:AA:AA:AA",
+ "t1": "2022-08-01T12:00:00",
+ "t2": "2022-08-01T12:10:00"
+}
+
with additional data - link<ip, int>
: The eid and the data are sent as a dictionary.
{
+ "type": "ip",
+ "id": "192.168.0.1",
+ "attr": "ip_dep",
+ "v": {"eid": "192.168.0.2", "data": 22},
+ "t1": "2022-08-01T12:00:00",
+ "t2": "2022-08-01T12:10:00"
+}
+
200 OK
:
400 Bad request
:
Returns some validation error message, for example:
+1 validation error for DataPointObservations_some_field
+v -> some_embedded_dict_field
+ field required (type=value_error.missing)
+
List latest snapshots of all ids present in database under entity.
+Contains only latest snapshot.
+Uses pagination.
+GET /entity/<entity_type>
Optional query parameters:
+Get data of entity's eid.
+Contains all snapshots and master record. Snapshots are ordered by ascending creation time.
+GET /entity/<entity_type>/<entity_id>
Optional query parameters:
+Get attribute value
+Value is either of:
+GET /entity/<entity_type>/<entity_id>/get/<attr_id>
Optional query parameters:
+Set current value of attribute
+Internally just creates datapoint for specified attribute and value.
+This endpoint is meant for editable
plain attributes -- for direct user edit on DP3 web UI.
POST /entity/<entity_type>/<entity_id>/set/<attr_id>
Required request body:
+ +List entities
+Returns dictionary containing all entities configured -- their simplified configuration and current state information.
+GET /entities
{
+ "<entity_id>": {
+ "id": "<entity_id>",
+ "name": "<entity_spec.name>",
+ "attribs": "<MODEL_SPEC.attribs(e_id)>",
+ "eid_estimate_count": "<DB.estimate_count_eids(e_id)>"
+ },
+ ...
+}
+
Execute Action - Sends the given action into execution queue.
+You can see the enabled actions in /config/control.yml
, available are:
make_snapshots
- Makes an out-of-order snapshot of all entitiesGET /control/<action>
DP³ is generic platform for data processing. +It's currently used in systems for management of network devices in CESNET, +but during development we focused on making DP³ as universal as possible.
+This page describes the high-level architecture of DP³ and the individual components.
+The base unit of data that DP³ uses is called a data-point, which looks like this:
+{
+ "type": "ip", // (1)!
+ "id": "192.168.0.1", // (2)!
+ "attr": "open_ports", // (3)!
+ "v": [22, 80, 443], // (4)!
+ "t1": "2022-08-01T12:00:00", // (5)!
+ "t2": "2022-08-01T12:10:00",
+ "src": "open_ports_module" // (6)!
+}
+
type
.id
. attr
field specifies the attribute of the data-point.v
field.t1
and t2
field. src
field.This example shows an example of an observations data-point (given it has a validity interval), +to learn more about the different types of data-points, please see the API documentation.
+The DP³ architecture as shown in the figure above consists of several components, +where the DP³ provided components are shown in blue:
+The application-specific components, shown in yellow-orange, are as following:
+yml
files determines the entities and their attributes,
+ together with the specifics of platform behavior on these entities.
+ For details of entity configuration, please see the database entities configuration page.The distinction between primary and secondary modules is such that primary modules
+ send data-points into the system using the HTTP API, while secondary modules react
+ to the data present in the system, e.g.: altering the data-flow in an application-specific manner,
+ deriving additional data based on incoming data-points or performing data correlation on entity snapshots.
+ For primary module implementation, the API documentation may be useful,
+ also feel free to check out the dummy_sender script in /scripts/dummy_sender.py
.
+ A comprehensive secondary module API documentation is under construction, for the time being,
+ refer to the CallbackRegistrar code reference or
+ check out the test modules in /modules/
or /tests/modules/
.
The final remaining component is the web interface, which is ultimately application-specific. + A generic web interface, or a set of generic components is a planned part of DP³, but is yet to be implemented. + The API provides a variety of endpoints which should enable you to create any view of the data you may require.
+This section describes the data flow within the platform.
+ +The above figure shows a zoomed in view of the worker-process from the architecture figure.
+Incoming Tasks, which carry data-points from the API,
+are passed to secondary module callbacks configured on new data point, or around entity creation.
+These modules may create additional data points or perform any other action.
+When all registered callbacks are processed, the resulting data is written to two collections:
+The data-point (DP) history collection, where the raw data-points are stored until archivation,
+and the profile history collection, where a document is stored for each entity id with the relevant history.
+You can find these collections in the database under the names {entity}#raw
and {entity}#master
.
DP³ periodically creates new profile snapshots, triggered by the Scheduler.
+Snapshots take the profile history, and compute the current value of the profile,
+reducing each attribute history to a single value.
+The snapshot creation frequency is configurable.
+Snapshots are created on a per-entity basis, but all linked entities are processed at the same time.
+This means that when snapshots are created, the registered snapshot callbacks
+can access any linked entities for their data correlation needs.
+After all the correlation callbacks are called, the snapshot is written to the profile snapshot collection,
+for which it can be accessed via the API. The collection is accessible under the name {entity}#snapshots
.
File database.yml
specifies mainly MongoDB database connection details and credentials.
It looks like this:
+connection:
+ username: "dp3_user"
+ password: "dp3_password"
+ address: "127.0.0.1"
+ port: 27017
+ db_name: "dp3_database"
+
Connection details contain:
+Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
username |
+string | +dp3 |
+Username for connection to DB. Escaped using urllib.parse.quote_plus . |
+
password |
+string | +dp3 |
+Password for connection to DB. Escaped using urllib.parse.quote_plus . |
+
address |
+string | +localhost |
+IP address or hostname for connection to DB. | +
port |
+int | +27017 | +Listening port of DB. | +
db_name |
+string | +dp3 |
+Database name to be utilized by DP³. | +
Files in db_entities
folder describe entities and their attributes. You can think of entity as class from object-oriented programming.
Below is YAML file (e.g. db_entities/bus.yml
) corresponding to bus tracking system example from Data model chapter.
entity:
+ id: bus
+ name: Bus
+attribs:
+ # Attribute `label`
+ label:
+ name: Label
+ description: Custom label for the bus.
+ type: plain
+ data_type: string
+ editable: true
+
+ # Attribute `location`
+ location:
+ name: Location
+ description: Location of the bus in a particular time. Value are GPS \
+ coordinates (array of latitude and longitude).
+ type: observations
+ data_type: array<float>
+ history_params:
+ pre_validity: 1m
+ post_validity: 1m
+ max_age: 30d
+
+ # Attribute `speed`
+ speed:
+ name: Speed
+ description: Speed of the bus in a particular time. In km/h.
+ type: observations
+ data_type: float
+ history_params:
+ pre_validity: 1m
+ post_validity: 1m
+ max_age: 30d
+
+ # Attribute `passengers_in_out`
+ passengers_in_out:
+ name: Passengers in/out
+ description: Number of passengers getting in or out of the bus. Distinguished by the doors used (front, middle, back). Regularly sampled every 10 minutes.
+ type: timeseries
+ timeseries_type: regular
+ timeseries_params:
+ max_age: 14d
+ time_step: 10m
+ series:
+ front_in:
+ data_type: int
+ front_out:
+ data_type: int
+ middle_in:
+ data_type: int
+ middle_out:
+ data_type: int
+ back_in:
+ data_type: int
+ back_out:
+ data_type: int
+
+ # Attribute `driver` to link the driver of the bus at a given time.
+ driver:
+ name: Driver
+ description: Driver of the bus at a given time.
+ type: observations
+ data_type: link<driver>
+ history_params:
+ pre_validity: 1m
+ post_validity: 1m
+ max_age: 30d
+
Entity is described simply by:
+Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
id |
+string (identifier) | +(mandatory) | +Short string identifying the entity type, it's machine name (must match regex [a-zA-Z_][a-zA-Z0-9_-]* ). Lower-case only is recommended. |
+
name |
+string | +(mandatory) | +Attribute name for humans. May contain any symbols. | +
Each attribute is specified by the following set of parameters:
+These apply to all types of attributes (plain, observations and timeseries).
+Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
id |
+string (identifier) | +(mandatory) | +Short string identifying the attribute, it's machine name (must match this regex [a-zA-Z_][a-zA-Z0-9_-]* ). Lower-case only is recommended. |
+
type |
+string | +(mandatory) | +Type of attribute. Can be either plain , observations or timeseries . |
+
name |
+string | +(mandatory) | +Attribute name for humans. May contain any symbols. | +
description |
+string | +"" |
+Longer description of the attribute, if needed. | +
color |
+#xxxxxx |
+null |
+Color to use in GUI (useful mostly for tag values), not used currently. | +
Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
data_type |
+string | +(mandatory) | +Data type of attribute value, see Supported data types. | +
categories |
+array of strings | +null |
+List of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example). |
+
editable |
+bool | +false |
+Whether value of this attribute is editable via web interface. | +
Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
data_type |
+string | +(mandatory) | +Data type of attribute value, see Supported data types. | +
categories |
+array of strings | +null |
+List of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example). |
+
editable |
+bool | +false |
+Whether value of this attribute is editable via web interface. | +
confidence |
+bool | +false |
+Whether a confidence value should be stored along with data value or not. | +
multi_value |
+bool | +false |
+Whether multiple values can be set at the same time. | +
history_params |
+object, see below | +(mandatory) | +History and time aggregation parameters. A subobject with fields described in the table below. | +
history_force_graph |
+bool | +false |
+By default, if data type of attribute is array, we show it's history on web interface as table. This option can force tag-like graph with comma-joined values of that array as tags. | +
Description of history_params
subobject (see table above).
Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
max_age |
+<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
+null |
+How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). | +
max_items |
+int (> 0) | +null |
+How many data-points/intervals to store (oldest ones are removed when limit is exceeded). Currently not implemented. | +
expire_time |
+<int><s/m/h/d> or inf (infinity) |
+infinity | +How long after the end time (t2 ) is the last value considered valid (i.e. is used as "current value"). Zero (0 ) means to strictly follow t1 , t2 . Zero can be specified without a unit (s/m/h/d ). Currently not implemented. |
+
pre_validity |
+<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
+0s |
+Max time before t1 for which the data-point's value is still considered to be the "current value" if there's no other data-point closer in time. |
+
post_validity |
+<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
+0s |
+Max time after t2 for which the data-point's value is still considered to be the "current value" if there's no other data-point closer in time. |
+
Note: At least one of max_age
and max_items
SHOULD be defined, otherwise the amount of stored data can grow unbounded.
Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
timeseries_type |
+string | +(mandatory) | +One of: regular , irregular or irregular_intervals . See chapter Data model for explanation. |
+
series |
+object of objects, see below | +(mandatory) | +Configuration of series of data represented by this timeseries attribute. | +
timeseries_params |
+object, see below | ++ | Other timeseries parameters. A subobject with fields described by the table below. | +
Description of series
subobject (see table above).
Key for series
object is id
- short string identifying the series (e.g. bytes
, temperature
, parcels
).
Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
type |
+string | +(mandatory) | +Data type of series. Only int and float are allowed (also time , but that's used internally, see below). |
+
Time series
(axis) is added implicitly by DP³ and this behaviour is specific to selected timeseries_type
:
"time": { "data_type": "time" }
"time": { "data_type": "time" }
"time_first": { "data_type": "time" }, "time_last": { "data_type": "time" }
Description of timeseries_params
subobject (see table above).
Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
max_age |
+<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
+null |
+How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). | +
time_step |
+<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
+(mandatory) for regular timeseries, null otherwise |
+"Sampling rate in time" of this attribute. For example, with time_step = 10m we expect data-point at 12:00, 12:10, 12:20, 12:30,... Only relevant for regular timeseries. |
+
Note: max_age
SHOULD be defined, otherwise the amount of stored data can grow unbounded.
List of supported values for parameter data_type
:
tag
: set/not_set (When the attribute is set, its value is always assumed to be true
, the "v" field doesn't have to be stored.)binary
: true
/false
/not_set (Attribute value is true
or false
, or the attribute is not set at all.)category<data_type; category1, category2, ...>
: Categorical values. Use only when a fixed set of values should be allowed, which should be specified in the second part of the type definition. The first part of the type definition describes the data_type of the category.string
int
: 32-bit signed integer (range from -2147483648 to +2147483647)int64
: 64-bit signed integer (use when the range of normal int
is not sufficent)float
time
: Timestamp in YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM]
format or timestamp since 1.1.1970 in seconds or milliseconds.ip4
: IPv4 address (passed as dotted-decimal string)ip6
: IPv6 address (passed as string in short or full format)mac
: MAC address (passed as string)link<entity_type>
: Link to a record of the specified type, e.g. link<ip>
link<entity_type,data_type>
: Link to a record of the specified type, carrying additional data, e.g. link<ip,int>
array<data_type>
: An array of values of specified data type (which must be one of the types above), e.g. array<int>
set<data_type>
: Same as array, but values can't repeat and order is irrelevant.dict<keys>
: Dictionary (object) containing multiple values as subkeys. keys should contain a comma-separated list of key names and types separated by colon, e.g. dict<port:int,protocol:string,tag?:string>
. By default, all fields are mandatory (i.e. a data-point missing some subkey will be refused), to mark a field as optional, put ?
after its name. Only the following data types can be used here: binary,category,string,int,float,time,ip4,ip6,mac
. Multi-level dicts are not supported.json
: Any JSON object can be stored, all processing is handled by user's code. This is here for special cases which can't be mapped to any data type above.Event logging is done using Redis and allows to count arbitrary events across +multiple processes (using shared counters in Redis) and in various time +intervals.
+More information can be found in Github repository of EventCountLogger.
+Configuration file event_logging.yml
looks like this:
redis:
+ host: localhost
+ port: 6379
+ db: 1
+
+groups:
+ # Main events of Task execution
+ te:
+ events:
+ - task_processed
+ - task_processing_error
+ intervals: [ "5m", "2h" ] # (1)!
+ sync-interval: 1 # (2)!
+ # Number of processed tasks by their "src" attribute
+ tasks_by_src:
+ events: [ ]
+ auto_declare_events: true
+ intervals: [ "5s", "5m" ]
+ sync-interval: 1
+
This section describes Redis connection details:
+Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
host |
+string | +localhost |
+IP address or hostname for connection to Redis. | +
port |
+int | +6379 | +Listening port of Redis. | +
db |
+int | +0 | +Index of Redis DB used for the counters (it shouldn't be used for anything else). | +
The default configuration groups enables logging of events in task execution, namely
+task_processed
and task_processing_error
.
To learn more about the group configuration for EventCountLogger, +please refer to the official documentation.
+ + + + + + +History manager is reponsible for deleting old records from master records in database.
+Configuration file history_manager.yml
is very simple:
Parameter tick_rate
sets interval how often (in minutes) should DP³ check if any data in master record of observations and timeseries attributes isn't too old and if there's something too old, removes it. To control what is considered as "too old", see parameter max_age
in Database entities configuration.
DP³ configuration folder consists of these files and folders:
+db_entities/
+modules/
+common.yml
+database.yml
+event_logging.yml
+history_manager.yml
+processing_core.yml
+snapshots.yml
+
Their meaning and usage is explained in following chapters.
+Example configuration is included config/
folder in DP³ repository.
Folder modules/
optionally contains any module-specific configuration.
This configuration doesn't have to follow any required format (except being YAML files).
+In secondary modules, you can access the configuration:
+ +Here, the MODULE_NAME
corresponds to MODULE_NAME.yml
file in modules/
folder.
Processing core's configuration in processing_core.yml
file looks like this:
msg_broker:
+ host: localhost
+ port: 5672
+ virtual_host: /
+ username: dp3_user
+ password: dp3_password
+worker_processes: 2
+worker_threads: 16
+modules_dir: "../dp3_modules"
+enabled_modules:
+ - "module_one"
+ - "module_two"
+
Message broker section describes connection details to RabbitMQ (or compatible) broker.
+Parameter | +Data-type | +Default value | +Description | +
---|---|---|---|
host |
+string | +localhost |
+IP address or hostname for connection to broker. | +
port |
+int | +5672 | +Listening port of broker. | +
virtual_host |
+string | +/ |
+Virtual host for connection to broker. | +
username |
+string | +guest |
+Username for connection to broker. | +
password |
+string | +guest |
+Password for connection to broker. | +
Number of worker processes. This has to be at least 1.
+If changing number of worker processes, the following process must be followed:
+/scripts/rmq_reconfigure.sh
supervisorctl
) and start all inputs againNumber of worker threads per process.
+This may be higher than number of CPUs, because this is not primarily intended +to utilize computational power of multiple CPUs (which Python cannot do well +anyway due to the GIL), but to mask long I/O operations (e.g. queries to +external services via network).
+Path to directory with plug-in (secondary) modules.
+Relative path is evaluated relative to location of this configuration file.
+List of plug-in modules which should be enabled in processing pipeline.
+Name of module filename without .py
extension must be used!
Snapshots configuration is straightforward. Currently, it only sets creation_rate
- period in minutes for creating new snapshots (30 minutes by default).
File snapshots.yml
looks like this:
Basic elements of the DP³ data model are entities (or objects), each entity +record (object instance) has a set of attributes. +Each attribute has some value (associated to a particular entity), +timestamp (history of previous values can be stored) +and optionally confidence value.
+Entities may be mutually connected. See Relationships below.
+In this chapter, we will illustrate details on an exemplary system. Imagine you +are developing data model for bus tracking system. You have to store these data:
+Also, map displaying current position of all buses is required.
+(In case you are interested, configuration of database entities for this system +is available in DB entities chapter.)
+To make everything clear and more readable, all example references below are +typesetted as quotes.
+There are 3 types of attributes:
+Common attributes with only one value of some data type. +There's no history stored, but timestamp of last change is available.
+Very useful for:
+data from external source, when you only need to have current value
+notes and other manually entered information
+++This is exactly what we need for label in our bus tracking system. +Administor labels particular bus inside web interface and we use this label +until it's changed - particularly display label next to a marker on a map. +No history is needed and it has 100% confidence.
+
Attributes with history of values at some time or interval of time. +Consequently, we can derive value at any time (most often not now) from these values.
+Each value may have associated confidence.
+These attributes may be single or multi value (multiple current values in one point in time).
+Very useful for data where both current value and history is needed.
+++In our example, location is great use-case for observations type. +We need to track position of the bus in time and store the history. Current +location is very important. Let's suppose, we also need to do oversampling by +predicting where is the bus now, eventhout we received last data-point 2 minutes +ago. This is all possible (predictions using custom secondary modules).
+The same applies to speed. It can also be derived from location.
+
One or more numeric values for a particular time.
+In this attribute type: history > current value. +In fact, no explicit current value is provided.
+Very useful for:
+any kind of history-based analysis
+logging of events/changes
+May be:
+regular: sampling is regular
+ Example: datapoint is created every x minutes
irregular: sampling is irregular
+ Example: datapoint is created when some event occurs
irregular intervals: sampling is irregular and includes two timestamps (from when till when were provided data gathered)
+ Example: Some event triggers 5 minute monitoring routine. When this routine finishes, it creates datapoint containing all the data from past 5 minutes.
++Timeseries are very useful for passengers getting in and out (from our example). +As we need to count two directions (in/out) for three doors (front/middle/back), +we create 6 series (e.g.
+front_in
,front_out
, ...,back_out
). +Counter data-points are received in 10 minute interval, so regular timeseries +are best fit for this use-case. +Every 10 minutes we receive values for all 6 series and store them. +Current value is not important as these data are only useful for passenger +flow analysis throught whole month/year/...
Relationships between entities can be represented with or without history. +They are realized using the link attribute type. +Depedning on whether the history is important, they can be configured using as the mentioned +plain data or observations.
+Relationships can contain additional data, if that fits the modelling needs of your use case.
+Very useful for:
+++As our example so far contains only one entity, we currently have no need for relationships. +However, if we wanted to track the different bus drivers driving individual buses, +relationships would come in quite handy. +The bus driver is a separate entity, and can drive multiple buses during the day. +The current bus driver will be represented as an observation link between the bus and the driver, +as can be seen in the resulting configuration.
+
Now that you have an understanding of the data model and the types of attributes, +you might want to check out the details of DB configuration, +where you will find the parameters for each attribute type +and the data types supported by the platform.
+ + + + + + +This page provides the basic info on where to start with writing documentation. +If you feel lost at any point, please check out the documentation of MkDocs +and Material for MkDocs, with which this documentation is built.
+mkdocs.yml # The configuration file.
+docs/
+ index.md # The documentation homepage.
+ gen_ref_pages.py # Script for generating the code reference.
+ ... # Other markdown pages, images and other files.
+
The docs/
folder contains all source Markdown files for the documentation.
You can find all documentation settings in mkdocs.yml
. See the nav
section for mapping of the left navigation tab and the Markdown files.
To see the changes made to the documentation page locally, a local instance of mkdocs
is required.
+You can install all the required packages using:
After installing, you can use the following mkdocs
commands:
mkdocs serve
- Start the live-reloading docs server.mkdocs build
- Build the documentation site.mkdocs -h
- Print help message and exit.As the entire documentation is written in Markdown, all base Markdown syntax is supported. This means headings, bold text, italics, inline code
, tables and many more.
This set of options can be further extended, if you ever find the need. See the possibilities in the Material theme reference.
+markdown_extensions
section in mkdocs.yml
for all enabled extensions.To reference an anchor within a page, such as a heading, use a Markdown link to the specific anchor, for example: Commands.
+If you're not sure which identifier to use, you can look at a heading's anchor by clicking the heading in your Web browser, either in the text itself, or in the table of contents.
+If the URL is https://example.com/some/page/#anchor-name
then you know that this item is possible to link to with [<displayed text>](#anchor-name)
. (Tip taken from mkdocstrings)
To make a reference to another page within the documentation, use the path to the Markdown source file, followed by the desired anchor. For example, this link was created as [link](index.md#repository-structure)
.
When making references to the generated Code Reference, there are two options. Links can be made either using the standard Markdown syntax, where some reverse-engineering of the generated files is required, or, with the support of mkdocstrings, using the [example][full.path.to.object]
syntax. A real link like this can be for example this one to the Platform Model Specification.
Code reference is generated using mkdocstrings and the Automatic code reference pages recipe from their documentation.
+The generation of pages is done using the docs/gen_ref_pages.py
script. The script is a slight modification of what is recommended within the mentioned recipe.
Mkdocstrings itself enables generating code documentation from its docstrings using a path.to.object
syntax.
+Here is an example of documentation for dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register
method:
register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
Registers passed timeseries hook to be called during snapshot creation.
+Binds hook to specified entity_type and attr_type (though same hook can be bound +multiple times). +If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, str, list[dict]], list[DataPointTask]]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
attr_type |
+
+ str
+ |
+
+
+
+ specifies attribute type + |
+ + required + | +
There are additional options that can be specified, which affect the way the documentation is presented. For more on these options, see here.
+Even if you create a duplicate code reference description, the mkdocstring-style link still leads to the code reference, as you can see here.
+The documentation is updated and deployed automatically with each push to selected branches thanks to the configured GitHub Action, which can be found in: .github/workflows/deploy.yml
.
DP³ is a platform helps to keep a database of information (attributes) about individual +entities (designed for IP addresses and other network identifiers, but may be anything), +when the data constantly changes in time.
+DP³ doesn't do much by itself, it must be supplemented by application-specific modules providing +and processing data.
+This is a basis of CESNET's "Asset Discovery Classification and Tagging" (ADiCT) project, +focused on discovery and classification of network devices, +but the platform itself is general and should be usable for any kind of data.
+For an introduction about how it works, see please check out the +architecture, data-model +and database config pages.
+Then you should be able to create a DP³ app using the provided setup utility as described in the install page and start tinkering!
+dp3
- Python package containing code of the processing core and the APIconfig
- default/example configurationinstall
- deployment configurationWhen talking about installing the DP³ platform, a distinction must be made between installing +for platform development, installing for application development (i.e. platform usage) +and installing for application and platform deployment. +We will cover all three cases separately.
+Pre-requisites: Python 3.9 or higher, pip
(with virtualenv
installed), git
, Docker
and Docker Compose
.
Create a virtualenv and install the DP³ platform using:
+python3 -m venv venv # (1)!
+source venv/bin/activate # (2)!
+python -m pip install --upgrade pip # (3)!
+pip install git+https://github.com/CESNET/dp3.git@new_dp3#egg=dp3
+
python3
does not work, try py -3
or python
instead.venv/Scripts/activate.bat
pip>=21.0.1
for the pyproject.toml
support.
+ If your pip is up-to-date, you can skip this step.To create a new DP³ application we will use the included dp3-setup
utility. Run:
So for example, to create an application called my_app
in the current directory, run:
This produces the following directory structure: +
📂 .
+ ├── 📠config # (1)!
+ │ ├── 📄 api.yml
+ │ ├── 📄 control.yml
+ │ ├── 📄 database.yml
+ │ ├── 📠db_entities # (2)!
+ │ ├── 📄 event_logging.yml
+ │ ├── 📄 history_manager.yml
+ │ ├── 📠modules # (3)!
+ │ ├── 📄 processing_core.yml
+ │ ├── 📄 README.md
+ │ └── 📄 snapshots.yml
+ ├── 📠docker # (4)!
+ │ ├── 📠python
+ │ └── 📠rabbitmq
+ ├── 📄 docker-compose.app.yml
+ ├── 📄 docker-compose.yml
+ ├── 📠modules # (5)!
+ │ └── 📄 test_module.py
+ ├── 📄 README.md # (6)!
+ └── 📄 requirements.txt
+
config
directory contains the configuration files for the DP³ platform. For more details,
+ please check out the configuration documentation.config/db_entities
directory contains the database entities of the application.
+ This defines the data model of your application.
+ For more details, you may want to check out the data model and the
+ DB entities documentation.config/modules
directory is where you can place the configuration specific to your modules.docker
directory contains the Dockerfiles for the RabbitMQ and python images,
+ tailored to your application. modules
directory contains the modules of your application. To get started,
+ a single module called test_module
is included.
+ For more details, please check out the Modules page.README.md
file contains some instructions to get started.
+ Edit it to your liking.To run the application, we first need to setup the other services the platform depends on,
+such as the MongoDB database, the RabbitMQ message distribution and the Redis database.
+This can be done using the supplied docker-compose.yml
file. Simply run:
-d
flag runs the services in the background, so you can continue working in the same terminal.
+ The --build
flag forces Docker to rebuild the images, so you can be sure you are running the latest version.
+ If you want to run the services in the foreground, omit the -d
flag.The state of running containers can be checked using:
+ +which will display the state of running processes. The logs of the services can be displayed using:
+ +which will display the logs of all services, or:
+ +which will display only the logs of the given service. + (In this case, the services are rabbitmq, mongo, mongo_express, and redis)
+We can now focus on running the platform and developing or testing. After you are done, simply run:
+ +which will stop and remove all containers, networks and volumes created by docker compose up
.
There are two main ways to run the application itself. First is a little more hand-on, +and allows easier debugging. +There are two main kinds of processes in the application: the API and the worker processes.
+To run the API, simply run:
+ +The starting configuration sets only a single worker process, which you can run using:
+ +The second way is to use the docker-compose.app.yml
file, which runs the API and the worker processes
+in separate containers. To run the API, simply run:
Either way, to test that everything is running properly, you can run: +
+Which should return a JSON response with the following content: +
+You are now ready to start developing your application!
+Pre-requisites: Python 3.9 or higher, pip
(with virtualenv
installed), git
, Docker
and Docker Compose
.
Pull the repository and install using:
+git clone --branch new_dp3 git@github.com:CESNET/dp3.git dp3
+cd dp3
+python3 -m venv venv # (1)!
+source venv/bin/activate # (2)!
+python -m pip install --upgrade pip # (3)!
+pip install --editable ".[dev]" # (4)!
+pre-commit install # (5)!
+
python3
does not work, try py -3
or python
instead.venv/Scripts/activate.bat
pip>=21.0.1
for the pyproject.toml
support.
+ If your pip is up-to-date, you can skip this step.pre-commit
and mkdocs
.pre-commit
hooks to automatically format and lint the code before committing.With the dependencies, the pre-commit package is installed.
+You can verify the installation using pre-commit --version
.
+Pre-commit is used to automatically unify code formatting and perform code linting.
+The hooks configured in .pre-commit-config.yaml
should now run automatically on every commit.
In case you want to make sure, you can run pre-commit run --all-files
to see it in action.
The DP³ platform is now installed and ready for development.
+To run it, we first need to set up the other services the platform depends on,
+such as the MongoDB database, the RabbitMQ message distribution and the Redis database.
+This can be done using the supplied docker-compose.yml
file. Simply run:
-d
flag runs the services in the background, so you can continue working in the same terminal.
+ The --build
flag forces Docker to rebuild the images, so you can be sure you are running the latest version.
+ If you want to run the services in the foreground, omit the -d
flag.Docker Compose can be installed as a standalone (older v1) or as a plugin (v2), +the only difference is when executing the command:
+++Note that Compose standalone uses the dash compose syntax instead of current’s standard syntax (space compose). +For example: type
+docker-compose up
when using Compose standalone, instead ofdocker compose up
.
This documentation uses the v2 syntax, so if you have the standalone version installed, adjust accordingly.
+After the first compose up
command, the images for RabbitMQ, MongoDB and Redis will be downloaded,
+their images will be built according to the configuration and all three services will be started.
+On subsequent runs, Docker will use the cache, so if the configuration does not change, the download
+and build steps will not be repeated.
The configuration is taken implicitly from the docker-compose.yml
file in the current directory.
+The docker-compose.yml
configuration contains the configuration for the services,
+as well as a testing setup of the DP³ platform itself.
+The full configuration is in tests/test_config
.
+The setup includes one worker process and one API process to handle requests.
+The API process is exposed on port 5000, so you can send requests to it using curl
or from your browser:
curl -X 'POST' 'http://localhost:5000/datapoints' \
+ -H 'Content-Type: application/json' \
+ --data '[{"type": "test_entity_type", "id": "abc", "attr": "test_attr_int", "v": 123, "t1": "2023-07-01T12:00:00", "t2": "2023-07-01T13:00:00"}]'
+
The state of running containers can be checked using:
+ +which will display the state of running processes. The logs of the services can be displayed using:
+ +which will display the logs of all services, or:
+ +which will display only the logs of the given service. + (In this case, the services are rabbitmq, mongo, redis, receiver_api and worker)
+We can now focus on running the platform and developing or testing. After you are done, simply run:
+ +which will stop and remove all containers, networks and volumes created by docker compose up
.
With the testing platform setup running, we can now run tests.
+Tests are run using the unittest
framework and can be run using:
python -m unittest discover \
+ -s tests/test_common \
+ -v
+CONF_DIR=tests/test_config \
+python -m unittest discover \
+ -s tests/test_api \
+ -v
+
For extending of this documentation, please refer to the Extending page.
+ + + + + + +DP³ enables its users to create custom modules to perform application specific data analysis. +Modules are loaded using a plugin-like architecture and can influence the data flow from the +very first moment upon handling the data-point push request.
+As described in the Architecture page, DP³ uses a categorization of modules +into primary and secondary modules. +The distinction between primary and secondary modules is such that primary modules +send data-points into the system using the HTTP API, while secondary modules react +to the data present in the system, e.g.: altering the data-flow in an application-specific manner, +deriving additional data based on incoming data-points or performing data correlation on entity snapshots.
+This page covers the DP³ API for secondary modules,
+for primary module implementation, the API documentation may be useful,
+also feel free to check out the dummy_sender script in /scripts/dummy_sender.py
.
First, make a directory that will contain all modules of the application.
+For example, let's assume that the directory will be called /modules/
.
As mentioned in the Processing core configuration page,
+the modules directory must be specified in the modules_dir
configuration option.
+Let's create the main module file now - assuming the module will be called my_awesome_module
,
+create a file /modules/my_awesome_module.py
.
Finally, to make the processing core load the module, add the module name to the enabled_modules
+configuration option, e.g.:
modules_dir: "/modules/"
+enabled_modules:
+ - "my_awesome_module"
+
Here is a basic skeleton for the module file:
+import logging
+
+from dp3.common.base_module import BaseModule
+from dp3.common.config import PlatformConfig
+from dp3.common.callback_registrar import CallbackRegistrar
+
+
+class MyAwesomeModule(BaseModule):
+ def __init__(self,
+ _platform_config: PlatformConfig,
+ _module_config: dict,
+ _registrar: CallbackRegistrar
+ ):
+ self.log = logging.getLogger("MyAwesomeModule")
+
All modules must subclass the BaseModule
class.
+If a class does not subclass the BaseModule
class,
+it will not be loaded and activated by the main DP³ worker.
+The declaration of BaseModule
is as follows:
class BaseModule(ABC):
+
+ @abstractmethod
+ def __init__(
+ self,
+ platform_config: PlatformConfig,
+ module_config: dict,
+ registrar: CallbackRegistrar
+ ):
+ pass
+
At initialization, each module receives a PlatformConfig
,
+a module_config
dictionary and a
+CallbackRegistrar
.
+For the module to do anything, it must read the provided configuration from platform_config
and
+module_config
and register callbacks to perform data analysis using the registrar
object.
+Let's go through them one at a time.
PlatformConfig
contains the entire DP³ platform configuration,
+which includes the application name, worker counts, which worker processes is the module running in
+and a ModelSpec
which contains the entity specification.
If you want to create configuration specific to the module itself, create a .yml
configuration file
+named as the module itself inside the modules/
folder,
+as described in the modules configuration page.
+This configuration will be then loaded into the module_config
dictionary for convenience.
The registrar:
CallbackRegistrar
object
+provides the API to register callbacks to be called during the data processing.
For callbacks that need to be called periodically,
+the scheduler_register
+is used.
+The specific times the callback will be called are defined using the CRON schedule expressions.
+Here is a simplified example from the HistoryManager module:
registrar.scheduler_register(
+ self.delete_old_dps, minute="*/10" # (1)!
+)
+registrar.scheduler_register(
+ self.archive_old_dps, minute=0, hour=2 # (2)!
+)
+
By default, the callback will receive no arguments, but you can pass static arguments for every call
+using the func_args
and func_kwargs
keyword arguments.
+The function return value will always be ignored.
The complete documentation can be found at the
+scheduler_register
page.
+As DP³ utilizes the APScheduler package internally
+to realize this functionality, specifically the CronTrigger
, feel free to check their documentation for more details.
There are a number of possible places to register callback functions during data-point processing.
+on_task_start
hook¶A hook will be called on task processing start.
+The callback is registered using the
+register_task_hook
method.
+Required signature is Callable[[DataPointTask], Any]
, as the return value is ignored.
+It may be useful for implementing custom statistics.
def task_hook(task: DataPointTask):
+ print(task.etype)
+
+registrar.register_task_hook("on_task_start", task_hook)
+
allow_entity_creation
hook¶Receives eid and Task, may prevent entity record creation (by returning False).
+The callback is registered using the
+register_entity_hook
method.
+Required signature is Callable[[str, DataPointTask], bool]
.
def entity_creation(eid: str, task: DataPointTask) -> bool:
+ return eid.startswith("1")
+
+registrar.register_entity_hook(
+ "allow_entity_creation", entity_creation, "test_entity_type"
+)
+
on_entity_creation
hook¶Receives eid and Task, may return new DataPointTasks.
+The callback is registered using the
+register_entity_hook
method.
+Required signature is Callable[[str, DataPointTask], list[DataPointTask]]
.
def processing_function(eid: str, task: DataPointTask) -> list[DataPointTask]:
+ output = does_work(task)
+ return [DataPointTask(
+ model_spec=task.model_spec,
+ etype="mac",
+ eid=eid,
+ data_points=[{
+ "etype": "test_enitity_type",
+ "eid": eid,
+ "attr": "derived_on_creation",
+ "src": "secondary/derived_on_creation",
+ "v": output
+ }]
+ )]
+
+registrar.register_entity_hook(
+ "on_entity_creation", processing_function, "test_entity_type"
+)
+
There are register points for all attribute types:
+on_new_plain
, on_new_observation
, on_new_ts_chunk
.
Callbacks are registered using the
+register_attr_hook
method.
+The callback allways receives eid, attribute and Task, and may return new DataPointTasks.
+The required signature is Callable[[str, DataPointBase], list[DataPointTask]]
.
def attr_hook(eid: str, dp: DataPointBase) -> list[DataPointTask]:
+ ...
+ return []
+
+registrar.register_attr_hook(
+ "on_new_observation", attr_hook, "test_entity_type", "test_attr_type",
+)
+
Timeseries hooks are run before snapshot creation, and allow to process the accumulated +timeseries data into observations / plain attributes to be accessed in snapshots.
+Callbacks are registered using the
+register_timeseries_hook
method.
+The expected callback signature is Callable[[str, str, list[dict]], list[DataPointTask]]
,
+as the callback should expect entity_type, attr_type and attribute history as arguments
+and return a list of DataPointTask objects.
def timeseries_hook(
+ entity_type: str, attr_type: str, attr_history: list[dict]
+) -> list[DataPointTask]:
+ ...
+ return []
+
+
+registrar.register_timeseries_hook(
+ timeseries_hook, "test_entity_type", "test_attr_type",
+)
+
Correlation callbacks are called during snapshot creation, and allow to perform analysis +on the data of the snapshot.
+The register_correlation_hook
+method expects a callable with the following signature:
+Callable[[str, dict], None]
, where the first argument is the entity type, and the second is a dict
+containing the current values of the entity and its linked entities.
As correlation hooks can depend on each other, the hook inputs and outputs must be specified
+using the depends_on and may_change arguments. Both arguments are lists of lists of strings,
+where each list of strings is a path from the specified entity type to individual attributes (even on linked entities).
+For example, if the entity type is test_entity_type
, and the hook depends on the attribute test_attr_type1
,
+the path is simply [["test_attr_type1"]]
. If the hook depends on the attribute test_attr_type1
+of an entity linked using test_attr_link
, the path will be [["test_attr_link", "test_attr_type1"]]
.
def correlation_hook(entity_type: str, values: dict):
+ ...
+
+registrar.register_correlation_hook(
+ correlation_hook, "test_entity_type", [["test_attr_type1"]], [["test_attr_type2"]]
+)
+
The order of running callbacks is determined automatically, based on the dependencies.
+If there is a cycle in the dependencies, a ValueError
will be raised at registration.
+Also, if the provided dependency / output paths are invalid, a ValueError
will be raised.
The module is free to run its own code in separate threads or processes.
+To synchronize such code with the platform, use the start()
and stop()
+methods of the BaseModule
class.
+the start()
method is called after the platform is initialized, and the stop()
method
+is called before the platform is shut down.
class MyModule(BaseModule):
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+ self._thread = None
+ self._stop_event = threading.Event()
+ self.log = logging.getLogger("MyModule")
+
+ def start(self):
+ self._thread = threading.Thread(target=self._run, daemon=True)
+ self._thread.start()
+
+ def stop(self):
+ self._stop_event.set()
+ self._thread.join()
+
+ def _run(self):
+ while not self._stop_event.is_set():
+ self.log.info("Hello world!")
+ time.sleep(1)
+
Datapoint logger
+Logs good/bad datapoints into file for further analysis. +They are logged in JSON format. +Bad datapoints are logged together with their error message.
+Logging may be disabled in api.yml
configuration file:
dp3/api/internal/dp_logger.py
Creates new logger instance with log_file
as target
dp3/api/internal/dp_logger.py
Logs good datapoints
+Datapoints are logged one-by-one in processed form. +Source should be IP address of incomping request.
+ +dp3/api/internal/dp_logger.py
Logs bad datapoints including the validation error message
+Whole request body is logged at once (JSON string is expected). +Source should be IP address of incomping request.
+ +dp3/api/internal/dp_logger.py
+ Bases: BaseModel
Entity specification and current state
+Merges (some) data from DP3's EntitySpec
and state information from Database
.
+Provides estimate count of master records in database.
+ Bases: BaseModel
List of entity eids and their data based on latest snapshot
+Includes timestamp of latest snapshot creation.
+Data does not include history of observations attributes and timeseries.
+ + + + + +
+ Bases: BaseModel
Data of entity eid
+Includes all snapshots and master record.
+empty
signalizes whether this eid includes any data.
+ Bases: BaseModel
Value and/or history of entity attribute for given eid
+Depends on attribute type: +- plain: just (current) value +- observations: (current) value and history stored in master record (optionally filtered) +- timeseries: just history stored in master record (optionally filtered)
+ + + + + +
+ Bases: BaseModel
Value of entity attribute for given eid
+The value is fetched from master record.
+ + + + + +Converts API datapoint values to DP3 datapoint
+If etype-attr pair doesn't exist in DP3 config, raises ValueError
.
+If values are not valid, raises pydantic's ValidationError.
dp3/api/internal/helpers.py
+ Bases: BaseModel
Data-point for API
+Contains single raw data value received on API. +This is generic class for plain, observation and timeseries datapoints.
+Provides front line of validation for this data value.
+This differs slightly compared to DataPoint
from DP3 in naming of attributes due to historic
+reasons.
After validation of this schema, datapoint is validated using attribute-specific validator to +ensure full compilance.
+ + + + + +
+ Bases: BaseModel
Healthcheck endpoint response
+ + + + + +
+ Bases: BaseModel
Generic success response
+ + + + + +
+ Bases: HTTPException
HTTP exception wrapper to simplify path and query validation
+ + +dp3/api/internal/response_models.py
async
+
+
+¶Middleware to check entity existence
+ + +async
+
+
+¶List latest snapshots of all id
s present in database under entity
.
Contains only latest snapshot.
+Uses pagination.
+ +dp3/api/routers/entity.py
async
+
+
+¶get_eid_data(entity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidData
+
Get data of entity
's eid
.
Contains all snapshots and master record. +Snapshots are ordered by ascending creation time.
+ +dp3/api/routers/entity.py
async
+
+
+¶get_eid_attr_value(entity: str, eid: str, attr: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidAttrValueOrHistory
+
Get attribute value
+Value is either of: +- current value: in case of plain attribute +- current value and history: in case of observation attribute +- history: in case of timeseries attribute
+ +dp3/api/routers/entity.py
async
+
+
+¶set_eid_attr_value(entity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request) -> SuccessResponse
+
Set current value of attribute
+Internally just creates datapoint for specified attribute and value.
+This endpoint is meant for editable
plain attributes -- for direct user edit on DP3 web UI.
dp3/api/routers/entity.py
async
+
+
+¶Health check
+Returns simple 'It works!' response.
+ + +async
+
+
+¶Insert datapoints
+Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.
+ +dp3/api/routers/root.py
async
+
+
+¶List entities
+Returns dictionary containing all entities configured -- their simplified configuration +and current state information.
+ +dp3/api/routers/root.py
Run the DP3 API using uvicorn.
+ + + +DP3 Setup Script for creating a DP3 application.
+ + + +Replace all occurrences of template
with the given text.
dp3/bin/setup.py
+ Bases: Flag
Enum of attribute types
+PLAIN
= 1
+OBSERVATIONS
= 2
+TIMESERIES
= 4
classmethod
+
+
+¶Convert string representation like "plain" to AttrType.
+ +dp3/common/attrspec.py
+ Bases: BaseModel
History parameters field of observations attribute
+ + + + + +
+ Bases: BaseModel
Timeseries parameters field of timeseries attribute
+ + + + + +
+ Bases: BaseModel
Series of timeseries attribute
+ + + + + +
+ Bases: BaseModel
Base of attribute specification
+Parent of other AttrSpec
classes.
+ Bases: AttrSpecGeneric
Parent of non-timeseries AttrSpec
classes.
+ Bases: AttrSpecClassic
Plain attribute specification
+ + +dp3/common/attrspec.py
+ Bases: AttrSpecClassic
Observations attribute specification
+ + +dp3/common/attrspec.py
+ Bases: AttrSpecGeneric
Timeseries attribute specification
+ + +dp3/common/attrspec.py
Factory for AttrSpec
classes
dp3/common/attrspec.py
+ Bases: ABC
Abstract class for platform modules. +Every module must inherit this abstract class for automatic loading of module!
+ + +Initialize the module and register callbacks.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
platform_config |
+
+ PlatformConfig
+ |
+
+
+
+ Platform configuration class + |
+ + required + | +
module_config |
+
+ dict
+ |
+
+
+
+ Configuration of the module,
+equivalent of |
+ + required + | +
registrar |
+
+ CallbackRegistrar
+ |
+
+
+
+ A callback / hook registration interface + |
+ + required + | +
dp3/common/base_module.py
Run the module - used to run own thread if needed.
+Called after initialization, may be used to create and run a separate +thread if needed by the module. Do nothing unless overridden.
+ +dp3/common/base_module.py
Stop the module - used to stop own thread.
+Called before program exit, may be used to finalize and stop the +separate thread if it is used. Do nothing unless overridden.
+ +dp3/common/base_module.py
Interface for callback registration.
+ + +dp3/common/callback_registrar.py
scheduler_register(func: Callable, *, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int
+
Register a function to be run at specified times.
+Pass cron-like specification of when the function should be called, +see docs +of apscheduler.triggers.cron for details. +`
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
func |
+
+ Callable
+ |
+
+
+
+ function or method to be called + |
+ + required + | +
func_args |
+
+ Union[list, tuple]
+ |
+
+
+
+ list of positional arguments to call func with + |
+
+ None
+ |
+
func_kwargs |
+
+ dict
+ |
+
+
+
+ dict of keyword arguments to call func with + |
+
+ None
+ |
+
year |
+
+ Union[int, str]
+ |
+
+
+
+ 4-digit year + |
+
+ None
+ |
+
month |
+
+ Union[int, str]
+ |
+
+
+
+ month (1-12) + |
+
+ None
+ |
+
day |
+
+ Union[int, str]
+ |
+
+
+
+ day of month (1-31) + |
+
+ None
+ |
+
week |
+
+ Union[int, str]
+ |
+
+
+
+ ISO week (1-53) + |
+
+ None
+ |
+
day_of_week |
+
+ Union[int, str]
+ |
+
+
+
+ number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun) + |
+
+ None
+ |
+
hour |
+
+ Union[int, str]
+ |
+
+
+
+ hour (0-23) + |
+
+ None
+ |
+
minute |
+
+ Union[int, str]
+ |
+
+
+
+ minute (0-59) + |
+
+ None
+ |
+
second |
+
+ Union[int, str]
+ |
+
+
+
+ second (0-59) + |
+
+ None
+ |
+
timezone |
+
+ str
+ |
+
+
+
+ Timezone for time specification (default is UTC). + |
+
+ 'UTC'
+ |
+
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+
+
+
+ job ID + |
+
dp3/common/callback_registrar.py
Registers one of available task hooks
+See: TaskGenericHooksContainer
+in task_hooks.py
dp3/common/callback_registrar.py
Registers one of available task entity hooks
+See: TaskEntityHooksContainer
+in task_hooks.py
dp3/common/callback_registrar.py
Registers one of available task attribute hooks
+See: TaskAttrHooksContainer
+in task_hooks.py
dp3/common/callback_registrar.py
register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
Registers passed timeseries hook to be called during snapshot creation.
+Binds hook to specified entity_type
and attr_type
(though same hook can be bound
+multiple times).
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, str, list[dict]], list[DataPointTask]]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
attr_type |
+
+ str
+ |
+
+
+
+ specifies attribute type + |
+ + required + | +
Raises:
+Type | +Description | +
---|---|
+ ValueError
+ |
+
+
+
+ If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised. + |
+
dp3/common/callback_registrar.py
register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])
+
Registers passed hook to be called during snapshot creation.
+Binds hook to specified entity_type (though same hook can be bound multiple times).
+entity_type
and attribute specifications are validated, ValueError
is raised on failure.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, dict], None]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
depends_on |
+
+ list[list[str]]
+ |
+
+
+
+ each item should specify an attribute that is depended on +in the form of a path from the specified entity_type to individual attributes +(even on linked entities). + |
+ + required + | +
may_change |
+
+ list[list[str]]
+ |
+
+
+
+ each item should specify an attribute that |
+ + required + | +
Raises:
+Type | +Description | +
---|---|
+ ValueError
+ |
+
+
+
+ On failure of specification validation. + |
+
dp3/common/callback_registrar.py
Platform config file reader and config model.
+ + + +
+ Bases: dict
Extension of built-in dict
that simplifies working with a nested hierarchy of dicts.
Key may be a path (in dot notation) into a hierarchy of dicts. For example
+ dictionary.get('abc.x.y')
+is equivalent to
+ dictionary['abc']['x']['y']
.
:returns: self[key]
or default
if key is not found.
dp3/common/config.py
Update HierarchicalDict
with other dictionary and merge common keys.
If there is a key in both current and the other dictionary and values of +both keys are dictionaries, they are merged together.
+Example: +
HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})
+->
+HierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})
+
None
.
+
+ dp3/common/config.py
+ Bases: BaseModel
Class representing full specification of an entity.
+ +Attributes:
+Name | +Type | +Description | +
---|---|---|
entity |
+
+ EntitySpec
+ |
+
+
+
+ Specification and settings of entity itself. + |
+
attribs |
+
+ dict[str, AttrSpecType]
+ |
+
+
+
+ A mapping of attribute id -> AttrSpec + |
+
+ Bases: BaseModel
Class representing the platform's current entity and attribute specification.
+ +Attributes:
+Name | +Type | +Description | +
---|---|---|
config |
+
+ dict[str, EntitySpecDict]
+ |
+
+
+
+ Legacy config format, exactly mirrors the config files. + |
+
entities |
+
+ dict[str, EntitySpec]
+ |
+
+
+
+ Mapping of entity id -> EntitySpec + |
+
attributes |
+
+ dict[tuple[str, str], AttrSpecType]
+ |
+
+
+
+ Mapping of (entity id, attribute id) -> AttrSpec + |
+
entity_attributes |
+
+ dict[str, dict[str, AttrSpecType]]
+ |
+
+
+
+ Mapping of entity id -> attribute id -> AttrSpec + |
+
relations |
+
+ dict[tuple[str, str], AttrSpecType]
+ |
+
+
+
+ Mapping of (entity id, attribute id) -> AttrSpec +only contains attributes which are relations. + |
+
Provided configuration must be a dict of following structure: +
{
+ <entity type>: {
+ 'entity': {
+ entity specification
+ },
+ 'attribs': {
+ <attr id>: {
+ attribute specification
+ },
+ other attributes
+ }
+ },
+ other entity types
+}
+
Raises:
+Type | +Description | +
---|---|
+ ValueError
+ |
+
+
+
+ if the specification is invalid. + |
+
dp3/common/config.py
+ Bases: BaseModel
An aggregation of configuration available to modules.
+ +Attributes:
+Name | +Type | +Description | +
---|---|---|
app_name |
+
+ str
+ |
+
+
+
+ Name of the application, used when naming various structures of the platform + |
+
config_base_path |
+
+ str
+ |
+
+
+
+ Path to directory containing platform config + |
+
config |
+
+ HierarchicalDict
+ |
+
+
+
+ A dictionary that contains the platform config + |
+
model_spec |
+
+ ModelSpec
+ |
+
+
+
+ Specification of the platform's model (entities and attributes) + |
+
num_processes |
+
+ PositiveInt
+ |
+
+
+
+ Number of worker processes + |
+
process_index |
+
+ NonNegativeInt
+ |
+
+
+
+ Index of current process + |
+
Read configuration file and return config as a dict-like object.
+The configuration file should contain a valid YAML
+- Comments may be included as lines starting with #
(optionally preceded
+ by whitespaces).
This function reads the file and converts it to a HierarchicalDict
.
+The only difference from built-in dict
is its get
method, which allows
+hierarchical keys (e.g. abc.x.y
).
+See doc of get method for more information.
dp3/common/config.py
Same as read_config, +but it loads whole configuration directory of YAML files, +so only files ending with ".yml" are loaded. +Each loaded configuration is located under key named after configuration filename.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
dir_path |
+
+ str
+ |
+
+
+
+ Path to read config from. + |
+ + required + | +
recursive |
+
+ bool
+ |
+
+
+
+ If |
+
+ False
+ |
+
dp3/common/config.py
Module enabling remote control of the platform's internal events.
+ + + +Class enabling remote control of the platform's internal events.
+ + +dp3/common/control.py
Connect to RabbitMQ and start consuming from TaskQueue.
+ +dp3/common/control.py
Stop consuming from TaskQueue, disconnect from RabbitMQ.
+ + +Sets the handler for the given action
+ + +Acknowledges the received message and executes an action according to the task
.
This function should not be called directly, but set as callback for TaskQueueReader.
+ +dp3/common/control.py
+ Bases: BaseModel
Data-point
+Contains single raw data value received on API. +This is just base class - plain, observation or timeseries datapoints inherit from this class +(see below).
+Provides front line of validation for this data value.
+Internal usage: inside Task, created by TaskExecutor
+ + + + + +
+ Bases: DataPointBase
Plain attribute data-point
+Contains single raw data value received on API for plain attribute.
+In case of plain data-point, it's not really a data-point, but we use +the same naming for simplicity.
+ + + +
+ Bases: DataPointBase
Observations attribute data-point
+Contains single raw data value received on API for observations attribute.
+ + + + + +
+ Bases: DataPointBase
Timeseries attribute data-point
+Contains single raw data value received on API for observations attribute.
+ + + + + +Validates or sets t2 of irregular timeseries datapoint
+ +dp3/common/datapoint.py
Validates or sets t2 of irregular intervals timeseries datapoint
+ +dp3/common/datapoint.py
+ Bases: BaseModel
Data type container
+Represents one of primitive data types:
+or composite data type:
+Attributes:
+Name | +Type | +Description | +
---|---|---|
data_type |
+
+ str
+ |
+
+
+
+ type for incoming value validation + |
+
hashable |
+
+ bool
+ |
+
+
+
+ whether contained data is hashable + |
+
is_link |
+
+ bool
+ |
+
+
+
+ whether this data type is link + |
+
link_to |
+
+ str
+ |
+
+
+
+ if |
+
dp3/common/datatype.py
Determines value validator (inner data_type
)
This is not implemented inside @validator
, because it apparently doesn't work with
+__root__
models.
dp3/common/datatype.py
95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 |
|
Returns linked entity id. Raises ValueError if DataType is not a link.
+ +dp3/common/datatype.py
Whether link has data. Raises ValueError if DataType is not a link.
+ +dp3/common/datatype.py
+ Bases: BaseModel
Entity specification
+This class represents specification of an entity type (e.g. ip, asn, ...)
+ + +dp3/common/entityspec.py
Common modules which are used throughout the platform.
+Allows modules to register functions (callables) to be run at +specified times or intervals (like cron does).
+Based on APScheduler package
+ + + +Allows modules to register functions (callables) to be run +at specified times or intervals (like cron does).
+ + +dp3/common/scheduler.py
register(func: Callable, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int
+
Register a function to be run at specified times.
+Pass cron-like specification of when the function should be called, +see docs +of apscheduler.triggers.cron for details.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
func |
+
+ Callable
+ |
+
+
+
+ function or method to be called + |
+ + required + | +
func_args |
+
+ Union[list, tuple]
+ |
+
+
+
+ list of positional arguments to call func with + |
+
+ None
+ |
+
func_kwargs |
+
+ dict
+ |
+
+
+
+ dict of keyword arguments to call func with + |
+
+ None
+ |
+
year |
+
+ Union[int, str]
+ |
+
+
+
+ 4-digit year + |
+
+ None
+ |
+
month |
+
+ Union[int, str]
+ |
+
+
+
+ month (1-12) + |
+
+ None
+ |
+
day |
+
+ Union[int, str]
+ |
+
+
+
+ day of month (1-31) + |
+
+ None
+ |
+
week |
+
+ Union[int, str]
+ |
+
+
+
+ ISO week (1-53) + |
+
+ None
+ |
+
day_of_week |
+
+ Union[int, str]
+ |
+
+
+
+ number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun) + |
+
+ None
+ |
+
hour |
+
+ Union[int, str]
+ |
+
+
+
+ hour (0-23) + |
+
+ None
+ |
+
minute |
+
+ Union[int, str]
+ |
+
+
+
+ minute (0-59) + |
+
+ None
+ |
+
second |
+
+ Union[int, str]
+ |
+
+
+
+ second (0-59) + |
+
+ None
+ |
+
timezone |
+
+ str
+ |
+
+
+
+ Timezone for time specification (default is UTC). + |
+
+ 'UTC'
+ |
+
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+
+
+
+ job ID + |
+
dp3/common/scheduler.py
+ Bases: BaseModel
, ABC
A generic task type class.
+An abstraction for the task queue classes to depend upon.
+ + + + + + + +
+ Bases: Task
DataPointTask
+Contains single task to be pushed to TaskQueue and processed.
+ +Attributes:
+Name | +Type | +Description | +
---|---|---|
etype |
+
+ str
+ |
+
+
+
+ Entity type + |
+
eid |
+
+ str
+ |
+
+
+
+ Entity id / key + |
+
data_points |
+
+ list[DataPointBase]
+ |
+
+
+
+ List of DataPoints to process + |
+
tags |
+
+ list[Any]
+ |
+
+
+
+ List of tags + |
+
ttl_token |
+
+ Optional[datetime]
+ |
+
+
+
+ ... + |
+
+ Bases: Task
Snapshot
+Contains a list of entities, the meaning of which depends on the type
.
+If type
is "task", then the list contains linked entities for which a snapshot
+should be created. Otherwise type
is "linked_entities", indicating which entities
+must be skipped in a parallelized creation of unlinked entities.
Attributes:
+Name | +Type | +Description | +
---|---|---|
entities |
+
+ list[tuple[str, str]]
+ |
+
+
+
+ List of (entity_type, entity_id) + |
+
time |
+
+ datetime
+ |
+
+
+
+ timestamp for snapshot creation + |
+
auxiliary/utility functions and classes
+ + + +Parse time in RFC 3339 format and return it as naive datetime in UTC.
+Timezone specification is optional (UTC is assumed when none is specified).
+ +dp3/common/utils.py
Parse duration in format (or just "0").
Return datetime.timedelta
+ +dp3/common/utils.py
Convert special types to JSON (use as "default" param of json.dumps)
+Supported types/objects: +- datetime +- timedelta
+ +dp3/common/utils.py
Convert special JSON keys created by conv_to_json back to Python objects +(use as "object_hook" param of json.loads)
+Supported types/objects: +- datetime +- timedelta
+ +dp3/common/utils.py
Get name of function or method as pretty string.
+ +dp3/common/utils.py
+ Bases: BaseModel
MongoDB host.
+ + + + + +
+ Bases: BaseModel
MongoDB standalone configuration.
+ + + + + +
+ Bases: BaseModel
MongoDB replica set configuration.
+ + + + + +
+ Bases: BaseModel
Database configuration.
+ + + + + +MongoDB database wrapper responsible for whole communication with database server. +Initializes database schema based on database configuration.
+db_conf - configuration of database connection (content of database.yml) +model_spec - ModelSpec object, configuration of data model (entities and attributes)
+ + +dp3/database/database.py
insert_datapoints(etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False) -> None
+
Inserts datapoint to raw data collection and updates master record.
+Raises DatabaseError when insert or update fails.
+ +dp3/database/database.py
187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 |
|
Replace master record of etype
:eid
with the provided record
.
Raises DatabaseError when update fails.
+ +dp3/database/database.py
Delete old datapoints from master collection.
+Periodically called for all etype
s from HistoryManager.
dp3/database/database.py
Get current master record for etype/eid.
+If doesn't exist, returns {}.
+ +dp3/database/database.py
Get cursor to current master records of etype.
+ +dp3/database/database.py
get_worker_master_records(worker_index: int, worker_cnt: int, etype: str, **kwargs: str) -> pymongo.cursor.Cursor
+
Get cursor to current master records of etype.
+ +dp3/database/database.py
Get latest snapshot of given etype/eid.
+If doesn't exist, returns {}.
+ +dp3/database/database.py
Get latest snapshots of given etype
.
This method is useful for displaying data on web.
+ +dp3/database/database.py
get_snapshots(etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> pymongo.cursor.Cursor
+
Get all (or filtered) snapshots of given eid
.
This method is useful for displaying eid
's history on web.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
etype |
+
+ str
+ |
+
+
+
+ entity type + |
+ + required + | +
eid |
+
+ str
+ |
+
+
+
+ id of entity, to which data-points correspond + |
+ + required + | +
t1 |
+
+ Optional[datetime]
+ |
+
+
+
+ left value of time interval (inclusive) + |
+
+ None
+ |
+
t2 |
+
+ Optional[datetime]
+ |
+
+
+
+ right value of time interval (inclusive) + |
+
+ None
+ |
+
dp3/database/database.py
get_value_or_history(etype: str, attr_name: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> dict
+
Gets current value and/or history of attribute for given eid
.
Depends on attribute type: +- plain: just (current) value +- observations: (current) value and history stored in master record (optionally filtered) +- timeseries: just history stored in master record (optionally filtered)
+Returns dict with two keys: current_value
and history
(list of values).
dp3/database/database.py
Estimates count of eid
s in given etype
dp3/database/database.py
Saves snapshot to specified entity of current master document.
+ +dp3/database/database.py
Saves a list of snapshots of current master documents.
+All snapshots must belong to same entity type.
+ +dp3/database/database.py
Saves snapshot to specified entity of current master document.
+ +dp3/database/database.py
get_observation_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]
+
Get full (or filtered) history of observation attribute.
+This method is useful for displaying eid
's history on web.
+Also used to feed data into get_timeseries_history()
.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
etype |
+
+ str
+ |
+
+
+
+ entity type + |
+ + required + | +
attr_name |
+
+ str
+ |
+
+
+
+ name of attribute + |
+ + required + | +
eid |
+
+ str
+ |
+
+
+
+ id of entity, to which data-points correspond + |
+ + required + | +
t1 |
+
+ datetime
+ |
+
+
+
+ left value of time interval (inclusive) + |
+
+ None
+ |
+
t2 |
+
+ datetime
+ |
+
+
+
+ right value of time interval (inclusive) + |
+
+ None
+ |
+
sort |
+
+ int
+ |
+
+
+
+ sort by timestamps - 0: ascending order by t1, 1: descending order by t2, +None: don't sort + |
+
+ None
+ |
+
Returns:
+Type | +Description | +
---|---|
+ list[dict]
+ |
+
+
+
+ list of dicts (reduced datapoints) + |
+
dp3/database/database.py
get_timeseries_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]
+
Get full (or filtered) history of timeseries attribute. +Outputs them in format: +
+This method is useful for displayingeid
's history on web.
+
+ Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
etype |
+
+ str
+ |
+
+
+
+ entity type + |
+ + required + | +
attr_name |
+
+ str
+ |
+
+
+
+ name of attribute + |
+ + required + | +
eid |
+
+ str
+ |
+
+
+
+ id of entity, to which data-points correspond + |
+ + required + | +
t1 |
+
+ datetime
+ |
+
+
+
+ left value of time interval (inclusive) + |
+
+ None
+ |
+
t2 |
+
+ datetime
+ |
+
+
+
+ right value of time interval (inclusive) + |
+
+ None
+ |
+
sort |
+
+ int
+ |
+
+
+
+ sort by timestamps - |
+
+ None
+ |
+
Returns:
+Type | +Description | +
---|---|
+ list[dict]
+ |
+
+
+
+ list of dicts (reduced datapoints) - each represents just one point at time + |
+
dp3/database/database.py
Delete old snapshots.
+Periodically called for all etype
s from HistoryManager.
dp3/database/database.py
Return a persistent cache collection for given module name.
+ +dp3/database/database.py
Returns the name of the caller method's class, or function name if caller is not a method.
+ +dp3/database/database.py
A wrapper responsible for communication with the database server.
+ + + +
+ Bases: JSONEncoder
JSONEncoder to encode datetime using the standard ADiCT format string.
+ + + + + +HistoryManager(db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar) -> None
+
dp3/history_management/history_manager.py
Deletes old data points from master collection.
+ +dp3/history_management/history_manager.py
Deletes old snapshots.
+ +dp3/history_management/history_manager.py
Archives old data points from raw collection.
+Updates already saved archive files, if present.
+ +dp3/history_management/history_manager.py
Merge datapoints in the history with equal values and overlapping time validity.
+Avergages the confidence.
+ +dp3/history_management/history_manager.py
Module responsible for managing history saved in database, currently to clean old data.
+ + + +Platform directory structure:
+Worker - The main worker process.
+Common - Common modules which are used throughout the platform.
+Database.EntityDatabase - A wrapper responsible for communication +with the database server.
+HistoryManagement.HistoryManager - Module responsible +for managing history saved in database, currently to clean old data.
+Snapshots - SnapShooter, a module responsible +for snapshot creation and running configured data correlation and fusion hooks, +and Snapshot Hooks, which manage the registered hooks and their +dependencies on one another.
+TaskProcessing - Module responsible for task +distribution, +processing and running configured +hooks. Task distribution is possible due to the +task queue.
+SnapShooter, a module responsible +for snapshot creation and running configured data correlation and fusion hooks, +and Snapshot Hooks, which manage the registered hooks and their +dependencies on one another.
+ + + +Module managing creation of snapshots, enabling data correlation and saving snapshots to DB.
+Snapshots are created periodically (user configurable period)
+When a snapshot is created, several things need to happen:
+observations
or plain
datapoints, which will be saved to db
+ and forwarded in processingprofile
SnapShooter(db: EntityDatabase, task_queue_writer: TaskQueueWriter, task_executor: TaskExecutor, platform_config: PlatformConfig, scheduler: Scheduler) -> None
+
Class responsible for creating entity snapshots.
+ + +dp3/snapshots/snapshooter.py
55 + 56 + 57 + 58 + 59 + 60 + 61 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 71 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 |
|
Connect to RabbitMQ and start consuming from TaskQueue.
+ +dp3/snapshots/snapshooter.py
Stop consuming from TaskQueue, disconnect from RabbitMQ.
+ +dp3/snapshots/snapshooter.py
register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
Registers passed timeseries hook to be called during snapshot creation.
+Binds hook to specified entity_type
and attr_type
(though same hook can be bound
+multiple times).
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, str, list[dict]], list[DataPointTask]]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
attr_type |
+
+ str
+ |
+
+
+
+ specifies attribute type + |
+ + required + | +
Raises:
+Type | +Description | +
---|---|
+ ValueError
+ |
+
+
+
+ If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised. + |
+
dp3/snapshots/snapshooter.py
register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])
+
Registers passed hook to be called during snapshot creation.
+Binds hook to specified entity_type (though same hook can be bound multiple times).
+entity_type
and attribute specifications are validated, ValueError
is raised on failure.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, dict], None]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
depends_on |
+
+ list[list[str]]
+ |
+
+
+
+ each item should specify an attribute that is depended on +in the form of a path from the specified entity_type to individual attributes +(even on linked entities). + |
+ + required + | +
may_change |
+
+ list[list[str]]
+ |
+
+
+
+ each item should specify an attribute that |
+ + required + | +
Raises:
+Type | +Description | +
---|---|
+ ValueError
+ |
+
+
+
+ On failure of specification validation. + |
+
dp3/snapshots/snapshooter.py
Adds the given entity,eid pair to the cache of all linked entitites.
+ +dp3/snapshots/snapshooter.py
Creates snapshots for all entities currently active in database.
+ +dp3/snapshots/snapshooter.py
Get weakly connected components from entity graph.
+ +dp3/snapshots/snapshooter.py
Acknowledges the received message and makes a snapshot according to the task
.
This function should not be called directly, but set as callback for TaskQueueReader.
+ +dp3/snapshots/snapshooter.py
Make snapshots for all entities with routing key belonging to this worker.
+ +dp3/snapshots/snapshooter.py
Make a snapshot for given entity master_record
and time
.
Runs timeseries and correlation hooks. +The resulting snapshot is saved into DB.
+ +dp3/snapshots/snapshooter.py
Make a snapshot for entities and time specified by task
.
Runs timeseries and correlation hooks. +The resulting snapshots are saved into DB.
+ +dp3/snapshots/snapshooter.py
observations
or plain
datapoints, which will be saved to db
+ and forwarded in processingdp3/snapshots/snapshooter.py
staticmethod
+
+
+¶Update existing master record with datapoints from new tasks
+ +dp3/snapshots/snapshooter.py
Loads the subgraph of entities linked to the current entity, +returns a list of their types and ids.
+ +dp3/snapshots/snapshooter.py
Returns a set of tuples (entity_type, entity_id) identifying entities linked by
+current_values
.
dp3/snapshots/snapshooter.py
get_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[Any, float]
+
Get current value of an attribute from its history. Assumes multi_value = False
.
dp3/snapshots/snapshooter.py
get_multi_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[list, list[float]]
+
Get current value of a multi_value attribute from its history.
+ +dp3/snapshots/snapshooter.py
staticmethod
+
+
+¶extrapolate_confidence(datapoint: dict, time: datetime, history_params: ObservationsHistoryParams) -> float
+
Get the confidence value at given time.
+ +dp3/snapshots/snapshooter.py
Module managing registered hooks and their dependencies on one another.
+ + + +Container for timeseries analysis hooks
+ + +dp3/snapshots/snapshot_hooks.py
register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
Registers passed timeseries hook to be called during snapshot creation.
+Binds hook to specified entity_type and attr_type (though same hook can be bound +multiple times). +If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, str, list[dict]], list[DataPointTask]]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
attr_type |
+
+ str
+ |
+
+
+
+ specifies attribute type + |
+ + required + | +
dp3/snapshots/snapshot_hooks.py
Runs registered hooks.
+ +dp3/snapshots/snapshot_hooks.py
Container for data fusion and correlation hooks.
+ + +dp3/snapshots/snapshot_hooks.py
register(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]]) -> str
+
Registers passed hook to be called during snapshot creation.
+Binds hook to specified entity_type (though same hook can be bound multiple times).
+If entity_type and attribute specifications are validated +and ValueError is raised on failure.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
hook |
+
+ Callable[[str, dict], None]
+ |
+
+
+
+
|
+ + required + | +
entity_type |
+
+ str
+ |
+
+
+
+ specifies entity type + |
+ + required + | +
depends_on |
+
+ list[list[str]]
+ |
+
+
+
+ each item should specify an attribute that is depended on +in the form of a path from the specified entity_type to individual attributes +(even on linked entities). + |
+ + required + | +
may_change |
+
+ list[list[str]]
+ |
+
+
+
+ each item should specify an attribute that |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ str
+ |
+
+
+
+ Generated hook id. + |
+
dp3/snapshots/snapshot_hooks.py
Runs registered hooks.
+ +dp3/snapshots/snapshot_hooks.py
dataclass
+
+
+¶Vertex in a graph of dependencies
+ + + + + +Class representing a graph of dependencies between correlation hooks.
+ + +dp3/snapshots/snapshot_hooks.py
Add hook to dependency graph and recalculate if any cycles are created.
+ +dp3/snapshots/snapshot_hooks.py
Add oriented edge between specified vertices.
+ +dp3/snapshots/snapshot_hooks.py
Calculate number of incoming edges for each vertex. Time complexity O(V + E).
+ +dp3/snapshots/snapshot_hooks.py
Implementation of Kahn's algorithm for topological sorting. +Raises ValueError if there is a cycle in the graph.
+See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm
+ +dp3/snapshots/snapshot_hooks.py
Module responsible for task +distribution, +processing and running configured +hooks. Task distribution is possible due to the +task queue.
+ + + +TaskDistributor(task_executor: TaskExecutor, platform_config: PlatformConfig, registrar: CallbackRegistrar, daemon_stop_lock: threading.Lock) -> None
+
TaskDistributor uses task queues to distribute tasks between all running processes.
+Tasks are assigned to worker processes based on hash of entity key, so each +entity is always processed by the same worker. Therefore, all requests +modifying a particular entity are done sequentially and no locking is +necessary.
+Tasks that are assigned to the current process are passed to task_executor
for execution.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
platform_config |
+
+ PlatformConfig
+ |
+
+
+
+ Platform config + |
+ + required + | +
task_executor |
+
+ TaskExecutor
+ |
+
+
+
+ Instance of TaskExecutor + |
+ + required + | +
registrar |
+
+ CallbackRegistrar
+ |
+
+
+
+ Interface for callback registration + |
+ + required + | +
daemon_stop_lock |
+
+ threading.Lock
+ |
+
+
+
+ Lock used to control when the program stops. (see dp3.worker) + |
+ + required + | +
dp3/task_processing/task_distributor.py
Run the worker threads and start consuming from TaskQueue.
+ +dp3/task_processing/task_distributor.py
Stop the worker threads.
+ +dp3/task_processing/task_distributor.py
TaskExecutor manages updates of entity records,
+which are being read from task queue (via parent
+TaskDistributor
)
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
db |
+
+ EntityDatabase
+ |
+
+
+
+ Instance of EntityDatabase + |
+ + required + | +
platform_config |
+
+ PlatformConfig
+ |
+
+
+
+ Current platform configuration. + |
+ + required + | +
dp3/task_processing/task_executor.py
Registers one of available task hooks
+See: TaskGenericHooksContainer
+in task_hooks.py
dp3/task_processing/task_executor.py
Registers one of available task entity hooks
+See: TaskEntityHooksContainer
+in task_hooks.py
dp3/task_processing/task_executor.py
Registers one of available task attribute hooks
+See: TaskAttrHooksContainer
+in task_hooks.py
dp3/task_processing/task_executor.py
Main processing function - push datapoint values, running all registered hooks.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
task |
+
+ DataPointTask
+ |
+
+
+
+ Task object to process. + |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
+
+
+ True if a new record was created, False otherwise, + |
+
+ list[DataPointTask]
+ |
+
+
+
+ and a list of new tasks created by hooks + |
+
dp3/task_processing/task_executor.py
103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 |
|
Container for generic hooks
+Possible hooks:
+on_task_start
: receives Task, no return value requirementsdp3/task_processing/task_hooks.py
Container for entity hooks
+Possible hooks:
+allow_entity_creation
: receives eid and Task, may prevent entity record creation (by
+ returning False)on_entity_creation
: receives eid and Task, may return list of DataPointTasksdp3/task_processing/task_hooks.py
Container for attribute hooks
+Possible hooks:
+on_new_plain
, on_new_observation
, on_new_ts_chunk
:
+ receives eid and DataPointBase, may return a list of DataPointTasksdp3/task_processing/task_hooks.py
Functions to work with the main task queue (RabbitMQ)
+There are two queues for each worker process: +- "normal" queue for tasks added by other components, this has a limit of 100 + tasks. +- "priority" one for tasks added by workers themselves, this has no limit since + workers mustn't be stopped by waiting for the queue.
+These queues are presented as a single one by this wrapper. +The TaskQueueReader first looks into the "priority" queue and only if there +is no task waiting, it reads the normal one.
+Tasks are distributed to worker processes (and threads) by hash of the entity +which is to be modified. The destination queue is decided by the message source, +so each source must know how many worker processes are there.
+Exchange and queues must be declared externally!
+Related configuration keys and their defaults: +(should be part of global DP3 config files) +
rabbitmq:
+ host: localhost
+ port: 5672
+ virtual_host: /
+ username: guest
+ password: guest
+
+worker_processes: 1
+
Common TaskQueue wrapper, handles connection to RabbitMQ server with automatic reconnection. +TaskQueueWriter and TaskQueueReader are derived from this.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rabbit_config |
+
+ dict
+ |
+
+
+
+ RabbitMQ connection parameters, dict with following keys (all optional): +host, port, virtual_host, username, password + |
+
+ None
+ |
+
dp3/task_processing/task_queue.py
Create a connection (or reconnect after error).
+If connection can't be established, try it again indefinitely.
+ +dp3/task_processing/task_queue.py
TaskQueueWriter(app_name: str, workers: int = 1, rabbit_config: dict = None, exchange: str = None, priority_exchange: str = None, parent_logger: logging.Logger = None) -> None
+
+ Bases: RobustAMQPConnection
Writes tasks into main Task Queue
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
app_name |
+
+ str
+ |
+
+
+
+ DP3 application name (used as prefix for RMQ queues and exchanges) + |
+ + required + | +
workers |
+
+ int
+ |
+
+
+
+ Number of worker processes in the system + |
+
+ 1
+ |
+
rabbit_config |
+
+ dict
+ |
+
+
+
+ RabbitMQ connection parameters, dict with following keys (all optional): +host, port, virtual_host, username, password + |
+
+ None
+ |
+
exchange |
+
+ str
+ |
+
+
+
+ Name of the exchange to write tasks to
+(default: |
+
+ None
+ |
+
priority_exchange |
+
+ str
+ |
+
+
+
+ Name of the exchange to write priority tasks to
+(default: |
+
+ None
+ |
+
parent_logger |
+
+ logging.Logger
+ |
+
+
+
+ Logger to inherit prefix from. + |
+
+ None
+ |
+
dp3/task_processing/task_queue.py
Check that needed exchanges are declared, return True or raise RuntimeError.
+If needed exchanges are not declared, reconnect and try again. (max 5 times)
+ +dp3/task_processing/task_queue.py
Broadcast task to all workers
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
task |
+
+ Task
+ |
+
+
+
+ prepared task + |
+ + required + | +
priority |
+
+ bool
+ |
+
+
+
+ if true, the task is placed into priority queue +(should only be used internally by workers) + |
+
+ False
+ |
+
dp3/task_processing/task_queue.py
Put task (update_request) to the queue of corresponding worker
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
task |
+
+ Task
+ |
+
+
+
+ prepared task + |
+ + required + | +
priority |
+
+ bool
+ |
+
+
+
+ if true, the task is placed into priority queue +(should only be used internally by workers) + |
+
+ False
+ |
+
dp3/task_processing/task_queue.py
TaskQueueReader(callback: Callable, parse_task: Callable[[str], Task], app_name: str, worker_index: int = 0, rabbit_config: dict = None, queue: str = None, priority_queue: str = None, parent_logger: logging.Logger = None) -> None
+
+ Bases: RobustAMQPConnection
TaskQueueReader consumes messages from two RabbitMQ queues +(normal and priority one for given worker) +and passes them to the given callback function.
+Tasks from the priority queue are passed before the normal ones.
+Each received message must be acknowledged by calling .ack(msg_tag)
.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
callback |
+
+ Callable
+ |
+
+
+
+ Function called when a message is received, prototype: func(tag, Task) + |
+ + required + | +
parse_task |
+
+ Callable[[str], Task]
+ |
+
+
+
+ Function called to parse message body into a task, prototype: func(body) -> Task + |
+ + required + | +
app_name |
+
+ str
+ |
+
+
+
+ DP3 application name (used as prefix for RMQ queues and exchanges) + |
+ + required + | +
worker_index |
+
+ int
+ |
+
+
+
+ index of this worker +(filled into DEFAULT_QUEUE string using .format() method) + |
+
+ 0
+ |
+
rabbit_config |
+
+ dict
+ |
+
+
+
+ RabbitMQ connection parameters, dict with following keys +(all optional): host, port, virtual_host, username, password + |
+
+ None
+ |
+
queue |
+
+ str
+ |
+
+
+
+ Name of RabbitMQ queue to read from (default: |
+
+ None
+ |
+
priority_queue |
+
+ str
+ |
+
+
+
+ Name of RabbitMQ queue to read from (priority messages)
+(default: |
+
+ None
+ |
+
parent_logger |
+
+ logging.Logger
+ |
+
+
+
+ Logger to inherit prefix from. + |
+
+ None
+ |
+
dp3/task_processing/task_queue.py
Start receiving tasks.
+ +dp3/task_processing/task_queue.py
Stop receiving tasks.
+ +dp3/task_processing/task_queue.py
Check that needed queues are declared, return True or raise RuntimeError.
+If needed queues are not declared, reconnect and try again. (max 5 times)
+ +dp3/task_processing/task_queue.py
Hash function used to distribute tasks to worker processes.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
key |
+
+ str
+ |
+
+
+
+ to be hashed + |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+
+
+
+ last 4 bytes of MD5 + |
+
dp3/task_processing/task_queue.py
Code of the main worker process.
+Don't run directly. Import and run the main() function.
+ + + +load_modules(modules_dir: str, enabled_modules: dict, log: logging.Logger, registrar: CallbackRegistrar, platform_config: PlatformConfig) -> list
+
Load plug-in modules
+Import Python modules with names in 'enabled_modules' from 'modules_dir' directory +and return all found classes derived from BaseModule class.
+ +dp3/worker.py
Run worker process.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
app_name |
+
+ str
+ |
+
+
+
+ Name of the application to distinct it from other DP3-based apps. +For example, it's used as a prefix for RabbitMQ queue names. + |
+ + required + | +
config_dir |
+
+ str
+ |
+
+
+
+ Path to directory containing configuration files. + |
+ + required + | +
process_index |
+
+ int
+ |
+
+
+
+ Index of this worker process. For each application +there must be N processes running simultaneously, each started with a +unique index (from 0 to N-1). N is read from configuration +('worker_processes' in 'processing_core.yml'). + |
+ + required + | +
verbose |
+
+ bool
+ |
+
+
+
+ More verbose output (set log level to DEBUG). + |
+ + required + | +
dp3/worker.py
96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 |
|
DP\u00b3 is a platform helps to keep a database of information (attributes) about individual entities (designed for IP addresses and other network identifiers, but may be anything), when the data constantly changes in time.
DP\u00b3 doesn't do much by itself, it must be supplemented by application-specific modules providing and processing data.
This is a basis of CESNET's \"Asset Discovery Classification and Tagging\" (ADiCT) project, focused on discovery and classification of network devices, but the platform itself is general and should be usable for any kind of data.
For an introduction about how it works, see please check out the architecture, data-model and database config pages.
Then you should be able to create a DP\u00b3 app using the provided setup utility as described in the install page and start tinkering!
"},{"location":"#repository-structure","title":"Repository structure","text":"dp3
- Python package containing code of the processing core and the APIconfig
- default/example configurationinstall
- deployment configurationDP\u00b3's has HTTP API which you can use to post datapoints and to read data stored in DP\u00b3. As the API is made using FastAPI, there is also an interactive documentation available at /docs
endpoint.
There are several API endpoints:
GET /
: check if API is running (just returns It works!
message)POST /datapoints
: insert datapoints into DP\u00b3GET /entity/<entity_type>
: list current snapshots of all entities of given typeGET /entity/<entity_type>/<entity_id>
: get data of entity with given entity idGET /entity/<entity_type>/<entity_id>/get/<attr_id>
: get attribute valueGET /entity/<entity_type>/<entity_id>/set/<attr_id>
: set attribute valueGET /entities
: list entity configurationGET /control/<action>
: send a pre-defined action into execution queue.Health check.
"},{"location":"api/#request","title":"Request","text":"GET /
200 OK
:
{ \"detail\": \"It works!\" }
POST /datapoints
All data are written to DP\u00b3 in the form of datapoints. A datapoint sets a value of a given attribute of given entity.
It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).
Payload to this endpoint is JSON array of datapoints. For example:
[\n{ DATAPOINT1 },\n{ DATAPOINT2 }\n]\n
Key Description Data-type Required? Plain Observations Timeseries type
Entity type string mandatory \u2714 \u2714 \u2714 id
Entity identification string mandatory \u2714 \u2714 \u2714 attr
Attribute name string mandatory \u2714 \u2714 \u2714 v
The value to set, depends on attr. type and data-type, see below -- mandatory \u2714 \u2714 \u2714 t1
Start time of the observation interval string (RFC 3339 format) mandatory -- \u2714 \u2714 t2
End time of the observation interval string (RFC 3339 format) optional, default=t1
-- \u2714 \u2714 c
Confidence float (0.0-1.0) optional, default=1.0 -- \u2714 \u2714 src
Identification of the information source string optional, default=\"\" \u2714 \u2714 \u2714 More details depends on the particular type of the attribute.
"},{"location":"api/#examples-of-datapoints","title":"Examples of datapoints","text":""},{"location":"api/#plain","title":"Plain","text":"{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"note\",\n\"v\": \"My home router\",\n\"src\": \"web_gui\"\n}\n
"},{"location":"api/#observations","title":"Observations","text":"{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"open_ports\",\n\"v\": [22, 80, 443],\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:10:00\",\n\"src\": \"open_ports_module\"\n}\n
"},{"location":"api/#timeseries","title":"Timeseries","text":"regular
:
{\n...\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:20:00\", // assuming time_step = 5 min\n\"v\": {\n\"a\": [1, 3, 0, 2]\n}\n}\n
irregular
: timestamps must always be present
{\n...\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:05:00\",\n\"v\": {\n\"time\": [\"2022-08-01T12:00:00\", \"2022-08-01T12:01:10\", \"2022-08-01T12:01:15\", \"2022-08-01T12:03:30\"],\n\"x\": [0.5, 0.8, 1.2, 0.7],\n\"y\": [-1, 3, 0, 0]\n}\n}\n
irregular_interval
:
{\n...\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:05:00\",\n\"v\": {\n\"time_first\": [\"2022-08-01T12:00:00\", \"2022-08-01T12:01:10\", \"2022-08-01T12:01:15\", \"2022-08-01T12:03:30\"],\n\"time_last\": [\"2022-08-01T12:01:00\", \"2022-08-01T12:01:15\", \"2022-08-01T12:03:00\", \"2022-08-01T12:03:40\"],\n\"x\": [0.5, 0.8, 1.2, 0.7],\n\"y\": [-1, 3, 0, 0]\n}\n}\n
"},{"location":"api/#relations","title":"Relations","text":"Can be represented using both plain attributes and observations. The difference will be only in time specification. Two examples using observations:
no data - link<mac>
: just the eid is sent
{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"mac_addrs\",\n\"v\": \"AA:AA:AA:AA:AA\",\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:10:00\"\n}\n
with additional data - link<ip, int>
: The eid and the data are sent as a dictionary.
{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"ip_dep\",\n\"v\": {\"eid\": \"192.168.0.2\", \"data\": 22},\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:10:00\"\n}\n
"},{"location":"api/#response_1","title":"Response","text":"200 OK
:
Success\n
400 Bad request
:
Returns some validation error message, for example:
1 validation error for DataPointObservations_some_field\nv -> some_embedded_dict_field\n field required (type=value_error.missing)\n
"},{"location":"api/#list-entities","title":"List entities","text":"List latest snapshots of all ids present in database under entity.
Contains only latest snapshot.
Uses pagination.
"},{"location":"api/#request_2","title":"Request","text":"GET /entity/<entity_type>
Optional query parameters:
{\n\"time_created\": \"2023-07-04T12:10:38.827Z\",\n\"data\": [\n{}\n]\n}\n
"},{"location":"api/#get-eid-data","title":"Get Eid data","text":"Get data of entity's eid.
Contains all snapshots and master record. Snapshots are ordered by ascending creation time.
"},{"location":"api/#request_3","title":"Request","text":"GET /entity/<entity_type>/<entity_id>
Optional query parameters:
{\n\"empty\": true,\n\"master_record\": {},\n\"snapshots\": [\n{}\n]\n}\n
"},{"location":"api/#get-attr-value","title":"Get attr value","text":"Get attribute value
Value is either of:
GET /entity/<entity_type>/<entity_id>/get/<attr_id>
Optional query parameters:
{\n\"attr_type\": 1,\n\"current_value\": \"string\",\n\"history\": []\n}\n
"},{"location":"api/#set-attr-value","title":"Set attr value","text":"Set current value of attribute
Internally just creates datapoint for specified attribute and value.
This endpoint is meant for editable
plain attributes -- for direct user edit on DP3 web UI.
POST /entity/<entity_type>/<entity_id>/set/<attr_id>
Required request body:
{\n\"value\": \"string\"\n}\n
"},{"location":"api/#response_5","title":"Response","text":"{\n\"detail\": \"OK\"\n}\n
"},{"location":"api/#entities","title":"Entities","text":"List entities
Returns dictionary containing all entities configured -- their simplified configuration and current state information.
"},{"location":"api/#request_6","title":"Request","text":"GET /entities
{\n\"<entity_id>\": {\n\"id\": \"<entity_id>\",\n\"name\": \"<entity_spec.name>\",\n\"attribs\": \"<MODEL_SPEC.attribs(e_id)>\",\n\"eid_estimate_count\": \"<DB.estimate_count_eids(e_id)>\"\n},\n...\n}\n
"},{"location":"api/#control","title":"Control","text":"Execute Action - Sends the given action into execution queue.
You can see the enabled actions in /config/control.yml
, available are:
make_snapshots
- Makes an out-of-order snapshot of all entitiesGET /control/<action>
{\n\"detail\": \"OK\"\n}\n
"},{"location":"architecture/","title":"Architecture","text":"DP\u00b3 is generic platform for data processing. It's currently used in systems for management of network devices in CESNET, but during development we focused on making DP\u00b3 as universal as possible.
This page describes the high-level architecture of DP\u00b3 and the individual components.
"},{"location":"architecture/#data-points","title":"Data-points","text":"The base unit of data that DP\u00b3 uses is called a data-point, which looks like this:
{\n\"type\": \"ip\", // (1)!\n\"id\": \"192.168.0.1\", // (2)!\n\"attr\": \"open_ports\", // (3)!\n\"v\": [22, 80, 443], // (4)!\n\"t1\": \"2022-08-01T12:00:00\", // (5)!\n\"t2\": \"2022-08-01T12:10:00\",\n\"src\": \"open_ports_module\" // (6)!\n}\n
type
.id
. attr
field specifies the attribute of the data-point.v
field.t1
and t2
field. src
field.This example shows an example of an observations data-point (given it has a validity interval), to learn more about the different types of data-points, please see the API documentation.
"},{"location":"architecture/#platform-architecture","title":"Platform Architecture","text":"DP\u00b3 architectureThe DP\u00b3 architecture as shown in the figure above consists of several components, where the DP\u00b3 provided components are shown in blue:
The application-specific components, shown in yellow-orange, are as following:
yml
files determines the entities and their attributes, together with the specifics of platform behavior on these entities. For details of entity configuration, please see the database entities configuration page.The distinction between primary and secondary modules is such that primary modules send data-points into the system using the HTTP API, while secondary modules react to the data present in the system, e.g.: altering the data-flow in an application-specific manner, deriving additional data based on incoming data-points or performing data correlation on entity snapshots. For primary module implementation, the API documentation may be useful, also feel free to check out the dummy_sender script in /scripts/dummy_sender.py
. A comprehensive secondary module API documentation is under construction, for the time being, refer to the CallbackRegistrar code reference or check out the test modules in /modules/
or /tests/modules/
.
The final remaining component is the web interface, which is ultimately application-specific. A generic web interface, or a set of generic components is a planned part of DP\u00b3, but is yet to be implemented. The API provides a variety of endpoints which should enable you to create any view of the data you may require.
This section describes the data flow within the platform.
DP\u00b3 Data flow
The above figure shows a zoomed in view of the worker-process from the architecture figure. Incoming Tasks, which carry data-points from the API, are passed to secondary module callbacks configured on new data point, or around entity creation. These modules may create additional data points or perform any other action. When all registered callbacks are processed, the resulting data is written to two collections: The data-point (DP) history collection, where the raw data-points are stored until archivation, and the profile history collection, where a document is stored for each entity id with the relevant history. You can find these collections in the database under the names {entity}#raw
and {entity}#master
.
DP\u00b3 periodically creates new profile snapshots, triggered by the Scheduler. Snapshots take the profile history, and compute the current value of the profile, reducing each attribute history to a single value. The snapshot creation frequency is configurable. Snapshots are created on a per-entity basis, but all linked entities are processed at the same time. This means that when snapshots are created, the registered snapshot callbacks can access any linked entities for their data correlation needs. After all the correlation callbacks are called, the snapshot is written to the profile snapshot collection, for which it can be accessed via the API. The collection is accessible under the name {entity}#snapshots
.
Basic elements of the DP\u00b3 data model are entities (or objects), each entity record (object instance) has a set of attributes. Each attribute has some value (associated to a particular entity), timestamp (history of previous values can be stored) and optionally confidence value.
Entities may be mutually connected. See Relationships below.
"},{"location":"data_model/#exemplary-system","title":"Exemplary system","text":"In this chapter, we will illustrate details on an exemplary system. Imagine you are developing data model for bus tracking system. You have to store these data:
Also, map displaying current position of all buses is required.
(In case you are interested, configuration of database entities for this system is available in DB entities chapter.)
To make everything clear and more readable, all example references below are typesetted as quotes.
"},{"location":"data_model/#types-of-attributes","title":"Types of attributes","text":"There are 3 types of attributes:
"},{"location":"data_model/#plain","title":"Plain","text":"Common attributes with only one value of some data type. There's no history stored, but timestamp of last change is available.
Very useful for:
data from external source, when you only need to have current value
notes and other manually entered information
This is exactly what we need for label in our bus tracking system. Administor labels particular bus inside web interface and we use this label until it's changed - particularly display label next to a marker on a map. No history is needed and it has 100% confidence.
"},{"location":"data_model/#observations","title":"Observations","text":"Attributes with history of values at some time or interval of time. Consequently, we can derive value at any time (most often not now) from these values.
Each value may have associated confidence.
These attributes may be single or multi value (multiple current values in one point in time).
Very useful for data where both current value and history is needed.
In our example, location is great use-case for observations type. We need to track position of the bus in time and store the history. Current location is very important. Let's suppose, we also need to do oversampling by predicting where is the bus now, eventhout we received last data-point 2 minutes ago. This is all possible (predictions using custom secondary modules).
The same applies to speed. It can also be derived from location.
"},{"location":"data_model/#timeseries","title":"Timeseries","text":"One or more numeric values for a particular time.
In this attribute type: history > current value. In fact, no explicit current value is provided.
Very useful for:
any kind of history-based analysis
logging of events/changes
May be:
regular: sampling is regular Example: datapoint is created every x minutes
irregular: sampling is irregular Example: datapoint is created when some event occurs
irregular intervals: sampling is irregular and includes two timestamps (from when till when were provided data gathered) Example: Some event triggers 5 minute monitoring routine. When this routine finishes, it creates datapoint containing all the data from past 5 minutes.
Timeseries are very useful for passengers getting in and out (from our example). As we need to count two directions (in/out) for three doors (front/middle/back), we create 6 series (e.g. front_in
, front_out
, ..., back_out
). Counter data-points are received in 10 minute interval, so regular timeseries are best fit for this use-case. Every 10 minutes we receive values for all 6 series and store them. Current value is not important as these data are only useful for passenger flow analysis throught whole month/year/...
Relationships between entities can be represented with or without history. They are realized using the link attribute type. Depedning on whether the history is important, they can be configured using as the mentioned plain data or observations.
Relationships can contain additional data, if that fits the modelling needs of your use case.
Very useful for:
As our example so far contains only one entity, we currently have no need for relationships. However, if we wanted to track the different bus drivers driving individual buses, relationships would come in quite handy. The bus driver is a separate entity, and can drive multiple buses during the day. The current bus driver will be represented as an observation link between the bus and the driver, as can be seen in the resulting configuration.
"},{"location":"data_model/#continue-to","title":"Continue to ...","text":"Now that you have an understanding of the data model and the types of attributes, you might want to check out the details of DB configuration, where you will find the parameters for each attribute type and the data types supported by the platform.
"},{"location":"extending/","title":"Extending Documentation","text":"This page provides the basic info on where to start with writing documentation. If you feel lost at any point, please check out the documentation of MkDocs and Material for MkDocs, with which this documentation is built.
"},{"location":"extending/#project-layout","title":"Project layout","text":"mkdocs.yml # The configuration file.\ndocs/\n index.md # The documentation homepage.\n gen_ref_pages.py # Script for generating the code reference.\n ... # Other markdown pages, images and other files.\n
The docs/
folder contains all source Markdown files for the documentation.
You can find all documentation settings in mkdocs.yml
. See the nav
section for mapping of the left navigation tab and the Markdown files.
To see the changes made to the documentation page locally, a local instance of mkdocs
is required. You can install all the required packages using:
pip install -r requirements.doc.txt\n
After installing, you can use the following mkdocs
commands:
mkdocs serve
- Start the live-reloading docs server.mkdocs build
- Build the documentation site.mkdocs -h
- Print help message and exit.As the entire documentation is written in Markdown, all base Markdown syntax is supported. This means headings, bold text, italics, inline code
, tables and many more.
This set of options can be further extended, if you ever find the need. See the possibilities in the Material theme reference.
Some of the enabled extensionsmarkdown_extensions
section in mkdocs.yml
for all enabled extensions.To reference an anchor within a page, such as a heading, use a Markdown link to the specific anchor, for example: Commands. If you're not sure which identifier to use, you can look at a heading's anchor by clicking the heading in your Web browser, either in the text itself, or in the table of contents. If the URL is https://example.com/some/page/#anchor-name
then you know that this item is possible to link to with [<displayed text>](#anchor-name)
. (Tip taken from mkdocstrings)
To make a reference to another page within the documentation, use the path to the Markdown source file, followed by the desired anchor. For example, this link was created as [link](index.md#repository-structure)
.
When making references to the generated Code Reference, there are two options. Links can be made either using the standard Markdown syntax, where some reverse-engineering of the generated files is required, or, with the support of mkdocstrings, using the [example][full.path.to.object]
syntax. A real link like this can be for example this one to the Platform Model Specification.
Code reference is generated using mkdocstrings and the Automatic code reference pages recipe from their documentation. The generation of pages is done using the docs/gen_ref_pages.py
script. The script is a slight modification of what is recommended within the mentioned recipe.
Mkdocstrings itself enables generating code documentation from its docstrings using a path.to.object
syntax. Here is an example of documentation for dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register
method:
There are additional options that can be specified, which affect the way the documentation is presented. For more on these options, see here.
Even if you create a duplicate code reference description, the mkdocstring-style link still leads to the code reference, as you can see here.
"},{"location":"extending/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register","title":"register","text":"register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n
Registers passed timeseries hook to be called during snapshot creation.
Binds hook to specified entity_type and attr_type (though same hook can be bound multiple times). If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.
Parameters:
Name Type Description Defaulthook
Callable[[str, str, list[dict]], list[DataPointTask]]
hook
callable should expect entity_type, attr_type and attribute history as arguments and return a list of Task
objects.
entity_type
str
specifies entity type
requiredattr_type
str
specifies attribute type
required"},{"location":"extending/#deployment","title":"Deployment","text":"The documentation is updated and deployed automatically with each push to selected branches thanks to the configured GitHub Action, which can be found in: .github/workflows/deploy.yml
.
When talking about installing the DP\u00b3 platform, a distinction must be made between installing for platform development, installing for application development (i.e. platform usage) and installing for application and platform deployment. We will cover all three cases separately.
"},{"location":"install/#installing-for-application-development","title":"Installing for application development","text":"Pre-requisites: Python 3.9 or higher, pip
(with virtualenv
installed), git
, Docker
and Docker Compose
.
Create a virtualenv and install the DP\u00b3 platform using:
python3 -m venv venv # (1)!\nsource venv/bin/activate # (2)!\npython -m pip install --upgrade pip # (3)!\npip install git+https://github.com/CESNET/dp3.git@new_dp3#egg=dp3\n
python3
does not work, try py -3
or python
instead.venv/Scripts/activate.bat
pip>=21.0.1
for the pyproject.toml
support. If your pip is up-to-date, you can skip this step.To create a new DP\u00b3 application we will use the included dp3-setup
utility. Run:
dp3-setup <application_directory> <your_application_name>
So for example, to create an application called my_app
in the current directory, run:
dp3-setup . my_app\n
This produces the following directory structure:
\ud83d\udcc2 .\n \u251c\u2500\u2500 \ud83d\udcc1 config # (1)! \n\u2502 \u251c\u2500\u2500 \ud83d\udcc4 api.yml\n \u2502 \u251c\u2500\u2500 \ud83d\udcc4 control.yml\n \u2502 \u251c\u2500\u2500 \ud83d\udcc4 database.yml\n \u2502 \u251c\u2500\u2500 \ud83d\udcc1 db_entities # (2)!\n\u2502 \u251c\u2500\u2500 \ud83d\udcc4 event_logging.yml\n \u2502 \u251c\u2500\u2500 \ud83d\udcc4 history_manager.yml\n \u2502 \u251c\u2500\u2500 \ud83d\udcc1 modules # (3)!\n\u2502 \u251c\u2500\u2500 \ud83d\udcc4 processing_core.yml\n \u2502 \u251c\u2500\u2500 \ud83d\udcc4 README.md\n \u2502 \u2514\u2500\u2500 \ud83d\udcc4 snapshots.yml\n \u251c\u2500\u2500 \ud83d\udcc1 docker # (4)!\n\u2502 \u251c\u2500\u2500 \ud83d\udcc1 python\n \u2502 \u2514\u2500\u2500 \ud83d\udcc1 rabbitmq\n \u251c\u2500\u2500 \ud83d\udcc4 docker-compose.app.yml\n \u251c\u2500\u2500 \ud83d\udcc4 docker-compose.yml\n \u251c\u2500\u2500 \ud83d\udcc1 modules # (5)!\n\u2502 \u2514\u2500\u2500 \ud83d\udcc4 test_module.py\n \u251c\u2500\u2500 \ud83d\udcc4 README.md # (6)!\n\u2514\u2500\u2500 \ud83d\udcc4 requirements.txt\n
config
directory contains the configuration files for the DP\u00b3 platform. For more details, please check out the configuration documentation.config/db_entities
directory contains the database entities of the application. This defines the data model of your application. For more details, you may want to check out the data model and the DB entities documentation.config/modules
directory is where you can place the configuration specific to your modules.docker
directory contains the Dockerfiles for the RabbitMQ and python images, tailored to your application. modules
directory contains the modules of your application. To get started, a single module called test_module
is included. For more details, please check out the Modules page.README.md
file contains some instructions to get started. Edit it to your liking.To run the application, we first need to setup the other services the platform depends on, such as the MongoDB database, the RabbitMQ message distribution and the Redis database. This can be done using the supplied docker-compose.yml
file. Simply run:
docker compose up -d --build # (1)!\n
-d
flag runs the services in the background, so you can continue working in the same terminal. The --build
flag forces Docker to rebuild the images, so you can be sure you are running the latest version. If you want to run the services in the foreground, omit the -d
flag.The state of running containers can be checked using:
docker compose ps\n
which will display the state of running processes. The logs of the services can be displayed using:
docker compose logs\n
which will display the logs of all services, or:
docker compose logs <service name>\n
which will display only the logs of the given service. (In this case, the services are rabbitmq, mongo, mongo_express, and redis)
We can now focus on running the platform and developing or testing. After you are done, simply run:
docker compose down\n
which will stop and remove all containers, networks and volumes created by docker compose up
.
There are two main ways to run the application itself. First is a little more hand-on, and allows easier debugging. There are two main kinds of processes in the application: the API and the worker processes.
To run the API, simply run:
APP_NAME=my_app CONF_DIR=config api\n
The starting configuration sets only a single worker process, which you can run using:
worker my_app config 0
The second way is to use the docker-compose.app.yml
file, which runs the API and the worker processes in separate containers. To run the API, simply run:
docker compose -f docker-compose.app.yml up -d --build\n
Either way, to test that everything is running properly, you can run:
curl -X 'GET' 'http://localhost:5000/' \\\n-H 'Accept: application/json'
Which should return a JSON response with the following content:
{\n\"detail\": \"It works!\"\n}\n
You are now ready to start developing your application!
"},{"location":"install/#installing-for-platform-development","title":"Installing for platform development","text":"Pre-requisites: Python 3.9 or higher, pip
(with virtualenv
installed), git
, Docker
and Docker Compose
.
Pull the repository and install using:
git clone --branch new_dp3 git@github.com:CESNET/dp3.git dp3 cd dp3\npython3 -m venv venv # (1)!\nsource venv/bin/activate # (2)!\npython -m pip install --upgrade pip # (3)!\npip install --editable \".[dev]\" # (4)!\npre-commit install # (5)!\n
python3
does not work, try py -3
or python
instead.venv/Scripts/activate.bat
pip>=21.0.1
for the pyproject.toml
support. If your pip is up-to-date, you can skip this step.pre-commit
and mkdocs
.pre-commit
hooks to automatically format and lint the code before committing.With the dependencies, the pre-commit package is installed. You can verify the installation using pre-commit --version
. Pre-commit is used to automatically unify code formatting and perform code linting. The hooks configured in .pre-commit-config.yaml
should now run automatically on every commit.
In case you want to make sure, you can run pre-commit run --all-files
to see it in action.
The DP\u00b3 platform is now installed and ready for development. To run it, we first need to set up the other services the platform depends on, such as the MongoDB database, the RabbitMQ message distribution and the Redis database. This can be done using the supplied docker-compose.yml
file. Simply run:
docker compose up -d --build # (1)!\n
-d
flag runs the services in the background, so you can continue working in the same terminal. The --build
flag forces Docker to rebuild the images, so you can be sure you are running the latest version. If you want to run the services in the foreground, omit the -d
flag.Docker Compose can be installed as a standalone (older v1) or as a plugin (v2), the only difference is when executing the command:
Note that Compose standalone uses the dash compose syntax instead of current\u2019s standard syntax (space compose). For example: type docker-compose up
when using Compose standalone, instead of docker compose up
.
This documentation uses the v2 syntax, so if you have the standalone version installed, adjust accordingly.
After the first compose up
command, the images for RabbitMQ, MongoDB and Redis will be downloaded, their images will be built according to the configuration and all three services will be started. On subsequent runs, Docker will use the cache, so if the configuration does not change, the download and build steps will not be repeated.
The configuration is taken implicitly from the docker-compose.yml
file in the current directory. The docker-compose.yml
configuration contains the configuration for the services, as well as a testing setup of the DP\u00b3 platform itself. The full configuration is in tests/test_config
. The setup includes one worker process and one API process to handle requests. The API process is exposed on port 5000, so you can send requests to it using curl
or from your browser:
curl -X 'GET' 'http://localhost:5000/' \\\n-H 'Accept: application/json'
curl -X 'POST' 'http://localhost:5000/datapoints' \\\n-H 'Content-Type: application/json' \\\n--data '[{\"type\": \"test_entity_type\", \"id\": \"abc\", \"attr\": \"test_attr_int\", \"v\": 123, \"t1\": \"2023-07-01T12:00:00\", \"t2\": \"2023-07-01T13:00:00\"}]'\n
Docker Compose basics The state of running containers can be checked using:
docker compose ps\n
which will display the state of running processes. The logs of the services can be displayed using:
docker compose logs\n
which will display the logs of all services, or:
docker compose logs <service name>\n
which will display only the logs of the given service. (In this case, the services are rabbitmq, mongo, redis, receiver_api and worker)
We can now focus on running the platform and developing or testing. After you are done, simply run:
docker compose down\n
which will stop and remove all containers, networks and volumes created by docker compose up
.
With the testing platform setup running, we can now run tests. Tests are run using the unittest
framework and can be run using:
python -m unittest discover \\\n-s tests/test_common \\\n-v\nCONF_DIR=tests/test_config \\\npython -m unittest discover \\\n-s tests/test_api \\\n-v\n
"},{"location":"install/#documentation","title":"Documentation","text":"For extending of this documentation, please refer to the Extending page.
"},{"location":"modules/","title":"Modules","text":"DP\u00b3 enables its users to create custom modules to perform application specific data analysis. Modules are loaded using a plugin-like architecture and can influence the data flow from the very first moment upon handling the data-point push request.
As described in the Architecture page, DP\u00b3 uses a categorization of modules into primary and secondary modules. The distinction between primary and secondary modules is such that primary modules send data-points into the system using the HTTP API, while secondary modules react to the data present in the system, e.g.: altering the data-flow in an application-specific manner, deriving additional data based on incoming data-points or performing data correlation on entity snapshots.
This page covers the DP\u00b3 API for secondary modules, for primary module implementation, the API documentation may be useful, also feel free to check out the dummy_sender script in /scripts/dummy_sender.py
.
First, make a directory that will contain all modules of the application. For example, let's assume that the directory will be called /modules/
.
As mentioned in the Processing core configuration page, the modules directory must be specified in the modules_dir
configuration option. Let's create the main module file now - assuming the module will be called my_awesome_module
, create a file /modules/my_awesome_module.py
.
Finally, to make the processing core load the module, add the module name to the enabled_modules
configuration option, e.g.:
modules_dir: \"/modules/\"\nenabled_modules:\n- \"my_awesome_module\"\n
Here is a basic skeleton for the module file:
import logging\nfrom dp3.common.base_module import BaseModule\nfrom dp3.common.config import PlatformConfig\nfrom dp3.common.callback_registrar import CallbackRegistrar\nclass MyAwesomeModule(BaseModule):\ndef __init__(self,\n_platform_config: PlatformConfig, \n_module_config: dict, \n_registrar: CallbackRegistrar\n):\nself.log = logging.getLogger(\"MyAwesomeModule\")\n
All modules must subclass the BaseModule
class. If a class does not subclass the BaseModule
class, it will not be loaded and activated by the main DP\u00b3 worker. The declaration of BaseModule
is as follows:
class BaseModule(ABC):\n@abstractmethod\ndef __init__(\nself, \nplatform_config: PlatformConfig, \nmodule_config: dict, \nregistrar: CallbackRegistrar\n):\npass\n
At initialization, each module receives a PlatformConfig
, a module_config
dictionary and a CallbackRegistrar
. For the module to do anything, it must read the provided configuration from platform_config
and module_config
and register callbacks to perform data analysis using the registrar
object. Let's go through them one at a time.
PlatformConfig
contains the entire DP\u00b3 platform configuration, which includes the application name, worker counts, which worker processes is the module running in and a ModelSpec
which contains the entity specification.
If you want to create configuration specific to the module itself, create a .yml
configuration file named as the module itself inside the modules/
folder, as described in the modules configuration page. This configuration will be then loaded into the module_config
dictionary for convenience.
The registrar:
CallbackRegistrar
object provides the API to register callbacks to be called during the data processing.
For callbacks that need to be called periodically, the scheduler_register
is used. The specific times the callback will be called are defined using the CRON schedule expressions. Here is a simplified example from the HistoryManager module:
registrar.scheduler_register(\nself.delete_old_dps, minute=\"*/10\" # (1)!\n)\nregistrar.scheduler_register(\nself.archive_old_dps, minute=0, hour=2 # (2)!\n) \n
By default, the callback will receive no arguments, but you can pass static arguments for every call using the func_args
and func_kwargs
keyword arguments. The function return value will always be ignored.
The complete documentation can be found at the scheduler_register
page. As DP\u00b3 utilizes the APScheduler package internally to realize this functionality, specifically the CronTrigger
, feel free to check their documentation for more details.
There are a number of possible places to register callback functions during data-point processing.
"},{"location":"modules/#task-on_task_start-hook","title":"Taskon_task_start
hook","text":"A hook will be called on task processing start. The callback is registered using the register_task_hook
method. Required signature is Callable[[DataPointTask], Any]
, as the return value is ignored. It may be useful for implementing custom statistics.
def task_hook(task: DataPointTask):\nprint(task.etype)\nregistrar.register_task_hook(\"on_task_start\", task_hook)\n
"},{"location":"modules/#entity-allow_entity_creation-hook","title":"Entity allow_entity_creation
hook","text":"Receives eid and Task, may prevent entity record creation (by returning False). The callback is registered using the register_entity_hook
method. Required signature is Callable[[str, DataPointTask], bool]
.
def entity_creation(eid: str, task: DataPointTask) -> bool:\nreturn eid.startswith(\"1\")\nregistrar.register_entity_hook(\n\"allow_entity_creation\", entity_creation, \"test_entity_type\"\n)\n
"},{"location":"modules/#entity-on_entity_creation-hook","title":"Entity on_entity_creation
hook","text":"Receives eid and Task, may return new DataPointTasks.
The callback is registered using the register_entity_hook
method. Required signature is Callable[[str, DataPointTask], list[DataPointTask]]
.
def processing_function(eid: str, task: DataPointTask) -> list[DataPointTask]:\noutput = does_work(task)\nreturn [DataPointTask(\nmodel_spec=task.model_spec,\netype=\"mac\",\neid=eid,\ndata_points=[{\n\"etype\": \"test_enitity_type\",\n\"eid\": eid,\n\"attr\": \"derived_on_creation\",\n\"src\": \"secondary/derived_on_creation\",\n\"v\": output\n}]\n)]\nregistrar.register_entity_hook(\n\"on_entity_creation\", processing_function, \"test_entity_type\"\n)\n
"},{"location":"modules/#attribute-hooks","title":"Attribute hooks","text":"There are register points for all attribute types: on_new_plain
, on_new_observation
, on_new_ts_chunk
.
Callbacks are registered using the register_attr_hook
method. The callback allways receives eid, attribute and Task, and may return new DataPointTasks. The required signature is Callable[[str, DataPointBase], list[DataPointTask]]
.
def attr_hook(eid: str, dp: DataPointBase) -> list[DataPointTask]:\n...\nreturn []\nregistrar.register_attr_hook(\n\"on_new_observation\", attr_hook, \"test_entity_type\", \"test_attr_type\",\n)\n
"},{"location":"modules/#timeseries-hook","title":"Timeseries hook","text":"Timeseries hooks are run before snapshot creation, and allow to process the accumulated timeseries data into observations / plain attributes to be accessed in snapshots.
Callbacks are registered using the register_timeseries_hook
method. The expected callback signature is Callable[[str, str, list[dict]], list[DataPointTask]]
, as the callback should expect entity_type, attr_type and attribute history as arguments and return a list of DataPointTask objects.
def timeseries_hook(\nentity_type: str, attr_type: str, attr_history: list[dict]\n) -> list[DataPointTask]:\n...\nreturn []\nregistrar.register_timeseries_hook(\ntimeseries_hook, \"test_entity_type\", \"test_attr_type\",\n)\n
"},{"location":"modules/#correlation-callbacks","title":"Correlation callbacks","text":"Correlation callbacks are called during snapshot creation, and allow to perform analysis on the data of the snapshot.
The register_correlation_hook
method expects a callable with the following signature: Callable[[str, dict], None]
, where the first argument is the entity type, and the second is a dict containing the current values of the entity and its linked entities.
As correlation hooks can depend on each other, the hook inputs and outputs must be specified using the depends_on and may_change arguments. Both arguments are lists of lists of strings, where each list of strings is a path from the specified entity type to individual attributes (even on linked entities). For example, if the entity type is test_entity_type
, and the hook depends on the attribute test_attr_type1
, the path is simply [[\"test_attr_type1\"]]
. If the hook depends on the attribute test_attr_type1
of an entity linked using test_attr_link
, the path will be [[\"test_attr_link\", \"test_attr_type1\"]]
.
def correlation_hook(entity_type: str, values: dict):\n...\nregistrar.register_correlation_hook(\ncorrelation_hook, \"test_entity_type\", [[\"test_attr_type1\"]], [[\"test_attr_type2\"]]\n)\n
The order of running callbacks is determined automatically, based on the dependencies. If there is a cycle in the dependencies, a ValueError
will be raised at registration. Also, if the provided dependency / output paths are invalid, a ValueError
will be raised.
The module is free to run its own code in separate threads or processes. To synchronize such code with the platform, use the start()
and stop()
methods of the BaseModule
class. the start()
method is called after the platform is initialized, and the stop()
method is called before the platform is shut down.
class MyModule(BaseModule):\ndef __init__(self, *args, **kwargs):\nsuper().__init__(*args, **kwargs)\nself._thread = None\nself._stop_event = threading.Event()\nself.log = logging.getLogger(\"MyModule\")\ndef start(self):\nself._thread = threading.Thread(target=self._run, daemon=True)\nself._thread.start()\ndef stop(self):\nself._stop_event.set()\nself._thread.join()\ndef _run(self):\nwhile not self._stop_event.is_set():\nself.log.info(\"Hello world!\")\ntime.sleep(1)\n
"},{"location":"configuration/","title":"Configuration","text":"DP\u00b3 configuration folder consists of these files and folders:
db_entities/\nmodules/\ncommon.yml\ndatabase.yml\nevent_logging.yml\nhistory_manager.yml\nprocessing_core.yml\nsnapshots.yml\n
Their meaning and usage is explained in following chapters.
"},{"location":"configuration/#example-configuration","title":"Example configuration","text":"Example configuration is included config/
folder in DP\u00b3 repository.
File database.yml
specifies mainly MongoDB database connection details and credentials.
It looks like this:
connection:\nusername: \"dp3_user\"\npassword: \"dp3_password\"\naddress: \"127.0.0.1\"\nport: 27017\ndb_name: \"dp3_database\"\n
"},{"location":"configuration/database/#connection","title":"Connection","text":"Connection details contain:
Parameter Data-type Default value Descriptionusername
string dp3
Username for connection to DB. Escaped using urllib.parse.quote_plus
. password
string dp3
Password for connection to DB. Escaped using urllib.parse.quote_plus
. address
string localhost
IP address or hostname for connection to DB. port
int 27017 Listening port of DB. db_name
string dp3
Database name to be utilized by DP\u00b3."},{"location":"configuration/db_entities/","title":"DB entities","text":"Files in db_entities
folder describe entities and their attributes. You can think of entity as class from object-oriented programming.
Below is YAML file (e.g. db_entities/bus.yml
) corresponding to bus tracking system example from Data model chapter.
entity:\nid: bus\nname: Bus\nattribs:\n# Attribute `label`\nlabel:\nname: Label\ndescription: Custom label for the bus.\ntype: plain\ndata_type: string\neditable: true\n# Attribute `location`\nlocation:\nname: Location\ndescription: Location of the bus in a particular time. Value are GPS \\\ncoordinates (array of latitude and longitude).\ntype: observations\ndata_type: array<float>\nhistory_params:\npre_validity: 1m\npost_validity: 1m\nmax_age: 30d\n# Attribute `speed`\nspeed:\nname: Speed\ndescription: Speed of the bus in a particular time. In km/h.\ntype: observations\ndata_type: float\nhistory_params:\npre_validity: 1m\npost_validity: 1m\nmax_age: 30d\n# Attribute `passengers_in_out`\npassengers_in_out:\nname: Passengers in/out\ndescription: Number of passengers getting in or out of the bus. Distinguished by the doors used (front, middle, back). Regularly sampled every 10 minutes.\ntype: timeseries\ntimeseries_type: regular\ntimeseries_params:\nmax_age: 14d\ntime_step: 10m\nseries:\nfront_in:\ndata_type: int\nfront_out:\ndata_type: int\nmiddle_in:\ndata_type: int\nmiddle_out:\ndata_type: int\nback_in:\ndata_type: int\nback_out:\ndata_type: int\n# Attribute `driver` to link the driver of the bus at a given time.\ndriver:\nname: Driver\ndescription: Driver of the bus at a given time.\ntype: observations\ndata_type: link<driver>\nhistory_params:\npre_validity: 1m\npost_validity: 1m\nmax_age: 30d\n
"},{"location":"configuration/db_entities/#entity","title":"Entity","text":"Entity is described simply by:
Parameter Data-type Default value Descriptionid
string (identifier) (mandatory) Short string identifying the entity type, it's machine name (must match regex [a-zA-Z_][a-zA-Z0-9_-]*
). Lower-case only is recommended. name
string (mandatory) Attribute name for humans. May contain any symbols."},{"location":"configuration/db_entities/#attributes","title":"Attributes","text":"Each attribute is specified by the following set of parameters:
"},{"location":"configuration/db_entities/#base","title":"Base","text":"These apply to all types of attributes (plain, observations and timeseries).
Parameter Data-type Default value Descriptionid
string (identifier) (mandatory) Short string identifying the attribute, it's machine name (must match this regex [a-zA-Z_][a-zA-Z0-9_-]*
). Lower-case only is recommended. type
string (mandatory) Type of attribute. Can be either plain
, observations
or timeseries
. name
string (mandatory) Attribute name for humans. May contain any symbols. description
string \"\"
Longer description of the attribute, if needed. color
#xxxxxx
null
Color to use in GUI (useful mostly for tag values), not used currently."},{"location":"configuration/db_entities/#plain-specific-parameters","title":"Plain-specific parameters","text":"Parameter Data-type Default value Description data_type
string (mandatory) Data type of attribute value, see Supported data types. categories
array of strings null
List of categories if data_type=category
and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example). editable
bool false
Whether value of this attribute is editable via web interface."},{"location":"configuration/db_entities/#observations-specific-parameters","title":"Observations-specific parameters","text":"Parameter Data-type Default value Description data_type
string (mandatory) Data type of attribute value, see Supported data types. categories
array of strings null
List of categories if data_type=category
and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example). editable
bool false
Whether value of this attribute is editable via web interface. confidence
bool false
Whether a confidence value should be stored along with data value or not. multi_value
bool false
Whether multiple values can be set at the same time. history_params
object, see below (mandatory) History and time aggregation parameters. A subobject with fields described in the table below. history_force_graph
bool false
By default, if data type of attribute is array, we show it's history on web interface as table. This option can force tag-like graph with comma-joined values of that array as tags."},{"location":"configuration/db_entities/#history-params","title":"History params","text":"Description of history_params
subobject (see table above).
max_age
<int><s/m/h/d>
(e.g. 30s
, 12h
, 7d
) null
How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). max_items
int (> 0) null
How many data-points/intervals to store (oldest ones are removed when limit is exceeded). Currently not implemented. expire_time
<int><s/m/h/d>
or inf
(infinity) infinity How long after the end time (t2
) is the last value considered valid (i.e. is used as \"current value\"). Zero (0
) means to strictly follow t1
, t2
. Zero can be specified without a unit (s/m/h/d
). Currently not implemented. pre_validity
<int><s/m/h/d>
(e.g. 30s
, 12h
, 7d
) 0s
Max time before t1
for which the data-point's value is still considered to be the \"current value\" if there's no other data-point closer in time. post_validity
<int><s/m/h/d>
(e.g. 30s
, 12h
, 7d
) 0s
Max time after t2
for which the data-point's value is still considered to be the \"current value\" if there's no other data-point closer in time. Note: At least one of max_age
and max_items
SHOULD be defined, otherwise the amount of stored data can grow unbounded.
timeseries_type
string (mandatory) One of: regular
, irregular
or irregular_intervals
. See chapter Data model for explanation. series
object of objects, see below (mandatory) Configuration of series of data represented by this timeseries attribute. timeseries_params
object, see below Other timeseries parameters. A subobject with fields described by the table below."},{"location":"configuration/db_entities/#series","title":"Series","text":"Description of series
subobject (see table above).
Key for series
object is id
- short string identifying the series (e.g. bytes
, temperature
, parcels
).
type
string (mandatory) Data type of series. Only int
and float
are allowed (also time
, but that's used internally, see below). Time series
(axis) is added implicitly by DP\u00b3 and this behaviour is specific to selected timeseries_type
:
\"time\": { \"data_type\": \"time\" }
\"time\": { \"data_type\": \"time\" }
\"time_first\": { \"data_type\": \"time\" }, \"time_last\": { \"data_type\": \"time\" }
Description of timeseries_params
subobject (see table above).
max_age
<int><s/m/h/d>
(e.g. 30s
, 12h
, 7d
) null
How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). time_step
<int><s/m/h/d>
(e.g. 30s
, 12h
, 7d
) (mandatory) for regular timeseries, null
otherwise \"Sampling rate in time\" of this attribute. For example, with time_step = 10m
we expect data-point at 12:00, 12:10, 12:20, 12:30,... Only relevant for regular timeseries. Note: max_age
SHOULD be defined, otherwise the amount of stored data can grow unbounded.
List of supported values for parameter data_type
:
tag
: set/not_set (When the attribute is set, its value is always assumed to be true
, the \"v\" field doesn't have to be stored.)binary
: true
/false
/not_set (Attribute value is true
or false
, or the attribute is not set at all.)category<data_type; category1, category2, ...>
: Categorical values. Use only when a fixed set of values should be allowed, which should be specified in the second part of the type definition. The first part of the type definition describes the data_type of the category.string
int
: 32-bit signed integer (range from -2147483648 to +2147483647)int64
: 64-bit signed integer (use when the range of normal int
is not sufficent)float
time
: Timestamp in YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [\u00b1]HH[:]MM]
format or timestamp since 1.1.1970 in seconds or milliseconds.ip4
: IPv4 address (passed as dotted-decimal string)ip6
: IPv6 address (passed as string in short or full format)mac
: MAC address (passed as string)link<entity_type>
: Link to a record of the specified type, e.g. link<ip>
link<entity_type,data_type>
: Link to a record of the specified type, carrying additional data, e.g. link<ip,int>
array<data_type>
: An array of values of specified data type (which must be one of the types above), e.g. array<int>
set<data_type>
: Same as array, but values can't repeat and order is irrelevant.dict<keys>
: Dictionary (object) containing multiple values as subkeys. keys should contain a comma-separated list of key names and types separated by colon, e.g. dict<port:int,protocol:string,tag?:string>
. By default, all fields are mandatory (i.e. a data-point missing some subkey will be refused), to mark a field as optional, put ?
after its name. Only the following data types can be used here: binary,category,string,int,float,time,ip4,ip6,mac
. Multi-level dicts are not supported.json
: Any JSON object can be stored, all processing is handled by user's code. This is here for special cases which can't be mapped to any data type above.Event logging is done using Redis and allows to count arbitrary events across multiple processes (using shared counters in Redis) and in various time intervals.
More information can be found in Github repository of EventCountLogger.
Configuration file event_logging.yml
looks like this:
redis:\nhost: localhost\nport: 6379\ndb: 1\ngroups:\n# Main events of Task execution\nte:\nevents:\n- task_processed\n- task_processing_error\nintervals: [ \"5m\", \"2h\" ] # (1)!\nsync-interval: 1 # (2)!\n# Number of processed tasks by their \"src\" attribute\ntasks_by_src:\nevents: [ ]\nauto_declare_events: true\nintervals: [ \"5s\", \"5m\" ]\nsync-interval: 1\n
This section describes Redis connection details:
Parameter Data-type Default value Descriptionhost
string localhost
IP address or hostname for connection to Redis. port
int 6379 Listening port of Redis. db
int 0 Index of Redis DB used for the counters (it shouldn't be used for anything else)."},{"location":"configuration/event_logging/#groups","title":"Groups","text":"The default configuration groups enables logging of events in task execution, namely task_processed
and task_processing_error
.
To learn more about the group configuration for EventCountLogger, please refer to the official documentation.
"},{"location":"configuration/history_manager/","title":"History manager","text":"History manager is reponsible for deleting old records from master records in database.
Configuration file history_manager.yml
is very simple:
datapoint_cleaning:\ntick_rate: 10\n
Parameter tick_rate
sets interval how often (in minutes) should DP\u00b3 check if any data in master record of observations and timeseries attributes isn't too old and if there's something too old, removes it. To control what is considered as \"too old\", see parameter max_age
in Database entities configuration.
Folder modules/
optionally contains any module-specific configuration.
This configuration doesn't have to follow any required format (except being YAML files).
In secondary modules, you can access the configuration:
from dp3 import g\nprint(g.config[\"modules\"][\"MODULE_NAME\"])\n
Here, the MODULE_NAME
corresponds to MODULE_NAME.yml
file in modules/
folder.
Processing core's configuration in processing_core.yml
file looks like this:
msg_broker:\nhost: localhost\nport: 5672\nvirtual_host: /\nusername: dp3_user\npassword: dp3_password\nworker_processes: 2\nworker_threads: 16\nmodules_dir: \"../dp3_modules\"\nenabled_modules:\n- \"module_one\"\n- \"module_two\"\n
"},{"location":"configuration/processing_core/#message-broker","title":"Message broker","text":"Message broker section describes connection details to RabbitMQ (or compatible) broker.
Parameter Data-type Default value Descriptionhost
string localhost
IP address or hostname for connection to broker. port
int 5672 Listening port of broker. virtual_host
string /
Virtual host for connection to broker. username
string guest
Username for connection to broker. password
string guest
Password for connection to broker."},{"location":"configuration/processing_core/#worker-processes","title":"Worker processes","text":"Number of worker processes. This has to be at least 1.
If changing number of worker processes, the following process must be followed:
/scripts/rmq_reconfigure.sh
supervisorctl
) and start all inputs againNumber of worker threads per process.
This may be higher than number of CPUs, because this is not primarily intended to utilize computational power of multiple CPUs (which Python cannot do well anyway due to the GIL), but to mask long I/O operations (e.g. queries to external services via network).
"},{"location":"configuration/processing_core/#modules-directory","title":"Modules directory","text":"Path to directory with plug-in (secondary) modules.
Relative path is evaluated relative to location of this configuration file.
"},{"location":"configuration/processing_core/#enabled-modules","title":"Enabled modules","text":"List of plug-in modules which should be enabled in processing pipeline.
Name of module filename without .py
extension must be used!
Snapshots configuration is straightforward. Currently, it only sets creation_rate
- period in minutes for creating new snapshots (30 minutes by default).
File snapshots.yml
looks like this:
creation_rate: 30\n
"},{"location":"reference/","title":"dp3","text":""},{"location":"reference/#dp3","title":"dp3","text":""},{"location":"reference/#dp3--dynamic-profile-processing-platform-dp3","title":"Dynamic Profile Processing Platform (DP\u00b3)","text":"Platform directory structure:
Worker - The main worker process.
Common - Common modules which are used throughout the platform.
Database.EntityDatabase - A wrapper responsible for communication with the database server.
HistoryManagement.HistoryManager - Module responsible for managing history saved in database, currently to clean old data.
Snapshots - SnapShooter, a module responsible for snapshot creation and running configured data correlation and fusion hooks, and Snapshot Hooks, which manage the registered hooks and their dependencies on one another.
TaskProcessing - Module responsible for task distribution, processing and running configured hooks. Task distribution is possible due to the task queue.
Code of the main worker process.
Don't run directly. Import and run the main() function.
"},{"location":"reference/worker/#dp3.worker.load_modules","title":"load_modules","text":"load_modules(modules_dir: str, enabled_modules: dict, log: logging.Logger, registrar: CallbackRegistrar, platform_config: PlatformConfig) -> list\n
Load plug-in modules
Import Python modules with names in 'enabled_modules' from 'modules_dir' directory and return all found classes derived from BaseModule class.
Source code indp3/worker.py
def load_modules(\nmodules_dir: str,\nenabled_modules: dict,\nlog: logging.Logger,\nregistrar: CallbackRegistrar,\nplatform_config: PlatformConfig,\n) -> list:\n\"\"\"Load plug-in modules\n Import Python modules with names in 'enabled_modules' from 'modules_dir' directory\n and return all found classes derived from BaseModule class.\n \"\"\"\n# Get list of all modules available in given folder\n# [:-3] is for removing '.py' suffix from module filenames\navailable_modules = []\nfor item in os.scandir(modules_dir):\n# A module can be a Python file or a Python package\n# (i.e. a directory with \"__init__.py\" file)\nif item.is_file() and item.name.endswith(\".py\"):\navailable_modules.append(item.name[:-3]) # name without .py\nif item.is_dir() and \"__init__.py\" in os.listdir(os.path.join(modules_dir, item.name)):\navailable_modules.append(item.name)\nlog.debug(f\"Available modules: {', '.join(available_modules)}\")\nlog.debug(f\"Enabled modules: {', '.join(enabled_modules)}\")\n# Check if all desired modules are in modules folder\nmissing_modules = set(enabled_modules) - set(available_modules)\nif missing_modules:\nlog.fatal(\n\"Some of desired modules are not available (not in modules folder), \"\nf\"specifically: {missing_modules}\"\n)\nsys.exit(2)\n# Do imports of desired modules from 'modules' folder\n# (rewrite sys.path to modules_dir, import all modules and rewrite it back)\nlog.debug(\"Importing modules ...\")\nsys.path.insert(0, modules_dir)\nimported_modules: list[tuple[str, str, type[BaseModule]]] = [\n(module_name, name, obj)\nfor module_name in enabled_modules\nfor name, obj in inspect.getmembers(import_module(module_name))\nif inspect.isclass(obj) and BaseModule in obj.__bases__\n]\ndel sys.path[0]\n# Final list will contain main classes from all desired modules,\n# which has BaseModule as parent\nmodules_main_objects = []\nfor module_name, _, obj in imported_modules:\n# Append instance of module class (obj is class --> obj() is instance)\n# --> call init, which registers handler\nmodule_config = platform_config.config.get(f\"modules.{module_name}\", {})\nmodules_main_objects.append(obj(platform_config, module_config, registrar))\nlog.info(f\"Module loaded: {module_name}:{obj.__name__}\")\nreturn modules_main_objects\n
"},{"location":"reference/worker/#dp3.worker.main","title":"main","text":"main(app_name: str, config_dir: str, process_index: int, verbose: bool) -> None\n
Run worker process.
Parameters:
Name Type Description Defaultapp_name
str
Name of the application to distinct it from other DP3-based apps. For example, it's used as a prefix for RabbitMQ queue names.
requiredconfig_dir
str
Path to directory containing configuration files.
requiredprocess_index
int
Index of this worker process. For each application there must be N processes running simultaneously, each started with a unique index (from 0 to N-1). N is read from configuration ('worker_processes' in 'processing_core.yml').
requiredverbose
bool
More verbose output (set log level to DEBUG).
required Source code indp3/worker.py
def main(app_name: str, config_dir: str, process_index: int, verbose: bool) -> None:\n\"\"\"\n Run worker process.\n Args:\n app_name: Name of the application to distinct it from other DP3-based apps.\n For example, it's used as a prefix for RabbitMQ queue names.\n config_dir: Path to directory containing configuration files.\n process_index: Index of this worker process. For each application\n there must be N processes running simultaneously, each started with a\n unique index (from 0 to N-1). N is read from configuration\n ('worker_processes' in 'processing_core.yml').\n verbose: More verbose output (set log level to DEBUG).\n \"\"\"\n##############################################\n# Initialize logging mechanism\nLOGFORMAT = \"%(asctime)-15s,%(threadName)s,%(name)s,[%(levelname)s] %(message)s\"\nLOGDATEFORMAT = \"%Y-%m-%dT%H:%M:%S\"\nlogging.basicConfig(\nlevel=logging.DEBUG if verbose else logging.INFO, format=LOGFORMAT, datefmt=LOGDATEFORMAT\n)\nlog = logging.getLogger()\n# Disable INFO and DEBUG messages from some libraries\nlogging.getLogger(\"requests\").setLevel(logging.WARNING)\nlogging.getLogger(\"urllib3\").setLevel(logging.WARNING)\nlogging.getLogger(\"amqpstorm\").setLevel(logging.WARNING)\n##############################################\n# Load configuration\nconfig_base_path = os.path.abspath(config_dir)\nlog.debug(f\"Loading config directory {config_base_path}\")\n# Whole configuration should be loaded\nconfig = read_config_dir(config_base_path, recursive=True)\ntry:\nmodel_spec = ModelSpec(config.get(\"db_entities\"))\nexcept ValidationError as e:\nlog.fatal(\"Invalid model specification: %s\", e)\nsys.exit(2)\n# Print whole attribute specification\nlog.debug(model_spec)\nnum_processes = config.get(\"processing_core.worker_processes\")\nplatform_config = PlatformConfig(\napp_name=app_name,\nconfig_base_path=config_base_path,\nconfig=config,\nmodel_spec=model_spec,\nprocess_index=process_index,\nnum_processes=num_processes,\n)\n##############################################\n# Create instances of core components\nlog.info(f\"***** {app_name} worker {process_index} of {num_processes} start *****\")\ndb = EntityDatabase(config.get(\"database\"), model_spec)\nglobal_scheduler = scheduler.Scheduler()\ntask_executor = TaskExecutor(db, platform_config)\nsnap_shooter = SnapShooter(\ndb,\nTaskQueueWriter(app_name, num_processes, config.get(\"processing_core.msg_broker\")),\ntask_executor,\nplatform_config,\nglobal_scheduler,\n)\nregistrar = CallbackRegistrar(global_scheduler, task_executor, snap_shooter)\nHistoryManager(db, platform_config, registrar)\nTelemetry(db, platform_config, registrar)\n# Lock used to control when the program stops.\ndaemon_stop_lock = threading.Lock()\ndaemon_stop_lock.acquire()\n# Signal handler releasing the lock on SIGINT or SIGTERM\ndef sigint_handler(signum, frame):\nlog.debug(\n\"Signal {} received, stopping worker\".format(\n{signal.SIGINT: \"SIGINT\", signal.SIGTERM: \"SIGTERM\"}.get(signum, signum)\n)\n)\ndaemon_stop_lock.release()\nsignal.signal(signal.SIGINT, sigint_handler)\nsignal.signal(signal.SIGTERM, sigint_handler)\nsignal.signal(signal.SIGABRT, sigint_handler)\ntask_distributor = TaskDistributor(task_executor, platform_config, registrar, daemon_stop_lock)\ncontrol = Control(platform_config)\ncontrol.set_action_handler(ControlAction.make_snapshots, snap_shooter.make_snapshots)\n##############################################\n# Load all plug-in modules\nos.path.dirname(__file__)\ncustom_modules_dir = config.get(\"processing_core.modules_dir\")\ncustom_modules_dir = os.path.abspath(os.path.join(config_base_path, custom_modules_dir))\nmodule_list = load_modules(\ncustom_modules_dir,\nconfig.get(\"processing_core.enabled_modules\"),\nlog,\nregistrar,\nplatform_config,\n)\n################################################\n# Initialization completed, run ...\n# Run update manager thread\nlog.info(\"***** Initialization completed, starting all modules *****\")\n# Run modules that have their own threads (TODO: there are no such modules, should be kept?)\n# (if they don't, the start() should do nothing)\nfor module in module_list:\nmodule.start()\n# start TaskDistributor (which starts TaskExecutors in several worker threads)\ntask_distributor.start()\n# Run scheduler\nglobal_scheduler.start()\n# Run SnapShooter\nsnap_shooter.start()\ncontrol.start()\n# Wait until someone wants to stop the program by releasing this Lock.\n# It may be a user by pressing Ctrl-C or some program module.\n# (try to acquire the lock again,\n# effectively waiting until it's released by signal handler or another thread)\nif os.name == \"nt\":\n# This is needed on Windows in order to catch Ctrl-C, which doesn't break the waiting.\nwhile not daemon_stop_lock.acquire(timeout=1):\npass\nelse:\ndaemon_stop_lock.acquire()\n################################################\n# Finalization & cleanup\n# Set signal handlers back to their defaults,\n# so the second Ctrl-C closes the program immediately\nsignal.signal(signal.SIGINT, signal.SIG_DFL)\nsignal.signal(signal.SIGTERM, signal.SIG_DFL)\nsignal.signal(signal.SIGABRT, signal.SIG_DFL)\nlog.info(\"Stopping running components ...\")\ncontrol.stop()\nsnap_shooter.stop()\nglobal_scheduler.stop()\ntask_distributor.stop()\nfor module in module_list:\nmodule.stop()\nlog.info(\"***** Finished, main thread exiting. *****\")\nlogging.shutdown()\n
"},{"location":"reference/api/","title":"api","text":""},{"location":"reference/api/#dp3.api","title":"dp3.api","text":""},{"location":"reference/api/main/","title":"main","text":""},{"location":"reference/api/main/#dp3.api.main","title":"dp3.api.main","text":""},{"location":"reference/api/internal/","title":"internal","text":""},{"location":"reference/api/internal/#dp3.api.internal","title":"dp3.api.internal","text":""},{"location":"reference/api/internal/config/","title":"config","text":""},{"location":"reference/api/internal/config/#dp3.api.internal.config","title":"dp3.api.internal.config","text":""},{"location":"reference/api/internal/config/#dp3.api.internal.config.ConfigEnv","title":"ConfigEnv","text":" Bases: BaseModel
Configuration environment variables container
"},{"location":"reference/api/internal/dp_logger/","title":"dp_logger","text":""},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger","title":"dp3.api.internal.dp_logger","text":""},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger","title":"DPLogger","text":"DPLogger(config: dict)\n
Datapoint logger
Logs good/bad datapoints into file for further analysis. They are logged in JSON format. Bad datapoints are logged together with their error message.
Logging may be disabled in api.yml
configuration file:
# ...\ndatapoint_logger:\n good_log: false\n bad_log: false\n# ...\n
Source code in dp3/api/internal/dp_logger.py
def __init__(self, config: dict):\nif not config:\nconfig = {}\ngood_log_file = config.get(\"good_log\", False)\nbad_log_file = config.get(\"bad_log\", False)\n# Setup loggers\nself._good_logger = self.setup_logger(\"GOOD\", good_log_file)\nself._bad_logger = self.setup_logger(\"BAD\", bad_log_file)\n
"},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger.setup_logger","title":"setup_logger","text":"setup_logger(name: str, log_file: str)\n
Creates new logger instance with log_file
as target
dp3/api/internal/dp_logger.py
def setup_logger(self, name: str, log_file: str):\n\"\"\"Creates new logger instance with `log_file` as target\"\"\"\n# Create log handler\nif log_file:\nparent_path = pathlib.Path(log_file).parent\nif not parent_path.exists():\nraise FileNotFoundError(\nf\"The directory {parent_path} does not exist,\"\n\" check the configured path or create the directory.\"\n)\nlog_handler = logging.FileHandler(log_file)\nlog_handler.setFormatter(self.LOG_FORMATTER)\nelse:\nlog_handler = logging.NullHandler()\n# Get logger instance\nlogger = logging.getLogger(name)\nlogger.addHandler(log_handler)\nlogger.setLevel(logging.INFO)\nreturn logger\n
"},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger.log_good","title":"log_good","text":"log_good(dps: list[DataPointBase], src: str = UNKNOWN_SRC_MSG)\n
Logs good datapoints
Datapoints are logged one-by-one in processed form. Source should be IP address of incomping request.
Source code indp3/api/internal/dp_logger.py
def log_good(self, dps: list[DataPointBase], src: str = UNKNOWN_SRC_MSG):\n\"\"\"Logs good datapoints\n Datapoints are logged one-by-one in processed form.\n Source should be IP address of incomping request.\n \"\"\"\nfor dp in dps:\nself._good_logger.info(dp.json(), extra={\"src\": src})\n
"},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger.log_bad","title":"log_bad","text":"log_bad(request_body: str, validation_error_msg: str, src: str = UNKNOWN_SRC_MSG)\n
Logs bad datapoints including the validation error message
Whole request body is logged at once (JSON string is expected). Source should be IP address of incomping request.
Source code indp3/api/internal/dp_logger.py
def log_bad(self, request_body: str, validation_error_msg: str, src: str = UNKNOWN_SRC_MSG):\n\"\"\"Logs bad datapoints including the validation error message\n Whole request body is logged at once (JSON string is expected).\n Source should be IP address of incomping request.\n \"\"\"\n# Remove newlines from request body\nrequest_body = request_body.replace(\"\\n\", \" \")\n# Prepend error message with tabs\nvalidation_error_msg = validation_error_msg.replace(\"\\n\", \"\\n\\t\")\nself._bad_logger.info(f\"{request_body}\\n\\t{validation_error_msg}\", extra={\"src\": src})\n
"},{"location":"reference/api/internal/entity_response_models/","title":"entity_response_models","text":""},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models","title":"dp3.api.internal.entity_response_models","text":""},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityState","title":"EntityState","text":" Bases: BaseModel
Entity specification and current state
Merges (some) data from DP3's EntitySpec
and state information from Database
. Provides estimate count of master records in database.
Bases: BaseModel
List of entity eids and their data based on latest snapshot
Includes timestamp of latest snapshot creation.
Data does not include history of observations attributes and timeseries.
"},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityEidData","title":"EntityEidData","text":" Bases: BaseModel
Data of entity eid
Includes all snapshots and master record.
empty
signalizes whether this eid includes any data.
Bases: BaseModel
Value and/or history of entity attribute for given eid
Depends on attribute type: - plain: just (current) value - observations: (current) value and history stored in master record (optionally filtered) - timeseries: just history stored in master record (optionally filtered)
"},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityEidAttrValue","title":"EntityEidAttrValue","text":" Bases: BaseModel
Value of entity attribute for given eid
The value is fetched from master record.
"},{"location":"reference/api/internal/helpers/","title":"helpers","text":""},{"location":"reference/api/internal/helpers/#dp3.api.internal.helpers","title":"dp3.api.internal.helpers","text":""},{"location":"reference/api/internal/helpers/#dp3.api.internal.helpers.api_to_dp3_datapoint","title":"api_to_dp3_datapoint","text":"api_to_dp3_datapoint(api_dp_values: dict) -> DataPointBase\n
Converts API datapoint values to DP3 datapoint
If etype-attr pair doesn't exist in DP3 config, raises ValueError
. If values are not valid, raises pydantic's ValidationError.
dp3/api/internal/helpers.py
def api_to_dp3_datapoint(api_dp_values: dict) -> DataPointBase:\n\"\"\"Converts API datapoint values to DP3 datapoint\n If etype-attr pair doesn't exist in DP3 config, raises `ValueError`.\n If values are not valid, raises pydantic's ValidationError.\n \"\"\"\netype = api_dp_values[\"type\"]\nattr = api_dp_values[\"attr\"]\n# Convert to DP3 datapoint format\ndp3_dp_values = api_dp_values\ndp3_dp_values[\"etype\"] = etype\ndp3_dp_values[\"eid\"] = api_dp_values[\"id\"]\n# Get attribute-specific model\ntry:\nmodel = MODEL_SPEC.attr(etype, attr).dp_model\nexcept KeyError as e:\nraise ValueError(f\"Combination of type '{etype}' and attr '{attr}' doesn't exist\") from e\n# Parse using the model\n# This may raise pydantic's ValidationError, but that's intensional (to get\n# a JSON-serializable trace as a response from API).\nreturn model.parse_obj(dp3_dp_values)\n
"},{"location":"reference/api/internal/models/","title":"models","text":""},{"location":"reference/api/internal/models/#dp3.api.internal.models","title":"dp3.api.internal.models","text":""},{"location":"reference/api/internal/models/#dp3.api.internal.models.DataPoint","title":"DataPoint","text":" Bases: BaseModel
Data-point for API
Contains single raw data value received on API. This is generic class for plain, observation and timeseries datapoints.
Provides front line of validation for this data value.
This differs slightly compared to DataPoint
from DP3 in naming of attributes due to historic reasons.
After validation of this schema, datapoint is validated using attribute-specific validator to ensure full compilance.
"},{"location":"reference/api/internal/response_models/","title":"response_models","text":""},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models","title":"dp3.api.internal.response_models","text":""},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models.HealthCheckResponse","title":"HealthCheckResponse","text":" Bases: BaseModel
Healthcheck endpoint response
"},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models.SuccessResponse","title":"SuccessResponse","text":" Bases: BaseModel
Generic success response
"},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models.RequestValidationError","title":"RequestValidationError","text":"RequestValidationError(loc, msg)\n
Bases: HTTPException
HTTP exception wrapper to simplify path and query validation
Source code indp3/api/internal/response_models.py
def __init__(self, loc, msg):\nsuper().__init__(422, [{\"loc\": loc, \"msg\": msg, \"type\": \"value_error\"}])\n
"},{"location":"reference/api/routers/","title":"routers","text":""},{"location":"reference/api/routers/#dp3.api.routers","title":"dp3.api.routers","text":""},{"location":"reference/api/routers/control/","title":"control","text":""},{"location":"reference/api/routers/control/#dp3.api.routers.control","title":"dp3.api.routers.control","text":""},{"location":"reference/api/routers/control/#dp3.api.routers.control.execute_action","title":"execute_action async
","text":"execute_action(action: ControlAction) -> SuccessResponse\n
Sends the given action into execution queue.
Source code indp3/api/routers/control.py
@router.get(\"/{action}\")\nasync def execute_action(action: ControlAction) -> SuccessResponse:\n\"\"\"Sends the given action into execution queue.\"\"\"\nCONTROL_WRITER.put_task(ControlMessage(action=action))\nreturn SuccessResponse(detail=\"Action sent.\")\n
"},{"location":"reference/api/routers/entity/","title":"entity","text":""},{"location":"reference/api/routers/entity/#dp3.api.routers.entity","title":"dp3.api.routers.entity","text":""},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.check_entity","title":"check_entity async
","text":"check_entity(entity: str)\n
Middleware to check entity existence
Source code indp3/api/routers/entity.py
async def check_entity(entity: str):\n\"\"\"Middleware to check entity existence\"\"\"\nif entity not in MODEL_SPEC.entities:\nraise RequestValidationError([\"path\", \"entity\"], f\"Entity '{entity}' doesn't exist\")\nreturn entity\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.list_entity_eids","title":"list_entity_eids async
","text":"list_entity_eids(entity: str, skip: NonNegativeInt = 0, limit: PositiveInt = 20) -> EntityEidList\n
List latest snapshots of all id
s present in database under entity
.
Contains only latest snapshot.
Uses pagination.
Source code indp3/api/routers/entity.py
@router.get(\"/{entity}\")\nasync def list_entity_eids(\nentity: str, skip: NonNegativeInt = 0, limit: PositiveInt = 20\n) -> EntityEidList:\n\"\"\"List latest snapshots of all `id`s present in database under `entity`.\n Contains only latest snapshot.\n Uses pagination.\n \"\"\"\ncursor = DB.get_latest_snapshots(entity).skip(skip).limit(limit)\ntime_created = None\n# Remove _id field\nresult = list(cursor)\nfor r in result:\ntime_created = r[\"_time_created\"]\ndel r[\"_time_created\"]\ndel r[\"_id\"]\nreturn EntityEidList(time_created=time_created, data=result)\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.get_eid_data","title":"get_eid_data async
","text":"get_eid_data(entity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidData\n
Get data of entity
's eid
.
Contains all snapshots and master record. Snapshots are ordered by ascending creation time.
Source code indp3/api/routers/entity.py
@router.get(\"/{entity}/{eid}\")\nasync def get_eid_data(\nentity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None\n) -> EntityEidData:\n\"\"\"Get data of `entity`'s `eid`.\n Contains all snapshots and master record.\n Snapshots are ordered by ascending creation time.\n \"\"\"\n# Get master record\n# TODO: This is probably not the most efficient way. Maybe gather only\n# plain data from master record and then call `get_timeseries_history`\n# for timeseries.\nmaster_record = DB.get_master_record(entity, eid)\nif \"_id\" in master_record:\ndel master_record[\"_id\"]\nif \"#hash\" in master_record:\ndel master_record[\"#hash\"]\n# Get filtered timeseries data\nfor attr in master_record:\nif MODEL_SPEC.attr(entity, attr).t == AttrType.TIMESERIES:\nmaster_record[attr] = DB.get_timeseries_history(\nentity, attr, eid, t1=date_from, t2=date_to\n)\n# Get snapshots\nsnapshots = list(DB.get_snapshots(entity, eid, t1=date_from, t2=date_to))\nfor s in snapshots:\ndel s[\"_id\"]\n# Whether this eid contains any data\nempty = not master_record and len(snapshots) == 0\nreturn EntityEidData(empty=empty, master_record=master_record, snapshots=snapshots)\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.get_eid_attr_value","title":"get_eid_attr_value async
","text":"get_eid_attr_value(entity: str, eid: str, attr: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidAttrValueOrHistory\n
Get attribute value
Value is either of: - current value: in case of plain attribute - current value and history: in case of observation attribute - history: in case of timeseries attribute
Source code indp3/api/routers/entity.py
@router.get(\"/{entity}/{eid}/get/{attr}\")\nasync def get_eid_attr_value(\nentity: str,\neid: str,\nattr: str,\ndate_from: Optional[datetime] = None,\ndate_to: Optional[datetime] = None,\n) -> EntityEidAttrValueOrHistory:\n\"\"\"Get attribute value\n Value is either of:\n - current value: in case of plain attribute\n - current value and history: in case of observation attribute\n - history: in case of timeseries attribute\n \"\"\"\n# Check if attribute exists\nif attr not in MODEL_SPEC.attribs(entity):\nraise RequestValidationError([\"path\", \"attr\"], f\"Attribute '{attr}' doesn't exist\")\nvalue_or_history = DB.get_value_or_history(entity, attr, eid, t1=date_from, t2=date_to)\nreturn EntityEidAttrValueOrHistory(\nattr_type=MODEL_SPEC.attr(entity, attr).t, **value_or_history\n)\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.set_eid_attr_value","title":"set_eid_attr_value async
","text":"set_eid_attr_value(entity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request) -> SuccessResponse\n
Set current value of attribute
Internally just creates datapoint for specified attribute and value.
This endpoint is meant for editable
plain attributes -- for direct user edit on DP3 web UI.
dp3/api/routers/entity.py
@router.post(\"/{entity}/{eid}/set/{attr}\")\nasync def set_eid_attr_value(\nentity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request\n) -> SuccessResponse:\n\"\"\"Set current value of attribute\n Internally just creates datapoint for specified attribute and value.\n This endpoint is meant for `editable` plain attributes -- for direct user edit on DP3 web UI.\n \"\"\"\n# Check if attribute exists\nif attr not in MODEL_SPEC.attribs(entity):\nraise RequestValidationError([\"path\", \"attr\"], f\"Attribute '{attr}' doesn't exist\")\n# Construct datapoint\ntry:\ndp = DataPoint(\ntype=entity,\nid=eid,\nattr=attr,\nv=body.value,\nt1=datetime.now(),\nsrc=f\"{request.client.host} via API\",\n)\ndp3_dp = api_to_dp3_datapoint(dp.dict())\nexcept ValidationError as e:\nraise RequestValidationError([\"body\", \"value\"], e.errors()[0][\"msg\"]) from e\n# This shouldn't fail\ntask = DataPointTask(model_spec=MODEL_SPEC, etype=entity, eid=eid, data_points=[dp3_dp])\n# Push tasks to task queue\nTASK_WRITER.put_task(task, False)\n# Datapoints from this endpoint are intentionally not logged using `DPLogger`.\n# If for some reason, in the future, they need to be, just copy code from data ingestion\n# endpoint.\nreturn SuccessResponse()\n
"},{"location":"reference/api/routers/root/","title":"root","text":""},{"location":"reference/api/routers/root/#dp3.api.routers.root","title":"dp3.api.routers.root","text":""},{"location":"reference/api/routers/root/#dp3.api.routers.root.health_check","title":"health_check async
","text":"health_check() -> HealthCheckResponse\n
Health check
Returns simple 'It works!' response.
Source code indp3/api/routers/root.py
@router.get(\"/\", tags=[\"Health\"])\nasync def health_check() -> HealthCheckResponse:\n\"\"\"Health check\n Returns simple 'It works!' response.\n \"\"\"\nreturn HealthCheckResponse()\n
"},{"location":"reference/api/routers/root/#dp3.api.routers.root.insert_datapoints","title":"insert_datapoints async
","text":"insert_datapoints(dps: list[DataPoint], request: Request) -> SuccessResponse\n
Insert datapoints
Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.
Source code indp3/api/routers/root.py
@router.post(DATAPOINTS_INGESTION_URL_PATH, tags=[\"Data ingestion\"])\nasync def insert_datapoints(dps: list[DataPoint], request: Request) -> SuccessResponse:\n\"\"\"Insert datapoints\n Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.\n \"\"\"\n# Convert to DP3 datapoints\n# This should not fail as all datapoints are already validated\ndp3_dps = [api_to_dp3_datapoint(dp.dict()) for dp in dps]\n# Group datapoints by etype-eid\ntasks_dps = defaultdict(list)\nfor dp in dp3_dps:\nkey = (dp.etype, dp.eid)\ntasks_dps[key].append(dp)\n# Create tasks\ntasks = []\nfor k in tasks_dps:\netype, eid = k\n# This shouldn't fail either\ntasks.append(\nDataPointTask(model_spec=MODEL_SPEC, etype=etype, eid=eid, data_points=tasks_dps[k])\n)\n# Push tasks to task queue\nfor task in tasks:\nTASK_WRITER.put_task(task, False)\n# Log datapoints\nDP_LOGGER.log_good(dp3_dps, src=request.client.host)\nreturn SuccessResponse()\n
"},{"location":"reference/api/routers/root/#dp3.api.routers.root.list_entities","title":"list_entities async
","text":"list_entities() -> dict[str, EntityState]\n
List entities
Returns dictionary containing all entities configured -- their simplified configuration and current state information.
Source code indp3/api/routers/root.py
@router.get(\"/entities\", tags=[\"Entity\"])\nasync def list_entities() -> dict[str, EntityState]:\n\"\"\"List entities\n Returns dictionary containing all entities configured -- their simplified configuration\n and current state information.\n \"\"\"\nentities = {}\nfor e_id in MODEL_SPEC.entities:\nentity_spec = MODEL_SPEC.entity(e_id)\nentities[e_id] = {\n\"id\": e_id,\n\"name\": entity_spec.name,\n\"attribs\": MODEL_SPEC.attribs(e_id),\n\"eid_estimate_count\": DB.estimate_count_eids(e_id),\n}\nreturn entities\n
"},{"location":"reference/bin/","title":"bin","text":""},{"location":"reference/bin/#dp3.bin","title":"dp3.bin","text":""},{"location":"reference/bin/api/","title":"api","text":""},{"location":"reference/bin/api/#dp3.bin.api","title":"dp3.bin.api","text":"Run the DP3 API using uvicorn.
"},{"location":"reference/bin/setup/","title":"setup","text":""},{"location":"reference/bin/setup/#dp3.bin.setup","title":"dp3.bin.setup","text":"DP3 Setup Script for creating a DP3 application.
"},{"location":"reference/bin/setup/#dp3.bin.setup.replace_template","title":"replace_template","text":"replace_template(directory: Path, template: str, replace_with: str)\n
Replace all occurrences of template
with the given text.
dp3/bin/setup.py
def replace_template(directory: Path, template: str, replace_with: str):\n\"\"\"Replace all occurrences of `template` with the given text.\"\"\"\nfor file in directory.rglob(\"*\"):\nif file.is_file():\ntry:\nwith file.open(\"r+\") as f:\ncontents = f.read()\ncontents = contents.replace(template, replace_with)\nf.seek(0)\nf.write(contents)\nf.truncate()\nexcept UnicodeDecodeError:\npass\nexcept PermissionError:\npass\n
"},{"location":"reference/bin/worker/","title":"worker","text":""},{"location":"reference/bin/worker/#dp3.bin.worker","title":"dp3.bin.worker","text":""},{"location":"reference/common/","title":"common","text":""},{"location":"reference/common/#dp3.common","title":"dp3.common","text":"Common modules which are used throughout the platform.
Bases: Flag
Enum of attribute types
PLAIN
= 1 OBSERVATIONS
= 2 TIMESERIES
= 4
classmethod
","text":"from_str(type_str: str)\n
Convert string representation like \"plain\" to AttrType.
Source code indp3/common/attrspec.py
@classmethod\ndef from_str(cls, type_str: str):\n\"\"\"\n Convert string representation like \"plain\" to AttrType.\n \"\"\"\ntry:\nreturn cls(cls[type_str.upper()])\nexcept Exception as e:\nraise AttrTypeError(f\"Invalid attribute type '{type_str}'\") from e\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.ObservationsHistoryParams","title":"ObservationsHistoryParams","text":" Bases: BaseModel
History parameters field of observations attribute
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.TimeseriesTSParams","title":"TimeseriesTSParams","text":" Bases: BaseModel
Timeseries parameters field of timeseries attribute
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.TimeseriesSeries","title":"TimeseriesSeries","text":" Bases: BaseModel
Series of timeseries attribute
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecGeneric","title":"AttrSpecGeneric","text":" Bases: BaseModel
Base of attribute specification
Parent of other AttrSpec
classes.
Bases: AttrSpecGeneric
Parent of non-timeseries AttrSpec
classes.
property
","text":"is_relation: bool\n
Returns whether specified attribute is a link.
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecClassic.relation_to","title":"relation_toproperty
","text":"relation_to: str\n
Returns linked entity id. Raises ValueError if attribute is not a link.
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecPlain","title":"AttrSpecPlain","text":"AttrSpecPlain(**data)\n
Bases: AttrSpecClassic
Plain attribute specification
Source code indp3/common/attrspec.py
def __init__(self, **data):\nsuper().__init__(**data)\nself._dp_model = create_model(\nf\"DataPointPlain_{self.id}\",\n__base__=DataPointPlainBase,\nv=(self.data_type.data_type, ...),\n)\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecObservations","title":"AttrSpecObservations","text":"AttrSpecObservations(**data)\n
Bases: AttrSpecClassic
Observations attribute specification
Source code indp3/common/attrspec.py
def __init__(self, **data):\nsuper().__init__(**data)\nvalue_validator = self.data_type.data_type\nself._dp_model = create_model(\nf\"DataPointObservations_{self.id}\",\n__base__=DataPointObservationsBase,\nv=(value_validator, ...),\n)\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecTimeseries","title":"AttrSpecTimeseries","text":"AttrSpecTimeseries(**data)\n
Bases: AttrSpecGeneric
Timeseries attribute specification
Source code indp3/common/attrspec.py
def __init__(self, **data):\nsuper().__init__(**data)\n# Typing of `v` field\ndp_value_typing = {}\nfor s in self.series:\ndata_type = self.series[s].data_type.data_type\ndp_value_typing[s] = ((list[data_type]), ...)\n# Validators\ndp_validators = {\n\"v_validator\": dp_ts_v_validator,\n}\n# Add root validator\nif self.timeseries_type == \"regular\":\ndp_validators[\"root_validator\"] = dp_ts_root_validator_regular_wrapper(\nself.timeseries_params.time_step\n)\nelif self.timeseries_type == \"irregular\":\ndp_validators[\"root_validator\"] = dp_ts_root_validator_irregular\nelif self.timeseries_type == \"irregular_intervals\":\ndp_validators[\"root_validator\"] = dp_ts_root_validator_irregular_intervals\nself._dp_model = create_model(\nf\"DataPointTimeseries_{self.id}\",\n__base__=DataPointTimeseriesBase,\n__validators__=dp_validators,\nv=(create_model(f\"DataPointTimeseriesValue_{self.id}\", **dp_value_typing), ...),\n)\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpec","title":"AttrSpec","text":"AttrSpec(id: str, spec: dict[str, Any]) -> AttrSpecType\n
Factory for AttrSpec
classes
dp3/common/attrspec.py
def AttrSpec(id: str, spec: dict[str, Any]) -> AttrSpecType:\n\"\"\"Factory for `AttrSpec` classes\"\"\"\nattr_type = AttrType.from_str(spec.get(\"type\"))\nsubclasses = {\nAttrType.PLAIN: AttrSpecPlain,\nAttrType.OBSERVATIONS: AttrSpecObservations,\nAttrType.TIMESERIES: AttrSpecTimeseries,\n}\nreturn subclasses[attr_type](id=id, **spec)\n
"},{"location":"reference/common/base_attrs/","title":"base_attrs","text":""},{"location":"reference/common/base_attrs/#dp3.common.base_attrs","title":"dp3.common.base_attrs","text":""},{"location":"reference/common/base_module/","title":"base_module","text":""},{"location":"reference/common/base_module/#dp3.common.base_module","title":"dp3.common.base_module","text":""},{"location":"reference/common/base_module/#dp3.common.base_module.BaseModule","title":"BaseModule","text":"BaseModule(platform_config: PlatformConfig, module_config: dict, registrar: CallbackRegistrar)\n
Bases: ABC
Abstract class for platform modules. Every module must inherit this abstract class for automatic loading of module!
Initialize the module and register callbacks.
Parameters:
Name Type Description Defaultplatform_config
PlatformConfig
Platform configuration class
requiredmodule_config
dict
Configuration of the module, equivalent of platform_config.config.get(\"modules.<module_name>\")
registrar
CallbackRegistrar
A callback / hook registration interface
required Source code indp3/common/base_module.py
@abstractmethod\ndef __init__(\nself, platform_config: PlatformConfig, module_config: dict, registrar: CallbackRegistrar\n):\n\"\"\"Initialize the module and register callbacks.\n Args:\n platform_config: Platform configuration class\n module_config: Configuration of the module,\n equivalent of `platform_config.config.get(\"modules.<module_name>\")`\n registrar: A callback / hook registration interface\n \"\"\"\n
"},{"location":"reference/common/base_module/#dp3.common.base_module.BaseModule.start","title":"start","text":"start() -> None\n
Run the module - used to run own thread if needed.
Called after initialization, may be used to create and run a separate thread if needed by the module. Do nothing unless overridden.
Source code indp3/common/base_module.py
def start(self) -> None:\n\"\"\"\n Run the module - used to run own thread if needed.\n Called after initialization, may be used to create and run a separate\n thread if needed by the module. Do nothing unless overridden.\n \"\"\"\nreturn None\n
"},{"location":"reference/common/base_module/#dp3.common.base_module.BaseModule.stop","title":"stop","text":"stop() -> None\n
Stop the module - used to stop own thread.
Called before program exit, may be used to finalize and stop the separate thread if it is used. Do nothing unless overridden.
Source code indp3/common/base_module.py
def stop(self) -> None:\n\"\"\"\n Stop the module - used to stop own thread.\n Called before program exit, may be used to finalize and stop the\n separate thread if it is used. Do nothing unless overridden.\n \"\"\"\nreturn None\n
"},{"location":"reference/common/callback_registrar/","title":"callback_registrar","text":""},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar","title":"dp3.common.callback_registrar","text":""},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar","title":"CallbackRegistrar","text":"CallbackRegistrar(scheduler: Scheduler, task_executor: TaskExecutor, snap_shooter: SnapShooter)\n
Interface for callback registration.
Source code indp3/common/callback_registrar.py
def __init__(\nself, scheduler: Scheduler, task_executor: TaskExecutor, snap_shooter: SnapShooter\n):\nself._scheduler = scheduler\nself._task_executor = task_executor\nself._snap_shooter = snap_shooter\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.scheduler_register","title":"scheduler_register","text":"scheduler_register(func: Callable, *, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int\n
Register a function to be run at specified times.
Pass cron-like specification of when the function should be called, see docs of apscheduler.triggers.cron for details. `
Parameters:
Name Type Description Defaultfunc
Callable
function or method to be called
requiredfunc_args
Union[list, tuple]
list of positional arguments to call func with
None
func_kwargs
dict
dict of keyword arguments to call func with
None
year
Union[int, str]
4-digit year
None
month
Union[int, str]
month (1-12)
None
day
Union[int, str]
day of month (1-31)
None
week
Union[int, str]
ISO week (1-53)
None
day_of_week
Union[int, str]
number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)
None
hour
Union[int, str]
hour (0-23)
None
minute
Union[int, str]
minute (0-59)
None
second
Union[int, str]
second (0-59)
None
timezone
str
Timezone for time specification (default is UTC).
'UTC'
Returns:
Type Descriptionint
job ID
Source code indp3/common/callback_registrar.py
def scheduler_register(\nself,\nfunc: Callable,\n*,\nfunc_args: Union[list, tuple] = None,\nfunc_kwargs: dict = None,\nyear: Union[int, str] = None,\nmonth: Union[int, str] = None,\nday: Union[int, str] = None,\nweek: Union[int, str] = None,\nday_of_week: Union[int, str] = None,\nhour: Union[int, str] = None,\nminute: Union[int, str] = None,\nsecond: Union[int, str] = None,\ntimezone: str = \"UTC\",\n) -> int:\n\"\"\"\n Register a function to be run at specified times.\n Pass cron-like specification of when the function should be called,\n see [docs](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html)\n of apscheduler.triggers.cron for details.\n `\n Args:\n func: function or method to be called\n func_args: list of positional arguments to call func with\n func_kwargs: dict of keyword arguments to call func with\n year: 4-digit year\n month: month (1-12)\n day: day of month (1-31)\n week: ISO week (1-53)\n day_of_week: number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)\n hour: hour (0-23)\n minute: minute (0-59)\n second: second (0-59)\n timezone: Timezone for time specification (default is UTC).\n Returns:\n job ID\n \"\"\"\nreturn self._scheduler.register(\nfunc,\nfunc_args=func_args,\nfunc_kwargs=func_kwargs,\nyear=year,\nmonth=month,\nday=day,\nweek=week,\nday_of_week=day_of_week,\nhour=hour,\nminute=minute,\nsecond=second,\ntimezone=timezone,\n)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_task_hook","title":"register_task_hook","text":"register_task_hook(hook_type: str, hook: Callable)\n
Registers one of available task hooks
See: TaskGenericHooksContainer
in task_hooks.py
dp3/common/callback_registrar.py
def register_task_hook(self, hook_type: str, hook: Callable):\n\"\"\"Registers one of available task hooks\n See: [`TaskGenericHooksContainer`][dp3.task_processing.task_hooks.TaskGenericHooksContainer]\n in `task_hooks.py`\n \"\"\"\nself._task_executor.register_task_hook(hook_type, hook)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_entity_hook","title":"register_entity_hook","text":"register_entity_hook(hook_type: str, hook: Callable, entity: str)\n
Registers one of available task entity hooks
See: TaskEntityHooksContainer
in task_hooks.py
dp3/common/callback_registrar.py
def register_entity_hook(self, hook_type: str, hook: Callable, entity: str):\n\"\"\"Registers one of available task entity hooks\n See: [`TaskEntityHooksContainer`][dp3.task_processing.task_hooks.TaskEntityHooksContainer]\n in `task_hooks.py`\n \"\"\"\nself._task_executor.register_entity_hook(hook_type, hook, entity)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_attr_hook","title":"register_attr_hook","text":"register_attr_hook(hook_type: str, hook: Callable, entity: str, attr: str)\n
Registers one of available task attribute hooks
See: TaskAttrHooksContainer
in task_hooks.py
dp3/common/callback_registrar.py
def register_attr_hook(self, hook_type: str, hook: Callable, entity: str, attr: str):\n\"\"\"Registers one of available task attribute hooks\n See: [`TaskAttrHooksContainer`][dp3.task_processing.task_hooks.TaskAttrHooksContainer]\n in `task_hooks.py`\n \"\"\"\nself._task_executor.register_attr_hook(hook_type, hook, entity, attr)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_timeseries_hook","title":"register_timeseries_hook","text":"register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n
Registers passed timeseries hook to be called during snapshot creation.
Binds hook to specified entity_type
and attr_type
(though same hook can be bound multiple times).
Parameters:
Name Type Description Defaulthook
Callable[[str, str, list[dict]], list[DataPointTask]]
hook
callable should expect entity_type, attr_type and attribute history as arguments and return a list of DataPointTask
objects.
entity_type
str
specifies entity type
requiredattr_type
str
specifies attribute type
requiredRaises:
Type DescriptionValueError
If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.
Source code indp3/common/callback_registrar.py
def register_timeseries_hook(\nself,\nhook: Callable[[str, str, list[dict]], list[DataPointTask]],\nentity_type: str,\nattr_type: str,\n):\n\"\"\"\n Registers passed timeseries hook to be called during snapshot creation.\n Binds hook to specified `entity_type` and `attr_type` (though same hook can be bound\n multiple times).\n Args:\n hook: `hook` callable should expect entity_type, attr_type and attribute\n history as arguments and return a list of `DataPointTask` objects.\n entity_type: specifies entity type\n attr_type: specifies attribute type\n Raises:\n ValueError: If entity_type and attr_type do not specify a valid timeseries attribute,\n a ValueError is raised.\n \"\"\"\nself._snap_shooter.register_timeseries_hook(hook, entity_type, attr_type)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_correlation_hook","title":"register_correlation_hook","text":"register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])\n
Registers passed hook to be called during snapshot creation.
Binds hook to specified entity_type (though same hook can be bound multiple times).
entity_type
and attribute specifications are validated, ValueError
is raised on failure.
Parameters:
Name Type Description Defaulthook
Callable[[str, dict], None]
hook
callable should expect entity type as str and its current values, including linked entities, as dict
entity_type
str
specifies entity type
requireddepends_on
list[list[str]]
each item should specify an attribute that is depended on in the form of a path from the specified entity_type to individual attributes (even on linked entities).
requiredmay_change
list[list[str]]
each item should specify an attribute that hook
may change. specification format is identical to depends_on
.
Raises:
Type DescriptionValueError
On failure of specification validation.
Source code indp3/common/callback_registrar.py
def register_correlation_hook(\nself,\nhook: Callable[[str, dict], None],\nentity_type: str,\ndepends_on: list[list[str]],\nmay_change: list[list[str]],\n):\n\"\"\"\n Registers passed hook to be called during snapshot creation.\n Binds hook to specified entity_type (though same hook can be bound multiple times).\n `entity_type` and attribute specifications are validated, `ValueError` is raised on failure.\n Args:\n hook: `hook` callable should expect entity type as str\n and its current values, including linked entities, as dict\n entity_type: specifies entity type\n depends_on: each item should specify an attribute that is depended on\n in the form of a path from the specified entity_type to individual attributes\n (even on linked entities).\n may_change: each item should specify an attribute that `hook` may change.\n specification format is identical to `depends_on`.\n Raises:\n ValueError: On failure of specification validation.\n \"\"\"\nself._snap_shooter.register_correlation_hook(hook, entity_type, depends_on, may_change)\n
"},{"location":"reference/common/config/","title":"config","text":""},{"location":"reference/common/config/#dp3.common.config","title":"dp3.common.config","text":"Platform config file reader and config model.
"},{"location":"reference/common/config/#dp3.common.config.HierarchicalDict","title":"HierarchicalDict","text":" Bases: dict
Extension of built-in dict
that simplifies working with a nested hierarchy of dicts.
get(key, default = NoDefault)\n
Key may be a path (in dot notation) into a hierarchy of dicts. For example dictionary.get('abc.x.y')
is equivalent to dictionary['abc']['x']['y']
.
:returns: self[key]
or default
if key is not found.
dp3/common/config.py
def get(self, key, default=NoDefault):\n\"\"\"\n Key may be a path (in dot notation) into a hierarchy of dicts. For example\n `dictionary.get('abc.x.y')`\n is equivalent to\n `dictionary['abc']['x']['y']`.\n :returns: `self[key]` or `default` if key is not found.\n \"\"\"\nd = self\ntry:\nwhile \".\" in key:\nfirst_key, key = key.split(\".\", 1)\nd = d[first_key]\nreturn d[key]\nexcept (KeyError, TypeError):\npass # not found - continue below\nif default is NoDefault:\nraise MissingConfigError(\"Mandatory configuration element is missing: \" + key)\nelse:\nreturn default\n
"},{"location":"reference/common/config/#dp3.common.config.HierarchicalDict.update","title":"update","text":"update(other, **kwargs)\n
Update HierarchicalDict
with other dictionary and merge common keys.
If there is a key in both current and the other dictionary and values of both keys are dictionaries, they are merged together.
Example:
HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})\n->\nHierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})\n
Changes the dictionary directly, returns None
. Source code in dp3/common/config.py
def update(self, other, **kwargs):\n\"\"\"\n Update `HierarchicalDict` with other dictionary and merge common keys.\n If there is a key in both current and the other dictionary and values of\n both keys are dictionaries, they are merged together.\n Example:\n ```\n HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})\n ->\n HierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})\n ```\n Changes the dictionary directly, returns `None`.\n \"\"\"\nother = dict(other)\nfor key in other:\nif key in self:\nif isinstance(self[key], dict) and isinstance(other[key], dict):\n# The key is present in both dicts and both key values are dicts -> merge them\nHierarchicalDict.update(self[key], other[key])\nelse:\n# One of the key values is not a dict -> overwrite the value\n# in self by the one from other (like normal \"update\" does)\nself[key] = other[key]\nelse:\n# key is not present in self -> set it to value from other\nself[key] = other[key]\n
"},{"location":"reference/common/config/#dp3.common.config.EntitySpecDict","title":"EntitySpecDict","text":" Bases: BaseModel
Class representing full specification of an entity.
Attributes:
Name Type Descriptionentity
EntitySpec
Specification and settings of entity itself.
attribs
dict[str, AttrSpecType]
A mapping of attribute id -> AttrSpec
"},{"location":"reference/common/config/#dp3.common.config.ModelSpec","title":"ModelSpec","text":"ModelSpec(config: HierarchicalDict)\n
Bases: BaseModel
Class representing the platform's current entity and attribute specification.
Attributes:
Name Type Descriptionconfig
dict[str, EntitySpecDict]
Legacy config format, exactly mirrors the config files.
entities
dict[str, EntitySpec]
Mapping of entity id -> EntitySpec
attributes
dict[tuple[str, str], AttrSpecType]
Mapping of (entity id, attribute id) -> AttrSpec
entity_attributes
dict[str, dict[str, AttrSpecType]]
Mapping of entity id -> attribute id -> AttrSpec
relations
dict[tuple[str, str], AttrSpecType]
Mapping of (entity id, attribute id) -> AttrSpec only contains attributes which are relations.
Provided configuration must be a dict of following structure:
{\n <entity type>: {\n 'entity': {\n entity specification\n },\n 'attribs': {\n <attr id>: {\n attribute specification\n },\n other attributes\n }\n },\n other entity types\n}\n
Raises:
Type DescriptionValueError
if the specification is invalid.
Source code indp3/common/config.py
def __init__(self, config: HierarchicalDict):\n\"\"\"\n Provided configuration must be a dict of following structure:\n ```\n {\n <entity type>: {\n 'entity': {\n entity specification\n },\n 'attribs': {\n <attr id>: {\n attribute specification\n },\n other attributes\n }\n },\n other entity types\n }\n ```\n Raises:\n ValueError: if the specification is invalid.\n \"\"\"\nsuper().__init__(\nconfig=config, entities={}, attributes={}, entity_attributes={}, relations={}\n)\n
"},{"location":"reference/common/config/#dp3.common.config.PlatformConfig","title":"PlatformConfig","text":" Bases: BaseModel
An aggregation of configuration available to modules.
Attributes:
Name Type Descriptionapp_name
str
Name of the application, used when naming various structures of the platform
config_base_path
str
Path to directory containing platform config
config
HierarchicalDict
A dictionary that contains the platform config
model_spec
ModelSpec
Specification of the platform's model (entities and attributes)
num_processes
PositiveInt
Number of worker processes
process_index
NonNegativeInt
Index of current process
"},{"location":"reference/common/config/#dp3.common.config.read_config","title":"read_config","text":"read_config(filepath: str) -> HierarchicalDict\n
Read configuration file and return config as a dict-like object.
The configuration file should contain a valid YAML - Comments may be included as lines starting with #
(optionally preceded by whitespaces).
This function reads the file and converts it to a HierarchicalDict
. The only difference from built-in dict
is its get
method, which allows hierarchical keys (e.g. abc.x.y
). See doc of get method for more information.
dp3/common/config.py
def read_config(filepath: str) -> HierarchicalDict:\n\"\"\"\n Read configuration file and return config as a dict-like object.\n The configuration file should contain a valid YAML\n - Comments may be included as lines starting with `#` (optionally preceded\n by whitespaces).\n This function reads the file and converts it to a `HierarchicalDict`.\n The only difference from built-in `dict` is its `get` method, which allows\n hierarchical keys (e.g. `abc.x.y`).\n See [doc of get method][dp3.common.config.HierarchicalDict.get] for more information.\n \"\"\"\nwith open(filepath) as file_content:\nreturn HierarchicalDict(yaml.safe_load(file_content))\n
"},{"location":"reference/common/config/#dp3.common.config.read_config_dir","title":"read_config_dir","text":"read_config_dir(dir_path: str, recursive: bool = False) -> HierarchicalDict\n
Same as read_config, but it loads whole configuration directory of YAML files, so only files ending with \".yml\" are loaded. Each loaded configuration is located under key named after configuration filename.
Parameters:
Name Type Description Defaultdir_path
str
Path to read config from.
requiredrecursive
bool
If recursive
is set, then the configuration directory will be read recursively (including configuration files inside directories).
False
Source code in dp3/common/config.py
def read_config_dir(dir_path: str, recursive: bool = False) -> HierarchicalDict:\n\"\"\"\n Same as [read_config][dp3.common.config.read_config],\n but it loads whole configuration directory of YAML files,\n so only files ending with \".yml\" are loaded.\n Each loaded configuration is located under key named after configuration filename.\n Args:\n dir_path: Path to read config from.\n recursive: If `recursive` is set, then the configuration directory will be read\n recursively (including configuration files inside directories).\n \"\"\"\nall_files_paths = os.listdir(dir_path)\nconfig = HierarchicalDict()\nfor config_filename in all_files_paths:\nconfig_full_path = os.path.join(dir_path, config_filename)\nif os.path.isdir(config_full_path) and recursive:\nloaded_config = read_config_dir(config_full_path, recursive)\nelif os.path.isfile(config_full_path) and config_filename.endswith(\".yml\"):\ntry:\nloaded_config = read_config(config_full_path)\nexcept TypeError:\n# configuration file is empty\ncontinue\n# remove '.yml' suffix of filename\nconfig_filename = config_filename[:-4]\nelse:\ncontinue\n# place configuration files into another dictionary level named by config dictionary name\nconfig[config_filename] = loaded_config\nreturn config\n
"},{"location":"reference/common/control/","title":"control","text":""},{"location":"reference/common/control/#dp3.common.control","title":"dp3.common.control","text":"Module enabling remote control of the platform's internal events.
"},{"location":"reference/common/control/#dp3.common.control.Control","title":"Control","text":"Control(platform_config: PlatformConfig) -> None\n
Class enabling remote control of the platform's internal events.
Source code indp3/common/control.py
def __init__(\nself,\nplatform_config: PlatformConfig,\n) -> None:\nself.log = logging.getLogger(\"Control\")\nself.action_handlers: dict[ControlAction, Callable] = {}\nself.enabled = False\nif platform_config.process_index != 0:\nself.log.debug(\"Control will be disabled in this worker to avoid race conditions.\")\nreturn\nself.enabled = True\nself.config = ControlConfig.parse_obj(platform_config.config.get(\"control\"))\nself.allowed_actions = set(self.config.allowed_actions)\nself.log.debug(\"Allowed actions: %s\", self.allowed_actions)\nqueue = f\"{platform_config.app_name}-control\"\nself.control_queue = TaskQueueReader(\ncallback=self.process_control_task,\nparse_task=ControlMessage.parse_raw,\napp_name=platform_config.app_name,\nworker_index=platform_config.process_index,\nrabbit_config=platform_config.config.get(\"processing_core.msg_broker\", {}),\nqueue=queue,\npriority_queue=queue,\nparent_logger=self.log,\n)\n
"},{"location":"reference/common/control/#dp3.common.control.Control.start","title":"start","text":"start()\n
Connect to RabbitMQ and start consuming from TaskQueue.
Source code indp3/common/control.py
def start(self):\n\"\"\"Connect to RabbitMQ and start consuming from TaskQueue.\"\"\"\nif not self.enabled:\nreturn\nunconfigured_handlers = self.allowed_actions - set(self.action_handlers)\nif unconfigured_handlers:\nraise ValueError(\nf\"The following configured actions are missing handlers: {unconfigured_handlers}\"\n)\nself.log.info(\"Connecting to RabbitMQ\")\nself.control_queue.connect()\nself.control_queue.check() # check presence of needed queues\nself.control_queue.start()\nself.log.debug(\"Configured handlers: %s\", self.action_handlers)\n
"},{"location":"reference/common/control/#dp3.common.control.Control.stop","title":"stop","text":"stop()\n
Stop consuming from TaskQueue, disconnect from RabbitMQ.
Source code indp3/common/control.py
def stop(self):\n\"\"\"Stop consuming from TaskQueue, disconnect from RabbitMQ.\"\"\"\nif not self.enabled:\nreturn\nself.control_queue.stop()\nself.control_queue.disconnect()\n
"},{"location":"reference/common/control/#dp3.common.control.Control.set_action_handler","title":"set_action_handler","text":"set_action_handler(action: ControlAction, handler: Callable)\n
Sets the handler for the given action
Source code indp3/common/control.py
def set_action_handler(self, action: ControlAction, handler: Callable):\n\"\"\"Sets the handler for the given action\"\"\"\nself.log.debug(\"Setting handler for action %s: %s\", action, handler)\nself.action_handlers[action] = handler\n
"},{"location":"reference/common/control/#dp3.common.control.Control.process_control_task","title":"process_control_task","text":"process_control_task(msg_id, task: ControlMessage)\n
Acknowledges the received message and executes an action according to the task
.
This function should not be called directly, but set as callback for TaskQueueReader.
Source code indp3/common/control.py
def process_control_task(self, msg_id, task: ControlMessage):\n\"\"\"\n Acknowledges the received message and executes an action according to the `task`.\n This function should not be called directly, but set as callback for TaskQueueReader.\n \"\"\"\nself.control_queue.ack(msg_id)\nif task.action in self.allowed_actions:\nself.log.info(\"Executing action: %s\", task.action)\nself.action_handlers[task.action]()\nelse:\nself.log.error(\"Action not allowed: %s\", task.action)\n
"},{"location":"reference/common/datapoint/","title":"datapoint","text":""},{"location":"reference/common/datapoint/#dp3.common.datapoint","title":"dp3.common.datapoint","text":""},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointBase","title":"DataPointBase","text":" Bases: BaseModel
Data-point
Contains single raw data value received on API. This is just base class - plain, observation or timeseries datapoints inherit from this class (see below).
Provides front line of validation for this data value.
Internal usage: inside Task, created by TaskExecutor
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointPlainBase","title":"DataPointPlainBase","text":" Bases: DataPointBase
Plain attribute data-point
Contains single raw data value received on API for plain attribute.
In case of plain data-point, it's not really a data-point, but we use the same naming for simplicity.
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointObservationsBase","title":"DataPointObservationsBase","text":" Bases: DataPointBase
Observations attribute data-point
Contains single raw data value received on API for observations attribute.
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointTimeseriesBase","title":"DataPointTimeseriesBase","text":" Bases: DataPointBase
Timeseries attribute data-point
Contains single raw data value received on API for observations attribute.
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.is_list_ordered","title":"is_list_ordered","text":"is_list_ordered(to_check: list)\n
Checks if list is ordered (not decreasing anywhere)
Source code indp3/common/datapoint.py
def is_list_ordered(to_check: list):\n\"\"\"Checks if list is ordered (not decreasing anywhere)\"\"\"\nreturn all(to_check[i] <= to_check[i + 1] for i in range(len(to_check) - 1))\n
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.dp_ts_root_validator_irregular","title":"dp_ts_root_validator_irregular","text":"dp_ts_root_validator_irregular(cls, values)\n
Validates or sets t2 of irregular timeseries datapoint
Source code indp3/common/datapoint.py
@root_validator\ndef dp_ts_root_validator_irregular(cls, values):\n\"\"\"Validates or sets t2 of irregular timeseries datapoint\"\"\"\nif \"v\" in values:\nfirst_time = values[\"v\"].time[0]\nlast_time = values[\"v\"].time[-1]\n# Check t1 <= first_time\nif \"t1\" in values:\nassert (\nvalues[\"t1\"] <= first_time\n), f\"'t1' is above first item in 'time' series ({first_time})\"\n# Check last_time <= t2\nif \"t2\" in values and values[\"t2\"]:\nassert (\nvalues[\"t2\"] >= last_time\n), f\"'t2' is below last item in 'time' series ({last_time})\"\nelse:\nvalues[\"t2\"] = last_time\n# time must be ordered\nassert is_list_ordered(values[\"v\"].time), \"'time' series is not ordered\"\nreturn values\n
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.dp_ts_root_validator_irregular_intervals","title":"dp_ts_root_validator_irregular_intervals","text":"dp_ts_root_validator_irregular_intervals(cls, values)\n
Validates or sets t2 of irregular intervals timeseries datapoint
Source code indp3/common/datapoint.py
@root_validator\ndef dp_ts_root_validator_irregular_intervals(cls, values):\n\"\"\"Validates or sets t2 of irregular intervals timeseries datapoint\"\"\"\nif \"v\" in values:\nfirst_time = values[\"v\"].time_first[0]\nlast_time = values[\"v\"].time_last[-1]\n# Check t1 <= first_time\nif \"t1\" in values:\nassert (\nvalues[\"t1\"] <= first_time\n), f\"'t1' is above first item in 'time_first' series ({first_time})\"\n# Check last_time <= t2\nif \"t2\" in values and values[\"t2\"]:\nassert (\nvalues[\"t2\"] >= last_time\n), f\"'t2' is below last item in 'time_last' series ({last_time})\"\nelse:\nvalues[\"t2\"] = last_time\n# Check time_first[i] <= time_last[i]\nassert all(\nt[0] <= t[1] for t in zip(values[\"v\"].time_first, values[\"v\"].time_last)\n), \"'time_first[i] <= time_last[i]' isn't true for all 'i'\"\nreturn values\n
"},{"location":"reference/common/datatype/","title":"datatype","text":""},{"location":"reference/common/datatype/#dp3.common.datatype","title":"dp3.common.datatype","text":""},{"location":"reference/common/datatype/#dp3.common.datatype.DataType","title":"DataType","text":"DataType(**data)\n
Bases: BaseModel
Data type container
Represents one of primitive data types:
or composite data type:
Attributes:
Name Type Descriptiondata_type
str
type for incoming value validation
hashable
bool
whether contained data is hashable
is_link
bool
whether this data type is link
link_to
str
if is_link
is True, what is linked target
dp3/common/datatype.py
def __init__(self, **data):\nsuper().__init__(**data)\nstr_type = data[\"__root__\"]\nself._hashable = not (\n\"dict\" in str_type\nor \"set\" in str_type\nor \"array\" in str_type\nor \"special\" in str_type\nor \"json\" in str_type\nor \"link\" in str_type\n)\nself.determine_value_validator(str_type)\n
"},{"location":"reference/common/datatype/#dp3.common.datatype.DataType.determine_value_validator","title":"determine_value_validator","text":"determine_value_validator(str_type: str)\n
Determines value validator (inner data_type
)
This is not implemented inside @validator
, because it apparently doesn't work with __root__
models.
dp3/common/datatype.py
def determine_value_validator(self, str_type: str):\n\"\"\"Determines value validator (inner `data_type`)\n This is not implemented inside `@validator`, because it apparently doesn't work with\n `__root__` models.\n \"\"\"\ndata_type = None\nif type(str_type) is not str:\nraise TypeError(f\"Data type {str_type} is not string\")\nif str_type in primitive_data_types:\n# Primitive type\ndata_type = primitive_data_types[str_type]\nelif re.match(re_array, str_type):\n# Array\nelement_type = str_type.split(\"<\")[1].split(\">\")[0]\nif element_type not in primitive_data_types:\nraise TypeError(f\"Data type {element_type} is not supported as an array element\")\ndata_type = list[primitive_data_types[element_type]]\nelif re.match(re_set, str_type):\n# Set\nelement_type = str_type.split(\"<\")[1].split(\">\")[0]\nif element_type not in primitive_data_types:\nraise TypeError(f\"Data type {element_type} is not supported as an set element\")\ndata_type = list[primitive_data_types[element_type]] # set is not supported by MongoDB\nelif m := re.match(re_link, str_type):\n# Link\netype, data = m.group(\"etype\"), m.group(\"data\")\nself._link_to = etype\nself._is_link = True\nself._link_data = bool(data)\nif etype and data:\nvalue_type = DataType(__root__=data)\ndata_type = create_model(\nf\"Link<{data}>\", __base__=Link, data=(value_type._data_type, ...)\n)\nelse:\ndata_type = Link\nelif re.match(re_dict, str_type):\n# Dict\ndict_spec = {}\nkey_str = str_type.split(\"<\")[1].split(\">\")[0]\nkey_spec = dict(item.split(\":\") for item in key_str.split(\",\"))\n# For each dict key\nfor k, v in key_spec.items():\nif v not in primitive_data_types:\nraise TypeError(f\"Data type {v} of key {k} is not supported as a dict field\")\n# Optional subattribute\nk_optional = k[-1] == \"?\"\nif k_optional:\n# Remove question mark from key\nk = k[:-1]\n# Set (type, default value) for the key\ndict_spec[k] = (primitive_data_types[v], None if k_optional else ...)\n# Create model for this dict\ndata_type = create_model(f\"{str_type}__inner\", **dict_spec)\nelif m := re.match(re_category, str_type):\n# Category\ncategory_type, category_values = m.group(\"type\"), m.group(\"vals\")\ncategory_type = DataType(__root__=category_type)\ncategory_values = [\ncategory_type._data_type(value.strip()) for value in category_values.split(\",\")\n]\ndata_type = Enum(f\"Category<{category_type}>\", {val: val for val in category_values})\nelse:\nraise TypeError(f\"Data type '{str_type}' is not supported\")\n# Set data type\nself._data_type = data_type\n
"},{"location":"reference/common/datatype/#dp3.common.datatype.DataType.get_linked_entity","title":"get_linked_entity","text":"get_linked_entity() -> id\n
Returns linked entity id. Raises ValueError if DataType is not a link.
Source code indp3/common/datatype.py
def get_linked_entity(self) -> id:\n\"\"\"Returns linked entity id. Raises ValueError if DataType is not a link.\"\"\"\ntry:\nreturn self._link_to\nexcept AttributeError:\nraise ValueError(f\"DataType '{self}' is not a link.\") from None\n
"},{"location":"reference/common/datatype/#dp3.common.datatype.DataType.link_has_data","title":"link_has_data","text":"link_has_data() -> bool\n
Whether link has data. Raises ValueError if DataType is not a link.
Source code indp3/common/datatype.py
def link_has_data(self) -> bool:\n\"\"\"Whether link has data. Raises ValueError if DataType is not a link.\"\"\"\ntry:\nreturn self._link_data\nexcept AttributeError:\nraise ValueError(f\"DataType '{self}' is not a link.\") from None\n
"},{"location":"reference/common/entityspec/","title":"entityspec","text":""},{"location":"reference/common/entityspec/#dp3.common.entityspec","title":"dp3.common.entityspec","text":""},{"location":"reference/common/entityspec/#dp3.common.entityspec.EntitySpec","title":"EntitySpec","text":"EntitySpec(id: str, spec: dict[str, Union[str, bool]])\n
Bases: BaseModel
Entity specification
This class represents specification of an entity type (e.g. ip, asn, ...)
Source code indp3/common/entityspec.py
def __init__(self, id: str, spec: dict[str, Union[str, bool]]):\nsuper().__init__(id=id, name=spec.get(\"name\"), snapshot=spec.get(\"snapshot\"))\n
"},{"location":"reference/common/scheduler/","title":"scheduler","text":""},{"location":"reference/common/scheduler/#dp3.common.scheduler","title":"dp3.common.scheduler","text":"Allows modules to register functions (callables) to be run at specified times or intervals (like cron does).
Based on APScheduler package
"},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler","title":"Scheduler","text":"Scheduler() -> None\n
Allows modules to register functions (callables) to be run at specified times or intervals (like cron does).
Source code indp3/common/scheduler.py
def __init__(self) -> None:\nself.log = logging.getLogger(\"Scheduler\")\n# self.log.setLevel(\"DEBUG\")\nlogging.getLogger(\"apscheduler.scheduler\").setLevel(\"WARNING\")\nlogging.getLogger(\"apscheduler.executors.default\").setLevel(\"WARNING\")\nself.sched = BackgroundScheduler(timezone=\"UTC\")\nself.last_job_id = 0\n
"},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler.register","title":"register","text":"register(func: Callable, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int\n
Register a function to be run at specified times.
Pass cron-like specification of when the function should be called, see docs of apscheduler.triggers.cron for details.
Parameters:
Name Type Description Defaultfunc
Callable
function or method to be called
requiredfunc_args
Union[list, tuple]
list of positional arguments to call func with
None
func_kwargs
dict
dict of keyword arguments to call func with
None
year
Union[int, str]
4-digit year
None
month
Union[int, str]
month (1-12)
None
day
Union[int, str]
day of month (1-31)
None
week
Union[int, str]
ISO week (1-53)
None
day_of_week
Union[int, str]
number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)
None
hour
Union[int, str]
hour (0-23)
None
minute
Union[int, str]
minute (0-59)
None
second
Union[int, str]
second (0-59)
None
timezone
str
Timezone for time specification (default is UTC).
'UTC'
Returns:
Type Descriptionint
job ID
Source code indp3/common/scheduler.py
def register(\nself,\nfunc: Callable,\nfunc_args: Union[list, tuple] = None,\nfunc_kwargs: dict = None,\nyear: Union[int, str] = None,\nmonth: Union[int, str] = None,\nday: Union[int, str] = None,\nweek: Union[int, str] = None,\nday_of_week: Union[int, str] = None,\nhour: Union[int, str] = None,\nminute: Union[int, str] = None,\nsecond: Union[int, str] = None,\ntimezone: str = \"UTC\",\n) -> int:\n\"\"\"\n Register a function to be run at specified times.\n Pass cron-like specification of when the function should be called,\n see [docs](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html)\n of apscheduler.triggers.cron for details.\n Args:\n func: function or method to be called\n func_args: list of positional arguments to call func with\n func_kwargs: dict of keyword arguments to call func with\n year: 4-digit year\n month: month (1-12)\n day: day of month (1-31)\n week: ISO week (1-53)\n day_of_week: number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)\n hour: hour (0-23)\n minute: minute (0-59)\n second: second (0-59)\n timezone: Timezone for time specification (default is UTC).\n Returns:\n job ID\n \"\"\"\nself.last_job_id += 1\ntrigger = CronTrigger(\nyear, month, day, week, day_of_week, hour, minute, second, timezone=timezone\n)\nself.sched.add_job(\nfunc,\ntrigger,\nfunc_args,\nfunc_kwargs,\ncoalesce=True,\nmax_instances=1,\nid=str(self.last_job_id),\n)\nself.log.debug(f\"Registered function {func.__qualname__} to be called at {trigger}\")\nreturn self.last_job_id\n
"},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler.pause_job","title":"pause_job","text":"pause_job(id)\n
Pause job with given ID
Source code indp3/common/scheduler.py
def pause_job(self, id):\n\"\"\"Pause job with given ID\"\"\"\nself.sched.pause_job(str(id))\n
"},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler.resume_job","title":"resume_job","text":"resume_job(id)\n
Resume previously paused job with given ID
Source code indp3/common/scheduler.py
def resume_job(self, id):\n\"\"\"Resume previously paused job with given ID\"\"\"\nself.sched.resume_job(str(id))\n
"},{"location":"reference/common/task/","title":"task","text":""},{"location":"reference/common/task/#dp3.common.task","title":"dp3.common.task","text":""},{"location":"reference/common/task/#dp3.common.task.Task","title":"Task","text":" Bases: BaseModel
, ABC
A generic task type class.
An abstraction for the task queue classes to depend upon.
"},{"location":"reference/common/task/#dp3.common.task.Task.routing_key","title":"routing_keyabstractmethod
","text":"routing_key() -> str\n
Returns:
Type Descriptionstr
A string to be used as a routing key between workers.
Source code indp3/common/task.py
@abstractmethod\ndef routing_key(self) -> str:\n\"\"\"\n Returns:\n A string to be used as a routing key between workers.\n \"\"\"\n
"},{"location":"reference/common/task/#dp3.common.task.Task.as_message","title":"as_message abstractmethod
","text":"as_message() -> str\n
Returns:
Type Descriptionstr
A string representation of the object.
Source code indp3/common/task.py
@abstractmethod\ndef as_message(self) -> str:\n\"\"\"\n Returns:\n A string representation of the object.\n \"\"\"\n
"},{"location":"reference/common/task/#dp3.common.task.DataPointTask","title":"DataPointTask","text":" Bases: Task
DataPointTask
Contains single task to be pushed to TaskQueue and processed.
Attributes:
Name Type Descriptionetype
str
Entity type
eid
str
Entity id / key
data_points
list[DataPointBase]
List of DataPoints to process
tags
list[Any]
List of tags
ttl_token
Optional[datetime]
...
"},{"location":"reference/common/task/#dp3.common.task.Snapshot","title":"Snapshot","text":" Bases: Task
Snapshot
Contains a list of entities, the meaning of which depends on the type
. If type
is \"task\", then the list contains linked entities for which a snapshot should be created. Otherwise type
is \"linked_entities\", indicating which entities must be skipped in a parallelized creation of unlinked entities.
Attributes:
Name Type Descriptionentities
list[tuple[str, str]]
List of (entity_type, entity_id)
time
datetime
timestamp for snapshot creation
"},{"location":"reference/common/utils/","title":"utils","text":""},{"location":"reference/common/utils/#dp3.common.utils","title":"dp3.common.utils","text":"auxiliary/utility functions and classes
"},{"location":"reference/common/utils/#dp3.common.utils.parse_rfc_time","title":"parse_rfc_time","text":"parse_rfc_time(time_str)\n
Parse time in RFC 3339 format and return it as naive datetime in UTC.
Timezone specification is optional (UTC is assumed when none is specified).
Source code indp3/common/utils.py
def parse_rfc_time(time_str):\n\"\"\"\n Parse time in RFC 3339 format and return it as naive datetime in UTC.\n Timezone specification is optional (UTC is assumed when none is specified).\n \"\"\"\nres = timestamp_re.match(time_str)\nif res is not None:\nyear, month, day, hour, minute, second = (int(n or 0) for n in res.group(*range(1, 7)))\nus_str = (res.group(7) or \"0\")[:6].ljust(6, \"0\")\nus = int(us_str)\nzonestr = res.group(8)\nzoneoffset = 0 if zonestr in (None, \"z\", \"Z\") else int(zonestr[:3]) * 60 + int(zonestr[4:6])\nzonediff = datetime.timedelta(minutes=zoneoffset)\nreturn datetime.datetime(year, month, day, hour, minute, second, us) - zonediff\nelse:\nraise ValueError(\"Wrong timestamp format\")\n
"},{"location":"reference/common/utils/#dp3.common.utils.parse_time_duration","title":"parse_time_duration","text":"parse_time_duration(duration_string: Union[str, int, datetime.timedelta]) -> datetime.timedelta\n
Parse duration in format (or just \"0\").
Return datetime.timedelta
Source code indp3/common/utils.py
def parse_time_duration(duration_string: Union[str, int, datetime.timedelta]) -> datetime.timedelta:\n\"\"\"\n Parse duration in format <num><s/m/h/d> (or just \"0\").\n Return datetime.timedelta\n \"\"\"\n# if it's already timedelta, just return it unchanged\nif isinstance(duration_string, datetime.timedelta):\nreturn duration_string\n# if number is passed, consider it number of seconds\nif isinstance(duration_string, (int, float)):\nreturn datetime.timedelta(seconds=duration_string)\nd = 0\nh = 0\nm = 0\ns = 0\nif duration_string == \"0\":\npass\nelif duration_string[-1] == \"d\":\nd = int(duration_string[:-1])\nelif duration_string[-1] == \"h\":\nh = int(duration_string[:-1])\nelif duration_string[-1] == \"m\":\nm = int(duration_string[:-1])\nelif duration_string[-1] == \"s\":\ns = int(duration_string[:-1])\nelse:\nraise ValueError(\"Invalid time duration string\")\nreturn datetime.timedelta(days=d, hours=h, minutes=m, seconds=s)\n
"},{"location":"reference/common/utils/#dp3.common.utils.conv_to_json","title":"conv_to_json","text":"conv_to_json(obj)\n
Convert special types to JSON (use as \"default\" param of json.dumps)
Supported types/objects: - datetime - timedelta
Source code indp3/common/utils.py
def conv_to_json(obj):\n\"\"\"Convert special types to JSON (use as \"default\" param of json.dumps)\n Supported types/objects:\n - datetime\n - timedelta\n \"\"\"\nif isinstance(obj, datetime.datetime):\nif obj.tzinfo:\nraise NotImplementedError(\n\"Can't serialize timezone-aware datetime object \"\n\"(DP3 policy is to use naive datetimes in UTC everywhere)\"\n)\nreturn {\"$datetime\": obj.strftime(\"%Y-%m-%dT%H:%M:%S.%f\")}\nif isinstance(obj, datetime.timedelta):\nreturn {\"$timedelta\": f\"{obj.days},{obj.seconds},{obj.microseconds}\"}\nraise TypeError(\"%r is not JSON serializable\" % obj)\n
"},{"location":"reference/common/utils/#dp3.common.utils.conv_from_json","title":"conv_from_json","text":"conv_from_json(dct)\n
Convert special JSON keys created by conv_to_json back to Python objects (use as \"object_hook\" param of json.loads)
Supported types/objects: - datetime - timedelta
Source code indp3/common/utils.py
def conv_from_json(dct):\n\"\"\"Convert special JSON keys created by conv_to_json back to Python objects\n (use as \"object_hook\" param of json.loads)\n Supported types/objects:\n - datetime\n - timedelta\n \"\"\"\nif \"$datetime\" in dct:\nval = dct[\"$datetime\"]\nreturn datetime.datetime.strptime(val, \"%Y-%m-%dT%H:%M:%S.%f\")\nif \"$timedelta\" in dct:\ndays, seconds, microseconds = dct[\"$timedelta\"].split(\",\")\nreturn datetime.timedelta(int(days), int(seconds), int(microseconds))\nreturn dct\n
"},{"location":"reference/common/utils/#dp3.common.utils.get_func_name","title":"get_func_name","text":"get_func_name(func_or_method)\n
Get name of function or method as pretty string.
Source code indp3/common/utils.py
def get_func_name(func_or_method):\n\"\"\"Get name of function or method as pretty string.\"\"\"\ntry:\nfname = func_or_method.__func__.__qualname__\nexcept AttributeError:\nfname = func_or_method.__name__\nreturn func_or_method.__module__ + \".\" + fname\n
"},{"location":"reference/database/","title":"database","text":""},{"location":"reference/database/#dp3.database","title":"dp3.database","text":"A wrapper responsible for communication with the database server.
"},{"location":"reference/database/database/","title":"database","text":""},{"location":"reference/database/database/#dp3.database.database","title":"dp3.database.database","text":""},{"location":"reference/database/database/#dp3.database.database.MongoHostConfig","title":"MongoHostConfig","text":" Bases: BaseModel
MongoDB host.
"},{"location":"reference/database/database/#dp3.database.database.MongoStandaloneConfig","title":"MongoStandaloneConfig","text":" Bases: BaseModel
MongoDB standalone configuration.
"},{"location":"reference/database/database/#dp3.database.database.MongoReplicaConfig","title":"MongoReplicaConfig","text":" Bases: BaseModel
MongoDB replica set configuration.
"},{"location":"reference/database/database/#dp3.database.database.MongoConfig","title":"MongoConfig","text":" Bases: BaseModel
Database configuration.
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase","title":"EntityDatabase","text":"EntityDatabase(db_conf: HierarchicalDict, model_spec: ModelSpec) -> None\n
MongoDB database wrapper responsible for whole communication with database server. Initializes database schema based on database configuration.
db_conf - configuration of database connection (content of database.yml) model_spec - ModelSpec object, configuration of data model (entities and attributes)
Source code indp3/database/database.py
def __init__(\nself,\ndb_conf: HierarchicalDict,\nmodel_spec: ModelSpec,\n) -> None:\nself.log = logging.getLogger(\"EntityDatabase\")\nconfig = MongoConfig.parse_obj(db_conf)\nself.log.info(\"Connecting to database...\")\nfor attempt, delay in enumerate(RECONNECT_DELAYS):\ntry:\nself._db = self.connect(config)\n# Check if connected\nself._db.admin.command(\"ping\")\nexcept pymongo.errors.ConnectionFailure as e:\nif attempt + 1 == len(RECONNECT_DELAYS):\nraise DatabaseError(\n\"Cannot connect to database with specified connection arguments.\"\n) from e\nelse:\nself.log.error(\n\"Cannot connect to database (attempt %d, retrying in %ds).\",\nattempt + 1,\ndelay,\n)\ntime.sleep(delay)\nself._db_schema_config = model_spec\n# Init and switch to correct database\nself._db = self._db[config.db_name]\nself._init_database_schema(config.db_name)\nself.log.info(\"Database successfully initialized!\")\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.insert_datapoints","title":"insert_datapoints","text":"insert_datapoints(etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False) -> None\n
Inserts datapoint to raw data collection and updates master record.
Raises DatabaseError when insert or update fails.
Source code indp3/database/database.py
def insert_datapoints(\nself, etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False\n) -> None:\n\"\"\"Inserts datapoint to raw data collection and updates master record.\n Raises DatabaseError when insert or update fails.\n \"\"\"\nif len(dps) == 0:\nreturn\netype = dps[0].etype\n# Check `etype`\nself._assert_etype_exists(etype)\n# Insert raw datapoints\nraw_col = self._raw_col_name(etype)\ndps_dicts = [dp.dict(exclude={\"attr_type\"}) for dp in dps]\ntry:\nself._db[raw_col].insert_many(dps_dicts)\nself.log.debug(f\"Inserted datapoints to raw collection:\\n{dps}\")\nexcept Exception as e:\nraise DatabaseError(f\"Insert of datapoints failed: {e}\\n{dps}\") from e\n# Update master document\nmaster_changes = {\"$push\": {}, \"$set\": {}}\nfor dp in dps:\nattr_spec = self._db_schema_config.attr(etype, dp.attr)\nv = dp.v.dict() if isinstance(dp.v, BaseModel) else dp.v\n# Rewrite value of plain attribute\nif attr_spec.t == AttrType.PLAIN:\nmaster_changes[\"$set\"][dp.attr] = {\"v\": v, \"ts_last_update\": datetime.now()}\n# Push new data of observation\nif attr_spec.t == AttrType.OBSERVATIONS:\nif dp.attr in master_changes[\"$push\"]:\n# Support multiple datapoints being pushed in the same request\nif \"$each\" not in master_changes[\"$push\"][dp.attr]:\nsaved_dp = master_changes[\"$push\"][dp.attr]\nmaster_changes[\"$push\"][dp.attr] = {\"$each\": [saved_dp]}\nmaster_changes[\"$push\"][dp.attr][\"$each\"].append(\n{\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v, \"c\": dp.c}\n)\nelse:\n# Otherwise just push one datapoint\nmaster_changes[\"$push\"][dp.attr] = {\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v, \"c\": dp.c}\n# Push new data of timeseries\nif attr_spec.t == AttrType.TIMESERIES:\nif dp.attr in master_changes[\"$push\"]:\n# Support multiple datapoints being pushed in the same request\nif \"$each\" not in master_changes[\"$push\"][dp.attr]:\nsaved_dp = master_changes[\"$push\"][dp.attr]\nmaster_changes[\"$push\"][dp.attr] = {\"$each\": [saved_dp]}\nmaster_changes[\"$push\"][dp.attr][\"$each\"].append(\n{\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v}\n)\nelse:\n# Otherwise just push one datapoint\nmaster_changes[\"$push\"][dp.attr] = {\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v}\nif new_entity:\nmaster_changes[\"$set\"][\"#hash\"] = HASH(f\"{etype}:{eid}\")\nmaster_col = self._master_col_name(etype)\ntry:\nself._db[master_col].update_one({\"_id\": eid}, master_changes, upsert=True)\nself.log.debug(f\"Updated master record of {etype} {eid}: {master_changes}\")\nexcept Exception as e:\nraise DatabaseError(f\"Update of master record failed: {e}\\n{dps}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.update_master_records","title":"update_master_records","text":"update_master_records(etype: str, eids: list[str], records: list[dict]) -> None\n
Replace master record of etype
:eid
with the provided record
.
Raises DatabaseError when update fails.
Source code indp3/database/database.py
def update_master_records(self, etype: str, eids: list[str], records: list[dict]) -> None:\n\"\"\"Replace master record of `etype`:`eid` with the provided `record`.\n Raises DatabaseError when update fails.\n \"\"\"\nmaster_col = self._master_col_name(etype)\ntry:\nself._db[master_col].bulk_write(\n[\nReplaceOne({\"_id\": eid}, record, upsert=True)\nfor eid, record in zip(eids, records)\n]\n)\nself.log.debug(\"Updated master records of %s: %s.\", eids, eids)\nexcept Exception as e:\nraise DatabaseError(f\"Update of master records failed: {e}\\n{records}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.delete_old_dps","title":"delete_old_dps","text":"delete_old_dps(etype: str, attr_name: str, t_old: datetime) -> None\n
Delete old datapoints from master collection.
Periodically called for all etype
s from HistoryManager.
dp3/database/database.py
def delete_old_dps(self, etype: str, attr_name: str, t_old: datetime) -> None:\n\"\"\"Delete old datapoints from master collection.\n Periodically called for all `etype`s from HistoryManager.\n \"\"\"\nmaster_col = self._master_col_name(etype)\ntry:\nself._db[master_col].update_many({}, {\"$pull\": {attr_name: {\"t2\": {\"$lt\": t_old}}}})\nexcept Exception as e:\nraise DatabaseError(f\"Delete of old datapoints failed: {e}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_master_record","title":"get_master_record","text":"get_master_record(etype: str, eid: str, **kwargs: str) -> dict\n
Get current master record for etype/eid.
If doesn't exist, returns {}.
Source code indp3/database/database.py
def get_master_record(self, etype: str, eid: str, **kwargs) -> dict:\n\"\"\"Get current master record for etype/eid.\n If doesn't exist, returns {}.\n \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].find_one({\"_id\": eid}, **kwargs) or {}\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.ekey_exists","title":"ekey_exists","text":"ekey_exists(etype: str, eid: str) -> bool\n
Checks whether master record for etype/eid exists
Source code indp3/database/database.py
def ekey_exists(self, etype: str, eid: str) -> bool:\n\"\"\"Checks whether master record for etype/eid exists\"\"\"\nreturn bool(self.get_master_record(etype, eid))\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_master_records","title":"get_master_records","text":"get_master_records(etype: str, **kwargs: str) -> pymongo.cursor.Cursor\n
Get cursor to current master records of etype.
Source code indp3/database/database.py
def get_master_records(self, etype: str, **kwargs) -> pymongo.cursor.Cursor:\n\"\"\"Get cursor to current master records of etype.\"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].find({}, **kwargs)\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_worker_master_records","title":"get_worker_master_records","text":"get_worker_master_records(worker_index: int, worker_cnt: int, etype: str, **kwargs: str) -> pymongo.cursor.Cursor\n
Get cursor to current master records of etype.
Source code indp3/database/database.py
def get_worker_master_records(\nself, worker_index: int, worker_cnt: int, etype: str, **kwargs\n) -> pymongo.cursor.Cursor:\n\"\"\"Get cursor to current master records of etype.\"\"\"\nif etype not in self._db_schema_config.entities:\nraise DatabaseError(f\"Entity '{etype}' does not exist\")\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].find({\"#hash\": {\"$mod\": [worker_cnt, worker_index]}}, **kwargs)\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_latest_snapshot","title":"get_latest_snapshot","text":"get_latest_snapshot(etype: str, eid: str) -> dict\n
Get latest snapshot of given etype/eid.
If doesn't exist, returns {}.
Source code indp3/database/database.py
def get_latest_snapshot(self, etype: str, eid: str) -> dict:\n\"\"\"Get latest snapshot of given etype/eid.\n If doesn't exist, returns {}.\n \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot_col = self._snapshots_col_name(etype)\nreturn self._db[snapshot_col].find_one({\"eid\": eid}, sort=[(\"_id\", -1)]) or {}\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_latest_snapshots","title":"get_latest_snapshots","text":"get_latest_snapshots(etype: str) -> pymongo.cursor.Cursor\n
Get latest snapshots of given etype
.
This method is useful for displaying data on web.
Source code indp3/database/database.py
def get_latest_snapshots(self, etype: str) -> pymongo.cursor.Cursor:\n\"\"\"Get latest snapshots of given `etype`.\n This method is useful for displaying data on web.\n \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot_col = self._snapshots_col_name(etype)\nlatest_snapshot = self._db[snapshot_col].find_one({}, sort=[(\"_id\", -1)])\nif latest_snapshot is None:\nreturn self._db[snapshot_col].find()\nlatest_snapshot_date = latest_snapshot[\"_time_created\"]\nreturn self._db[snapshot_col].find({\"_time_created\": latest_snapshot_date})\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_snapshots","title":"get_snapshots","text":"get_snapshots(etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> pymongo.cursor.Cursor\n
Get all (or filtered) snapshots of given eid
.
This method is useful for displaying eid
's history on web.
Parameters:
Name Type Description Defaultetype
str
entity type
requiredeid
str
id of entity, to which data-points correspond
requiredt1
Optional[datetime]
left value of time interval (inclusive)
None
t2
Optional[datetime]
right value of time interval (inclusive)
None
Source code in dp3/database/database.py
def get_snapshots(\nself, etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None\n) -> pymongo.cursor.Cursor:\n\"\"\"Get all (or filtered) snapshots of given `eid`.\n This method is useful for displaying `eid`'s history on web.\n Args:\n etype: entity type\n eid: id of entity, to which data-points correspond\n t1: left value of time interval (inclusive)\n t2: right value of time interval (inclusive)\n \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot_col = self._snapshots_col_name(etype)\nquery = {\"eid\": eid, \"_time_created\": {}}\n# Filter by date\nif t1:\nquery[\"_time_created\"][\"$gte\"] = t1\nif t2:\nquery[\"_time_created\"][\"$lte\"] = t2\n# Unset if empty\nif not query[\"_time_created\"]:\ndel query[\"_time_created\"]\nreturn self._db[snapshot_col].find(query).sort([(\"_time_created\", pymongo.ASCENDING)])\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_value_or_history","title":"get_value_or_history","text":"get_value_or_history(etype: str, attr_name: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> dict\n
Gets current value and/or history of attribute for given eid
.
Depends on attribute type: - plain: just (current) value - observations: (current) value and history stored in master record (optionally filtered) - timeseries: just history stored in master record (optionally filtered)
Returns dict with two keys: current_value
and history
(list of values).
dp3/database/database.py
def get_value_or_history(\nself,\netype: str,\nattr_name: str,\neid: str,\nt1: Optional[datetime] = None,\nt2: Optional[datetime] = None,\n) -> dict:\n\"\"\"Gets current value and/or history of attribute for given `eid`.\n Depends on attribute type:\n - plain: just (current) value\n - observations: (current) value and history stored in master record (optionally filtered)\n - timeseries: just history stored in master record (optionally filtered)\n Returns dict with two keys: `current_value` and `history` (list of values).\n \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nattr_spec = self._db_schema_config.attr(etype, attr_name)\nresult = {\"current_value\": None, \"history\": []}\n# Add current value to the result\nif attr_spec.t == AttrType.PLAIN:\nresult[\"current_value\"] = (\nself.get_master_record(etype, eid).get(attr_name, {}).get(\"v\", None)\n)\nelif attr_spec.t == AttrType.OBSERVATIONS:\nresult[\"current_value\"] = self.get_latest_snapshot(etype, eid).get(attr_name, None)\n# Add history\nif attr_spec.t == AttrType.OBSERVATIONS:\nresult[\"history\"] = self.get_observation_history(etype, attr_name, eid, t1, t2)\nelif attr_spec.t == AttrType.TIMESERIES:\nresult[\"history\"] = self.get_timeseries_history(etype, attr_name, eid, t1, t2)\nreturn result\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.estimate_count_eids","title":"estimate_count_eids","text":"estimate_count_eids(etype: str) -> int\n
Estimates count of eid
s in given etype
dp3/database/database.py
def estimate_count_eids(self, etype: str) -> int:\n\"\"\"Estimates count of `eid`s in given `etype`\"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].estimated_document_count({})\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.save_snapshot","title":"save_snapshot","text":"save_snapshot(etype: str, snapshot: dict, time: datetime)\n
Saves snapshot to specified entity of current master document.
Source code indp3/database/database.py
def save_snapshot(self, etype: str, snapshot: dict, time: datetime):\n\"\"\"Saves snapshot to specified entity of current master document.\"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot[\"_time_created\"] = time\nsnapshot_col = self._snapshots_col_name(etype)\ntry:\nself._db[snapshot_col].insert_one(snapshot)\nself.log.debug(f\"Inserted snapshot: {snapshot}\")\nexcept Exception as e:\nraise DatabaseError(f\"Insert of snapshot failed: {e}\\n{snapshot}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.save_snapshots","title":"save_snapshots","text":"save_snapshots(etype: str, snapshots: list[dict], time: datetime)\n
Saves a list of snapshots of current master documents.
All snapshots must belong to same entity type.
Source code indp3/database/database.py
def save_snapshots(self, etype: str, snapshots: list[dict], time: datetime):\n\"\"\"\n Saves a list of snapshots of current master documents.\n All snapshots must belong to same entity type.\n \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nfor snapshot in snapshots:\nsnapshot[\"_time_created\"] = time\nsnapshot_col = self._snapshots_col_name(etype)\ntry:\nself._db[snapshot_col].insert_many(snapshots)\nself.log.debug(f\"Inserted snapshots: {snapshots}\")\nexcept Exception as e:\nraise DatabaseError(f\"Insert of snapshots failed: {e}\\n{snapshots}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.save_metadata","title":"save_metadata","text":"save_metadata(time: datetime, metadata: dict)\n
Saves snapshot to specified entity of current master document.
Source code indp3/database/database.py
def save_metadata(self, time: datetime, metadata: dict):\n\"\"\"Saves snapshot to specified entity of current master document.\"\"\"\nmodule = get_caller_id()\nmetadata[\"_id\"] = module + time.strftime(\"%Y-%m-%dT%H:%M:%S.%fZ\")[:-4]\nmetadata[\"#module\"] = module\nmetadata[\"#time_created\"] = time\nmetadata[\"#last_update\"] = datetime.now()\ntry:\nself._db[\"#metadata\"].insert_one(metadata)\nself.log.debug(\"Inserted metadata %s: %s\", metadata[\"_id\"], metadata)\nexcept Exception as e:\nraise DatabaseError(f\"Insert of metadata failed: {e}\\n{metadata}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_observation_history","title":"get_observation_history","text":"get_observation_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]\n
Get full (or filtered) history of observation attribute.
This method is useful for displaying eid
's history on web. Also used to feed data into get_timeseries_history()
.
Parameters:
Name Type Description Defaultetype
str
entity type
requiredattr_name
str
name of attribute
requiredeid
str
id of entity, to which data-points correspond
requiredt1
datetime
left value of time interval (inclusive)
None
t2
datetime
right value of time interval (inclusive)
None
sort
int
sort by timestamps - 0: ascending order by t1, 1: descending order by t2, None: don't sort
None
Returns:
Type Descriptionlist[dict]
list of dicts (reduced datapoints)
Source code indp3/database/database.py
def get_observation_history(\nself,\netype: str,\nattr_name: str,\neid: str,\nt1: datetime = None,\nt2: datetime = None,\nsort: int = None,\n) -> list[dict]:\n\"\"\"Get full (or filtered) history of observation attribute.\n This method is useful for displaying `eid`'s history on web.\n Also used to feed data into `get_timeseries_history()`.\n Args:\n etype: entity type\n attr_name: name of attribute\n eid: id of entity, to which data-points correspond\n t1: left value of time interval (inclusive)\n t2: right value of time interval (inclusive)\n sort: sort by timestamps - 0: ascending order by t1, 1: descending order by t2,\n None: don't sort\n Returns:\n list of dicts (reduced datapoints)\n \"\"\"\nt1 = datetime.fromtimestamp(0) if t1 is None else t1\nt2 = datetime.now() if t2 is None else t2\n# Get attribute history\nmr = self.get_master_record(etype, eid)\nattr_history = mr.get(attr_name, [])\n# Filter\nattr_history_filtered = [row for row in attr_history if row[\"t1\"] <= t2 and row[\"t2\"] >= t1]\n# Sort\nif sort == 1:\nattr_history_filtered.sort(key=lambda row: row[\"t1\"])\nelif sort == 2:\nattr_history_filtered.sort(key=lambda row: row[\"t2\"], reverse=True)\nreturn attr_history_filtered\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_timeseries_history","title":"get_timeseries_history","text":"get_timeseries_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]\n
Get full (or filtered) history of timeseries attribute. Outputs them in format:
[\n {\n \"t1\": ...,\n \"t2\": ...,\n \"v\": {\n \"series1\": ...,\n \"series2\": ...\n }\n },\n ...\n ]\n
This method is useful for displaying eid
's history on web. Parameters:
Name Type Description Defaultetype
str
entity type
requiredattr_name
str
name of attribute
requiredeid
str
id of entity, to which data-points correspond
requiredt1
datetime
left value of time interval (inclusive)
None
t2
datetime
right value of time interval (inclusive)
None
sort
int
sort by timestamps - 0
: ascending order by t1
, 1
: descending order by t2
, None
: don't sort
None
Returns:
Type Descriptionlist[dict]
list of dicts (reduced datapoints) - each represents just one point at time
Source code indp3/database/database.py
def get_timeseries_history(\nself,\netype: str,\nattr_name: str,\neid: str,\nt1: datetime = None,\nt2: datetime = None,\nsort: int = None,\n) -> list[dict]:\n\"\"\"Get full (or filtered) history of timeseries attribute.\n Outputs them in format:\n ```\n [\n {\n \"t1\": ...,\n \"t2\": ...,\n \"v\": {\n \"series1\": ...,\n \"series2\": ...\n }\n },\n ...\n ]\n ```\n This method is useful for displaying `eid`'s history on web.\n Args:\n etype: entity type\n attr_name: name of attribute\n eid: id of entity, to which data-points correspond\n t1: left value of time interval (inclusive)\n t2: right value of time interval (inclusive)\n sort: sort by timestamps - `0`: ascending order by `t1`, `1`: descending order by `t2`,\n `None`: don't sort\n Returns:\n list of dicts (reduced datapoints) - each represents just one point at time\n \"\"\"\nt1 = datetime.fromtimestamp(0) if t1 is None else t1\nt2 = datetime.now() if t2 is None else t2\nattr_history = self.get_observation_history(etype, attr_name, eid, t1, t2, sort)\nif not attr_history:\nreturn []\nattr_history_split = self._split_timeseries_dps(etype, attr_name, attr_history)\n# Filter out rows outside [t1, t2] interval\nattr_history_filtered = [\nrow for row in attr_history_split if row[\"t1\"] <= t2 and row[\"t2\"] >= t1\n]\nreturn attr_history_filtered\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.delete_old_snapshots","title":"delete_old_snapshots","text":"delete_old_snapshots(etype: str, t_old: datetime)\n
Delete old snapshots.
Periodically called for all etype
s from HistoryManager.
dp3/database/database.py
def delete_old_snapshots(self, etype: str, t_old: datetime):\n\"\"\"Delete old snapshots.\n Periodically called for all `etype`s from HistoryManager.\n \"\"\"\nsnapshot_col_name = self._snapshots_col_name(etype)\ntry:\nreturn self._db[snapshot_col_name].delete_many({\"_time_created\": {\"$lt\": t_old}})\nexcept Exception as e:\nraise DatabaseError(f\"Delete of olds snapshots failed: {e}\") from e\n
"},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_module_cache","title":"get_module_cache","text":"get_module_cache()\n
Return a persistent cache collection for given module name.
Source code indp3/database/database.py
def get_module_cache(self):\n\"\"\"Return a persistent cache collection for given module name.\"\"\"\nmodule = get_caller_id()\nself.log.debug(\"Module %s is accessing its cache collection\", module)\nreturn self._db[f\"#cache#{module}\"]\n
"},{"location":"reference/database/database/#dp3.database.database.get_caller_id","title":"get_caller_id","text":"get_caller_id()\n
Returns the name of the caller method's class, or function name if caller is not a method.
Source code indp3/database/database.py
def get_caller_id():\n\"\"\"Returns the name of the caller method's class, or function name if caller is not a method.\"\"\"\ncaller = inspect.stack()[2]\nif module := caller.frame.f_locals.get(\"self\"):\nreturn module.__class__.__qualname__\nreturn caller.function\n
"},{"location":"reference/history_management/","title":"history_management","text":""},{"location":"reference/history_management/#dp3.history_management","title":"dp3.history_management","text":"Module responsible for managing history saved in database, currently to clean old data.
"},{"location":"reference/history_management/history_manager/","title":"history_manager","text":""},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager","title":"dp3.history_management.history_manager","text":""},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.DatetimeEncoder","title":"DatetimeEncoder","text":" Bases: JSONEncoder
JSONEncoder to encode datetime using the standard ADiCT format string.
"},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager","title":"HistoryManager","text":"HistoryManager(db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar) -> None\n
Source code in dp3/history_management/history_manager.py
def __init__(\nself, db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar\n) -> None:\nself.log = logging.getLogger(\"HistoryManager\")\nself.db = db\nself.model_spec = platform_config.model_spec\nself.worker_index = platform_config.process_index\nself.num_workers = platform_config.num_processes\nself.config = platform_config.config.get(\"history_manager\")\n# Schedule master document aggregation\nregistrar.scheduler_register(self.aggregate_master_docs, minute=\"*/10\")\nif platform_config.process_index != 0:\nself.log.debug(\n\"History management will be disabled in this worker to avoid race conditions.\"\n)\nreturn\n# Schedule datapoints cleaning\ndatapoint_cleaning_period = self.config[\"datapoint_cleaning\"][\"tick_rate\"]\nregistrar.scheduler_register(self.delete_old_dps, minute=f\"*/{datapoint_cleaning_period}\")\nsnapshot_cleaning_cron = self.config[\"snapshot_cleaning\"][\"cron_schedule\"]\nself.keep_snapshot_delta = timedelta(days=self.config[\"snapshot_cleaning\"][\"days_to_keep\"])\nregistrar.scheduler_register(self.delete_old_snapshots, **snapshot_cleaning_cron)\n# Schedule datapoint archivation\nself.keep_raw_delta = timedelta(days=self.config[\"datapoint_archivation\"][\"days_to_keep\"])\nself.log_dir = self._ensure_log_dir(self.config[\"datapoint_archivation\"][\"archive_dir\"])\nregistrar.scheduler_register(self.archive_old_dps, minute=0, hour=2) # Every day at 2 AM\n
"},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager.delete_old_dps","title":"delete_old_dps","text":"delete_old_dps()\n
Deletes old data points from master collection.
Source code indp3/history_management/history_manager.py
def delete_old_dps(self):\n\"\"\"Deletes old data points from master collection.\"\"\"\nself.log.debug(\"Deleting old records ...\")\nfor etype_attr, attr_conf in self.model_spec.attributes.items():\netype, attr_name = etype_attr\nmax_age = None\nif attr_conf.t == AttrType.OBSERVATIONS:\nmax_age = attr_conf.history_params.max_age\nelif attr_conf.t == AttrType.TIMESERIES:\nmax_age = attr_conf.timeseries_params.max_age\nif not max_age:\ncontinue\nt_old = datetime.utcnow() - max_age\ntry:\nself.db.delete_old_dps(etype, attr_name, t_old)\nexcept DatabaseError as e:\nself.log.error(e)\n
"},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager.delete_old_snapshots","title":"delete_old_snapshots","text":"delete_old_snapshots()\n
Deletes old snapshots.
Source code indp3/history_management/history_manager.py
def delete_old_snapshots(self):\n\"\"\"Deletes old snapshots.\"\"\"\nt_old = datetime.now() - self.keep_snapshot_delta\nself.log.debug(\"Deleting all snapshots before %s\", t_old)\ndeleted_total = 0\nfor etype in self.model_spec.entities:\ntry:\nresult = self.db.delete_old_snapshots(etype, t_old)\ndeleted_total += result.deleted_count\nexcept DatabaseError as e:\nself.log.exception(e)\nself.log.debug(\"Deleted %s snapshots in total.\", deleted_total)\n
"},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager.archive_old_dps","title":"archive_old_dps","text":"archive_old_dps()\n
Archives old data points from raw collection.
Updates already saved archive files, if present.
Source code indp3/history_management/history_manager.py
def archive_old_dps(self):\n\"\"\"\n Archives old data points from raw collection.\n Updates already saved archive files, if present.\n \"\"\"\nt_old = datetime.utcnow() - self.keep_raw_delta\nt_old = t_old.replace(hour=0, minute=0, second=0, microsecond=0)\nself.log.debug(\"Archiving all records before %s ...\", t_old)\nmax_date, min_date, total_dps = self._get_raw_dps_summary(t_old)\nif total_dps == 0:\nself.log.debug(\"Found no datapoints to archive.\")\nreturn\nself.log.debug(\n\"Found %s datapoints to archive in the range %s - %s\", total_dps, min_date, max_date\n)\nn_days = (max_date - min_date).days + 1\nfor date, next_date in [\n(min_date + timedelta(days=n), min_date + timedelta(days=n + 1)) for n in range(n_days)\n]:\ndate_string = date.strftime(\"%Y%m%d\")\nday_datapoints = 0\ndate_logfile = self.log_dir / f\"dp-log-{date_string}.json\"\nwith open(date_logfile, \"w\", encoding=\"utf-8\") as logfile:\nfirst = True\nfor etype in self.model_spec.entities:\nresult_cursor = self.db.get_raw(etype, after=date, before=next_date)\nfor dp in result_cursor:\nif first:\nlogfile.write(\nf\"[\\n{json.dumps(self._reformat_dp(dp), cls=DatetimeEncoder)}\"\n)\nfirst = False\nelse:\nlogfile.write(\nf\",\\n{json.dumps(self._reformat_dp(dp), cls=DatetimeEncoder)}\"\n)\nday_datapoints += 1\nlogfile.write(\"\\n]\")\nself.log.debug(\n\"%s: Archived %s datapoints to %s\", date_string, day_datapoints, date_logfile\n)\ncompress_file(date_logfile)\nos.remove(date_logfile)\nself.log.debug(\"%s: Saved archive was compressed\", date_string)\nif not day_datapoints:\ncontinue\ndeleted_count = 0\nfor etype in self.model_spec.entities:\ndeleted_res = self.db.delete_old_raw_dps(etype, next_date)\ndeleted_count += deleted_res.deleted_count\nself.log.debug(\"%s: Deleted %s datapoints\", date_string, deleted_count)\n
"},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.aggregate_dp_history_on_equal","title":"aggregate_dp_history_on_equal","text":"aggregate_dp_history_on_equal(history: list[dict], spec: ObservationsHistoryParams)\n
Merge datapoints in the history with equal values and overlapping time validity.
Avergages the confidence.
Source code indp3/history_management/history_manager.py
def aggregate_dp_history_on_equal(history: list[dict], spec: ObservationsHistoryParams):\n\"\"\"\n Merge datapoints in the history with equal values and overlapping time validity.\n Avergages the confidence.\n \"\"\"\nhistory = sorted(history, key=lambda x: x[\"t1\"])\naggregated_history = []\ncurrent_dp = None\nmerged_cnt = 0\npre = spec.pre_validity\npost = spec.post_validity\nfor dp in history:\nif not current_dp:\ncurrent_dp = dp\nmerged_cnt += 1\ncontinue\nif current_dp[\"v\"] == dp[\"v\"] and current_dp[\"t2\"] + post >= dp[\"t1\"] - pre:\ncurrent_dp[\"t2\"] = max(dp[\"t2\"], current_dp[\"t2\"])\ncurrent_dp[\"c\"] += dp[\"c\"]\nmerged_cnt += 1\nelse:\naggregated_history.append(current_dp)\ncurrent_dp[\"c\"] /= merged_cnt\nmerged_cnt = 1\ncurrent_dp = dp\nif current_dp:\ncurrent_dp[\"c\"] /= merged_cnt\naggregated_history.append(current_dp)\nreturn aggregated_history\n
"},{"location":"reference/history_management/telemetry/","title":"telemetry","text":""},{"location":"reference/history_management/telemetry/#dp3.history_management.telemetry","title":"dp3.history_management.telemetry","text":""},{"location":"reference/snapshots/","title":"snapshots","text":""},{"location":"reference/snapshots/#dp3.snapshots","title":"dp3.snapshots","text":"SnapShooter, a module responsible for snapshot creation and running configured data correlation and fusion hooks, and Snapshot Hooks, which manage the registered hooks and their dependencies on one another.
"},{"location":"reference/snapshots/snapshooter/","title":"snapshooter","text":""},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter","title":"dp3.snapshots.snapshooter","text":"Module managing creation of snapshots, enabling data correlation and saving snapshots to DB.
Snapshots are created periodically (user configurable period)
When a snapshot is created, several things need to happen:
observations
or plain
datapoints, which will be saved to db and forwarded in processingprofile
SnapShooter(db: EntityDatabase, task_queue_writer: TaskQueueWriter, task_executor: TaskExecutor, platform_config: PlatformConfig, scheduler: Scheduler) -> None\n
Class responsible for creating entity snapshots.
Source code indp3/snapshots/snapshooter.py
def __init__(\nself,\ndb: EntityDatabase,\ntask_queue_writer: TaskQueueWriter,\ntask_executor: TaskExecutor,\nplatform_config: PlatformConfig,\nscheduler: Scheduler,\n) -> None:\nself.log = logging.getLogger(\"SnapShooter\")\nself.db = db\nself.task_queue_writer = task_queue_writer\nself.model_spec = platform_config.model_spec\nself.entity_relation_attrs = defaultdict(dict)\nfor (entity, attr), _ in self.model_spec.relations.items():\nself.entity_relation_attrs[entity][attr] = True\nfor entity in self.model_spec.entities:\nself.entity_relation_attrs[entity][\"_id\"] = True\nself.worker_index = platform_config.process_index\nself.worker_cnt = platform_config.num_processes\nself.config = SnapShooterConfig.parse_obj(platform_config.config.get(\"snapshots\"))\nself._timeseries_hooks = SnapshotTimeseriesHookContainer(self.log, self.model_spec)\nself._correlation_hooks = SnapshotCorrelationHookContainer(self.log, self.model_spec)\nqueue = f\"{platform_config.app_name}-worker-{platform_config.process_index}-snapshots\"\nself.snapshot_queue_reader = TaskQueueReader(\ncallback=self.process_snapshot_task,\nparse_task=Snapshot.parse_raw,\napp_name=platform_config.app_name,\nworker_index=platform_config.process_index,\nrabbit_config=platform_config.config.get(\"processing_core.msg_broker\", {}),\nqueue=queue,\npriority_queue=queue,\nparent_logger=self.log,\n)\nself.snapshot_entities = [\nentity for entity, spec in self.model_spec.entities.items() if spec.snapshot\n]\nself.log.info(\"Snapshots will be created for entities: %s\", self.snapshot_entities)\n# Register snapshot cache\nfor (entity, attr), spec in self.model_spec.relations.items():\nif spec.t == AttrType.PLAIN:\ntask_executor.register_attr_hook(\n\"on_new_plain\", self.add_to_link_cache, entity, attr\n)\nelif spec.t == AttrType.OBSERVATIONS:\ntask_executor.register_attr_hook(\n\"on_new_observation\", self.add_to_link_cache, entity, attr\n)\nif platform_config.process_index != 0:\nself.log.debug(\n\"Snapshot task creation will be disabled in this worker to avoid race conditions.\"\n)\nself.snapshot_queue_writer = None\nreturn\nself.snapshot_queue_writer = TaskQueueWriter(\nplatform_config.app_name,\nplatform_config.num_processes,\nplatform_config.config.get(\"processing_core.msg_broker\"),\nf\"{platform_config.app_name}-main-snapshot-exchange\",\nparent_logger=self.log,\n)\n# Schedule snapshot period\nsnapshot_period = self.config.creation_rate\nscheduler.register(self.make_snapshots, minute=f\"*/{snapshot_period}\")\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.start","title":"start","text":"start()\n
Connect to RabbitMQ and start consuming from TaskQueue.
Source code indp3/snapshots/snapshooter.py
def start(self):\n\"\"\"Connect to RabbitMQ and start consuming from TaskQueue.\"\"\"\nself.log.info(\"Connecting to RabbitMQ\")\nself.snapshot_queue_reader.connect()\nself.snapshot_queue_reader.check() # check presence of needed queues\nif self.snapshot_queue_writer is not None:\nself.snapshot_queue_writer.connect()\nself.snapshot_queue_writer.check() # check presence of needed exchanges\nself.snapshot_queue_reader.start()\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.stop","title":"stop","text":"stop()\n
Stop consuming from TaskQueue, disconnect from RabbitMQ.
Source code indp3/snapshots/snapshooter.py
def stop(self):\n\"\"\"Stop consuming from TaskQueue, disconnect from RabbitMQ.\"\"\"\nself.snapshot_queue_reader.stop()\nif self.snapshot_queue_writer is not None:\nself.snapshot_queue_writer.disconnect()\nself.snapshot_queue_reader.disconnect()\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.register_timeseries_hook","title":"register_timeseries_hook","text":"register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n
Registers passed timeseries hook to be called during snapshot creation.
Binds hook to specified entity_type
and attr_type
(though same hook can be bound multiple times).
Parameters:
Name Type Description Defaulthook
Callable[[str, str, list[dict]], list[DataPointTask]]
hook
callable should expect entity_type, attr_type and attribute history as arguments and return a list of DataPointTask
objects.
entity_type
str
specifies entity type
requiredattr_type
str
specifies attribute type
requiredRaises:
Type DescriptionValueError
If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.
Source code indp3/snapshots/snapshooter.py
def register_timeseries_hook(\nself,\nhook: Callable[[str, str, list[dict]], list[DataPointTask]],\nentity_type: str,\nattr_type: str,\n):\n\"\"\"\n Registers passed timeseries hook to be called during snapshot creation.\n Binds hook to specified `entity_type` and `attr_type` (though same hook can be bound\n multiple times).\n Args:\n hook: `hook` callable should expect entity_type, attr_type and attribute\n history as arguments and return a list of `DataPointTask` objects.\n entity_type: specifies entity type\n attr_type: specifies attribute type\n Raises:\n ValueError: If entity_type and attr_type do not specify a valid timeseries attribute,\n a ValueError is raised.\n \"\"\"\nself._timeseries_hooks.register(hook, entity_type, attr_type)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.register_correlation_hook","title":"register_correlation_hook","text":"register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])\n
Registers passed hook to be called during snapshot creation.
Binds hook to specified entity_type (though same hook can be bound multiple times).
entity_type
and attribute specifications are validated, ValueError
is raised on failure.
Parameters:
Name Type Description Defaulthook
Callable[[str, dict], None]
hook
callable should expect entity type as str and its current values, including linked entities, as dict
entity_type
str
specifies entity type
requireddepends_on
list[list[str]]
each item should specify an attribute that is depended on in the form of a path from the specified entity_type to individual attributes (even on linked entities).
requiredmay_change
list[list[str]]
each item should specify an attribute that hook
may change. specification format is identical to depends_on
.
Raises:
Type DescriptionValueError
On failure of specification validation.
Source code indp3/snapshots/snapshooter.py
def register_correlation_hook(\nself,\nhook: Callable[[str, dict], None],\nentity_type: str,\ndepends_on: list[list[str]],\nmay_change: list[list[str]],\n):\n\"\"\"\n Registers passed hook to be called during snapshot creation.\n Binds hook to specified entity_type (though same hook can be bound multiple times).\n `entity_type` and attribute specifications are validated, `ValueError` is raised on failure.\n Args:\n hook: `hook` callable should expect entity type as str\n and its current values, including linked entities, as dict\n entity_type: specifies entity type\n depends_on: each item should specify an attribute that is depended on\n in the form of a path from the specified entity_type to individual attributes\n (even on linked entities).\n may_change: each item should specify an attribute that `hook` may change.\n specification format is identical to `depends_on`.\n Raises:\n ValueError: On failure of specification validation.\n \"\"\"\nself._correlation_hooks.register(hook, entity_type, depends_on, may_change)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.add_to_link_cache","title":"add_to_link_cache","text":"add_to_link_cache(eid: str, dp: DataPointBase)\n
Adds the given entity,eid pair to the cache of all linked entitites.
Source code indp3/snapshots/snapshooter.py
def add_to_link_cache(self, eid: str, dp: DataPointBase):\n\"\"\"Adds the given entity,eid pair to the cache of all linked entitites.\"\"\"\ncache = self.db.get_module_cache()\netype_to = self.model_spec.relations[dp.etype, dp.attr].relation_to\nto_insert = [\n{\n\"_id\": f\"{dp.etype}#{eid}\",\n\"etype\": dp.etype,\n\"eid\": eid,\n\"expire_at\": datetime.now() + timedelta(days=2),\n},\n{\n\"_id\": f\"{etype_to}#{dp.v.eid}\",\n\"etype\": etype_to,\n\"eid\": dp.v.eid,\n\"expire_at\": datetime.now() + timedelta(days=2),\n},\n]\nres = cache.bulk_write([ReplaceOne({\"_id\": x[\"_id\"]}, x, upsert=True) for x in to_insert])\nself.log.debug(\"Cached %s linked entities: %s\", len(to_insert), res.bulk_api_result)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_snapshots","title":"make_snapshots","text":"make_snapshots()\n
Creates snapshots for all entities currently active in database.
Source code indp3/snapshots/snapshooter.py
def make_snapshots(self):\n\"\"\"Creates snapshots for all entities currently active in database.\"\"\"\ntime = datetime.now()\n# distribute list of possibly linked entities to all workers\ncached = self.get_cached_link_entity_ids()\nself.log.debug(\"Broadcasting %s cached linked entities\", len(cached))\nself.snapshot_queue_writer.broadcast_task(\ntask=Snapshot(entities=cached, time=time, type=SnapshotMessageType.linked_entities)\n)\n# Load links only for a reduced set of entities\nself.log.debug(\"Loading linked entities.\")\nself.db.save_metadata(time, {\"task_creation_start\": time, \"entities\": 0, \"components\": 0})\ntimes = {}\ncounts = {\"entities\": 0, \"components\": 0}\ntry:\nlinked_entities = self.get_linked_entities(time, cached)\ntimes[\"components_loaded\"] = datetime.now()\nfor linked_entities_component in linked_entities:\ncounts[\"entities\"] += len(linked_entities_component)\ncounts[\"components\"] += 1\nself.snapshot_queue_writer.put_task(\ntask=Snapshot(\nentities=linked_entities_component, time=time, type=SnapshotMessageType.task\n)\n)\nexcept pymongo.errors.CursorNotFound as err:\nself.log.exception(err)\nfinally:\ntimes[\"task_creation_end\"] = datetime.now()\nself.db.update_metadata(\ntime,\nmetadata=times,\nincrease=counts,\n)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_linked_entities","title":"get_linked_entities","text":"get_linked_entities(time: datetime, cached_linked_entities: list[tuple[str, str]])\n
Get weakly connected components from entity graph.
Source code indp3/snapshots/snapshooter.py
def get_linked_entities(self, time: datetime, cached_linked_entities: list[tuple[str, str]]):\n\"\"\"Get weakly connected components from entity graph.\"\"\"\nvisited_entities = set()\nentity_to_component = {}\nlinked_components = []\nfor etype, eid in cached_linked_entities:\nmaster_record = self.db.get_master_record(\netype, eid, projection=self.entity_relation_attrs[etype]\n) or {\"_id\": eid}\nif (etype, master_record[\"_id\"]) not in visited_entities:\n# Get entities linked by current entity\ncurrent_values = self.get_values_at_time(etype, master_record, time)\nlinked_entities = self.load_linked_entity_ids(etype, current_values, time)\n# Set linked as visited\nvisited_entities.update(linked_entities)\n# Update component\nhave_component = linked_entities & set(entity_to_component.keys())\nif have_component:\nfor entity in have_component:\ncomponent = entity_to_component[entity]\ncomponent.update(linked_entities)\nentity_to_component.update(\n{entity: component for entity in linked_entities}\n)\nbreak\nelse:\nentity_to_component.update(\n{entity: linked_entities for entity in linked_entities}\n)\nlinked_components.append(linked_entities)\nreturn linked_components\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.process_snapshot_task","title":"process_snapshot_task","text":"process_snapshot_task(msg_id, task: Snapshot)\n
Acknowledges the received message and makes a snapshot according to the task
.
This function should not be called directly, but set as callback for TaskQueueReader.
Source code indp3/snapshots/snapshooter.py
def process_snapshot_task(self, msg_id, task: Snapshot):\n\"\"\"\n Acknowledges the received message and makes a snapshot according to the `task`.\n This function should not be called directly, but set as callback for TaskQueueReader.\n \"\"\"\nself.snapshot_queue_reader.ack(msg_id)\nif task.type == SnapshotMessageType.task:\nself.make_snapshot(task)\nelif task.type == SnapshotMessageType.linked_entities:\nself.make_snapshots_by_hash(task)\nelse:\nraise ValueError(\"Unknown SnapshotMessageType.\")\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_snapshots_by_hash","title":"make_snapshots_by_hash","text":"make_snapshots_by_hash(task: Snapshot)\n
Make snapshots for all entities with routing key belonging to this worker.
Source code indp3/snapshots/snapshooter.py
def make_snapshots_by_hash(self, task: Snapshot):\n\"\"\"\n Make snapshots for all entities with routing key belonging to this worker.\n \"\"\"\nself.log.debug(\"Creating snapshots for worker portion by hash.\")\nhave_links = set(task.entities)\nentity_cnt = 0\nfor etype in self.snapshot_entities:\nrecords_cursor = self.db.get_worker_master_records(\nself.worker_index,\nself.worker_cnt,\netype,\nno_cursor_timeout=True,\n)\ntry:\nsnapshots = []\nfor master_record in records_cursor:\nif (etype, master_record[\"_id\"]) in have_links:\ncontinue\nentity_cnt += 1\nsnapshots.append(self.make_linkless_snapshot(etype, master_record, task.time))\nif len(snapshots) >= DB_SEND_CHUNK:\nself.db.save_snapshots(etype, snapshots, task.time)\nsnapshots.clear()\nif snapshots:\nself.db.save_snapshots(etype, snapshots, task.time)\nsnapshots.clear()\nfinally:\nrecords_cursor.close()\nself.db.update_metadata(\ntask.time,\nmetadata={},\nincrease={\"entities\": entity_cnt, \"components\": entity_cnt},\n)\nself.log.debug(\"Worker snapshot creation done.\")\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_linkless_snapshot","title":"make_linkless_snapshot","text":"make_linkless_snapshot(entity_type: str, master_record: dict, time: datetime)\n
Make a snapshot for given entity master_record
and time
.
Runs timeseries and correlation hooks. The resulting snapshot is saved into DB.
Source code indp3/snapshots/snapshooter.py
def make_linkless_snapshot(self, entity_type: str, master_record: dict, time: datetime):\n\"\"\"\n Make a snapshot for given entity `master_record` and `time`.\n Runs timeseries and correlation hooks.\n The resulting snapshot is saved into DB.\n \"\"\"\nself.run_timeseries_processing(entity_type, master_record)\nvalues = self.get_values_at_time(entity_type, master_record, time)\nentity_values = {(entity_type, master_record[\"_id\"]): values}\nself._correlation_hooks.run(entity_values)\nassert len(entity_values) == 1, \"Expected a single entity.\"\nfor record in entity_values.values():\nreturn record\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_snapshot","title":"make_snapshot","text":"make_snapshot(task: Snapshot)\n
Make a snapshot for entities and time specified by task
.
Runs timeseries and correlation hooks. The resulting snapshots are saved into DB.
Source code indp3/snapshots/snapshooter.py
def make_snapshot(self, task: Snapshot):\n\"\"\"\n Make a snapshot for entities and time specified by `task`.\n Runs timeseries and correlation hooks.\n The resulting snapshots are saved into DB.\n \"\"\"\nentity_values = {}\nfor entity_type, entity_id in task.entities:\nrecord = self.db.get_master_record(entity_type, entity_id) or {\"_id\": entity_id}\nself.run_timeseries_processing(entity_type, record)\nvalues = self.get_values_at_time(entity_type, record, task.time)\nentity_values[entity_type, entity_id] = values\nself.link_loaded_entities(entity_values)\nself._correlation_hooks.run(entity_values)\n# unlink entities again\nfor rtype_rid, record in entity_values.items():\nrtype, rid = rtype_rid\nfor attr, value in record.items():\nif (rtype, attr) not in self.model_spec.relations:\ncontinue\nif self.model_spec.relations[rtype, attr].multi_value:\nrecord[attr] = [\n{k: v for k, v in link_dict.items() if k != \"record\"} for link_dict in value\n]\nelse:\nrecord[attr] = {k: v for k, v in value.items() if k != \"record\"}\nfor rtype_rid, record in entity_values.items():\nself.db.save_snapshot(rtype_rid[0], record, task.time)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.run_timeseries_processing","title":"run_timeseries_processing","text":"run_timeseries_processing(entity_type, master_record)\n
observations
or plain
datapoints, which will be saved to db and forwarded in processingdp3/snapshots/snapshooter.py
def run_timeseries_processing(self, entity_type, master_record):\n\"\"\"\n - all registered timeseries processing modules must be called\n - this should result in `observations` or `plain` datapoints, which will be saved to db\n and forwarded in processing\n \"\"\"\ntasks = []\nfor attr, attr_spec in self.model_spec.entity_attributes[entity_type].items():\nif attr_spec.t == AttrType.TIMESERIES and attr in master_record:\nnew_tasks = self._timeseries_hooks.run(entity_type, attr, master_record[attr])\ntasks.extend(new_tasks)\nself.extend_master_record(entity_type, master_record, tasks)\nfor task in tasks:\nself.task_queue_writer.put_task(task)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.extend_master_record","title":"extend_master_record staticmethod
","text":"extend_master_record(etype, master_record, new_tasks: list[DataPointTask])\n
Update existing master record with datapoints from new tasks
Source code indp3/snapshots/snapshooter.py
@staticmethod\ndef extend_master_record(etype, master_record, new_tasks: list[DataPointTask]):\n\"\"\"Update existing master record with datapoints from new tasks\"\"\"\nfor task in new_tasks:\nfor datapoint in task.data_points:\nif datapoint.etype != etype:\ncontinue\ndp_dict = datapoint.dict(include={\"v\", \"t1\", \"t2\", \"c\"})\nif datapoint.attr in master_record:\nmaster_record[datapoint.attr].append()\nelse:\nmaster_record[datapoint.attr] = [dp_dict]\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.load_linked_entity_ids","title":"load_linked_entity_ids","text":"load_linked_entity_ids(entity_type: str, current_values: dict, time: datetime)\n
Loads the subgraph of entities linked to the current entity, returns a list of their types and ids.
Source code indp3/snapshots/snapshooter.py
def load_linked_entity_ids(self, entity_type: str, current_values: dict, time: datetime):\n\"\"\"\n Loads the subgraph of entities linked to the current entity,\n returns a list of their types and ids.\n \"\"\"\nloaded_entity_ids = {(entity_type, current_values[\"eid\"])}\nlinked_entity_ids_to_process = (\nself.get_linked_entity_ids(entity_type, current_values) - loaded_entity_ids\n)\nwhile linked_entity_ids_to_process:\nentity_identifiers = linked_entity_ids_to_process.pop()\nlinked_etype, linked_eid = entity_identifiers\nrelevant_attributes = self.entity_relation_attrs[linked_etype]\nrecord = self.db.get_master_record(\nlinked_etype, linked_eid, projection=relevant_attributes\n) or {\"_id\": linked_eid}\nlinked_values = self.get_values_at_time(linked_etype, record, time)\nlinked_entity_ids_to_process.update(\nself.get_linked_entity_ids(entity_type, linked_values) - set(loaded_entity_ids)\n)\nloaded_entity_ids.add((linked_etype, linked_eid))\nreturn loaded_entity_ids\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_linked_entity_ids","title":"get_linked_entity_ids","text":"get_linked_entity_ids(entity_type: str, current_values: dict) -> set[tuple[str, str]]\n
Returns a set of tuples (entity_type, entity_id) identifying entities linked by current_values
.
dp3/snapshots/snapshooter.py
def get_linked_entity_ids(self, entity_type: str, current_values: dict) -> set[tuple[str, str]]:\n\"\"\"\n Returns a set of tuples (entity_type, entity_id) identifying entities linked by\n `current_values`.\n \"\"\"\nrelated_entity_ids = set()\nfor attr, val in current_values.items():\nif (entity_type, attr) not in self.model_spec.relations:\ncontinue\nattr_spec = self.model_spec.relations[entity_type, attr]\nif attr_spec.multi_value:\nrelated_entity_ids.update((attr_spec.relation_to, v[\"eid\"]) for v in val)\nelse:\nrelated_entity_ids.add((attr_spec.relation_to, val[\"eid\"]))\nreturn related_entity_ids\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_value_at_time","title":"get_value_at_time","text":"get_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[Any, float]\n
Get current value of an attribute from its history. Assumes multi_value = False
.
dp3/snapshots/snapshooter.py
def get_value_at_time(\nself, attr_spec: AttrSpecObservations, attr_history, time: datetime\n) -> tuple[Any, float]:\n\"\"\"Get current value of an attribute from its history. Assumes `multi_value = False`.\"\"\"\nreturn max(\n(\n(point[\"v\"], self.extrapolate_confidence(point, time, attr_spec.history_params))\nfor point in attr_history\n),\nkey=lambda val_conf: val_conf[1],\ndefault=(None, 0.0),\n)\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_multi_value_at_time","title":"get_multi_value_at_time","text":"get_multi_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[list, list[float]]\n
Get current value of a multi_value attribute from its history.
Source code indp3/snapshots/snapshooter.py
def get_multi_value_at_time(\nself, attr_spec: AttrSpecObservations, attr_history, time: datetime\n) -> tuple[list, list[float]]:\n\"\"\"Get current value of a multi_value attribute from its history.\"\"\"\nif attr_spec.data_type.hashable:\nvalues_with_confidence = defaultdict(float)\nfor point in attr_history:\nvalue = point[\"v\"]\nconfidence = self.extrapolate_confidence(point, time, attr_spec.history_params)\nif confidence > 0.0 and values_with_confidence[value] < confidence:\nvalues_with_confidence[value] = confidence\nreturn list(values_with_confidence.keys()), list(values_with_confidence.values())\nelse:\nvalues = []\nconfidence_list = []\nfor point in attr_history:\nvalue = point[\"v\"]\nconfidence = self.extrapolate_confidence(point, time, attr_spec.history_params)\nif value in values:\ni = values.index(value)\nif confidence_list[i] < confidence:\nconfidence_list[i] = confidence\nelif confidence > 0.0:\nvalues.append(value)\nconfidence_list.append(confidence)\nreturn values, confidence_list\n
"},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.extrapolate_confidence","title":"extrapolate_confidence staticmethod
","text":"extrapolate_confidence(datapoint: dict, time: datetime, history_params: ObservationsHistoryParams) -> float\n
Get the confidence value at given time.
Source code indp3/snapshots/snapshooter.py
@staticmethod\ndef extrapolate_confidence(\ndatapoint: dict, time: datetime, history_params: ObservationsHistoryParams\n) -> float:\n\"\"\"Get the confidence value at given time.\"\"\"\nt1 = datapoint[\"t1\"]\nt2 = datapoint[\"t2\"]\nbase_confidence = datapoint[\"c\"]\nif time < t1:\nif time <= t1 - history_params.pre_validity:\nreturn 0.0\nreturn base_confidence * (1 - (t1 - time) / history_params.pre_validity)\nif time <= t2:\nreturn base_confidence # completely inside the (strict) interval\nif time >= t2 + history_params.post_validity:\nreturn 0.0\nreturn base_confidence * (1 - (time - t2) / history_params.post_validity)\n
"},{"location":"reference/snapshots/snapshot_hooks/","title":"snapshot_hooks","text":""},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks","title":"dp3.snapshots.snapshot_hooks","text":"Module managing registered hooks and their dependencies on one another.
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer","title":"SnapshotTimeseriesHookContainer","text":"SnapshotTimeseriesHookContainer(log: logging.Logger, model_spec: ModelSpec)\n
Container for timeseries analysis hooks
Source code indp3/snapshots/snapshot_hooks.py
def __init__(self, log: logging.Logger, model_spec: ModelSpec):\nself.log = log.getChild(\"TimeseriesHooks\")\nself.model_spec = model_spec\nself._hooks = defaultdict(list)\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register","title":"register","text":"register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n
Registers passed timeseries hook to be called during snapshot creation.
Binds hook to specified entity_type and attr_type (though same hook can be bound multiple times). If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.
Parameters:
Name Type Description Defaulthook
Callable[[str, str, list[dict]], list[DataPointTask]]
hook
callable should expect entity_type, attr_type and attribute history as arguments and return a list of Task
objects.
entity_type
str
specifies entity type
requiredattr_type
str
specifies attribute type
required Source code indp3/snapshots/snapshot_hooks.py
def register(\nself,\nhook: Callable[[str, str, list[dict]], list[DataPointTask]],\nentity_type: str,\nattr_type: str,\n):\n\"\"\"\n Registers passed timeseries hook to be called during snapshot creation.\n Binds hook to specified entity_type and attr_type (though same hook can be bound\n multiple times).\n If entity_type and attr_type do not specify a valid timeseries attribute,\n a ValueError is raised.\n Args:\n hook: `hook` callable should expect entity_type, attr_type and attribute\n history as arguments and return a list of `Task` objects.\n entity_type: specifies entity type\n attr_type: specifies attribute type\n \"\"\"\nif (entity_type, attr_type) not in self.model_spec.attributes:\nraise ValueError(f\"Attribute '{attr_type}' of entity '{entity_type}' does not exist.\")\nspec = self.model_spec.attributes[entity_type, attr_type]\nif spec.t != AttrType.TIMESERIES:\nraise ValueError(f\"'{entity_type}.{attr_type}' is not a timeseries, but '{spec.t}'\")\nself._hooks[entity_type, attr_type].append(hook)\nself.log.debug(f\"Added hook: '{hook.__qualname__}'\")\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.run","title":"run","text":"run(entity_type: str, attr_type: str, attr_history: list[dict]) -> list[DataPointTask]\n
Runs registered hooks.
Source code indp3/snapshots/snapshot_hooks.py
def run(\nself, entity_type: str, attr_type: str, attr_history: list[dict]\n) -> list[DataPointTask]:\n\"\"\"Runs registered hooks.\"\"\"\ntasks = []\nfor hook in self._hooks[entity_type, attr_type]:\ntry:\nnew_tasks = hook(entity_type, attr_type, attr_history)\ntasks.extend(new_tasks)\nexcept Exception as e:\nself.log.error(f\"Error during running hook {hook}: {e}\")\nreturn tasks\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotCorrelationHookContainer","title":"SnapshotCorrelationHookContainer","text":"SnapshotCorrelationHookContainer(log: logging.Logger, model_spec: ModelSpec)\n
Container for data fusion and correlation hooks.
Source code indp3/snapshots/snapshot_hooks.py
def __init__(self, log: logging.Logger, model_spec: ModelSpec):\nself.log = log.getChild(\"CorrelationHooks\")\nself.model_spec = model_spec\nself._hooks: defaultdict[str, list[tuple[str, Callable]]] = defaultdict(list)\nself._dependency_graph = DependencyGraph(self.log)\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotCorrelationHookContainer.register","title":"register","text":"register(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]]) -> str\n
Registers passed hook to be called during snapshot creation.
Binds hook to specified entity_type (though same hook can be bound multiple times).
If entity_type and attribute specifications are validated and ValueError is raised on failure.
Parameters:
Name Type Description Defaulthook
Callable[[str, dict], None]
hook
callable should expect entity type as str and its current values, including linked entities, as dict
entity_type
str
specifies entity type
requireddepends_on
list[list[str]]
each item should specify an attribute that is depended on in the form of a path from the specified entity_type to individual attributes (even on linked entities).
requiredmay_change
list[list[str]]
each item should specify an attribute that hook
may change. specification format is identical to depends_on
.
Returns:
Type Descriptionstr
Generated hook id.
Source code indp3/snapshots/snapshot_hooks.py
def register(\nself,\nhook: Callable[[str, dict], None],\nentity_type: str,\ndepends_on: list[list[str]],\nmay_change: list[list[str]],\n) -> str:\n\"\"\"\n Registers passed hook to be called during snapshot creation.\n Binds hook to specified entity_type (though same hook can be bound multiple times).\n If entity_type and attribute specifications are validated\n and ValueError is raised on failure.\n Args:\n hook: `hook` callable should expect entity type as str\n and its current values, including linked entities, as dict\n entity_type: specifies entity type\n depends_on: each item should specify an attribute that is depended on\n in the form of a path from the specified entity_type to individual attributes\n (even on linked entities).\n may_change: each item should specify an attribute that `hook` may change.\n specification format is identical to `depends_on`.\n Returns:\n Generated hook id.\n \"\"\"\nif entity_type not in self.model_spec.entities:\nraise ValueError(f\"Entity '{entity_type}' does not exist.\")\nself._validate_attr_paths(entity_type, depends_on)\nself._validate_attr_paths(entity_type, may_change)\ndepends_on = self._expand_path_backlinks(entity_type, depends_on)\nmay_change = self._expand_path_backlinks(entity_type, may_change)\ndepends_on = self._embed_base_entity(entity_type, depends_on)\nmay_change = self._embed_base_entity(entity_type, may_change)\nhook_id = (\nf\"{hook.__qualname__}(\"\nf\"{entity_type}, [{','.join(depends_on)}], [{','.join(may_change)}]\"\nf\")\"\n)\nself._dependency_graph.add_hook_dependency(hook_id, depends_on, may_change)\nself._hooks[entity_type].append((hook_id, hook))\nself._restore_hook_order(self._hooks[entity_type])\nself.log.debug(f\"Added hook: '{hook_id}'\")\nreturn hook_id\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotCorrelationHookContainer.run","title":"run","text":"run(entities: dict)\n
Runs registered hooks.
Source code indp3/snapshots/snapshot_hooks.py
def run(self, entities: dict):\n\"\"\"Runs registered hooks.\"\"\"\nentity_types = {etype for etype, _ in entities}\nhook_subset = [\n(hook_id, hook, etype) for etype in entity_types for hook_id, hook in self._hooks[etype]\n]\ntopological_order = self._dependency_graph.topological_order\nhook_subset.sort(key=lambda x: topological_order.index(x[0]))\nentities_by_etype = {\netype_eid[0]: {etype_eid[1]: entity} for etype_eid, entity in entities.items()\n}\nfor hook_id, hook, etype in hook_subset:\nfor eid, entity_values in entities_by_etype[etype].items():\nself.log.debug(\"Running hook %s on entity %s\", hook_id, eid)\nhook(etype, entity_values)\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.GraphVertex","title":"GraphVertex dataclass
","text":"Vertex in a graph of dependencies
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph","title":"DependencyGraph","text":"DependencyGraph(log)\n
Class representing a graph of dependencies between correlation hooks.
Source code indp3/snapshots/snapshot_hooks.py
def __init__(self, log):\nself.log = log.getChild(\"DependencyGraph\")\n# dictionary of adjacency lists for each edge\nself._vertices = defaultdict(GraphVertex)\nself.topological_order = []\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.add_hook_dependency","title":"add_hook_dependency","text":"add_hook_dependency(hook_id: str, depends_on: list[str], may_change: list[str])\n
Add hook to dependency graph and recalculate if any cycles are created.
Source code indp3/snapshots/snapshot_hooks.py
def add_hook_dependency(self, hook_id: str, depends_on: list[str], may_change: list[str]):\n\"\"\"Add hook to dependency graph and recalculate if any cycles are created.\"\"\"\nif hook_id in self._vertices:\nraise ValueError(f\"Hook id '{hook_id}' already present in the vertices.\")\nfor path in depends_on:\nself.add_edge(path, hook_id)\nfor path in may_change:\nself.add_edge(hook_id, path)\nself._vertices[hook_id].type = \"hook\"\ntry:\nself.topological_sort()\nexcept ValueError as err:\nraise ValueError(f\"Hook {hook_id} introduces a circular dependency.\") from err\nself.check_multiple_writes()\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.add_edge","title":"add_edge","text":"add_edge(id_from: Hashable, id_to: Hashable)\n
Add oriented edge between specified vertices.
Source code indp3/snapshots/snapshot_hooks.py
def add_edge(self, id_from: Hashable, id_to: Hashable):\n\"\"\"Add oriented edge between specified vertices.\"\"\"\nself._vertices[id_from].adj.append(id_to)\n# Ensure vertex with 'id_to' exists to avoid iteration errors later.\n_ = self._vertices[id_to]\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.calculate_in_degrees","title":"calculate_in_degrees","text":"calculate_in_degrees()\n
Calculate number of incoming edges for each vertex. Time complexity O(V + E).
Source code indp3/snapshots/snapshot_hooks.py
def calculate_in_degrees(self):\n\"\"\"Calculate number of incoming edges for each vertex. Time complexity O(V + E).\"\"\"\nfor vertex_node in self._vertices.values():\nvertex_node.in_degree = 0\nfor vertex_node in self._vertices.values():\nfor adjacent_name in vertex_node.adj:\nself._vertices[adjacent_name].in_degree += 1\n
"},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.topological_sort","title":"topological_sort","text":"topological_sort()\n
Implementation of Kahn's algorithm for topological sorting. Raises ValueError if there is a cycle in the graph.
See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm
Source code indp3/snapshots/snapshot_hooks.py
def topological_sort(self):\n\"\"\"\n Implementation of Kahn's algorithm for topological sorting.\n Raises ValueError if there is a cycle in the graph.\n See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm\n \"\"\"\nself.calculate_in_degrees()\nqueue = [(node_id, node) for node_id, node in self._vertices.items() if node.in_degree == 0]\ntopological_order = []\nprocessed_vertices_cnt = 0\nwhile queue:\ncurr_node_id, curr_node = queue.pop(0)\ntopological_order.append(curr_node_id)\n# Decrease neighbouring nodes' in-degree by 1\nfor neighbor in curr_node.adj:\nneighbor_node = self._vertices[neighbor]\nneighbor_node.in_degree -= 1\n# If in-degree becomes zero, add it to queue\nif neighbor_node.in_degree == 0:\nqueue.append((neighbor, neighbor_node))\nprocessed_vertices_cnt += 1\nif processed_vertices_cnt != len(self._vertices):\nraise ValueError(\"Dependency graph contains a cycle.\")\nelse:\nself.topological_order = topological_order\nreturn topological_order\n
"},{"location":"reference/task_processing/","title":"task_processing","text":""},{"location":"reference/task_processing/#dp3.task_processing","title":"dp3.task_processing","text":"Module responsible for task distribution, processing and running configured hooks. Task distribution is possible due to the task queue.
"},{"location":"reference/task_processing/task_distributor/","title":"task_distributor","text":""},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor","title":"dp3.task_processing.task_distributor","text":""},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor.TaskDistributor","title":"TaskDistributor","text":"TaskDistributor(task_executor: TaskExecutor, platform_config: PlatformConfig, registrar: CallbackRegistrar, daemon_stop_lock: threading.Lock) -> None\n
TaskDistributor uses task queues to distribute tasks between all running processes.
Tasks are assigned to worker processes based on hash of entity key, so each entity is always processed by the same worker. Therefore, all requests modifying a particular entity are done sequentially and no locking is necessary.
Tasks that are assigned to the current process are passed to task_executor
for execution.
Parameters:
Name Type Description Defaultplatform_config
PlatformConfig
Platform config
requiredtask_executor
TaskExecutor
Instance of TaskExecutor
requiredregistrar
CallbackRegistrar
Interface for callback registration
requireddaemon_stop_lock
threading.Lock
Lock used to control when the program stops. (see dp3.worker)
required Source code indp3/task_processing/task_distributor.py
def __init__(\nself,\ntask_executor: TaskExecutor,\nplatform_config: PlatformConfig,\nregistrar: CallbackRegistrar,\ndaemon_stop_lock: threading.Lock,\n) -> None:\nassert (\n0 <= platform_config.process_index < platform_config.num_processes\n), \"process index must be smaller than number of processes\"\nself.log = logging.getLogger(\"TaskDistributor\")\nself.process_index = platform_config.process_index\nself.num_processes = platform_config.num_processes\nself.model_spec = platform_config.model_spec\nself.daemon_stop_lock = daemon_stop_lock\nself.rabbit_params = platform_config.config.get(\"processing_core.msg_broker\", {})\nself.entity_types = list(\nplatform_config.config.get(\"db_entities\").keys()\n) # List of configured entity types\nself.running = False\n# List of worker threads for processing the update requests\nself._worker_threads = []\nself.num_threads = platform_config.config.get(\"processing_core.worker_threads\", 8)\n# Internal queues for each worker\nself._queues = [queue.Queue(10) for _ in range(self.num_threads)]\n# Connections to main task queue\n# Reader - reads tasks from a pair of queues (one pair per process)\n# and distributes them to worker threads\nself._task_queue_reader = TaskQueueReader(\ncallback=self._distribute_task,\nparse_task=lambda body: DataPointTask(model_spec=self.model_spec, **json.loads(body)),\napp_name=platform_config.app_name,\nworker_index=self.process_index,\nrabbit_config=self.rabbit_params,\n)\n# Writer - allows modules to write new tasks\nself._task_queue_writer = TaskQueueWriter(\nplatform_config.app_name, self.num_processes, self.rabbit_params\n)\nself.task_executor = task_executor\n# Object to store thread-local data (e.g. worker-thread index)\n# (each thread sees different object contents)\nself._current_thread_data = threading.local()\n# Number of restarts of threads by watchdog\nself._watchdog_restarts = 0\n# Register watchdog to scheduler\nregistrar.scheduler_register(self._watchdog, second=\"*/30\")\n
"},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor.TaskDistributor.start","title":"start","text":"start() -> None\n
Run the worker threads and start consuming from TaskQueue.
Source code indp3/task_processing/task_distributor.py
def start(self) -> None:\n\"\"\"Run the worker threads and start consuming from TaskQueue.\"\"\"\nself.log.info(\"Connecting to RabbitMQ\")\nself._task_queue_reader.connect()\nself._task_queue_reader.check() # check presence of needed queues\nself._task_queue_writer.connect()\nself._task_queue_writer.check() # check presence of needed exchanges\nself.log.info(f\"Starting {self.num_threads} worker threads\")\nself.running = True\nself._worker_threads = [\nthreading.Thread(\ntarget=self._worker_func, args=(i,), name=f\"Worker-{self.process_index}-{i}\"\n)\nfor i in range(self.num_threads)\n]\nfor worker in self._worker_threads:\nworker.start()\nself.log.info(\"Starting consuming tasks from main queue\")\nself._task_queue_reader.start()\n
"},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor.TaskDistributor.stop","title":"stop","text":"stop() -> None\n
Stop the worker threads.
Source code indp3/task_processing/task_distributor.py
def stop(self) -> None:\n\"\"\"Stop the worker threads.\"\"\"\nself.log.info(\"Waiting for worker threads to finish their current tasks ...\")\n# Thread for printing debug messages about worker status\nthreading.Thread(target=self._dbg_worker_status_print, daemon=True).start()\n# Stop receiving new tasks from global queue\nself._task_queue_reader.stop()\n# Signalize stop to worker threads\nself.running = False\n# Wait until all workers stopped\nfor worker in self._worker_threads:\nworker.join()\nself._task_queue_reader.disconnect()\nself._task_queue_writer.disconnect()\n# Cleanup\nself._worker_threads = []\n
"},{"location":"reference/task_processing/task_executor/","title":"task_executor","text":""},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor","title":"dp3.task_processing.task_executor","text":""},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor","title":"TaskExecutor","text":"TaskExecutor(db: EntityDatabase, platform_config: PlatformConfig) -> None\n
TaskExecutor manages updates of entity records, which are being read from task queue (via parent TaskDistributor
)
Parameters:
Name Type Description Defaultdb
EntityDatabase
Instance of EntityDatabase
requiredplatform_config
PlatformConfig
Current platform configuration.
required Source code indp3/task_processing/task_executor.py
def __init__(\nself,\ndb: EntityDatabase,\nplatform_config: PlatformConfig,\n) -> None:\n# initialize task distribution\nself.log = logging.getLogger(\"TaskExecutor\")\n# Get list of configured entity types\nself.entity_types = list(platform_config.model_spec.entities.keys())\nself.log.debug(f\"Configured entity types: {self.entity_types}\")\nself.model_spec = platform_config.model_spec\nself.db = db\n# EventCountLogger\n# - count number of events across multiple processes using shared counters in Redis\necl = EventCountLogger(\nplatform_config.config.get(\"event_logging.groups\"),\nplatform_config.config.get(\"event_logging.redis\"),\n)\nself.elog = ecl.get_group(\"te\") or DummyEventGroup()\nself.elog_by_src = ecl.get_group(\"tasks_by_src\") or DummyEventGroup()\n# Print warning if some event group is not configured\nnot_configured_groups = []\nif isinstance(self.elog, DummyEventGroup):\nnot_configured_groups.append(\"te\")\nif isinstance(self.elog_by_src, DummyEventGroup):\nnot_configured_groups.append(\"tasks_by_src\")\nif not_configured_groups:\nself.log.warning(\n\"EventCountLogger: No configuration for event group(s) \"\nf\"'{','.join(not_configured_groups)}' found, \"\n\"such events will not be logged (check event_logging.yml)\"\n)\n# Hooks\nself._task_generic_hooks = TaskGenericHooksContainer(self.log)\nself._task_entity_hooks = {}\nself._task_attr_hooks = {}\nfor entity in self.model_spec.entities:\nself._task_entity_hooks[entity] = TaskEntityHooksContainer(entity, self.log)\nfor entity, attr in self.model_spec.attributes:\nattr_type = self.model_spec.attributes[entity, attr].t\nself._task_attr_hooks[entity, attr] = TaskAttrHooksContainer(\nentity, attr, attr_type, self.log\n)\n
"},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.register_task_hook","title":"register_task_hook","text":"register_task_hook(hook_type: str, hook: Callable)\n
Registers one of available task hooks
See: TaskGenericHooksContainer
in task_hooks.py
dp3/task_processing/task_executor.py
def register_task_hook(self, hook_type: str, hook: Callable):\n\"\"\"Registers one of available task hooks\n See: [`TaskGenericHooksContainer`][dp3.task_processing.task_hooks.TaskGenericHooksContainer]\n in `task_hooks.py`\n \"\"\"\nself._task_generic_hooks.register(hook_type, hook)\n
"},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.register_entity_hook","title":"register_entity_hook","text":"register_entity_hook(hook_type: str, hook: Callable, entity: str)\n
Registers one of available task entity hooks
See: TaskEntityHooksContainer
in task_hooks.py
dp3/task_processing/task_executor.py
def register_entity_hook(self, hook_type: str, hook: Callable, entity: str):\n\"\"\"Registers one of available task entity hooks\n See: [`TaskEntityHooksContainer`][dp3.task_processing.task_hooks.TaskEntityHooksContainer]\n in `task_hooks.py`\n \"\"\"\nself._task_entity_hooks[entity].register(hook_type, hook)\n
"},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.register_attr_hook","title":"register_attr_hook","text":"register_attr_hook(hook_type: str, hook: Callable, entity: str, attr: str)\n
Registers one of available task attribute hooks
See: TaskAttrHooksContainer
in task_hooks.py
dp3/task_processing/task_executor.py
def register_attr_hook(self, hook_type: str, hook: Callable, entity: str, attr: str):\n\"\"\"Registers one of available task attribute hooks\n See: [`TaskAttrHooksContainer`][dp3.task_processing.task_hooks.TaskAttrHooksContainer]\n in `task_hooks.py`\n \"\"\"\nself._task_attr_hooks[entity, attr].register(hook_type, hook)\n
"},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.process_task","title":"process_task","text":"process_task(task: DataPointTask) -> tuple[bool, list[DataPointTask]]\n
Main processing function - push datapoint values, running all registered hooks.
Parameters:
Name Type Description Defaulttask
DataPointTask
Task object to process.
requiredReturns:
Type Descriptionbool
True if a new record was created, False otherwise,
list[DataPointTask]
and a list of new tasks created by hooks
Source code indp3/task_processing/task_executor.py
def process_task(self, task: DataPointTask) -> tuple[bool, list[DataPointTask]]:\n\"\"\"\n Main processing function - push datapoint values, running all registered hooks.\n Args:\n task: Task object to process.\n Returns:\n True if a new record was created, False otherwise,\n and a list of new tasks created by hooks\n \"\"\"\nself.log.debug(f\"Received new task {task.etype}/{task.eid}, starting processing!\")\nnew_tasks = []\n# Run on_task_start hook\nself._task_generic_hooks.run_on_start(task)\n# Check existence of etype\nif task.etype not in self.entity_types:\nself.log.error(f\"Task {task.etype}/{task.eid}: Unknown entity type!\")\nself.elog.log(\"task_processing_error\")\nreturn False, new_tasks\n# Check existence of eid\ntry:\nekey_exists = self.db.ekey_exists(task.etype, task.eid)\nexcept DatabaseError as e:\nself.log.error(f\"Task {task.etype}/{task.eid}: DB error: {e}\")\nself.elog.log(\"task_processing_error\")\nreturn False, new_tasks\nnew_entity = not ekey_exists\nif new_entity:\n# Run allow_entity_creation hook\nif not self._task_entity_hooks[task.etype].run_allow_creation(task.eid, task):\nself.log.debug(\nf\"Task {task.etype}/{task.eid}: hooks decided not to create new eid record\"\n)\nreturn False, new_tasks\n# Run on_entity_creation hook\nnew_tasks += self._task_entity_hooks[task.etype].run_on_creation(task.eid, task)\n# Insert into database\ntry:\nself.db.insert_datapoints(task.etype, task.eid, task.data_points, new_entity=new_entity)\nself.log.debug(f\"Task {task.etype}/{task.eid}: All changes written to DB\")\nexcept DatabaseError as e:\nself.log.error(f\"Task {task.etype}/{task.eid}: DB error: {e}\")\nself.elog.log(\"task_processing_error\")\nreturn False, new_tasks\n# Run attribute hooks\nfor dp in task.data_points:\nnew_tasks += self._task_attr_hooks[dp.etype, dp.attr].run_on_new(dp.eid, dp)\n# Log the processed task\nself.elog.log(\"task_processed\")\nfor dp in task.data_points:\nif dp.src:\nself.elog_by_src.log(dp.src)\nif new_entity:\nself.elog.log(\"record_created\")\nself.log.debug(f\"Secondary modules created {len(new_tasks)} new tasks.\")\nreturn new_entity, new_tasks\n
"},{"location":"reference/task_processing/task_hooks/","title":"task_hooks","text":""},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks","title":"dp3.task_processing.task_hooks","text":""},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks.TaskGenericHooksContainer","title":"TaskGenericHooksContainer","text":"TaskGenericHooksContainer(log: logging.Logger)\n
Container for generic hooks
Possible hooks:
on_task_start
: receives Task, no return value requirementsdp3/task_processing/task_hooks.py
def __init__(self, log: logging.Logger):\nself.log = log.getChild(\"genericHooks\")\nself._on_start = []\n
"},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks.TaskEntityHooksContainer","title":"TaskEntityHooksContainer","text":"TaskEntityHooksContainer(entity: str, log: logging.Logger)\n
Container for entity hooks
Possible hooks:
allow_entity_creation
: receives eid and Task, may prevent entity record creation (by returning False)on_entity_creation
: receives eid and Task, may return list of DataPointTasksdp3/task_processing/task_hooks.py
def __init__(self, entity: str, log: logging.Logger):\nself.entity = entity\nself.log = log.getChild(f\"entityHooks.{entity}\")\nself._allow_creation = []\nself._on_creation = []\n
"},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks.TaskAttrHooksContainer","title":"TaskAttrHooksContainer","text":"TaskAttrHooksContainer(entity: str, attr: str, attr_type: AttrType, log: logging.Logger)\n
Container for attribute hooks
Possible hooks:
on_new_plain
, on_new_observation
, on_new_ts_chunk
: receives eid and DataPointBase, may return a list of DataPointTasksdp3/task_processing/task_hooks.py
def __init__(self, entity: str, attr: str, attr_type: AttrType, log: logging.Logger):\nself.entity = entity\nself.attr = attr\nself.log = log.getChild(f\"attributeHooks.{entity}.{attr}\")\nif attr_type == AttrType.PLAIN:\nself.on_new_hook_type = \"on_new_plain\"\nelif attr_type == AttrType.OBSERVATIONS:\nself.on_new_hook_type = \"on_new_observation\"\nelif attr_type == AttrType.TIMESERIES:\nself.on_new_hook_type = \"on_new_ts_chunk\"\nelse:\nraise ValueError(f\"Invalid attribute type '{attr_type}'\")\nself._on_new = []\n
"},{"location":"reference/task_processing/task_queue/","title":"task_queue","text":""},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue","title":"dp3.task_processing.task_queue","text":"Functions to work with the main task queue (RabbitMQ)
There are two queues for each worker process: - \"normal\" queue for tasks added by other components, this has a limit of 100 tasks. - \"priority\" one for tasks added by workers themselves, this has no limit since workers mustn't be stopped by waiting for the queue.
These queues are presented as a single one by this wrapper. The TaskQueueReader first looks into the \"priority\" queue and only if there is no task waiting, it reads the normal one.
Tasks are distributed to worker processes (and threads) by hash of the entity which is to be modified. The destination queue is decided by the message source, so each source must know how many worker processes are there.
Exchange and queues must be declared externally!
Related configuration keys and their defaults: (should be part of global DP3 config files)
rabbitmq:\n host: localhost\n port: 5672\n virtual_host: /\n username: guest\n password: guest\n\nworker_processes: 1\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.RobustAMQPConnection","title":"RobustAMQPConnection","text":"RobustAMQPConnection(rabbit_config: dict = None) -> None\n
Common TaskQueue wrapper, handles connection to RabbitMQ server with automatic reconnection. TaskQueueWriter and TaskQueueReader are derived from this.
Parameters:
Name Type Description Defaultrabbit_config
dict
RabbitMQ connection parameters, dict with following keys (all optional): host, port, virtual_host, username, password
None
Source code in dp3/task_processing/task_queue.py
def __init__(self, rabbit_config: dict = None) -> None:\nrabbit_config = {} if rabbit_config is None else rabbit_config\nself.log = logging.getLogger(\"RobustAMQPConnection\")\nself.conn_params = {\n\"hostname\": rabbit_config.get(\"host\", \"localhost\"),\n\"port\": int(rabbit_config.get(\"port\", 5672)),\n\"virtual_host\": rabbit_config.get(\"virtual_host\", \"/\"),\n\"username\": rabbit_config.get(\"username\", \"guest\"),\n\"password\": rabbit_config.get(\"password\", \"guest\"),\n}\nself.connection = None\nself.channel = None\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.RobustAMQPConnection.connect","title":"connect","text":"connect() -> None\n
Create a connection (or reconnect after error).
If connection can't be established, try it again indefinitely.
Source code indp3/task_processing/task_queue.py
def connect(self) -> None:\n\"\"\"Create a connection (or reconnect after error).\n If connection can't be established, try it again indefinitely.\n \"\"\"\nif self.connection:\nself.connection.close()\nattempts = 0\nwhile True:\nattempts += 1\ntry:\nself.connection = amqpstorm.Connection(**self.conn_params)\nself.log.debug(\n\"AMQP connection created, server: \"\n\"'{hostname}:{port}/{virtual_host}'\".format_map(self.conn_params)\n)\nif attempts > 1:\n# This was a repeated attempt, print success message with ERROR level\nself.log.error(\"... it's OK now, we're successfully connected!\")\nself.channel = self.connection.channel()\nself.channel.confirm_deliveries()\nself.channel.basic.qos(PREFETCH_COUNT)\nbreak\nexcept amqpstorm.AMQPError as e:\nsleep_time = RECONNECT_DELAYS[min(attempts, len(RECONNECT_DELAYS)) - 1]\nself.log.error(\nf\"RabbitMQ connection error (will try to reconnect in {sleep_time}s): {e}\"\n)\ntime.sleep(sleep_time)\nexcept KeyboardInterrupt:\nbreak\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter","title":"TaskQueueWriter","text":"TaskQueueWriter(app_name: str, workers: int = 1, rabbit_config: dict = None, exchange: str = None, priority_exchange: str = None, parent_logger: logging.Logger = None) -> None\n
Bases: RobustAMQPConnection
Writes tasks into main Task Queue
Parameters:
Name Type Description Defaultapp_name
str
DP3 application name (used as prefix for RMQ queues and exchanges)
requiredworkers
int
Number of worker processes in the system
1
rabbit_config
dict
RabbitMQ connection parameters, dict with following keys (all optional): host, port, virtual_host, username, password
None
exchange
str
Name of the exchange to write tasks to (default: \"<app-name>-main-task-exchange\"
)
None
priority_exchange
str
Name of the exchange to write priority tasks to (default: \"<app-name>-priority-task-exchange\"
)
None
parent_logger
logging.Logger
Logger to inherit prefix from.
None
Source code in dp3/task_processing/task_queue.py
def __init__(\nself,\napp_name: str,\nworkers: int = 1,\nrabbit_config: dict = None,\nexchange: str = None,\npriority_exchange: str = None,\nparent_logger: logging.Logger = None,\n) -> None:\nrabbit_config = {} if rabbit_config is None else rabbit_config\nassert isinstance(workers, int) and workers >= 1, \"count of workers must be positive number\"\nassert isinstance(exchange, str) or exchange is None, \"exchange argument has to be string!\"\nassert (\nisinstance(priority_exchange, str) or priority_exchange is None\n), \"priority_exchange has to be string\"\nsuper().__init__(rabbit_config)\nif parent_logger is not None:\nself.log = parent_logger.getChild(\"TaskQueueWriter\")\nelse:\nself.log = logging.getLogger(\"TaskQueueWriter\")\nif exchange is None:\nexchange = DEFAULT_EXCHANGE.format(app_name)\nif priority_exchange is None:\npriority_exchange = DEFAULT_PRIORITY_EXCHANGE.format(app_name)\nself.workers = workers\nself.exchange = exchange\nself.exchange_pri = priority_exchange\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter.check","title":"check","text":"check() -> bool\n
Check that needed exchanges are declared, return True or raise RuntimeError.
If needed exchanges are not declared, reconnect and try again. (max 5 times)
Source code indp3/task_processing/task_queue.py
def check(self) -> bool:\n\"\"\"\n Check that needed exchanges are declared, return True or raise RuntimeError.\n If needed exchanges are not declared, reconnect and try again. (max 5 times)\n \"\"\"\nfor attempt, sleep_time in enumerate(RECONNECT_DELAYS):\nif self.check_exchange_existence(self.exchange) and self.check_exchange_existence(\nself.exchange_pri\n):\nreturn True\nself.log.warning(\n\"RabbitMQ exchange configuration doesn't match (attempt %d of %d, retrying in %ds)\",\nattempt + 1,\nlen(RECONNECT_DELAYS),\nsleep_time,\n)\ntime.sleep(sleep_time)\nself.disconnect()\nself.connect()\nif not self.check_exchange_existence(self.exchange):\nraise ExchangeNotDeclared(self.exchange)\nif not self.check_exchange_existence(self.exchange_pri):\nraise ExchangeNotDeclared(self.exchange_pri)\nreturn True\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter.broadcast_task","title":"broadcast_task","text":"broadcast_task(task: Task, priority: bool = False) -> None\n
Broadcast task to all workers
Parameters:
Name Type Description Defaulttask
Task
prepared task
requiredpriority
bool
if true, the task is placed into priority queue (should only be used internally by workers)
False
Source code in dp3/task_processing/task_queue.py
def broadcast_task(self, task: Task, priority: bool = False) -> None:\n\"\"\"\n Broadcast task to all workers\n Args:\n task: prepared task\n priority: if true, the task is placed into priority queue\n (should only be used internally by workers)\n \"\"\"\nif not self.channel:\nself.connect()\nself.log.debug(f\"Received new broadcast task: {task}\")\nbody = task.as_message()\nexchange = self.exchange_pri if priority else self.exchange\nfor routing_key in range(self.workers):\nself._send_message(routing_key, exchange, body)\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter.put_task","title":"put_task","text":"put_task(task: Task, priority: bool = False) -> None\n
Put task (update_request) to the queue of corresponding worker
Parameters:
Name Type Description Defaulttask
Task
prepared task
requiredpriority
bool
if true, the task is placed into priority queue (should only be used internally by workers)
False
Source code in dp3/task_processing/task_queue.py
def put_task(self, task: Task, priority: bool = False) -> None:\n\"\"\"\n Put task (update_request) to the queue of corresponding worker\n Args:\n task: prepared task\n priority: if true, the task is placed into priority queue\n (should only be used internally by workers)\n \"\"\"\nif not self.channel:\nself.connect()\nself.log.debug(f\"Received new task: {task}\")\n# Prepare routing key\nbody = task.as_message()\nkey = task.routing_key()\nrouting_key = HASH(key) % self.workers # index of the worker to send the task to\nexchange = self.exchange_pri if priority else self.exchange\nself._send_message(routing_key, exchange, body)\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader","title":"TaskQueueReader","text":"TaskQueueReader(callback: Callable, parse_task: Callable[[str], Task], app_name: str, worker_index: int = 0, rabbit_config: dict = None, queue: str = None, priority_queue: str = None, parent_logger: logging.Logger = None) -> None\n
Bases: RobustAMQPConnection
TaskQueueReader consumes messages from two RabbitMQ queues (normal and priority one for given worker) and passes them to the given callback function.
Tasks from the priority queue are passed before the normal ones.
Each received message must be acknowledged by calling .ack(msg_tag)
.
Parameters:
Name Type Description Defaultcallback
Callable
Function called when a message is received, prototype: func(tag, Task)
requiredparse_task
Callable[[str], Task]
Function called to parse message body into a task, prototype: func(body) -> Task
requiredapp_name
str
DP3 application name (used as prefix for RMQ queues and exchanges)
requiredworker_index
int
index of this worker (filled into DEFAULT_QUEUE string using .format() method)
0
rabbit_config
dict
RabbitMQ connection parameters, dict with following keys (all optional): host, port, virtual_host, username, password
None
queue
str
Name of RabbitMQ queue to read from (default: \"<app-name>-worker-<index>\"
)
None
priority_queue
str
Name of RabbitMQ queue to read from (priority messages) (default: \"<app-name>-worker-<index>-pri\"
)
None
parent_logger
logging.Logger
Logger to inherit prefix from.
None
Source code in dp3/task_processing/task_queue.py
def __init__(\nself,\ncallback: Callable,\nparse_task: Callable[[str], Task],\napp_name: str,\nworker_index: int = 0,\nrabbit_config: dict = None,\nqueue: str = None,\npriority_queue: str = None,\nparent_logger: logging.Logger = None,\n) -> None:\nrabbit_config = {} if rabbit_config is None else rabbit_config\nassert callable(callback), \"callback must be callable object\"\nassert (\nisinstance(worker_index, int) and worker_index >= 0\n), \"worker_index must be positive number\"\nassert isinstance(queue, str) or queue is None, \"queue must be string\"\nassert (\nisinstance(priority_queue, str) or priority_queue is None\n), \"priority_queue must be string\"\nsuper().__init__(rabbit_config)\nif parent_logger is not None:\nself.log = parent_logger.getChild(\"TaskQueueReader\")\nelse:\nself.log = logging.getLogger(\"TaskQueueReader\")\nself.callback = callback\nself.parse_task = parse_task\nif queue is None:\nqueue = DEFAULT_QUEUE.format(app_name, worker_index)\nif priority_queue is None:\npriority_queue = DEFAULT_PRIORITY_QUEUE.format(app_name, worker_index)\nself.queue_name = queue\nself.priority_queue_name = priority_queue\nself.running = False\nself._consuming_thread = None\nself._processing_thread = None\n# Receive messages into 2 temporary queues\n# (max length should be equal to prefetch_count set in RabbitMQReader)\nself.cache = collections.deque()\nself.cache_pri = collections.deque()\nself.cache_full = threading.Event() # signalize there's something in the cache\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.start","title":"start","text":"start() -> None\n
Start receiving tasks.
Source code indp3/task_processing/task_queue.py
def start(self) -> None:\n\"\"\"Start receiving tasks.\"\"\"\nif self.running:\nraise RuntimeError(\"Already running\")\nif not self.connection:\nself.connect()\nself.log.info(\"Starting TaskQueueReader\")\n# Start thread for message consuming from server\nself._consuming_thread = threading.Thread(None, self._consuming_thread_func)\nself._consuming_thread.start()\n# Start thread for message processing and passing to user's callback\nself.running = True\nself._processing_thread = threading.Thread(None, self._msg_processing_thread_func)\nself._processing_thread.start()\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.stop","title":"stop","text":"stop() -> None\n
Stop receiving tasks.
Source code indp3/task_processing/task_queue.py
def stop(self) -> None:\n\"\"\"Stop receiving tasks.\"\"\"\nif not self.running:\nraise RuntimeError(\"Not running\")\nself._stop_consuming_thread()\nself._stop_processing_thread()\nself.log.info(\"TaskQueueReader stopped\")\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.check","title":"check","text":"check() -> bool\n
Check that needed queues are declared, return True or raise RuntimeError.
If needed queues are not declared, reconnect and try again. (max 5 times)
Source code indp3/task_processing/task_queue.py
def check(self) -> bool:\n\"\"\"\n Check that needed queues are declared, return True or raise RuntimeError.\n If needed queues are not declared, reconnect and try again. (max 5 times)\n \"\"\"\nfor attempt, sleep_time in enumerate(RECONNECT_DELAYS):\nif self.check_queue_existence(self.queue_name) and self.check_queue_existence(\nself.priority_queue_name\n):\nreturn True\nself.log.warning(\n\"RabbitMQ queue configuration doesn't match (attempt %d of %d, retrying in %ds)\",\nattempt + 1,\nlen(RECONNECT_DELAYS),\nsleep_time,\n)\ntime.sleep(sleep_time)\nself.disconnect()\nself.connect()\nif not self.check_queue_existence(self.queue_name):\nraise QueueNotDeclared(self.queue_name)\nif not self.check_queue_existence(self.priority_queue_name):\nraise QueueNotDeclared(self.priority_queue_name)\nreturn True\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.ack","title":"ack","text":"ack(msg_tag: Any)\n
Acknowledge processing of the message/task
Parameters:
Name Type Description Defaultmsg_tag
Any
Message tag received as the first param of the callback function.
required Source code indp3/task_processing/task_queue.py
def ack(self, msg_tag: Any):\n\"\"\"Acknowledge processing of the message/task\n Args:\n msg_tag: Message tag received as the first param of the callback function.\n \"\"\"\nself.channel.basic.ack(delivery_tag=msg_tag)\n
"},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.HASH","title":"HASH","text":"HASH(key: str) -> int\n
Hash function used to distribute tasks to worker processes.
Parameters:
Name Type Description Defaultkey
str
to be hashed
requiredReturns:
Type Descriptionint
last 4 bytes of MD5
Source code indp3/task_processing/task_queue.py
def HASH(key: str) -> int:\n\"\"\"Hash function used to distribute tasks to worker processes.\n Args:\n key: to be hashed\n Returns:\n last 4 bytes of MD5\n \"\"\"\nreturn int(hashlib.md5(key.encode(\"utf8\")).hexdigest()[-4:], 16)\n
"}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
new file mode 100644
index 00000000..741ec82d
--- /dev/null
+++ b/sitemap.xml
@@ -0,0 +1,303 @@
+
+