diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..5115c37f --- /dev/null +++ b/404.html @@ -0,0 +1,1479 @@ + + + + + + + + + + + + + + + + + + + + DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ +

404 - Not found

+ +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/api/index.html b/api/index.html new file mode 100644 index 00000000..914c37d4 --- /dev/null +++ b/api/index.html @@ -0,0 +1,2395 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + API - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

API

+

DP³'s has HTTP API which you can use to post datapoints and to read data stored in DP³. +As the API is made using FastAPI, there is also an interactive documentation available at /docs endpoint.

+

There are several API endpoints:

+ +
+

Index

+

Health check.

+

Request

+

GET /

+

Response

+

200 OK:

+

{ + "detail": "It works!" +}

+
+

Insert datapoints

+

Request

+

POST /datapoints

+

All data are written to DP³ in the form of datapoints. A datapoint sets a value of a given attribute of given entity.

+

It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).

+

Payload to this endpoint is JSON array of datapoints. For example:

+
[
+   { DATAPOINT1 },
+   { DATAPOINT2 }
+]
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
KeyDescriptionData-typeRequired?PlainObservationsTimeseries
typeEntity typestringmandatory✔✔✔
idEntity identificationstringmandatory✔✔✔
attrAttribute namestringmandatory✔✔✔
vThe value to set, depends on attr. type and data-type, see below--mandatory✔✔✔
t1Start time of the observation intervalstring (RFC 3339 format)mandatory--✔✔
t2End time of the observation intervalstring (RFC 3339 format)optional, default=t1--✔✔
cConfidencefloat (0.0-1.0)optional, default=1.0--✔✔
srcIdentification of the information sourcestringoptional, default=""✔✔✔
+

More details depends on the particular type of the attribute.

+

Examples of datapoints

+
Plain
+
{
+  "type": "ip",
+  "id": "192.168.0.1",
+  "attr": "note",
+  "v": "My home router",
+  "src": "web_gui"
+}
+
+
Observations
+
{
+  "type": "ip",
+  "id": "192.168.0.1",
+  "attr": "open_ports",
+  "v": [22, 80, 443],
+  "t1": "2022-08-01T12:00:00",
+  "t2": "2022-08-01T12:10:00",
+  "src": "open_ports_module"
+}
+
+
Timeseries
+

regular:

+
{
+  ...
+  "t1": "2022-08-01T12:00:00",
+  "t2": "2022-08-01T12:20:00", // assuming time_step = 5 min
+  "v": {
+    "a": [1, 3, 0, 2]
+  }
+}
+
+

irregular: timestamps must always be present

+
{
+  ...
+  "t1": "2022-08-01T12:00:00",
+  "t2": "2022-08-01T12:05:00",
+  "v": {
+    "time": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
+    "x": [0.5, 0.8, 1.2, 0.7],
+    "y": [-1, 3, 0, 0]
+  }
+}
+
+

irregular_interval:

+
{
+  ...
+  "t1": "2022-08-01T12:00:00",
+  "t2": "2022-08-01T12:05:00",
+  "v": {
+    "time_first": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
+    "time_last": ["2022-08-01T12:01:00", "2022-08-01T12:01:15", "2022-08-01T12:03:00", "2022-08-01T12:03:40"],
+    "x": [0.5, 0.8, 1.2, 0.7],
+    "y": [-1, 3, 0, 0]
+  }
+}
+
+
Relations
+

Can be represented using both plain attributes and observations. The difference will be only +in time specification. Two examples using observations:

+

no data - link<mac>: just the eid is sent

+
{
+  "type": "ip",
+  "id": "192.168.0.1",
+  "attr": "mac_addrs",
+  "v": "AA:AA:AA:AA:AA",
+  "t1": "2022-08-01T12:00:00",
+  "t2": "2022-08-01T12:10:00"
+}
+
+

with additional data - link<ip, int>: The eid and the data are sent as a dictionary.

+
{
+  "type": "ip",
+  "id": "192.168.0.1",
+  "attr": "ip_dep",
+  "v": {"eid": "192.168.0.2", "data": 22},
+  "t1": "2022-08-01T12:00:00",
+  "t2": "2022-08-01T12:10:00"
+}
+
+

Response

+

200 OK:

+
Success
+
+

400 Bad request:

+

Returns some validation error message, for example:

+
1 validation error for DataPointObservations_some_field
+v -> some_embedded_dict_field
+  field required (type=value_error.missing)
+
+
+

List entities

+

List latest snapshots of all ids present in database under entity.

+

Contains only latest snapshot.

+

Uses pagination.

+

Request

+

GET /entity/<entity_type>

+

Optional query parameters:

+
    +
  • skip: how many entities to skip (default: 0)
  • +
  • limit: how many entities to return (default: 20)
  • +
+

Response

+
{
+  "time_created": "2023-07-04T12:10:38.827Z",
+  "data": [
+    {}
+  ]
+}
+
+
+

Get Eid data

+

Get data of entity's eid.

+

Contains all snapshots and master record. Snapshots are ordered by ascending creation time.

+

Request

+

GET /entity/<entity_type>/<entity_id>

+

Optional query parameters:

+
    +
  • date_from: date-time string
  • +
  • date_to: date-time string
  • +
+

Response

+
{
+  "empty": true,
+  "master_record": {},
+  "snapshots": [
+    {}
+  ]
+}
+
+
+

Get attr value

+

Get attribute value

+

Value is either of:

+
    +
  • current value: in case of plain attribute
  • +
  • current value and history: in case of observation attribute
  • +
  • history: in case of timeseries attribute
  • +
+

Request

+

GET /entity/<entity_type>/<entity_id>/get/<attr_id>

+

Optional query parameters:

+
    +
  • date_from: date-time string
  • +
  • date_to: date-time string
  • +
+

Response

+
{
+  "attr_type": 1,
+  "current_value": "string",
+  "history": []
+}
+
+
+

Set attr value

+

Set current value of attribute

+

Internally just creates datapoint for specified attribute and value.

+

This endpoint is meant for editable plain attributes -- for direct user edit on DP3 web UI.

+

Request

+

POST /entity/<entity_type>/<entity_id>/set/<attr_id>

+

Required request body:

+
{
+  "value": "string"
+}
+
+

Response

+
{
+  "detail": "OK"
+}
+
+
+

Entities

+

List entities

+

Returns dictionary containing all entities configured -- their simplified configuration and current state information.

+

Request

+

GET /entities

+

Response

+
{
+  "<entity_id>": {
+    "id": "<entity_id>",
+    "name": "<entity_spec.name>",
+    "attribs": "<MODEL_SPEC.attribs(e_id)>",
+    "eid_estimate_count": "<DB.estimate_count_eids(e_id)>"
+  },
+  ...
+}
+
+
+

Control

+

Execute Action - Sends the given action into execution queue.

+

You can see the enabled actions in /config/control.yml, available are:

+
    +
  • make_snapshots - Makes an out-of-order snapshot of all entities
  • +
+

Request

+

GET /control/<action>

+

Response

+
{
+  "detail": "OK"
+}
+
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/architecture/index.html b/architecture/index.html new file mode 100644 index 00000000..b0330d8e --- /dev/null +++ b/architecture/index.html @@ -0,0 +1,1679 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Architecture - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Architecture

+

DP³ is generic platform for data processing. +It's currently used in systems for management of network devices in CESNET, +but during development we focused on making DP³ as universal as possible.

+

This page describes the high-level architecture of DP³ and the individual components.

+

Data-points

+

The base unit of data that DP³ uses is called a data-point, which looks like this:

+
{
+  "type": "ip", // (1)!
+  "id": "192.168.0.1", // (2)!
+  "attr": "open_ports", // (3)!
+  "v": [22, 80, 443], // (4)!
+  "t1": "2022-08-01T12:00:00", // (5)!
+  "t2": "2022-08-01T12:10:00",
+  "src": "open_ports_module" // (6)!
+}
+
+
    +
  1. A data-point's value belongs to a specific (user-defined) entity type, declared by the type.
  2. +
  3. The exact entity is using its entity id in id.
  4. +
  5. Each entity has multiple defined attributes, the attr field specifies the attribute of the data-point.
  6. +
  7. Finally, the data-point's value is sent in the v field.
  8. +
  9. Data-point validity interval is defined using the t1 and t2 field.
  10. +
  11. To easily determine the data source of this data-point, you can optionally provide an identifier using the src field.
  12. +
+

This example shows an example of an observations data-point (given it has a validity interval), +to learn more about the different types of data-points, please see the API documentation.

+

Platform Architecture

+
+

DP3 architecture +

+
DP³ architecture
+
+

The DP³ architecture as shown in the figure above consists of several components, +where the DP³ provided components are shown in blue:

+
    +
  • The HTTP API (built with Fast API) validates incoming data-points and sends them + for processing to the task distribution queues. + It also provides access to the database for web or scripts.
  • +
  • The task distribution is done using RabbitMQ queues, which distribute tasks between workers.
  • +
  • The main code of the platform runs in parallel worker processes. + In the worker processes is a processing core, + which performs all updates and communicates with core modules and + application-specific secondary modules when appropriate.
  • +
  • Both the HTTP API and worker processes use the database API to access the entity database, + currently implemented in MongoDB.
  • +
+

The application-specific components, shown in yellow-orange, are as following:

+
    +
  • The entity configuration via yml files determines the entities and their attributes, + together with the specifics of platform behavior on these entities. + For details of entity configuration, please see the database entities configuration page.
  • +
  • +

    The distinction between primary and secondary modules is such that primary modules + send data-points into the system using the HTTP API, while secondary modules react + to the data present in the system, e.g.: altering the data-flow in an application-specific manner, + deriving additional data based on incoming data-points or performing data correlation on entity snapshots. + For primary module implementation, the API documentation may be useful, + also feel free to check out the dummy_sender script in /scripts/dummy_sender.py. + A comprehensive secondary module API documentation is under construction, for the time being, + refer to the CallbackRegistrar code reference or + check out the test modules in /modules/ or /tests/modules/.

    +
  • +
  • +

    The final remaining component is the web interface, which is ultimately application-specific. + A generic web interface, or a set of generic components is a planned part of DP³, but is yet to be implemented. + The API provides a variety of endpoints which should enable you to create any view of the data you may require.

    +
  • +
+

Data flow

+

This section describes the data flow within the platform.

+
+

DP3 Data flow +

+
DP³ Data flow
+
+

The above figure shows a zoomed in view of the worker-process from the architecture figure. +Incoming Tasks, which carry data-points from the API, +are passed to secondary module callbacks configured on new data point, or around entity creation. +These modules may create additional data points or perform any other action. +When all registered callbacks are processed, the resulting data is written to two collections: +The data-point (DP) history collection, where the raw data-points are stored until archivation, +and the profile history collection, where a document is stored for each entity id with the relevant history. +You can find these collections in the database under the names {entity}#raw and {entity}#master.

+

DP³ periodically creates new profile snapshots, triggered by the Scheduler. +Snapshots take the profile history, and compute the current value of the profile, +reducing each attribute history to a single value. +The snapshot creation frequency is configurable. +Snapshots are created on a per-entity basis, but all linked entities are processed at the same time. +This means that when snapshots are created, the registered snapshot callbacks +can access any linked entities for their data correlation needs. +After all the correlation callbacks are called, the snapshot is written to the profile snapshot collection, +for which it can be accessed via the API. The collection is accessible under the name {entity}#snapshots.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/assets/_mkdocstrings.css b/assets/_mkdocstrings.css new file mode 100644 index 00000000..049a254b --- /dev/null +++ b/assets/_mkdocstrings.css @@ -0,0 +1,64 @@ + +/* Avoid breaking parameter names, etc. in table cells. */ +.doc-contents td code { + word-break: normal !important; +} + +/* No line break before first paragraph of descriptions. */ +.doc-md-description, +.doc-md-description>p:first-child { + display: inline; +} + +/* Max width for docstring sections tables. */ +.doc .md-typeset__table, +.doc .md-typeset__table table { + display: table !important; + width: 100%; +} + +.doc .md-typeset__table tr { + display: table-row; +} + +/* Defaults in Spacy table style. */ +.doc-param-default { + float: right; +} + +/* Keep headings consistent. */ +h1.doc-heading, +h2.doc-heading, +h3.doc-heading, +h4.doc-heading, +h5.doc-heading, +h6.doc-heading { + font-weight: 400; + line-height: 1.5; + color: inherit; + text-transform: none; +} + +h1.doc-heading { + font-size: 1.6rem; +} + +h2.doc-heading { + font-size: 1.2rem; +} + +h3.doc-heading { + font-size: 1.15rem; +} + +h4.doc-heading { + font-size: 1.10rem; +} + +h5.doc-heading { + font-size: 1.05rem; +} + +h6.doc-heading { + font-size: 1rem; +} \ No newline at end of file diff --git a/assets/images/favicon.png b/assets/images/favicon.png new file mode 100644 index 00000000..1cf13b9f Binary files /dev/null and b/assets/images/favicon.png differ diff --git a/assets/javascripts/bundle.220ee61c.min.js b/assets/javascripts/bundle.220ee61c.min.js new file mode 100644 index 00000000..116072a1 --- /dev/null +++ b/assets/javascripts/bundle.220ee61c.min.js @@ -0,0 +1,29 @@ +"use strict";(()=>{var Ci=Object.create;var gr=Object.defineProperty;var Ri=Object.getOwnPropertyDescriptor;var ki=Object.getOwnPropertyNames,Ht=Object.getOwnPropertySymbols,Hi=Object.getPrototypeOf,yr=Object.prototype.hasOwnProperty,nn=Object.prototype.propertyIsEnumerable;var rn=(e,t,r)=>t in e?gr(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,P=(e,t)=>{for(var r in t||(t={}))yr.call(t,r)&&rn(e,r,t[r]);if(Ht)for(var r of Ht(t))nn.call(t,r)&&rn(e,r,t[r]);return e};var on=(e,t)=>{var r={};for(var n in e)yr.call(e,n)&&t.indexOf(n)<0&&(r[n]=e[n]);if(e!=null&&Ht)for(var n of Ht(e))t.indexOf(n)<0&&nn.call(e,n)&&(r[n]=e[n]);return r};var Pt=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Pi=(e,t,r,n)=>{if(t&&typeof t=="object"||typeof t=="function")for(let o of ki(t))!yr.call(e,o)&&o!==r&&gr(e,o,{get:()=>t[o],enumerable:!(n=Ri(t,o))||n.enumerable});return e};var yt=(e,t,r)=>(r=e!=null?Ci(Hi(e)):{},Pi(t||!e||!e.__esModule?gr(r,"default",{value:e,enumerable:!0}):r,e));var sn=Pt((xr,an)=>{(function(e,t){typeof xr=="object"&&typeof an!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(xr,function(){"use strict";function e(r){var n=!0,o=!1,i=null,s={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function a(O){return!!(O&&O!==document&&O.nodeName!=="HTML"&&O.nodeName!=="BODY"&&"classList"in O&&"contains"in O.classList)}function f(O){var Qe=O.type,De=O.tagName;return!!(De==="INPUT"&&s[Qe]&&!O.readOnly||De==="TEXTAREA"&&!O.readOnly||O.isContentEditable)}function c(O){O.classList.contains("focus-visible")||(O.classList.add("focus-visible"),O.setAttribute("data-focus-visible-added",""))}function u(O){O.hasAttribute("data-focus-visible-added")&&(O.classList.remove("focus-visible"),O.removeAttribute("data-focus-visible-added"))}function p(O){O.metaKey||O.altKey||O.ctrlKey||(a(r.activeElement)&&c(r.activeElement),n=!0)}function m(O){n=!1}function d(O){a(O.target)&&(n||f(O.target))&&c(O.target)}function h(O){a(O.target)&&(O.target.classList.contains("focus-visible")||O.target.hasAttribute("data-focus-visible-added"))&&(o=!0,window.clearTimeout(i),i=window.setTimeout(function(){o=!1},100),u(O.target))}function v(O){document.visibilityState==="hidden"&&(o&&(n=!0),Y())}function Y(){document.addEventListener("mousemove",N),document.addEventListener("mousedown",N),document.addEventListener("mouseup",N),document.addEventListener("pointermove",N),document.addEventListener("pointerdown",N),document.addEventListener("pointerup",N),document.addEventListener("touchmove",N),document.addEventListener("touchstart",N),document.addEventListener("touchend",N)}function B(){document.removeEventListener("mousemove",N),document.removeEventListener("mousedown",N),document.removeEventListener("mouseup",N),document.removeEventListener("pointermove",N),document.removeEventListener("pointerdown",N),document.removeEventListener("pointerup",N),document.removeEventListener("touchmove",N),document.removeEventListener("touchstart",N),document.removeEventListener("touchend",N)}function N(O){O.target.nodeName&&O.target.nodeName.toLowerCase()==="html"||(n=!1,B())}document.addEventListener("keydown",p,!0),document.addEventListener("mousedown",m,!0),document.addEventListener("pointerdown",m,!0),document.addEventListener("touchstart",m,!0),document.addEventListener("visibilitychange",v,!0),Y(),r.addEventListener("focus",d,!0),r.addEventListener("blur",h,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var cn=Pt(Er=>{(function(e){var t=function(){try{return!!Symbol.iterator}catch(c){return!1}},r=t(),n=function(c){var u={next:function(){var p=c.shift();return{done:p===void 0,value:p}}};return r&&(u[Symbol.iterator]=function(){return u}),u},o=function(c){return encodeURIComponent(c).replace(/%20/g,"+")},i=function(c){return decodeURIComponent(String(c).replace(/\+/g," "))},s=function(){var c=function(p){Object.defineProperty(this,"_entries",{writable:!0,value:{}});var m=typeof p;if(m!=="undefined")if(m==="string")p!==""&&this._fromString(p);else if(p instanceof c){var d=this;p.forEach(function(B,N){d.append(N,B)})}else if(p!==null&&m==="object")if(Object.prototype.toString.call(p)==="[object Array]")for(var h=0;hd[0]?1:0}),c._entries&&(c._entries={});for(var p=0;p1?i(d[1]):"")}})})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Er);(function(e){var t=function(){try{var o=new e.URL("b","http://a");return o.pathname="c d",o.href==="http://a/c%20d"&&o.searchParams}catch(i){return!1}},r=function(){var o=e.URL,i=function(f,c){typeof f!="string"&&(f=String(f)),c&&typeof c!="string"&&(c=String(c));var u=document,p;if(c&&(e.location===void 0||c!==e.location.href)){c=c.toLowerCase(),u=document.implementation.createHTMLDocument(""),p=u.createElement("base"),p.href=c,u.head.appendChild(p);try{if(p.href.indexOf(c)!==0)throw new Error(p.href)}catch(O){throw new Error("URL unable to set base "+c+" due to "+O)}}var m=u.createElement("a");m.href=f,p&&(u.body.appendChild(m),m.href=m.href);var d=u.createElement("input");if(d.type="url",d.value=f,m.protocol===":"||!/:/.test(m.href)||!d.checkValidity()&&!c)throw new TypeError("Invalid URL");Object.defineProperty(this,"_anchorElement",{value:m});var h=new e.URLSearchParams(this.search),v=!0,Y=!0,B=this;["append","delete","set"].forEach(function(O){var Qe=h[O];h[O]=function(){Qe.apply(h,arguments),v&&(Y=!1,B.search=h.toString(),Y=!0)}}),Object.defineProperty(this,"searchParams",{value:h,enumerable:!0});var N=void 0;Object.defineProperty(this,"_updateSearchParams",{enumerable:!1,configurable:!1,writable:!1,value:function(){this.search!==N&&(N=this.search,Y&&(v=!1,this.searchParams._fromString(this.search),v=!0))}})},s=i.prototype,a=function(f){Object.defineProperty(s,f,{get:function(){return this._anchorElement[f]},set:function(c){this._anchorElement[f]=c},enumerable:!0})};["hash","host","hostname","port","protocol"].forEach(function(f){a(f)}),Object.defineProperty(s,"search",{get:function(){return this._anchorElement.search},set:function(f){this._anchorElement.search=f,this._updateSearchParams()},enumerable:!0}),Object.defineProperties(s,{toString:{get:function(){var f=this;return function(){return f.href}}},href:{get:function(){return this._anchorElement.href.replace(/\?$/,"")},set:function(f){this._anchorElement.href=f,this._updateSearchParams()},enumerable:!0},pathname:{get:function(){return this._anchorElement.pathname.replace(/(^\/?)/,"/")},set:function(f){this._anchorElement.pathname=f},enumerable:!0},origin:{get:function(){var f={"http:":80,"https:":443,"ftp:":21}[this._anchorElement.protocol],c=this._anchorElement.port!=f&&this._anchorElement.port!=="";return this._anchorElement.protocol+"//"+this._anchorElement.hostname+(c?":"+this._anchorElement.port:"")},enumerable:!0},password:{get:function(){return""},set:function(f){},enumerable:!0},username:{get:function(){return""},set:function(f){},enumerable:!0}}),i.createObjectURL=function(f){return o.createObjectURL.apply(o,arguments)},i.revokeObjectURL=function(f){return o.revokeObjectURL.apply(o,arguments)},e.URL=i};if(t()||r(),e.location!==void 0&&!("origin"in e.location)){var n=function(){return e.location.protocol+"//"+e.location.hostname+(e.location.port?":"+e.location.port:"")};try{Object.defineProperty(e.location,"origin",{get:n,enumerable:!0})}catch(o){setInterval(function(){e.location.origin=n()},100)}}})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Er)});var qr=Pt((Mt,Nr)=>{/*! + * clipboard.js v2.0.11 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */(function(t,r){typeof Mt=="object"&&typeof Nr=="object"?Nr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof Mt=="object"?Mt.ClipboardJS=r():t.ClipboardJS=r()})(Mt,function(){return function(){var e={686:function(n,o,i){"use strict";i.d(o,{default:function(){return Ai}});var s=i(279),a=i.n(s),f=i(370),c=i.n(f),u=i(817),p=i.n(u);function m(j){try{return document.execCommand(j)}catch(T){return!1}}var d=function(T){var E=p()(T);return m("cut"),E},h=d;function v(j){var T=document.documentElement.getAttribute("dir")==="rtl",E=document.createElement("textarea");E.style.fontSize="12pt",E.style.border="0",E.style.padding="0",E.style.margin="0",E.style.position="absolute",E.style[T?"right":"left"]="-9999px";var H=window.pageYOffset||document.documentElement.scrollTop;return E.style.top="".concat(H,"px"),E.setAttribute("readonly",""),E.value=j,E}var Y=function(T,E){var H=v(T);E.container.appendChild(H);var I=p()(H);return m("copy"),H.remove(),I},B=function(T){var E=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},H="";return typeof T=="string"?H=Y(T,E):T instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(T==null?void 0:T.type)?H=Y(T.value,E):(H=p()(T),m("copy")),H},N=B;function O(j){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?O=function(E){return typeof E}:O=function(E){return E&&typeof Symbol=="function"&&E.constructor===Symbol&&E!==Symbol.prototype?"symbol":typeof E},O(j)}var Qe=function(){var T=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},E=T.action,H=E===void 0?"copy":E,I=T.container,q=T.target,Me=T.text;if(H!=="copy"&&H!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(q!==void 0)if(q&&O(q)==="object"&&q.nodeType===1){if(H==="copy"&&q.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(H==="cut"&&(q.hasAttribute("readonly")||q.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(Me)return N(Me,{container:I});if(q)return H==="cut"?h(q):N(q,{container:I})},De=Qe;function $e(j){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?$e=function(E){return typeof E}:$e=function(E){return E&&typeof Symbol=="function"&&E.constructor===Symbol&&E!==Symbol.prototype?"symbol":typeof E},$e(j)}function Ei(j,T){if(!(j instanceof T))throw new TypeError("Cannot call a class as a function")}function tn(j,T){for(var E=0;E0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof I.action=="function"?I.action:this.defaultAction,this.target=typeof I.target=="function"?I.target:this.defaultTarget,this.text=typeof I.text=="function"?I.text:this.defaultText,this.container=$e(I.container)==="object"?I.container:document.body}},{key:"listenClick",value:function(I){var q=this;this.listener=c()(I,"click",function(Me){return q.onClick(Me)})}},{key:"onClick",value:function(I){var q=I.delegateTarget||I.currentTarget,Me=this.action(q)||"copy",kt=De({action:Me,container:this.container,target:this.target(q),text:this.text(q)});this.emit(kt?"success":"error",{action:Me,text:kt,trigger:q,clearSelection:function(){q&&q.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(I){return vr("action",I)}},{key:"defaultTarget",value:function(I){var q=vr("target",I);if(q)return document.querySelector(q)}},{key:"defaultText",value:function(I){return vr("text",I)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(I){var q=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return N(I,q)}},{key:"cut",value:function(I){return h(I)}},{key:"isSupported",value:function(){var I=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],q=typeof I=="string"?[I]:I,Me=!!document.queryCommandSupported;return q.forEach(function(kt){Me=Me&&!!document.queryCommandSupported(kt)}),Me}}]),E}(a()),Ai=Li},828:function(n){var o=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function s(a,f){for(;a&&a.nodeType!==o;){if(typeof a.matches=="function"&&a.matches(f))return a;a=a.parentNode}}n.exports=s},438:function(n,o,i){var s=i(828);function a(u,p,m,d,h){var v=c.apply(this,arguments);return u.addEventListener(m,v,h),{destroy:function(){u.removeEventListener(m,v,h)}}}function f(u,p,m,d,h){return typeof u.addEventListener=="function"?a.apply(null,arguments):typeof m=="function"?a.bind(null,document).apply(null,arguments):(typeof u=="string"&&(u=document.querySelectorAll(u)),Array.prototype.map.call(u,function(v){return a(v,p,m,d,h)}))}function c(u,p,m,d){return function(h){h.delegateTarget=s(h.target,p),h.delegateTarget&&d.call(u,h)}}n.exports=f},879:function(n,o){o.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},o.nodeList=function(i){var s=Object.prototype.toString.call(i);return i!==void 0&&(s==="[object NodeList]"||s==="[object HTMLCollection]")&&"length"in i&&(i.length===0||o.node(i[0]))},o.string=function(i){return typeof i=="string"||i instanceof String},o.fn=function(i){var s=Object.prototype.toString.call(i);return s==="[object Function]"}},370:function(n,o,i){var s=i(879),a=i(438);function f(m,d,h){if(!m&&!d&&!h)throw new Error("Missing required arguments");if(!s.string(d))throw new TypeError("Second argument must be a String");if(!s.fn(h))throw new TypeError("Third argument must be a Function");if(s.node(m))return c(m,d,h);if(s.nodeList(m))return u(m,d,h);if(s.string(m))return p(m,d,h);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function c(m,d,h){return m.addEventListener(d,h),{destroy:function(){m.removeEventListener(d,h)}}}function u(m,d,h){return Array.prototype.forEach.call(m,function(v){v.addEventListener(d,h)}),{destroy:function(){Array.prototype.forEach.call(m,function(v){v.removeEventListener(d,h)})}}}function p(m,d,h){return a(document.body,m,d,h)}n.exports=f},817:function(n){function o(i){var s;if(i.nodeName==="SELECT")i.focus(),s=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var a=i.hasAttribute("readonly");a||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),a||i.removeAttribute("readonly"),s=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var f=window.getSelection(),c=document.createRange();c.selectNodeContents(i),f.removeAllRanges(),f.addRange(c),s=f.toString()}return s}n.exports=o},279:function(n){function o(){}o.prototype={on:function(i,s,a){var f=this.e||(this.e={});return(f[i]||(f[i]=[])).push({fn:s,ctx:a}),this},once:function(i,s,a){var f=this;function c(){f.off(i,c),s.apply(a,arguments)}return c._=s,this.on(i,c,a)},emit:function(i){var s=[].slice.call(arguments,1),a=((this.e||(this.e={}))[i]||[]).slice(),f=0,c=a.length;for(f;f{"use strict";/*! + * escape-html + * Copyright(c) 2012-2013 TJ Holowaychuk + * Copyright(c) 2015 Andreas Lubbe + * Copyright(c) 2015 Tiancheng "Timothy" Gu + * MIT Licensed + */var rs=/["'&<>]/;Yo.exports=ns;function ns(e){var t=""+e,r=rs.exec(t);if(!r)return t;var n,o="",i=0,s=0;for(i=r.index;i0&&i[i.length-1])&&(c[0]===6||c[0]===2)){r=0;continue}if(c[0]===3&&(!i||c[1]>i[0]&&c[1]=e.length&&(e=void 0),{value:e&&e[n++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function W(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var n=r.call(e),o,i=[],s;try{for(;(t===void 0||t-- >0)&&!(o=n.next()).done;)i.push(o.value)}catch(a){s={error:a}}finally{try{o&&!o.done&&(r=n.return)&&r.call(n)}finally{if(s)throw s.error}}return i}function D(e,t,r){if(r||arguments.length===2)for(var n=0,o=t.length,i;n1||a(m,d)})})}function a(m,d){try{f(n[m](d))}catch(h){p(i[0][3],h)}}function f(m){m.value instanceof et?Promise.resolve(m.value.v).then(c,u):p(i[0][2],m)}function c(m){a("next",m)}function u(m){a("throw",m)}function p(m,d){m(d),i.shift(),i.length&&a(i[0][0],i[0][1])}}function pn(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof Ee=="function"?Ee(e):e[Symbol.iterator](),r={},n("next"),n("throw"),n("return"),r[Symbol.asyncIterator]=function(){return this},r);function n(i){r[i]=e[i]&&function(s){return new Promise(function(a,f){s=e[i](s),o(a,f,s.done,s.value)})}}function o(i,s,a,f){Promise.resolve(f).then(function(c){i({value:c,done:a})},s)}}function C(e){return typeof e=="function"}function at(e){var t=function(n){Error.call(n),n.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var It=at(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +`+r.map(function(n,o){return o+1+") "+n.toString()}).join(` + `):"",this.name="UnsubscriptionError",this.errors=r}});function Ve(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Ie=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,n,o,i;if(!this.closed){this.closed=!0;var s=this._parentage;if(s)if(this._parentage=null,Array.isArray(s))try{for(var a=Ee(s),f=a.next();!f.done;f=a.next()){var c=f.value;c.remove(this)}}catch(v){t={error:v}}finally{try{f&&!f.done&&(r=a.return)&&r.call(a)}finally{if(t)throw t.error}}else s.remove(this);var u=this.initialTeardown;if(C(u))try{u()}catch(v){i=v instanceof It?v.errors:[v]}var p=this._finalizers;if(p){this._finalizers=null;try{for(var m=Ee(p),d=m.next();!d.done;d=m.next()){var h=d.value;try{ln(h)}catch(v){i=i!=null?i:[],v instanceof It?i=D(D([],W(i)),W(v.errors)):i.push(v)}}}catch(v){n={error:v}}finally{try{d&&!d.done&&(o=m.return)&&o.call(m)}finally{if(n)throw n.error}}}if(i)throw new It(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)ln(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Ve(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Ve(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Sr=Ie.EMPTY;function jt(e){return e instanceof Ie||e&&"closed"in e&&C(e.remove)&&C(e.add)&&C(e.unsubscribe)}function ln(e){C(e)?e():e.unsubscribe()}var Le={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var st={setTimeout:function(e,t){for(var r=[],n=2;n0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var n=this,o=this,i=o.hasError,s=o.isStopped,a=o.observers;return i||s?Sr:(this.currentObservers=null,a.push(r),new Ie(function(){n.currentObservers=null,Ve(a,r)}))},t.prototype._checkFinalizedStatuses=function(r){var n=this,o=n.hasError,i=n.thrownError,s=n.isStopped;o?r.error(i):s&&r.complete()},t.prototype.asObservable=function(){var r=new F;return r.source=this,r},t.create=function(r,n){return new xn(r,n)},t}(F);var xn=function(e){ie(t,e);function t(r,n){var o=e.call(this)||this;return o.destination=r,o.source=n,o}return t.prototype.next=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.next)===null||o===void 0||o.call(n,r)},t.prototype.error=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.error)===null||o===void 0||o.call(n,r)},t.prototype.complete=function(){var r,n;(n=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||n===void 0||n.call(r)},t.prototype._subscribe=function(r){var n,o;return(o=(n=this.source)===null||n===void 0?void 0:n.subscribe(r))!==null&&o!==void 0?o:Sr},t}(x);var Et={now:function(){return(Et.delegate||Date).now()},delegate:void 0};var wt=function(e){ie(t,e);function t(r,n,o){r===void 0&&(r=1/0),n===void 0&&(n=1/0),o===void 0&&(o=Et);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=n,i._timestampProvider=o,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=n===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,n),i}return t.prototype.next=function(r){var n=this,o=n.isStopped,i=n._buffer,s=n._infiniteTimeWindow,a=n._timestampProvider,f=n._windowTime;o||(i.push(r),!s&&i.push(a.now()+f)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var n=this._innerSubscribe(r),o=this,i=o._infiniteTimeWindow,s=o._buffer,a=s.slice(),f=0;f0?e.prototype.requestAsyncId.call(this,r,n,o):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,n,o){var i;if(o===void 0&&(o=0),o!=null?o>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,n,o);var s=r.actions;n!=null&&((i=s[s.length-1])===null||i===void 0?void 0:i.id)!==n&&(ut.cancelAnimationFrame(n),r._scheduled=void 0)},t}(Wt);var Sn=function(e){ie(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var n=this._scheduled;this._scheduled=void 0;var o=this.actions,i;r=r||o.shift();do if(i=r.execute(r.state,r.delay))break;while((r=o[0])&&r.id===n&&o.shift());if(this._active=!1,i){for(;(r=o[0])&&r.id===n&&o.shift();)r.unsubscribe();throw i}},t}(Dt);var Oe=new Sn(wn);var M=new F(function(e){return e.complete()});function Vt(e){return e&&C(e.schedule)}function Cr(e){return e[e.length-1]}function Ye(e){return C(Cr(e))?e.pop():void 0}function Te(e){return Vt(Cr(e))?e.pop():void 0}function zt(e,t){return typeof Cr(e)=="number"?e.pop():t}var pt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Nt(e){return C(e==null?void 0:e.then)}function qt(e){return C(e[ft])}function Kt(e){return Symbol.asyncIterator&&C(e==null?void 0:e[Symbol.asyncIterator])}function Qt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function zi(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Yt=zi();function Gt(e){return C(e==null?void 0:e[Yt])}function Bt(e){return un(this,arguments,function(){var r,n,o,i;return $t(this,function(s){switch(s.label){case 0:r=e.getReader(),s.label=1;case 1:s.trys.push([1,,9,10]),s.label=2;case 2:return[4,et(r.read())];case 3:return n=s.sent(),o=n.value,i=n.done,i?[4,et(void 0)]:[3,5];case 4:return[2,s.sent()];case 5:return[4,et(o)];case 6:return[4,s.sent()];case 7:return s.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function Jt(e){return C(e==null?void 0:e.getReader)}function U(e){if(e instanceof F)return e;if(e!=null){if(qt(e))return Ni(e);if(pt(e))return qi(e);if(Nt(e))return Ki(e);if(Kt(e))return On(e);if(Gt(e))return Qi(e);if(Jt(e))return Yi(e)}throw Qt(e)}function Ni(e){return new F(function(t){var r=e[ft]();if(C(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function qi(e){return new F(function(t){for(var r=0;r=2;return function(n){return n.pipe(e?A(function(o,i){return e(o,i,n)}):de,ge(1),r?He(t):Dn(function(){return new Zt}))}}function Vn(){for(var e=[],t=0;t=2,!0))}function pe(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new x}:t,n=e.resetOnError,o=n===void 0?!0:n,i=e.resetOnComplete,s=i===void 0?!0:i,a=e.resetOnRefCountZero,f=a===void 0?!0:a;return function(c){var u,p,m,d=0,h=!1,v=!1,Y=function(){p==null||p.unsubscribe(),p=void 0},B=function(){Y(),u=m=void 0,h=v=!1},N=function(){var O=u;B(),O==null||O.unsubscribe()};return y(function(O,Qe){d++,!v&&!h&&Y();var De=m=m!=null?m:r();Qe.add(function(){d--,d===0&&!v&&!h&&(p=$r(N,f))}),De.subscribe(Qe),!u&&d>0&&(u=new rt({next:function($e){return De.next($e)},error:function($e){v=!0,Y(),p=$r(B,o,$e),De.error($e)},complete:function(){h=!0,Y(),p=$r(B,s),De.complete()}}),U(O).subscribe(u))})(c)}}function $r(e,t){for(var r=[],n=2;ne.next(document)),e}function K(e,t=document){return Array.from(t.querySelectorAll(e))}function z(e,t=document){let r=ce(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function ce(e,t=document){return t.querySelector(e)||void 0}function _e(){return document.activeElement instanceof HTMLElement&&document.activeElement||void 0}function tr(e){return L(b(document.body,"focusin"),b(document.body,"focusout")).pipe(ke(1),l(()=>{let t=_e();return typeof t!="undefined"?e.contains(t):!1}),V(e===_e()),J())}function Xe(e){return{x:e.offsetLeft,y:e.offsetTop}}function Kn(e){return L(b(window,"load"),b(window,"resize")).pipe(Ce(0,Oe),l(()=>Xe(e)),V(Xe(e)))}function rr(e){return{x:e.scrollLeft,y:e.scrollTop}}function dt(e){return L(b(e,"scroll"),b(window,"resize")).pipe(Ce(0,Oe),l(()=>rr(e)),V(rr(e)))}var Yn=function(){if(typeof Map!="undefined")return Map;function e(t,r){var n=-1;return t.some(function(o,i){return o[0]===r?(n=i,!0):!1}),n}return function(){function t(){this.__entries__=[]}return Object.defineProperty(t.prototype,"size",{get:function(){return this.__entries__.length},enumerable:!0,configurable:!0}),t.prototype.get=function(r){var n=e(this.__entries__,r),o=this.__entries__[n];return o&&o[1]},t.prototype.set=function(r,n){var o=e(this.__entries__,r);~o?this.__entries__[o][1]=n:this.__entries__.push([r,n])},t.prototype.delete=function(r){var n=this.__entries__,o=e(n,r);~o&&n.splice(o,1)},t.prototype.has=function(r){return!!~e(this.__entries__,r)},t.prototype.clear=function(){this.__entries__.splice(0)},t.prototype.forEach=function(r,n){n===void 0&&(n=null);for(var o=0,i=this.__entries__;o0},e.prototype.connect_=function(){!Wr||this.connected_||(document.addEventListener("transitionend",this.onTransitionEnd_),window.addEventListener("resize",this.refresh),va?(this.mutationsObserver_=new MutationObserver(this.refresh),this.mutationsObserver_.observe(document,{attributes:!0,childList:!0,characterData:!0,subtree:!0})):(document.addEventListener("DOMSubtreeModified",this.refresh),this.mutationEventsAdded_=!0),this.connected_=!0)},e.prototype.disconnect_=function(){!Wr||!this.connected_||(document.removeEventListener("transitionend",this.onTransitionEnd_),window.removeEventListener("resize",this.refresh),this.mutationsObserver_&&this.mutationsObserver_.disconnect(),this.mutationEventsAdded_&&document.removeEventListener("DOMSubtreeModified",this.refresh),this.mutationsObserver_=null,this.mutationEventsAdded_=!1,this.connected_=!1)},e.prototype.onTransitionEnd_=function(t){var r=t.propertyName,n=r===void 0?"":r,o=ba.some(function(i){return!!~n.indexOf(i)});o&&this.refresh()},e.getInstance=function(){return this.instance_||(this.instance_=new e),this.instance_},e.instance_=null,e}(),Gn=function(e,t){for(var r=0,n=Object.keys(t);r0},e}(),Jn=typeof WeakMap!="undefined"?new WeakMap:new Yn,Xn=function(){function e(t){if(!(this instanceof e))throw new TypeError("Cannot call a class as a function.");if(!arguments.length)throw new TypeError("1 argument required, but only 0 present.");var r=ga.getInstance(),n=new La(t,r,this);Jn.set(this,n)}return e}();["observe","unobserve","disconnect"].forEach(function(e){Xn.prototype[e]=function(){var t;return(t=Jn.get(this))[e].apply(t,arguments)}});var Aa=function(){return typeof nr.ResizeObserver!="undefined"?nr.ResizeObserver:Xn}(),Zn=Aa;var eo=new x,Ca=$(()=>k(new Zn(e=>{for(let t of e)eo.next(t)}))).pipe(g(e=>L(ze,k(e)).pipe(R(()=>e.disconnect()))),X(1));function he(e){return{width:e.offsetWidth,height:e.offsetHeight}}function ye(e){return Ca.pipe(S(t=>t.observe(e)),g(t=>eo.pipe(A(({target:r})=>r===e),R(()=>t.unobserve(e)),l(()=>he(e)))),V(he(e)))}function bt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function ar(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var to=new x,Ra=$(()=>k(new IntersectionObserver(e=>{for(let t of e)to.next(t)},{threshold:0}))).pipe(g(e=>L(ze,k(e)).pipe(R(()=>e.disconnect()))),X(1));function sr(e){return Ra.pipe(S(t=>t.observe(e)),g(t=>to.pipe(A(({target:r})=>r===e),R(()=>t.unobserve(e)),l(({isIntersecting:r})=>r))))}function ro(e,t=16){return dt(e).pipe(l(({y:r})=>{let n=he(e),o=bt(e);return r>=o.height-n.height-t}),J())}var cr={drawer:z("[data-md-toggle=drawer]"),search:z("[data-md-toggle=search]")};function no(e){return cr[e].checked}function Ke(e,t){cr[e].checked!==t&&cr[e].click()}function Ue(e){let t=cr[e];return b(t,"change").pipe(l(()=>t.checked),V(t.checked))}function ka(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function Ha(){return L(b(window,"compositionstart").pipe(l(()=>!0)),b(window,"compositionend").pipe(l(()=>!1))).pipe(V(!1))}function oo(){let e=b(window,"keydown").pipe(A(t=>!(t.metaKey||t.ctrlKey)),l(t=>({mode:no("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),A(({mode:t,type:r})=>{if(t==="global"){let n=_e();if(typeof n!="undefined")return!ka(n,r)}return!0}),pe());return Ha().pipe(g(t=>t?M:e))}function le(){return new URL(location.href)}function ot(e){location.href=e.href}function io(){return new x}function ao(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)ao(e,r)}function _(e,t,...r){let n=document.createElement(e);if(t)for(let o of Object.keys(t))typeof t[o]!="undefined"&&(typeof t[o]!="boolean"?n.setAttribute(o,t[o]):n.setAttribute(o,""));for(let o of r)ao(n,o);return n}function fr(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function so(){return location.hash.substring(1)}function Dr(e){let t=_("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function Pa(e){return L(b(window,"hashchange"),e).pipe(l(so),V(so()),A(t=>t.length>0),X(1))}function co(e){return Pa(e).pipe(l(t=>ce(`[id="${t}"]`)),A(t=>typeof t!="undefined"))}function Vr(e){let t=matchMedia(e);return er(r=>t.addListener(()=>r(t.matches))).pipe(V(t.matches))}function fo(){let e=matchMedia("print");return L(b(window,"beforeprint").pipe(l(()=>!0)),b(window,"afterprint").pipe(l(()=>!1))).pipe(V(e.matches))}function zr(e,t){return e.pipe(g(r=>r?t():M))}function ur(e,t={credentials:"same-origin"}){return ue(fetch(`${e}`,t)).pipe(fe(()=>M),g(r=>r.status!==200?Ot(()=>new Error(r.statusText)):k(r)))}function We(e,t){return ur(e,t).pipe(g(r=>r.json()),X(1))}function uo(e,t){let r=new DOMParser;return ur(e,t).pipe(g(n=>n.text()),l(n=>r.parseFromString(n,"text/xml")),X(1))}function pr(e){let t=_("script",{src:e});return $(()=>(document.head.appendChild(t),L(b(t,"load"),b(t,"error").pipe(g(()=>Ot(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(l(()=>{}),R(()=>document.head.removeChild(t)),ge(1))))}function po(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function lo(){return L(b(window,"scroll",{passive:!0}),b(window,"resize",{passive:!0})).pipe(l(po),V(po()))}function mo(){return{width:innerWidth,height:innerHeight}}function ho(){return b(window,"resize",{passive:!0}).pipe(l(mo),V(mo()))}function bo(){return G([lo(),ho()]).pipe(l(([e,t])=>({offset:e,size:t})),X(1))}function lr(e,{viewport$:t,header$:r}){let n=t.pipe(ee("size")),o=G([n,r]).pipe(l(()=>Xe(e)));return G([r,t,o]).pipe(l(([{height:i},{offset:s,size:a},{x:f,y:c}])=>({offset:{x:s.x-f,y:s.y-c+i},size:a})))}(()=>{function e(n,o){parent.postMessage(n,o||"*")}function t(...n){return n.reduce((o,i)=>o.then(()=>new Promise(s=>{let a=document.createElement("script");a.src=i,a.onload=s,document.body.appendChild(a)})),Promise.resolve())}var r=class extends EventTarget{constructor(n){super(),this.url=n,this.m=i=>{i.source===this.w&&(this.dispatchEvent(new MessageEvent("message",{data:i.data})),this.onmessage&&this.onmessage(i))},this.e=(i,s,a,f,c)=>{if(s===`${this.url}`){let u=new ErrorEvent("error",{message:i,filename:s,lineno:a,colno:f,error:c});this.dispatchEvent(u),this.onerror&&this.onerror(u)}};let o=document.createElement("iframe");o.hidden=!0,document.body.appendChild(this.iframe=o),this.w.document.open(),this.w.document.write(` + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Database

+

File database.yml specifies mainly MongoDB database connection details and credentials.

+

It looks like this:

+
connection:
+  username: "dp3_user"
+  password: "dp3_password"
+  address: "127.0.0.1"
+  port: 27017
+  db_name: "dp3_database"
+
+

Connection

+

Connection details contain:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
usernamestringdp3Username for connection to DB. Escaped using urllib.parse.quote_plus.
passwordstringdp3Password for connection to DB. Escaped using urllib.parse.quote_plus.
addressstringlocalhostIP address or hostname for connection to DB.
portint27017Listening port of DB.
db_namestringdp3Database name to be utilized by DP³.
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/db_entities/index.html b/configuration/db_entities/index.html new file mode 100644 index 00000000..42bb40ea --- /dev/null +++ b/configuration/db_entities/index.html @@ -0,0 +1,2106 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Database entities - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

DB entities

+

Files in db_entities folder describe entities and their attributes. You can think of entity as class from object-oriented programming.

+

Below is YAML file (e.g. db_entities/bus.yml) corresponding to bus tracking system example from Data model chapter.

+
entity:
+  id: bus
+  name: Bus
+attribs:
+  # Attribute `label`
+  label:
+    name: Label
+    description: Custom label for the bus.
+    type: plain
+    data_type: string
+    editable: true
+
+  # Attribute `location`
+  location:
+    name: Location
+    description: Location of the bus in a particular time. Value are GPS \
+      coordinates (array of latitude and longitude).
+    type: observations
+    data_type: array<float>
+    history_params:
+      pre_validity: 1m
+      post_validity: 1m
+      max_age: 30d
+
+  # Attribute `speed`
+  speed:
+    name: Speed
+    description: Speed of the bus in a particular time. In km/h.
+    type: observations
+    data_type: float
+    history_params:
+      pre_validity: 1m
+      post_validity: 1m
+      max_age: 30d
+
+  # Attribute `passengers_in_out`
+  passengers_in_out:
+    name: Passengers in/out
+    description: Number of passengers getting in or out of the bus. Distinguished by the doors used (front, middle, back). Regularly sampled every 10 minutes.
+    type: timeseries
+    timeseries_type: regular
+    timeseries_params:
+      max_age: 14d
+    time_step: 10m
+    series:
+      front_in:
+        data_type: int
+      front_out:
+        data_type: int
+      middle_in:
+        data_type: int
+      middle_out:
+        data_type: int
+      back_in:
+        data_type: int
+      back_out:
+        data_type: int
+
+  # Attribute `driver` to link the driver of the bus at a given time.
+  driver:
+    name: Driver
+    description: Driver of the bus at a given time.
+    type: observations
+    data_type: link<driver>
+    history_params:
+      pre_validity: 1m
+      post_validity: 1m
+      max_age: 30d
+
+

Entity

+

Entity is described simply by:

+ + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
idstring (identifier)(mandatory)Short string identifying the entity type, it's machine name (must match regex [a-zA-Z_][a-zA-Z0-9_-]*). Lower-case only is recommended.
namestring(mandatory)Attribute name for humans. May contain any symbols.
+

Attributes

+

Each attribute is specified by the following set of parameters:

+

Base

+

These apply to all types of attributes (plain, observations and timeseries).

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
idstring (identifier)(mandatory)Short string identifying the attribute, it's machine name (must match this regex [a-zA-Z_][a-zA-Z0-9_-]*). Lower-case only is recommended.
typestring(mandatory)Type of attribute. Can be either plain, observations or timeseries.
namestring(mandatory)Attribute name for humans. May contain any symbols.
descriptionstring""Longer description of the attribute, if needed.
color#xxxxxxnullColor to use in GUI (useful mostly for tag values), not used currently.
+

Plain-specific parameters

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
data_typestring(mandatory)Data type of attribute value, see Supported data types.
categoriesarray of stringsnullList of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example).
editableboolfalseWhether value of this attribute is editable via web interface.
+

Observations-specific parameters

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
data_typestring(mandatory)Data type of attribute value, see Supported data types.
categoriesarray of stringsnullList of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example).
editableboolfalseWhether value of this attribute is editable via web interface.
confidenceboolfalseWhether a confidence value should be stored along with data value or not.
multi_valueboolfalseWhether multiple values can be set at the same time.
history_paramsobject, see below(mandatory)History and time aggregation parameters. A subobject with fields described in the table below.
history_force_graphboolfalseBy default, if data type of attribute is array, we show it's history on web interface as table. This option can force tag-like graph with comma-joined values of that array as tags.
+

History params

+

Description of history_params subobject (see table above).

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
max_age<int><s/m/h/d> (e.g. 30s, 12h, 7d)nullHow many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed).
max_itemsint (> 0)nullHow many data-points/intervals to store (oldest ones are removed when limit is exceeded). Currently not implemented.
expire_time<int><s/m/h/d> or inf (infinity)infinityHow long after the end time (t2) is the last value considered valid (i.e. is used as "current value"). Zero (0) means to strictly follow t1, t2. Zero can be specified without a unit (s/m/h/d). Currently not implemented.
pre_validity<int><s/m/h/d> (e.g. 30s, 12h, 7d)0sMax time before t1 for which the data-point's value is still considered to be the "current value" if there's no other data-point closer in time.
post_validity<int><s/m/h/d> (e.g. 30s, 12h, 7d)0sMax time after t2 for which the data-point's value is still considered to be the "current value" if there's no other data-point closer in time.
+

Note: At least one of max_age and max_items SHOULD be defined, otherwise the amount of stored data can grow unbounded.

+

Timeseries-specific parameters

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
timeseries_typestring(mandatory)One of: regular, irregular or irregular_intervals. See chapter Data model for explanation.
seriesobject of objects, see below(mandatory)Configuration of series of data represented by this timeseries attribute.
timeseries_paramsobject, see belowOther timeseries parameters. A subobject with fields described by the table below.
+

Series

+

Description of series subobject (see table above).

+

Key for series object is id - short string identifying the series (e.g. bytes, temperature, parcels).

+ + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
typestring(mandatory)Data type of series. Only int and float are allowed (also time, but that's used internally, see below).
+

Time series (axis) is added implicitly by DP³ and this behaviour is specific to selected timeseries_type:

+
    +
  • regular:
    +"time": { "data_type": "time" }
  • +
  • irregular:
    +"time": { "data_type": "time" }
  • +
  • irregular_timestamps:
    +"time_first": { "data_type": "time" }, "time_last": { "data_type": "time" }
  • +
+

Timeseries params

+

Description of timeseries_params subobject (see table above).

+ + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
max_age<int><s/m/h/d> (e.g. 30s, 12h, 7d)nullHow many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed).
time_step<int><s/m/h/d> (e.g. 30s, 12h, 7d)(mandatory) for regular timeseries, null otherwise"Sampling rate in time" of this attribute. For example, with time_step = 10m we expect data-point at 12:00, 12:10, 12:20, 12:30,... Only relevant for regular timeseries.
+

Note: max_age SHOULD be defined, otherwise the amount of stored data can grow unbounded.

+

Supported data types

+

List of supported values for parameter data_type:

+
    +
  • tag: set/not_set (When the attribute is set, its value is always assumed to be true, the "v" field doesn't have to be stored.)
  • +
  • binary: true/false/not_set (Attribute value is true or false, or the attribute is not set at all.)
  • +
  • category<data_type; category1, category2, ...>: Categorical values. Use only when a fixed set of values should be allowed, which should be specified in the second part of the type definition. The first part of the type definition describes the data_type of the category.
  • +
  • string
  • +
  • int: 32-bit signed integer (range from -2147483648 to +2147483647)
  • +
  • int64: 64-bit signed integer (use when the range of normal int is not sufficent)
  • +
  • float
  • +
  • time: Timestamp in YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM] format or timestamp since 1.1.1970 in seconds or milliseconds.
  • +
  • ip4: IPv4 address (passed as dotted-decimal string)
  • +
  • ip6: IPv6 address (passed as string in short or full format)
  • +
  • mac: MAC address (passed as string)
  • +
  • link<entity_type>: Link to a record of the specified type, e.g. link<ip>
  • +
  • link<entity_type,data_type>: Link to a record of the specified type, carrying additional data, e.g. link<ip,int>
  • +
  • array<data_type>: An array of values of specified data type (which must be one of the types above), e.g. array<int>
  • +
  • set<data_type>: Same as array, but values can't repeat and order is irrelevant.
  • +
  • dict<keys>: Dictionary (object) containing multiple values as subkeys. keys should contain a comma-separated list of key names and types separated by colon, e.g. dict<port:int,protocol:string,tag?:string>. By default, all fields are mandatory (i.e. a data-point missing some subkey will be refused), to mark a field as optional, put ? after its name. Only the following data types can be used here: binary,category,string,int,float,time,ip4,ip6,mac. Multi-level dicts are not supported.
  • +
  • json: Any JSON object can be stored, all processing is handled by user's code. This is here for special cases which can't be mapped to any data type above.
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/event_logging/index.html b/configuration/event_logging/index.html new file mode 100644 index 00000000..cac5496a --- /dev/null +++ b/configuration/event_logging/index.html @@ -0,0 +1,1581 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Event logging - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Event logging

+

Event logging is done using Redis and allows to count arbitrary events across +multiple processes (using shared counters in Redis) and in various time +intervals.

+

More information can be found in Github repository of EventCountLogger.

+

Configuration file event_logging.yml looks like this:

+
redis:
+  host: localhost
+  port: 6379
+  db: 1
+
+groups:
+  # Main events of Task execution
+  te:
+    events:
+      - task_processed
+      - task_processing_error
+    intervals: [ "5m", "2h" ] # (1)!
+    sync-interval: 1 # (2)!
+  # Number of processed tasks by their "src" attribute
+  tasks_by_src:
+    events: [ ]
+    auto_declare_events: true
+    intervals: [ "5s", "5m" ]
+    sync-interval: 1
+
+
    +
  1. Two intervals - 5 min and 2 hours for longer-term history in Munin/Icinga
  2. +
  3. Cache counts locally, push to Redis every second
  4. +
+

Redis

+

This section describes Redis connection details:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
hoststringlocalhostIP address or hostname for connection to Redis.
portint6379Listening port of Redis.
dbint0Index of Redis DB used for the counters (it shouldn't be used for anything else).
+

Groups

+

The default configuration groups enables logging of events in task execution, namely +task_processed and task_processing_error.

+

To learn more about the group configuration for EventCountLogger, +please refer to the official documentation.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/history_manager/index.html b/configuration/history_manager/index.html new file mode 100644 index 00000000..892a634f --- /dev/null +++ b/configuration/history_manager/index.html @@ -0,0 +1,1521 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + History manager - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

History manager

+

History manager is reponsible for deleting old records from master records in database.

+

Configuration file history_manager.yml is very simple:

+
datapoint_cleaning:
+  tick_rate: 10
+
+

Parameter tick_rate sets interval how often (in minutes) should DP³ check if any data in master record of observations and timeseries attributes isn't too old and if there's something too old, removes it. To control what is considered as "too old", see parameter max_age in Database entities configuration.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/index.html b/configuration/index.html new file mode 100644 index 00000000..e5106bd5 --- /dev/null +++ b/configuration/index.html @@ -0,0 +1,1574 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + General - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Configuration

+

DP³ configuration folder consists of these files and folders:

+
db_entities/
+modules/
+common.yml
+database.yml
+event_logging.yml
+history_manager.yml
+processing_core.yml
+snapshots.yml
+
+

Their meaning and usage is explained in following chapters.

+

Example configuration

+

Example configuration is included config/ folder in DP³ repository.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/modules/index.html b/configuration/modules/index.html new file mode 100644 index 00000000..358b7178 --- /dev/null +++ b/configuration/modules/index.html @@ -0,0 +1,1523 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Modules - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Modules

+

Folder modules/ optionally contains any module-specific configuration.

+

This configuration doesn't have to follow any required format (except being YAML files).

+

In secondary modules, you can access the configuration:

+
from dp3 import g
+
+print(g.config["modules"]["MODULE_NAME"])
+
+

Here, the MODULE_NAME corresponds to MODULE_NAME.yml file in modules/ folder.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/processing_core/index.html b/configuration/processing_core/index.html new file mode 100644 index 00000000..221fc82d --- /dev/null +++ b/configuration/processing_core/index.html @@ -0,0 +1,1697 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Processing core - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Processing core

+

Processing core's configuration in processing_core.yml file looks like this:

+
msg_broker:
+  host: localhost
+  port: 5672
+  virtual_host: /
+  username: dp3_user
+  password: dp3_password
+worker_processes: 2
+worker_threads: 16
+modules_dir: "../dp3_modules"
+enabled_modules:
+  - "module_one"
+  - "module_two"
+
+

Message broker

+

Message broker section describes connection details to RabbitMQ (or compatible) broker.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterData-typeDefault valueDescription
hoststringlocalhostIP address or hostname for connection to broker.
portint5672Listening port of broker.
virtual_hoststring/Virtual host for connection to broker.
usernamestringguestUsername for connection to broker.
passwordstringguestPassword for connection to broker.
+

Worker processes

+

Number of worker processes. This has to be at least 1.

+

If changing number of worker processes, the following process must be followed:

+
    +
  1. stop all inputs writing to task queue (e.g. API)
  2. +
  3. when all queues are empty, stop all workers
  4. +
  5. reconfigure queues in RabbitMQ using script found in /scripts/rmq_reconfigure.sh
  6. +
  7. change the settings here and in init scripts for worker processes (e.g. supervisor)
  8. +
  9. reload workers (e.g. using supervisorctl) and start all inputs again
  10. +
+

Worker threads

+

Number of worker threads per process.

+

This may be higher than number of CPUs, because this is not primarily intended +to utilize computational power of multiple CPUs (which Python cannot do well +anyway due to the GIL), but to mask long I/O operations (e.g. queries to +external services via network).

+

Modules directory

+

Path to directory with plug-in (secondary) modules.

+

Relative path is evaluated relative to location of this configuration file.

+

Enabled modules

+

List of plug-in modules which should be enabled in processing pipeline.

+

Name of module filename without .py extension must be used!

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/configuration/snapshots/index.html b/configuration/snapshots/index.html new file mode 100644 index 00000000..ae9b690d --- /dev/null +++ b/configuration/snapshots/index.html @@ -0,0 +1,1519 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Snapshots - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Snapshots

+

Snapshots configuration is straightforward. Currently, it only sets creation_rate - period in minutes for creating new snapshots (30 minutes by default).

+

File snapshots.yml looks like this:

+
creation_rate: 30
+
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/data_model/index.html b/data_model/index.html new file mode 100644 index 00000000..94b195f6 --- /dev/null +++ b/data_model/index.html @@ -0,0 +1,1774 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Data model - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

DP³ data model

+

Basic elements of the DP³ data model are entities (or objects), each entity +record (object instance) has a set of attributes. +Each attribute has some value (associated to a particular entity), +timestamp (history of previous values can be stored) +and optionally confidence value.

+

Entities may be mutually connected. See Relationships below.

+

Exemplary system

+

In this chapter, we will illustrate details on an exemplary system. Imagine you +are developing data model for bus tracking system. You have to store these data:

+
    +
  • label: Custom label for the bus set by administrator in web interface.
  • +
  • location: Location of the bus in a particular time. Value are GPS + coordinates (array of latitude and longitude).
  • +
  • speed: Speed of the bus in a particular time.
  • +
  • passengers getting in and out: Number of passengers getting in or out of + the bus. Distinguished by the doors used (front, middle, back). Bus control + unit sends counters value every 10 minutes.
  • +
+

Also, map displaying current position of all buses is required.

+

(In case you are interested, configuration of database entities for this system +is available in DB entities chapter.)

+

To make everything clear and more readable, all example references below are +typesetted as quotes.

+

Types of attributes

+

There are 3 types of attributes:

+

Plain

+

Common attributes with only one value of some data type. +There's no history stored, but timestamp of last change is available.

+

Very useful for:

+
    +
  • +

    data from external source, when you only need to have current value

    +
  • +
  • +

    notes and other manually entered information

    +
  • +
+
+

This is exactly what we need for label in our bus tracking system. +Administor labels particular bus inside web interface and we use this label +until it's changed - particularly display label next to a marker on a map. +No history is needed and it has 100% confidence.

+
+

Observations

+

Attributes with history of values at some time or interval of time. +Consequently, we can derive value at any time (most often not now) from these values.

+

Each value may have associated confidence.

+

These attributes may be single or multi value (multiple current values in one point in time).

+

Very useful for data where both current value and history is needed.

+
+

In our example, location is great use-case for observations type. +We need to track position of the bus in time and store the history. Current +location is very important. Let's suppose, we also need to do oversampling by +predicting where is the bus now, eventhout we received last data-point 2 minutes +ago. This is all possible (predictions using custom secondary modules).

+

The same applies to speed. It can also be derived from location.

+
+

Timeseries

+

One or more numeric values for a particular time.

+

In this attribute type: history > current value. +In fact, no explicit current value is provided.

+

Very useful for:

+
    +
  • +

    any kind of history-based analysis

    +
  • +
  • +

    logging of events/changes

    +
  • +
+

May be:

+
    +
  • +

    regular: sampling is regular
    + Example: datapoint is created every x minutes

    +
  • +
  • +

    irregular: sampling is irregular
    + Example: datapoint is created when some event occurs

    +
  • +
  • +

    irregular intervals: sampling is irregular and includes two timestamps (from when till when were provided data gathered)
    + Example: Some event triggers 5 minute monitoring routine. When this routine finishes, it creates datapoint containing all the data from past 5 minutes.

    +
  • +
+
+

Timeseries are very useful for passengers getting in and out (from our example). +As we need to count two directions (in/out) for three doors (front/middle/back), +we create 6 series (e.g. front_in, front_out, ..., back_out). +Counter data-points are received in 10 minute interval, so regular timeseries +are best fit for this use-case. +Every 10 minutes we receive values for all 6 series and store them. +Current value is not important as these data are only useful for passenger +flow analysis throught whole month/year/...

+
+

Relationships

+

Relationships between entities can be represented with or without history. +They are realized using the link attribute type. +Depedning on whether the history is important, they can be configured using as the mentioned +plain data or observations.

+

Relationships can contain additional data, if that fits the modelling needs of your use case.

+

Very useful for:

+
    +
  • any kind of relationship between entities
  • +
  • linkning dynamic entities to entities with static data
  • +
+
+

As our example so far contains only one entity, we currently have no need for relationships. +However, if we wanted to track the different bus drivers driving individual buses, +relationships would come in quite handy. +The bus driver is a separate entity, and can drive multiple buses during the day. +The current bus driver will be represented as an observation link between the bus and the driver, +as can be seen in the resulting configuration.

+
+

Continue to ...

+

Now that you have an understanding of the data model and the types of attributes, +you might want to check out the details of DB configuration, +where you will find the parameters for each attribute type +and the data types supported by the platform.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/extending/index.html b/extending/index.html new file mode 100644 index 00000000..a70f6c03 --- /dev/null +++ b/extending/index.html @@ -0,0 +1,1799 @@ + + + + + + + + + + + + + + + + + + + + + + + + Extending Documentation - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Extending Documentation

+

This page provides the basic info on where to start with writing documentation. +If you feel lost at any point, please check out the documentation of MkDocs +and Material for MkDocs, with which this documentation is built.

+

Project layout

+
mkdocs.yml            # The configuration file.
+docs/
+    index.md          # The documentation homepage.
+    gen_ref_pages.py  # Script for generating the code reference.
+    ...               # Other markdown pages, images and other files.
+
+

The docs/ folder contains all source Markdown files for the documentation.

+

You can find all documentation settings in mkdocs.yml. See the nav section for mapping of the left navigation tab and the Markdown files.

+

Local instance & commands

+

To see the changes made to the documentation page locally, a local instance of mkdocs is required. +You can install all the required packages using:

+
pip install -r requirements.doc.txt
+
+

After installing, you can use the following mkdocs commands:

+
    +
  • mkdocs serve - Start the live-reloading docs server.
  • +
  • mkdocs build - Build the documentation site.
  • +
  • mkdocs -h - Print help message and exit.
  • +
+

Text formatting and other features

+

As the entire documentation is written in Markdown, all base Markdown syntax is supported. This means headings, bold text, italics, inline code, tables and many more.

+

This set of options can be further extended, if you ever find the need. See the possibilities in the Material theme reference.

+
+Some of the enabled extensions +
    +
  • This is an example of a collapsable admonition with a custom title.
  • +
  • Admonitions are one of the enabled markdown extensions, an another example would be the TODO checklist syntax:
      +
    • Unchecked item
    • +
    • Checked item
    • +
    +
  • +
  • See the markdown_extensions section in mkdocs.yml for all enabled extensions.
  • +
+
+ +

To reference an anchor within a page, such as a heading, use a Markdown link to the specific anchor, for example: Commands. +If you're not sure which identifier to use, you can look at a heading's anchor by clicking the heading in your Web browser, either in the text itself, or in the table of contents. +If the URL is https://example.com/some/page/#anchor-name then you know that this item is possible to link to with [<displayed text>](#anchor-name). (Tip taken from mkdocstrings)

+

To make a reference to another page within the documentation, use the path to the Markdown source file, followed by the desired anchor. For example, this link was created as [link](index.md#repository-structure).

+

When making references to the generated Code Reference, there are two options. Links can be made either using the standard Markdown syntax, where some reverse-engineering of the generated files is required, or, with the support of mkdocstrings, using the [example][full.path.to.object] syntax. A real link like this can be for example this one to the Platform Model Specification.

+

Code reference generation

+

Code reference is generated using mkdocstrings and the Automatic code reference pages recipe from their documentation. +The generation of pages is done using the docs/gen_ref_pages.py script. The script is a slight modification of what is recommended within the mentioned recipe.

+

Mkdocstrings itself enables generating code documentation from its docstrings using a path.to.object syntax. +Here is an example of documentation for dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register method:

+ + +
+ + + +

+ register + + +

+
register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
+ +
+ +

Registers passed timeseries hook to be called during snapshot creation.

+

Binds hook to specified entity_type and attr_type (though same hook can be bound +multiple times). +If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, str, list[dict]], list[DataPointTask]] + +
+

hook callable should expect entity_type, attr_type and attribute +history as arguments and return a list of Task objects.

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
attr_type + str + +
+

specifies attribute type

+
+
+ required +
+ +
+ +

There are additional options that can be specified, which affect the way the documentation is presented. For more on these options, see here.

+

Even if you create a duplicate code reference description, the mkdocstring-style link still leads to the code reference, as you can see here.

+

Deployment

+

The documentation is updated and deployed automatically with each push to selected branches thanks to the configured GitHub Action, which can be found in: .github/workflows/deploy.yml.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/gen_ref_pages.py b/gen_ref_pages.py new file mode 100644 index 00000000..d08a371e --- /dev/null +++ b/gen_ref_pages.py @@ -0,0 +1,41 @@ +"""Generate the code reference pages.""" + +from pathlib import Path + +import mkdocs_gen_files + +nav = mkdocs_gen_files.Nav() + +EXCLUDE = [ + ".core_modules", + "template", +] + +EXCLUDE = [f.format(p) for p in EXCLUDE for f in ["dp3/{}/*", "dp3/{}/**/*"]] + +for path in sorted(Path("dp3").rglob("*.py")): + if any(path.match(excluded) for excluded in EXCLUDE): + continue + + module_path = path.with_suffix("") + doc_path = path.relative_to("dp3").with_suffix(".md") + full_doc_path = Path("reference", doc_path) + parts = list(module_path.parts) + + if parts[-1] == "__init__": + parts = parts[:-1] + doc_path = doc_path.with_name("index.md") + full_doc_path = full_doc_path.with_name("index.md") + elif parts[-1] == "__main__": + continue + + nav[parts] = doc_path.as_posix() + + with mkdocs_gen_files.open(full_doc_path, "w") as fd: + identifier = ".".join(parts) + print(f"# ::: {identifier}", file=fd) + + mkdocs_gen_files.set_edit_path(full_doc_path, path) + +with mkdocs_gen_files.open("reference/SUMMARY.md", "w") as nav_file: + nav_file.writelines(nav.build_literate_nav()) diff --git a/img/architecture.png b/img/architecture.png new file mode 100644 index 00000000..e59fce86 Binary files /dev/null and b/img/architecture.png differ diff --git a/img/architecture.svg b/img/architecture.svg new file mode 100644 index 00000000..df066596 --- /dev/null +++ b/img/architecture.svg @@ -0,0 +1,592 @@ + + + +Existing softwareDP3 components Application-specific componentsYMLEntity configurationData-pointsPrimary modulesDatabaseAPIData access(web, API)TaskdistributionProcessingcoreCoremodulesSecondarymodulesParallel worker processes diff --git a/img/dataflow.svg b/img/dataflow.svg new file mode 100644 index 00000000..3853832a --- /dev/null +++ b/img/dataflow.svg @@ -0,0 +1,1202 @@ + + + +TasksDPsUpdatesDPsTasksProfilesProcessingcoreCore & secondarymodulesWeb & APIaccessProfile History + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +DP History + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +ArchivationCreateSnapshotsProfile Snapshots + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Scheduler + + + diff --git a/index.html b/index.html new file mode 100644 index 00000000..2a2d9cff --- /dev/null +++ b/index.html @@ -0,0 +1,1575 @@ + + + + + + + + + + + + + + + + + + + + + + + + DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Dynamic Profile Processing Platform (DP³)

+

DP³ is a platform helps to keep a database of information (attributes) about individual +entities (designed for IP addresses and other network identifiers, but may be anything), +when the data constantly changes in time.

+

DP³ doesn't do much by itself, it must be supplemented by application-specific modules providing +and processing data.

+

This is a basis of CESNET's "Asset Discovery Classification and Tagging" (ADiCT) project, +focused on discovery and classification of network devices, +but the platform itself is general and should be usable for any kind of data.

+

For an introduction about how it works, see please check out the +architecture, data-model +and database config pages.

+

Then you should be able to create a DP³ app using the provided setup utility as described in the install page and start tinkering!

+

Repository structure

+
    +
  • dp3 - Python package containing code of the processing core and the API
  • +
  • config - default/example configuration
  • +
  • install - deployment configuration
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/install/index.html b/install/index.html new file mode 100644 index 00000000..ebe54866 --- /dev/null +++ b/install/index.html @@ -0,0 +1,1882 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Install - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+ +
+ + + +
+
+ + + + + + + +

Installing DP³ platform

+

When talking about installing the DP³ platform, a distinction must be made between installing +for platform development, installing for application development (i.e. platform usage) +and installing for application and platform deployment. +We will cover all three cases separately.

+

Installing for application development

+

Pre-requisites: Python 3.9 or higher, pip (with virtualenv installed), git, Docker and Docker Compose.

+

Create a virtualenv and install the DP³ platform using:

+
python3 -m venv venv  # (1)!
+source venv/bin/activate  # (2)!
+python -m pip install --upgrade pip  # (3)!
+pip install git+https://github.com/CESNET/dp3.git@new_dp3#egg=dp3
+
+
    +
  1. We recommend using virtual environment. If you are not familiar with it, please read + this first. + Note for Windows: If python3 does not work, try py -3 or python instead.
  2. +
  3. Windows: venv/Scripts/activate.bat
  4. +
  5. We require pip>=21.0.1 for the pyproject.toml support. + If your pip is up-to-date, you can skip this step.
  6. +
+

Creating a DP³ application

+

To create a new DP³ application we will use the included dp3-setup utility. Run:

+
dp3-setup <application_directory> <your_application_name> 
+
+

So for example, to create an application called my_app in the current directory, run:

+
dp3-setup . my_app
+
+

This produces the following directory structure: +

 📂 .
+ ├── 📠config  # (1)! 
+ │   ├── 📄 api.yml
+ │   ├── 📄 control.yml
+ │   ├── 📄 database.yml
+ │   ├── 📠db_entities # (2)!
+ │   ├── 📄 event_logging.yml
+ │   ├── 📄 history_manager.yml
+ │   ├── 📠modules # (3)!
+ │   ├── 📄 processing_core.yml
+ │   ├── 📄 README.md
+ │   └── 📄 snapshots.yml
+ ├── 📠docker # (4)!
+ │   ├── 📠python
+ │   └── 📠rabbitmq
+ ├── 📄 docker-compose.app.yml
+ ├── 📄 docker-compose.yml
+ ├── 📠modules # (5)!
+ │   └── 📄 test_module.py
+ ├── 📄 README.md # (6)!
+ └── 📄 requirements.txt
+

+
    +
  1. The config directory contains the configuration files for the DP³ platform. For more details, + please check out the configuration documentation.
  2. +
  3. The config/db_entities directory contains the database entities of the application. + This defines the data model of your application. + For more details, you may want to check out the data model and the + DB entities documentation.
  4. +
  5. The config/modules directory is where you can place the configuration specific to your modules.
  6. +
  7. The docker directory contains the Dockerfiles for the RabbitMQ and python images, + tailored to your application.
  8. +
  9. The modules directory contains the modules of your application. To get started, + a single module called test_module is included. + For more details, please check out the Modules page.
  10. +
  11. The README.md file contains some instructions to get started. + Edit it to your liking.
  12. +
+

Running the Application

+

To run the application, we first need to setup the other services the platform depends on, +such as the MongoDB database, the RabbitMQ message distribution and the Redis database. +This can be done using the supplied docker-compose.yml file. Simply run:

+
docker compose up -d --build  # (1)!
+
+
    +
  1. The -d flag runs the services in the background, so you can continue working in the same terminal. + The --build flag forces Docker to rebuild the images, so you can be sure you are running the latest version. + If you want to run the services in the foreground, omit the -d flag.
  2. +
+
+Docker Compose basics +

The state of running containers can be checked using:

+
docker compose ps
+
+

which will display the state of running processes. The logs of the services can be displayed using:

+
docker compose logs
+
+

which will display the logs of all services, or:

+
docker compose logs <service name>
+
+

which will display only the logs of the given service. + (In this case, the services are rabbitmq, mongo, mongo_express, and redis)

+

We can now focus on running the platform and developing or testing. After you are done, simply run:

+
docker compose down
+
+

which will stop and remove all containers, networks and volumes created by docker compose up.

+
+

There are two main ways to run the application itself. First is a little more hand-on, +and allows easier debugging. +There are two main kinds of processes in the application: the API and the worker processes.

+

To run the API, simply run:

+
APP_NAME=my_app CONF_DIR=config api
+
+

The starting configuration sets only a single worker process, which you can run using:

+
worker my_app config 0     
+
+

The second way is to use the docker-compose.app.yml file, which runs the API and the worker processes +in separate containers. To run the API, simply run:

+
docker compose -f docker-compose.app.yml up -d --build
+
+

Either way, to test that everything is running properly, you can run: +

curl -X 'GET' 'http://localhost:5000/' \
+     -H 'Accept: application/json' 
+

+

Which should return a JSON response with the following content: +

{
+   "detail": "It works!"
+}
+

+

You are now ready to start developing your application!

+

Installing for platform development

+

Pre-requisites: Python 3.9 or higher, pip (with virtualenv installed), git, Docker and Docker Compose.

+

Pull the repository and install using:

+
git clone --branch new_dp3 git@github.com:CESNET/dp3.git dp3 
+cd dp3
+python3 -m venv venv  # (1)!
+source venv/bin/activate  # (2)!
+python -m pip install --upgrade pip  # (3)!
+pip install --editable ".[dev]" # (4)!
+pre-commit install  # (5)!
+
+
    +
  1. We recommend using virtual environment. If you are not familiar with it, please read + this first. + Note for Windows: If python3 does not work, try py -3 or python instead.
  2. +
  3. Windows: venv/Scripts/activate.bat
  4. +
  5. We require pip>=21.0.1 for the pyproject.toml support. + If your pip is up-to-date, you can skip this step.
  6. +
  7. Install using editable mode to allow for changes in the code to be reflected in the installed package. + Also, install the development dependencies, such as pre-commit and mkdocs.
  8. +
  9. Install pre-commit hooks to automatically format and lint the code before committing.
  10. +
+

With the dependencies, the pre-commit package is installed. +You can verify the installation using pre-commit --version. +Pre-commit is used to automatically unify code formatting and perform code linting. +The hooks configured in .pre-commit-config.yaml should now run automatically on every commit.

+

In case you want to make sure, you can run pre-commit run --all-files to see it in action.

+

Running the dependencies and the platform

+

The DP³ platform is now installed and ready for development. +To run it, we first need to set up the other services the platform depends on, +such as the MongoDB database, the RabbitMQ message distribution and the Redis database. +This can be done using the supplied docker-compose.yml file. Simply run:

+
docker compose up -d --build  # (1)!
+
+
    +
  1. The -d flag runs the services in the background, so you can continue working in the same terminal. + The --build flag forces Docker to rebuild the images, so you can be sure you are running the latest version. + If you want to run the services in the foreground, omit the -d flag.
  2. +
+
+On Docker Compose +

Docker Compose can be installed as a standalone (older v1) or as a plugin (v2), +the only difference is when executing the command:

+
+

Note that Compose standalone uses the dash compose syntax instead of current’s standard syntax (space compose). +For example: type docker-compose up when using Compose standalone, instead of docker compose up.

+
+

This documentation uses the v2 syntax, so if you have the standalone version installed, adjust accordingly.

+
+

After the first compose up command, the images for RabbitMQ, MongoDB and Redis will be downloaded, +their images will be built according to the configuration and all three services will be started. +On subsequent runs, Docker will use the cache, so if the configuration does not change, the download +and build steps will not be repeated.

+

The configuration is taken implicitly from the docker-compose.yml file in the current directory. +The docker-compose.yml configuration contains the configuration for the services, +as well as a testing setup of the DP³ platform itself. +The full configuration is in tests/test_config. +The setup includes one worker process and one API process to handle requests. +The API process is exposed on port 5000, so you can send requests to it using curl or from your browser:

+

curl -X 'GET' 'http://localhost:5000/' \
+     -H 'Accept: application/json' 
+
+
curl -X 'POST' 'http://localhost:5000/datapoints' \
+     -H 'Content-Type: application/json' \
+     --data '[{"type": "test_entity_type", "id": "abc", "attr": "test_attr_int", "v": 123, "t1": "2023-07-01T12:00:00", "t2": "2023-07-01T13:00:00"}]'
+

+
+Docker Compose basics +

The state of running containers can be checked using:

+
docker compose ps
+
+

which will display the state of running processes. The logs of the services can be displayed using:

+
docker compose logs
+
+

which will display the logs of all services, or:

+
docker compose logs <service name>
+
+

which will display only the logs of the given service. + (In this case, the services are rabbitmq, mongo, redis, receiver_api and worker)

+

We can now focus on running the platform and developing or testing. After you are done, simply run:

+
docker compose down
+
+

which will stop and remove all containers, networks and volumes created by docker compose up.

+
+

Testing

+

With the testing platform setup running, we can now run tests. +Tests are run using the unittest framework and can be run using:

+
python -m unittest discover \
+       -s tests/test_common \
+       -v
+CONF_DIR=tests/test_config \
+python -m unittest discover \
+       -s tests/test_api \
+       -v
+
+

Documentation

+

For extending of this documentation, please refer to the Extending page.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/modules/index.html b/modules/index.html new file mode 100644 index 00000000..4f8ebb01 --- /dev/null +++ b/modules/index.html @@ -0,0 +1,1966 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Modules - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Modules

+

DP³ enables its users to create custom modules to perform application specific data analysis. +Modules are loaded using a plugin-like architecture and can influence the data flow from the +very first moment upon handling the data-point push request.

+

As described in the Architecture page, DP³ uses a categorization of modules +into primary and secondary modules. +The distinction between primary and secondary modules is such that primary modules +send data-points into the system using the HTTP API, while secondary modules react +to the data present in the system, e.g.: altering the data-flow in an application-specific manner, +deriving additional data based on incoming data-points or performing data correlation on entity snapshots.

+

This page covers the DP³ API for secondary modules, +for primary module implementation, the API documentation may be useful, +also feel free to check out the dummy_sender script in /scripts/dummy_sender.py.

+

Creating a new Module

+

First, make a directory that will contain all modules of the application. +For example, let's assume that the directory will be called /modules/.

+

As mentioned in the Processing core configuration page, +the modules directory must be specified in the modules_dir configuration option. +Let's create the main module file now - assuming the module will be called my_awesome_module, +create a file /modules/my_awesome_module.py.

+

Finally, to make the processing core load the module, add the module name to the enabled_modules +configuration option, e.g.:

+
Enabling the module in processing_core.yml
modules_dir: "/modules/"
+enabled_modules:
+  - "my_awesome_module"
+
+

Here is a basic skeleton for the module file:

+
import logging
+
+from dp3.common.base_module import BaseModule
+from dp3.common.config import PlatformConfig
+from dp3.common.callback_registrar import CallbackRegistrar
+
+
+class MyAwesomeModule(BaseModule):
+    def __init__(self,
+        _platform_config: PlatformConfig, 
+        _module_config: dict, 
+        _registrar: CallbackRegistrar
+    ):
+        self.log = logging.getLogger("MyAwesomeModule")
+
+

All modules must subclass the BaseModule class. +If a class does not subclass the BaseModule class, +it will not be loaded and activated by the main DP³ worker. +The declaration of BaseModule is as follows:

+
class BaseModule(ABC):
+
+    @abstractmethod
+    def __init__(
+        self, 
+        platform_config: PlatformConfig, 
+        module_config: dict, 
+        registrar: CallbackRegistrar
+    ):
+        pass
+
+

At initialization, each module receives a PlatformConfig, +a module_config dictionary and a +CallbackRegistrar. +For the module to do anything, it must read the provided configuration from platform_configand +module_config and register callbacks to perform data analysis using the registrar object. +Let's go through them one at a time.

+

Configuration

+

PlatformConfig contains the entire DP³ platform configuration, +which includes the application name, worker counts, which worker processes is the module running in +and a ModelSpec which contains the entity specification.

+

If you want to create configuration specific to the module itself, create a .yml configuration file +named as the module itself inside the modules/ folder, +as described in the modules configuration page. +This configuration will be then loaded into the module_config dictionary for convenience.

+

Callbacks

+

The registrar: CallbackRegistrar object +provides the API to register callbacks to be called during the data processing.

+

CRON Trigger Periodic Callbacks

+

For callbacks that need to be called periodically, +the scheduler_register +is used. +The specific times the callback will be called are defined using the CRON schedule expressions. +Here is a simplified example from the HistoryManager module:

+
registrar.scheduler_register(
+    self.delete_old_dps, minute="*/10"  # (1)!
+)
+registrar.scheduler_register(
+    self.archive_old_dps, minute=0, hour=2  # (2)!
+)  
+
+
    +
  1. At every 10th minute.
  2. +
  3. Every day at 2 AM.
  4. +
+

By default, the callback will receive no arguments, but you can pass static arguments for every call +using the func_args and func_kwargs keyword arguments. +The function return value will always be ignored.

+

The complete documentation can be found at the +scheduler_register page. +As DP³ utilizes the APScheduler package internally +to realize this functionality, specifically the CronTrigger, feel free to check their documentation for more details.

+

Callbacks within processing

+

There are a number of possible places to register callback functions during data-point processing.

+

Task on_task_start hook

+

A hook will be called on task processing start. +The callback is registered using the +register_task_hook method. +Required signature is Callable[[DataPointTask], Any], as the return value is ignored. +It may be useful for implementing custom statistics.

+
def task_hook(task: DataPointTask):
+    print(task.etype)
+
+registrar.register_task_hook("on_task_start", task_hook)
+
+

Entity allow_entity_creation hook

+

Receives eid and Task, may prevent entity record creation (by returning False). +The callback is registered using the +register_entity_hook method. +Required signature is Callable[[str, DataPointTask], bool].

+
def entity_creation(eid: str, task: DataPointTask) -> bool:
+    return eid.startswith("1")
+
+registrar.register_entity_hook(
+    "allow_entity_creation", entity_creation, "test_entity_type"
+)
+
+

Entity on_entity_creation hook

+

Receives eid and Task, may return new DataPointTasks.

+

The callback is registered using the +register_entity_hook method. +Required signature is Callable[[str, DataPointTask], list[DataPointTask]].

+
def processing_function(eid: str, task: DataPointTask) -> list[DataPointTask]:
+    output = does_work(task)
+    return [DataPointTask(
+        model_spec=task.model_spec,
+        etype="mac",
+        eid=eid,
+        data_points=[{
+            "etype": "test_enitity_type",
+            "eid": eid,
+            "attr": "derived_on_creation",
+            "src": "secondary/derived_on_creation",
+            "v": output
+        }]
+    )]
+
+registrar.register_entity_hook(
+    "on_entity_creation", processing_function, "test_entity_type"
+)
+
+

Attribute hooks

+

There are register points for all attribute types: +on_new_plain, on_new_observation, on_new_ts_chunk.

+

Callbacks are registered using the +register_attr_hook method. +The callback allways receives eid, attribute and Task, and may return new DataPointTasks. +The required signature is Callable[[str, DataPointBase], list[DataPointTask]].

+
def attr_hook(eid: str, dp: DataPointBase) -> list[DataPointTask]:
+    ...
+    return []
+
+registrar.register_attr_hook(
+    "on_new_observation", attr_hook, "test_entity_type", "test_attr_type",
+)
+
+

Timeseries hook

+

Timeseries hooks are run before snapshot creation, and allow to process the accumulated +timeseries data into observations / plain attributes to be accessed in snapshots.

+

Callbacks are registered using the +register_timeseries_hook method. +The expected callback signature is Callable[[str, str, list[dict]], list[DataPointTask]], +as the callback should expect entity_type, attr_type and attribute history as arguments +and return a list of DataPointTask objects.

+
def timeseries_hook(
+        entity_type: str, attr_type: str, attr_history: list[dict]
+) -> list[DataPointTask]:
+    ...
+    return []
+
+
+registrar.register_timeseries_hook(
+    timeseries_hook, "test_entity_type", "test_attr_type",
+)
+
+

Correlation callbacks

+

Correlation callbacks are called during snapshot creation, and allow to perform analysis +on the data of the snapshot.

+

The register_correlation_hook +method expects a callable with the following signature: +Callable[[str, dict], None], where the first argument is the entity type, and the second is a dict +containing the current values of the entity and its linked entities.

+

As correlation hooks can depend on each other, the hook inputs and outputs must be specified +using the depends_on and may_change arguments. Both arguments are lists of lists of strings, +where each list of strings is a path from the specified entity type to individual attributes (even on linked entities). +For example, if the entity type is test_entity_type, and the hook depends on the attribute test_attr_type1, +the path is simply [["test_attr_type1"]]. If the hook depends on the attribute test_attr_type1 +of an entity linked using test_attr_link, the path will be [["test_attr_link", "test_attr_type1"]].

+
def correlation_hook(entity_type: str, values: dict):
+    ...
+
+registrar.register_correlation_hook(
+    correlation_hook, "test_entity_type", [["test_attr_type1"]], [["test_attr_type2"]]
+)
+
+

The order of running callbacks is determined automatically, based on the dependencies. +If there is a cycle in the dependencies, a ValueError will be raised at registration. +Also, if the provided dependency / output paths are invalid, a ValueError will be raised.

+

Running module code in a separate thread

+

The module is free to run its own code in separate threads or processes. +To synchronize such code with the platform, use the start() and stop() +methods of the BaseModule class. +the start() method is called after the platform is initialized, and the stop() method +is called before the platform is shut down.

+
class MyModule(BaseModule):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._thread = None
+        self._stop_event = threading.Event()
+        self.log = logging.getLogger("MyModule")
+
+    def start(self):
+        self._thread = threading.Thread(target=self._run, daemon=True)
+        self._thread.start()
+
+    def stop(self):
+        self._stop_event.set()
+        self._thread.join()
+
+    def _run(self):
+        while not self._stop_event.is_set():
+            self.log.info("Hello world!")
+            time.sleep(1)
+
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/objects.inv b/objects.inv new file mode 100644 index 00000000..c3217f16 --- /dev/null +++ b/objects.inv @@ -0,0 +1,15 @@ +# Sphinx inventory version 2 +# Project: DP3 +# Version: 0.0.0 +# The remainder of this file is compressed using zlib. +xÚÅ›Msœ8†ïû+\µ{]R[¹í-k»âC\q<®ìQ¥A= 1 , ¯ýï·%¾g!@àK2õóê[Ý °üs 3šË#W²ùEŽœ?Ë`Wýù§ AÄ ïðú5Ï3€(– +ÄUþþ÷¡ÈBóìê¯+…ð©!ê“?ýþÇÕŸ¿±ü³¶L9+èÙÕ·ƒÿ¸x.ùJ•7O  §Œ”Åå`ÍlSl˜³ Íãšá^±†;PVßîÄökF‹Q]ä²aòìG.öeI+&¸6ÿÝf¯&TÊ™<–“„GÑà öHMá1Xpóð­¡ŽTo:4 Šn½/N¥"ø7‰8g« +ì©'>d*VïD€Ìy&A/8H¤Ë ^¶œ$ÜšË;E8Œ¸?ÅÛ˜}Ã-okͪèÖš_”?iRÀwq‡MæâýÃjà[øIÂi²VEí }‘(Nð>a8T9ÇnËÌÎw_TÖöV«gÑC]µ‘þ´òf,z·9 +w@u¼>BøüXÝr¨û<­]† åê:ðR€T8±cœ+89n…àb©œà…²OéªÄE3}Ê+Áóª¤À„x…ÐröÛV‚·\ݵ+ Ú A¨gi‰£5s`&¸AVH1“¾¸(Í3›Šo&ÅÝ•¼ÖÛ«²ôOœ+‡a×ņ¸Mëm„˜‘wªÒ/ÆU(T»ÕKÐvÅ0¸Œ0öÝàÿ°.ºh/xÑÅ÷k10÷/˜àÖ—'4¢ Åjx: P¬ÑŸ¶9ÈBž¦|¨Ê›§…=[e¡Ýª.5hhoåé=88&c‚ƒà)‘j8ÞpF~ßãt}5g¬\¹*h*VµM <í<wæ_]¹Ã_!C\è‰v­¾iA,ÑwMh}\êr~À5ýS?à‡¤Êexh}wfzB¶sÈpöúÛSôÕÌ]ûÖÒ–»Œ¨,Çåm$øß7¼ÁÞ™Ž ¤¢BõÕ,.Ï—`Cš${Š_™¬TØ{ò¼¼2¸®.=vE»×“F Ñ£Ñíd4ëY¹Ö#ŠÊg“ÓÝ\¹r·?DÛx¸¢¬šíícôC.šÅWl‰ês’Rî™w1 8<ƨv‡Ê¾ø&¡t¸4ÚÆiÄ"g6_Ø Zå?ñØYØÞ{K¨¹™ ôÔ‹ôºÈ™ ”‘v:Ì°ẊY’"­m?oÑ7ÖNjˆ­W\ng¬;lü`ufaä_æzÈ‘f,á‹œ ®³r¤ºnΛ¹ì^jvxD›bÀ6•ª=ëØN¢ßÚ/²ëbû%·ž¶.F@&ÂCc66È.H–…qçJ§£tº• ã ë+“»År™–ªÒ +ö9«K š›aMO¸cØ´4ΠÌôµÍwiª£†NU&qö l,_;­¹¸YI{nuŒ[Vk<}Ô–³ :§ºu”,°&"°W§)6 v]Ô`]&‘×0e5§Æ¿øÞ/V€,Ò…Üú¸ +]â¢YðTÙö½ÝÔd—ã,"Ï0ºnFH¸Jp«—4‚ù öؘ߬ú}¤éæØÉHƹlˆ3Là ‡ÐÄ_c0NÒ +A­Çpèê¼ê'Ï¿¤ŽÉ8/%é=[Û’ŒNî)½Íî+_âÂ@Õ·Ï œ-e¿ãRÙâ™éÄBϘ&<¿ÜGÈŒ-ýB½ÑÊ#í¦3<^Å-Q)Cv’R“¢“ºŽFãèXž0Âòu$ô +íµdÀc‡À÷µb=™ò)ãVjúA)†A²sÖm¡³^ƒÖW(#Œ¼Ží‹jþ'´ÄS[ï!/2eÍc‰Ž¤¯°îè÷VlD +ŠZ£«¥ãÎÛÜƪC¯µº‰÷¥:ûÿúë¦ô”HH1~ñ#¢©:åûd<ë\©ú7Û Ã“uçm“KÂzb8EDÇ< çÍmr6H{Ö¬^ð¸oÛ¹±¤« ³Y%Æ×ѺU1^7ïE"}`±¼Þ²n”ðR”Ÿ•¬Q …ÝŽÛ¾xw_`I îØJûYѹIý‹«ÁU~öU’.;‚2¹‡]‹=_TK©#Ow<à-Ï{Ó'? ö§èüHz±$eL'[LêØ~h.–J鳋¶Xç4Ån{÷t±Xý0¯ùÐú4Ïs’ý»Î÷WÖÓ™t¹QûV\hEÖ]ÕÕ ÆY´ž$¼)Șc–b±šù`´÷œ‰Ø‚;ïkn¹2B¦Êž¿ö"•‰Š·ĉ"hΓ2.Ï1Ó\ozÕçÊ®~Äé×ͳ¿¬vô0¼éé5>ÿ#îqÙëöˆÞ¦C‚>Y_"¾N/Å%pü BÁ›÷½wb¤¼•µùÆÅÒwkn¬Ñe—dE°…VH“°0;Vœa31J›âsÍ×EÿŸ'<Òï!ɧ„gšæ­ß¾#pa<)5h^þÍôë¡ñ¾¨^!qàš¹ ˜ÇÚ7}Áói½žÞH¬·ª°% +\¨[~õ9møjG´iÏmGÇqÔ–ÈLyÙ}Eq—÷ÝW”åÝ¿x/ZÃÄ%²6ïî"×¾v ¦!Õ‡gÚ #>µ2Á¿‘˜þ:jU©— +˜24ÆÀ<ò}!Õ—ûXù šoÔk¿LH¿j£N›Ïîšzp~è_ÿŠXM•ÙýÃjÏb{ázH¥š±-ÌWÍ‹ô²µ‡­Ô˜ã„,›êxÌ×Úf>Vbt=©»/»»Ùìÿx7ŸË \ No newline at end of file diff --git a/reference/SUMMARY/index.html b/reference/SUMMARY/index.html new file mode 100644 index 00000000..acc1e566 --- /dev/null +++ b/reference/SUMMARY/index.html @@ -0,0 +1,1555 @@ + + + + + + + + + + + + + + + + + + + + SUMMARY - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + + + + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/index.html b/reference/api/index.html new file mode 100644 index 00000000..d78fa129 --- /dev/null +++ b/reference/api/index.html @@ -0,0 +1,1539 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + api - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/config/index.html b/reference/api/internal/config/index.html new file mode 100644 index 00000000..faea9795 --- /dev/null +++ b/reference/api/internal/config/index.html @@ -0,0 +1,1637 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + config - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal.config + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ ConfigEnv + + +

+ + +
+

+ Bases: BaseModel

+ + +

Configuration environment variables container

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/dp_logger/index.html b/reference/api/internal/dp_logger/index.html new file mode 100644 index 00000000..ead6246f --- /dev/null +++ b/reference/api/internal/dp_logger/index.html @@ -0,0 +1,1880 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + dp_logger - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal.dp_logger + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ DPLogger + + +

+
DPLogger(config: dict)
+
+ +
+ + +

Datapoint logger

+

Logs good/bad datapoints into file for further analysis. +They are logged in JSON format. +Bad datapoints are logged together with their error message.

+

Logging may be disabled in api.yml configuration file:

+
# ...
+datapoint_logger:
+  good_log: false
+  bad_log: false
+# ...
+
+ + +
+ Source code in dp3/api/internal/dp_logger.py +
29
+30
+31
+32
+33
+34
+35
+36
+37
+38
def __init__(self, config: dict):
+    if not config:
+        config = {}
+
+    good_log_file = config.get("good_log", False)
+    bad_log_file = config.get("bad_log", False)
+
+    # Setup loggers
+    self._good_logger = self.setup_logger("GOOD", good_log_file)
+    self._bad_logger = self.setup_logger("BAD", bad_log_file)
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ setup_logger + + +

+
setup_logger(name: str, log_file: str)
+
+ +
+ +

Creates new logger instance with log_file as target

+ +
+ Source code in dp3/api/internal/dp_logger.py +
40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
def setup_logger(self, name: str, log_file: str):
+    """Creates new logger instance with `log_file` as target"""
+    # Create log handler
+    if log_file:
+        parent_path = pathlib.Path(log_file).parent
+        if not parent_path.exists():
+            raise FileNotFoundError(
+                f"The directory {parent_path} does not exist,"
+                " check the configured path or create the directory."
+            )
+        log_handler = logging.FileHandler(log_file)
+        log_handler.setFormatter(self.LOG_FORMATTER)
+    else:
+        log_handler = logging.NullHandler()
+
+    # Get logger instance
+    logger = logging.getLogger(name)
+    logger.addHandler(log_handler)
+    logger.setLevel(logging.INFO)
+
+    return logger
+
+
+
+ +
+ +
+ + + +

+ log_good + + +

+
log_good(dps: list[DataPointBase], src: str = UNKNOWN_SRC_MSG)
+
+ +
+ +

Logs good datapoints

+

Datapoints are logged one-by-one in processed form. +Source should be IP address of incomping request.

+ +
+ Source code in dp3/api/internal/dp_logger.py +
62
+63
+64
+65
+66
+67
+68
+69
def log_good(self, dps: list[DataPointBase], src: str = UNKNOWN_SRC_MSG):
+    """Logs good datapoints
+
+    Datapoints are logged one-by-one in processed form.
+    Source should be IP address of incomping request.
+    """
+    for dp in dps:
+        self._good_logger.info(dp.json(), extra={"src": src})
+
+
+
+ +
+ +
+ + + +

+ log_bad + + +

+
log_bad(request_body: str, validation_error_msg: str, src: str = UNKNOWN_SRC_MSG)
+
+ +
+ +

Logs bad datapoints including the validation error message

+

Whole request body is logged at once (JSON string is expected). +Source should be IP address of incomping request.

+ +
+ Source code in dp3/api/internal/dp_logger.py +
71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
def log_bad(self, request_body: str, validation_error_msg: str, src: str = UNKNOWN_SRC_MSG):
+    """Logs bad datapoints including the validation error message
+
+    Whole request body is logged at once (JSON string is expected).
+    Source should be IP address of incomping request.
+    """
+    # Remove newlines from request body
+    request_body = request_body.replace("\n", " ")
+
+    # Prepend error message with tabs
+    validation_error_msg = validation_error_msg.replace("\n", "\n\t")
+
+    self._bad_logger.info(f"{request_body}\n\t{validation_error_msg}", extra={"src": src})
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/entity_response_models/index.html b/reference/api/internal/entity_response_models/index.html new file mode 100644 index 00000000..6e54f85b --- /dev/null +++ b/reference/api/internal/entity_response_models/index.html @@ -0,0 +1,1864 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + entity_response_models - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal.entity_response_models + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ EntityState + + +

+ + +
+

+ Bases: BaseModel

+ + +

Entity specification and current state

+

Merges (some) data from DP3's EntitySpec and state information from Database. +Provides estimate count of master records in database.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ EntityEidList + + +

+ + +
+

+ Bases: BaseModel

+ + +

List of entity eids and their data based on latest snapshot

+

Includes timestamp of latest snapshot creation.

+

Data does not include history of observations attributes and timeseries.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ EntityEidData + + +

+ + +
+

+ Bases: BaseModel

+ + +

Data of entity eid

+

Includes all snapshots and master record.

+

empty signalizes whether this eid includes any data.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ EntityEidAttrValueOrHistory + + +

+ + +
+

+ Bases: BaseModel

+ + +

Value and/or history of entity attribute for given eid

+

Depends on attribute type: +- plain: just (current) value +- observations: (current) value and history stored in master record (optionally filtered) +- timeseries: just history stored in master record (optionally filtered)

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ EntityEidAttrValue + + +

+ + +
+

+ Bases: BaseModel

+ + +

Value of entity attribute for given eid

+

The value is fetched from master record.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/helpers/index.html b/reference/api/internal/helpers/index.html new file mode 100644 index 00000000..773881f0 --- /dev/null +++ b/reference/api/internal/helpers/index.html @@ -0,0 +1,1670 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + helpers - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal.helpers + + +

+ +
+ + + +
+ + + + + + + + + +
+ + + +

+ api_to_dp3_datapoint + + +

+
api_to_dp3_datapoint(api_dp_values: dict) -> DataPointBase
+
+ +
+ +

Converts API datapoint values to DP3 datapoint

+

If etype-attr pair doesn't exist in DP3 config, raises ValueError. +If values are not valid, raises pydantic's ValidationError.

+ +
+ Source code in dp3/api/internal/helpers.py +
 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
def api_to_dp3_datapoint(api_dp_values: dict) -> DataPointBase:
+    """Converts API datapoint values to DP3 datapoint
+
+    If etype-attr pair doesn't exist in DP3 config, raises `ValueError`.
+    If values are not valid, raises pydantic's ValidationError.
+    """
+    etype = api_dp_values["type"]
+    attr = api_dp_values["attr"]
+
+    # Convert to DP3 datapoint format
+    dp3_dp_values = api_dp_values
+    dp3_dp_values["etype"] = etype
+    dp3_dp_values["eid"] = api_dp_values["id"]
+
+    # Get attribute-specific model
+    try:
+        model = MODEL_SPEC.attr(etype, attr).dp_model
+    except KeyError as e:
+        raise ValueError(f"Combination of type '{etype}' and attr '{attr}' doesn't exist") from e
+
+    # Parse using the model
+    # This may raise pydantic's ValidationError, but that's intensional (to get
+    # a JSON-serializable trace as a response from API).
+    return model.parse_obj(dp3_dp_values)
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/index.html b/reference/api/internal/index.html new file mode 100644 index 00000000..f8762f56 --- /dev/null +++ b/reference/api/internal/index.html @@ -0,0 +1,1541 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + internal - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/models/index.html b/reference/api/internal/models/index.html new file mode 100644 index 00000000..7330d68f --- /dev/null +++ b/reference/api/internal/models/index.html @@ -0,0 +1,1644 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + models - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal.models + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ DataPoint + + +

+ + +
+

+ Bases: BaseModel

+ + +

Data-point for API

+

Contains single raw data value received on API. +This is generic class for plain, observation and timeseries datapoints.

+

Provides front line of validation for this data value.

+

This differs slightly compared to DataPoint from DP3 in naming of attributes due to historic +reasons.

+

After validation of this schema, datapoint is validated using attribute-specific validator to +ensure full compilance.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/internal/response_models/index.html b/reference/api/internal/response_models/index.html new file mode 100644 index 00000000..0b3aef43 --- /dev/null +++ b/reference/api/internal/response_models/index.html @@ -0,0 +1,1753 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + response_models - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.internal.response_models + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ HealthCheckResponse + + +

+ + +
+

+ Bases: BaseModel

+ + +

Healthcheck endpoint response

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ SuccessResponse + + +

+ + +
+

+ Bases: BaseModel

+ + +

Generic success response

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ RequestValidationError + + +

+
RequestValidationError(loc, msg)
+
+ +
+

+ Bases: HTTPException

+ + +

HTTP exception wrapper to simplify path and query validation

+ + +
+ Source code in dp3/api/internal/response_models.py +
def __init__(self, loc, msg):
+    super().__init__(422, [{"loc": loc, "msg": msg, "type": "value_error"}])
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/main/index.html b/reference/api/main/index.html new file mode 100644 index 00000000..755f781d --- /dev/null +++ b/reference/api/main/index.html @@ -0,0 +1,1549 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + main - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.main + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/routers/control/index.html b/reference/api/routers/control/index.html new file mode 100644 index 00000000..99380b13 --- /dev/null +++ b/reference/api/routers/control/index.html @@ -0,0 +1,1634 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + control - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.routers.control + + +

+ +
+ + + +
+ + + + + + + + + +
+ + + +

+ execute_action + + + + async + + +

+
execute_action(action: ControlAction) -> SuccessResponse
+
+ +
+ +

Sends the given action into execution queue.

+ +
+ Source code in dp3/api/routers/control.py +
10
+11
+12
+13
+14
@router.get("/{action}")
+async def execute_action(action: ControlAction) -> SuccessResponse:
+    """Sends the given action into execution queue."""
+    CONTROL_WRITER.put_task(ControlMessage(action=action))
+    return SuccessResponse(detail="Action sent.")
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/routers/entity/index.html b/reference/api/routers/entity/index.html new file mode 100644 index 00000000..83019171 --- /dev/null +++ b/reference/api/routers/entity/index.html @@ -0,0 +1,2048 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + entity - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.routers.entity + + +

+ +
+ + + +
+ + + + + + + + + +
+ + + +

+ check_entity + + + + async + + +

+
check_entity(entity: str)
+
+ +
+ +

Middleware to check entity existence

+ +
+ Source code in dp3/api/routers/entity.py +
23
+24
+25
+26
+27
async def check_entity(entity: str):
+    """Middleware to check entity existence"""
+    if entity not in MODEL_SPEC.entities:
+        raise RequestValidationError(["path", "entity"], f"Entity '{entity}' doesn't exist")
+    return entity
+
+
+
+ +
+ +
+ + + +

+ list_entity_eids + + + + async + + +

+
list_entity_eids(entity: str, skip: NonNegativeInt = 0, limit: PositiveInt = 20) -> EntityEidList
+
+ +
+ +

List latest snapshots of all ids present in database under entity.

+

Contains only latest snapshot.

+

Uses pagination.

+ +
+ Source code in dp3/api/routers/entity.py +
33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
@router.get("/{entity}")
+async def list_entity_eids(
+    entity: str, skip: NonNegativeInt = 0, limit: PositiveInt = 20
+) -> EntityEidList:
+    """List latest snapshots of all `id`s present in database under `entity`.
+
+    Contains only latest snapshot.
+
+    Uses pagination.
+    """
+    cursor = DB.get_latest_snapshots(entity).skip(skip).limit(limit)
+
+    time_created = None
+
+    # Remove _id field
+    result = list(cursor)
+    for r in result:
+        time_created = r["_time_created"]
+        del r["_time_created"]
+        del r["_id"]
+
+    return EntityEidList(time_created=time_created, data=result)
+
+
+
+ +
+ +
+ + + +

+ get_eid_data + + + + async + + +

+
get_eid_data(entity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidData
+
+ +
+ +

Get data of entity's eid.

+

Contains all snapshots and master record. +Snapshots are ordered by ascending creation time.

+ +
+ Source code in dp3/api/routers/entity.py +
57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
@router.get("/{entity}/{eid}")
+async def get_eid_data(
+    entity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None
+) -> EntityEidData:
+    """Get data of `entity`'s `eid`.
+
+    Contains all snapshots and master record.
+    Snapshots are ordered by ascending creation time.
+    """
+    # Get master record
+    # TODO: This is probably not the most efficient way. Maybe gather only
+    # plain data from master record and then call `get_timeseries_history`
+    # for timeseries.
+    master_record = DB.get_master_record(entity, eid)
+    if "_id" in master_record:
+        del master_record["_id"]
+    if "#hash" in master_record:
+        del master_record["#hash"]
+
+    # Get filtered timeseries data
+    for attr in master_record:
+        if MODEL_SPEC.attr(entity, attr).t == AttrType.TIMESERIES:
+            master_record[attr] = DB.get_timeseries_history(
+                entity, attr, eid, t1=date_from, t2=date_to
+            )
+
+    # Get snapshots
+    snapshots = list(DB.get_snapshots(entity, eid, t1=date_from, t2=date_to))
+    for s in snapshots:
+        del s["_id"]
+
+    # Whether this eid contains any data
+    empty = not master_record and len(snapshots) == 0
+
+    return EntityEidData(empty=empty, master_record=master_record, snapshots=snapshots)
+
+
+
+ +
+ +
+ + + +

+ get_eid_attr_value + + + + async + + +

+
get_eid_attr_value(entity: str, eid: str, attr: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidAttrValueOrHistory
+
+ +
+ +

Get attribute value

+

Value is either of: +- current value: in case of plain attribute +- current value and history: in case of observation attribute +- history: in case of timeseries attribute

+ +
+ Source code in dp3/api/routers/entity.py +
@router.get("/{entity}/{eid}/get/{attr}")
+async def get_eid_attr_value(
+    entity: str,
+    eid: str,
+    attr: str,
+    date_from: Optional[datetime] = None,
+    date_to: Optional[datetime] = None,
+) -> EntityEidAttrValueOrHistory:
+    """Get attribute value
+
+    Value is either of:
+    - current value: in case of plain attribute
+    - current value and history: in case of observation attribute
+    - history: in case of timeseries attribute
+    """
+    # Check if attribute exists
+    if attr not in MODEL_SPEC.attribs(entity):
+        raise RequestValidationError(["path", "attr"], f"Attribute '{attr}' doesn't exist")
+
+    value_or_history = DB.get_value_or_history(entity, attr, eid, t1=date_from, t2=date_to)
+
+    return EntityEidAttrValueOrHistory(
+        attr_type=MODEL_SPEC.attr(entity, attr).t, **value_or_history
+    )
+
+
+
+ +
+ +
+ + + +

+ set_eid_attr_value + + + + async + + +

+
set_eid_attr_value(entity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request) -> SuccessResponse
+
+ +
+ +

Set current value of attribute

+

Internally just creates datapoint for specified attribute and value.

+

This endpoint is meant for editable plain attributes -- for direct user edit on DP3 web UI.

+ +
+ Source code in dp3/api/routers/entity.py +
@router.post("/{entity}/{eid}/set/{attr}")
+async def set_eid_attr_value(
+    entity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request
+) -> SuccessResponse:
+    """Set current value of attribute
+
+    Internally just creates datapoint for specified attribute and value.
+
+    This endpoint is meant for `editable` plain attributes -- for direct user edit on DP3 web UI.
+    """
+    # Check if attribute exists
+    if attr not in MODEL_SPEC.attribs(entity):
+        raise RequestValidationError(["path", "attr"], f"Attribute '{attr}' doesn't exist")
+
+    # Construct datapoint
+    try:
+        dp = DataPoint(
+            type=entity,
+            id=eid,
+            attr=attr,
+            v=body.value,
+            t1=datetime.now(),
+            src=f"{request.client.host} via API",
+        )
+        dp3_dp = api_to_dp3_datapoint(dp.dict())
+    except ValidationError as e:
+        raise RequestValidationError(["body", "value"], e.errors()[0]["msg"]) from e
+
+    # This shouldn't fail
+    task = DataPointTask(model_spec=MODEL_SPEC, etype=entity, eid=eid, data_points=[dp3_dp])
+
+    # Push tasks to task queue
+    TASK_WRITER.put_task(task, False)
+
+    # Datapoints from this endpoint are intentionally not logged using `DPLogger`.
+    # If for some reason, in the future, they need to be, just copy code from data ingestion
+    # endpoint.
+
+    return SuccessResponse()
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/routers/index.html b/reference/api/routers/index.html new file mode 100644 index 00000000..1030b31c --- /dev/null +++ b/reference/api/routers/index.html @@ -0,0 +1,1541 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + routers - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.routers + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/api/routers/root/index.html b/reference/api/routers/root/index.html new file mode 100644 index 00000000..265e1ade --- /dev/null +++ b/reference/api/routers/root/index.html @@ -0,0 +1,1830 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + root - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.api.routers.root + + +

+ +
+ + + +
+ + + + + + + + + +
+ + + +

+ health_check + + + + async + + +

+
health_check() -> HealthCheckResponse
+
+ +
+ +

Health check

+

Returns simple 'It works!' response.

+ +
+ Source code in dp3/api/routers/root.py +
21
+22
+23
+24
+25
+26
+27
@router.get("/", tags=["Health"])
+async def health_check() -> HealthCheckResponse:
+    """Health check
+
+    Returns simple 'It works!' response.
+    """
+    return HealthCheckResponse()
+
+
+
+ +
+ +
+ + + +

+ insert_datapoints + + + + async + + +

+
insert_datapoints(dps: list[DataPoint], request: Request) -> SuccessResponse
+
+ +
+ +

Insert datapoints

+

Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.

+ +
+ Source code in dp3/api/routers/root.py +
30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
@router.post(DATAPOINTS_INGESTION_URL_PATH, tags=["Data ingestion"])
+async def insert_datapoints(dps: list[DataPoint], request: Request) -> SuccessResponse:
+    """Insert datapoints
+
+    Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.
+    """
+    # Convert to DP3 datapoints
+    # This should not fail as all datapoints are already validated
+    dp3_dps = [api_to_dp3_datapoint(dp.dict()) for dp in dps]
+
+    # Group datapoints by etype-eid
+    tasks_dps = defaultdict(list)
+    for dp in dp3_dps:
+        key = (dp.etype, dp.eid)
+        tasks_dps[key].append(dp)
+
+    # Create tasks
+    tasks = []
+    for k in tasks_dps:
+        etype, eid = k
+
+        # This shouldn't fail either
+        tasks.append(
+            DataPointTask(model_spec=MODEL_SPEC, etype=etype, eid=eid, data_points=tasks_dps[k])
+        )
+
+    # Push tasks to task queue
+    for task in tasks:
+        TASK_WRITER.put_task(task, False)
+
+    # Log datapoints
+    DP_LOGGER.log_good(dp3_dps, src=request.client.host)
+
+    return SuccessResponse()
+
+
+
+ +
+ +
+ + + +

+ list_entities + + + + async + + +

+
list_entities() -> dict[str, EntityState]
+
+ +
+ +

List entities

+

Returns dictionary containing all entities configured -- their simplified configuration +and current state information.

+ +
+ Source code in dp3/api/routers/root.py +
66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
@router.get("/entities", tags=["Entity"])
+async def list_entities() -> dict[str, EntityState]:
+    """List entities
+
+    Returns dictionary containing all entities configured -- their simplified configuration
+    and current state information.
+    """
+    entities = {}
+
+    for e_id in MODEL_SPEC.entities:
+        entity_spec = MODEL_SPEC.entity(e_id)
+        entities[e_id] = {
+            "id": e_id,
+            "name": entity_spec.name,
+            "attribs": MODEL_SPEC.attribs(e_id),
+            "eid_estimate_count": DB.estimate_count_eids(e_id),
+        }
+
+    return entities
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/bin/api/index.html b/reference/bin/api/index.html new file mode 100644 index 00000000..2d38c48e --- /dev/null +++ b/reference/bin/api/index.html @@ -0,0 +1,1551 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + api - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.bin.api + + +

+ +
+ +

Run the DP3 API using uvicorn.

+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/bin/index.html b/reference/bin/index.html new file mode 100644 index 00000000..8927872e --- /dev/null +++ b/reference/bin/index.html @@ -0,0 +1,1539 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + bin - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.bin + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/bin/setup/index.html b/reference/bin/setup/index.html new file mode 100644 index 00000000..7e682b1d --- /dev/null +++ b/reference/bin/setup/index.html @@ -0,0 +1,1650 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + setup - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.bin.setup + + +

+ +
+ +

DP3 Setup Script for creating a DP3 application.

+ + + +
+ + + + + + + + + +
+ + + +

+ replace_template + + +

+
replace_template(directory: Path, template: str, replace_with: str)
+
+ +
+ +

Replace all occurrences of template with the given text.

+ +
+ Source code in dp3/bin/setup.py +
 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
def replace_template(directory: Path, template: str, replace_with: str):
+    """Replace all occurrences of `template` with the given text."""
+    for file in directory.rglob("*"):
+        if file.is_file():
+            try:
+                with file.open("r+") as f:
+                    contents = f.read()
+                    contents = contents.replace(template, replace_with)
+                    f.seek(0)
+                    f.write(contents)
+                    f.truncate()
+            except UnicodeDecodeError:
+                pass
+            except PermissionError:
+                pass
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/bin/worker/index.html b/reference/bin/worker/index.html new file mode 100644 index 00000000..bd34f95a --- /dev/null +++ b/reference/bin/worker/index.html @@ -0,0 +1,1549 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + worker - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.bin.worker + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/attrspec/index.html b/reference/common/attrspec/index.html new file mode 100644 index 00000000..8ed1df4d --- /dev/null +++ b/reference/common/attrspec/index.html @@ -0,0 +1,2393 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + attrspec - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.attrspec + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ AttrType + + +

+ + +
+

+ Bases: Flag

+ + +

Enum of attribute types

+

PLAIN = 1 +OBSERVATIONS = 2 +TIMESERIES = 4

+ + + + + +
+ + + + + + + + + +
+ + + +

+ from_str + + + + classmethod + + +

+
from_str(type_str: str)
+
+ +
+ +

Convert string representation like "plain" to AttrType.

+ +
+ Source code in dp3/common/attrspec.py +
58
+59
+60
+61
+62
+63
+64
+65
+66
@classmethod
+def from_str(cls, type_str: str):
+    """
+    Convert string representation like "plain" to AttrType.
+    """
+    try:
+        return cls(cls[type_str.upper()])
+    except Exception as e:
+        raise AttrTypeError(f"Invalid attribute type '{type_str}'") from e
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ ObservationsHistoryParams + + +

+ + +
+

+ Bases: BaseModel

+ + +

History parameters field of observations attribute

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ TimeseriesTSParams + + +

+ + +
+

+ Bases: BaseModel

+ + +

Timeseries parameters field of timeseries attribute

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ TimeseriesSeries + + +

+ + +
+

+ Bases: BaseModel

+ + +

Series of timeseries attribute

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ AttrSpecGeneric + + +

+ + +
+

+ Bases: BaseModel

+ + +

Base of attribute specification

+

Parent of other AttrSpec classes.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ AttrSpecClassic + + +

+ + +
+

+ Bases: AttrSpecGeneric

+ + +

Parent of non-timeseries AttrSpec classes.

+ + + + + +
+ + + + + + + +
+ + + +

+ is_relation + + + + property + + +

+
is_relation: bool
+
+ +
+ +

Returns whether specified attribute is a link.

+
+ +
+ +
+ + + +

+ relation_to + + + + property + + +

+
relation_to: str
+
+ +
+ +

Returns linked entity id. Raises ValueError if attribute is not a link.

+
+ +
+ + + + + +
+ +
+ +
+ +
+ + + +

+ AttrSpecPlain + + +

+
AttrSpecPlain(**data)
+
+ +
+

+ Bases: AttrSpecClassic

+ + +

Plain attribute specification

+ + +
+ Source code in dp3/common/attrspec.py +
def __init__(self, **data):
+    super().__init__(**data)
+
+    self._dp_model = create_model(
+        f"DataPointPlain_{self.id}",
+        __base__=DataPointPlainBase,
+        v=(self.data_type.data_type, ...),
+    )
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ AttrSpecObservations + + +

+
AttrSpecObservations(**data)
+
+ +
+

+ Bases: AttrSpecClassic

+ + +

Observations attribute specification

+ + +
+ Source code in dp3/common/attrspec.py +
def __init__(self, **data):
+    super().__init__(**data)
+
+    value_validator = self.data_type.data_type
+
+    self._dp_model = create_model(
+        f"DataPointObservations_{self.id}",
+        __base__=DataPointObservationsBase,
+        v=(value_validator, ...),
+    )
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ AttrSpecTimeseries + + +

+
AttrSpecTimeseries(**data)
+
+ +
+

+ Bases: AttrSpecGeneric

+ + +

Timeseries attribute specification

+ + +
+ Source code in dp3/common/attrspec.py +
def __init__(self, **data):
+    super().__init__(**data)
+
+    # Typing of `v` field
+    dp_value_typing = {}
+    for s in self.series:
+        data_type = self.series[s].data_type.data_type
+        dp_value_typing[s] = ((list[data_type]), ...)
+
+    # Validators
+    dp_validators = {
+        "v_validator": dp_ts_v_validator,
+    }
+
+    # Add root validator
+    if self.timeseries_type == "regular":
+        dp_validators["root_validator"] = dp_ts_root_validator_regular_wrapper(
+            self.timeseries_params.time_step
+        )
+    elif self.timeseries_type == "irregular":
+        dp_validators["root_validator"] = dp_ts_root_validator_irregular
+    elif self.timeseries_type == "irregular_intervals":
+        dp_validators["root_validator"] = dp_ts_root_validator_irregular_intervals
+
+    self._dp_model = create_model(
+        f"DataPointTimeseries_{self.id}",
+        __base__=DataPointTimeseriesBase,
+        __validators__=dp_validators,
+        v=(create_model(f"DataPointTimeseriesValue_{self.id}", **dp_value_typing), ...),
+    )
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + +
+ + + +

+ AttrSpec + + +

+
AttrSpec(id: str, spec: dict[str, Any]) -> AttrSpecType
+
+ +
+ +

Factory for AttrSpec classes

+ +
+ Source code in dp3/common/attrspec.py +
def AttrSpec(id: str, spec: dict[str, Any]) -> AttrSpecType:
+    """Factory for `AttrSpec` classes"""
+
+    attr_type = AttrType.from_str(spec.get("type"))
+    subclasses = {
+        AttrType.PLAIN: AttrSpecPlain,
+        AttrType.OBSERVATIONS: AttrSpecObservations,
+        AttrType.TIMESERIES: AttrSpecTimeseries,
+    }
+    return subclasses[attr_type](id=id, **spec)
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/base_attrs/index.html b/reference/common/base_attrs/index.html new file mode 100644 index 00000000..54988581 --- /dev/null +++ b/reference/common/base_attrs/index.html @@ -0,0 +1,1549 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + base_attrs - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.base_attrs + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/base_module/index.html b/reference/common/base_module/index.html new file mode 100644 index 00000000..26402411 --- /dev/null +++ b/reference/common/base_module/index.html @@ -0,0 +1,1843 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + base_module - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.base_module + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ BaseModule + + +

+
BaseModule(platform_config: PlatformConfig, module_config: dict, registrar: CallbackRegistrar)
+
+ +
+

+ Bases: ABC

+ + +

Abstract class for platform modules. +Every module must inherit this abstract class for automatic loading of module!

+ + +

Initialize the module and register callbacks.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
platform_config + PlatformConfig + +
+

Platform configuration class

+
+
+ required +
module_config + dict + +
+

Configuration of the module, +equivalent of platform_config.config.get("modules.<module_name>")

+
+
+ required +
registrar + CallbackRegistrar + +
+

A callback / hook registration interface

+
+
+ required +
+ +
+ Source code in dp3/common/base_module.py +
13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
@abstractmethod
+def __init__(
+    self, platform_config: PlatformConfig, module_config: dict, registrar: CallbackRegistrar
+):
+    """Initialize the module and register callbacks.
+    Args:
+        platform_config: Platform configuration class
+        module_config: Configuration of the module,
+            equivalent of `platform_config.config.get("modules.<module_name>")`
+        registrar: A callback / hook registration interface
+    """
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ start + + +

+
start() -> None
+
+ +
+ +

Run the module - used to run own thread if needed.

+

Called after initialization, may be used to create and run a separate +thread if needed by the module. Do nothing unless overridden.

+ +
+ Source code in dp3/common/base_module.py +
25
+26
+27
+28
+29
+30
+31
+32
def start(self) -> None:
+    """
+    Run the module - used to run own thread if needed.
+
+    Called after initialization, may be used to create and run a separate
+    thread if needed by the module. Do nothing unless overridden.
+    """
+    return None
+
+
+
+ +
+ +
+ + + +

+ stop + + +

+
stop() -> None
+
+ +
+ +

Stop the module - used to stop own thread.

+

Called before program exit, may be used to finalize and stop the +separate thread if it is used. Do nothing unless overridden.

+ +
+ Source code in dp3/common/base_module.py +
34
+35
+36
+37
+38
+39
+40
+41
def stop(self) -> None:
+    """
+    Stop the module - used to stop own thread.
+
+    Called before program exit, may be used to finalize and stop the
+    separate thread if it is used. Do nothing unless overridden.
+    """
+    return None
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/callback_registrar/index.html b/reference/common/callback_registrar/index.html new file mode 100644 index 00000000..f1425cab --- /dev/null +++ b/reference/common/callback_registrar/index.html @@ -0,0 +1,2527 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + callback_registrar - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.callback_registrar + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ CallbackRegistrar + + +

+
CallbackRegistrar(scheduler: Scheduler, task_executor: TaskExecutor, snap_shooter: SnapShooter)
+
+ +
+ + +

Interface for callback registration.

+ + +
+ Source code in dp3/common/callback_registrar.py +
12
+13
+14
+15
+16
+17
def __init__(
+    self, scheduler: Scheduler, task_executor: TaskExecutor, snap_shooter: SnapShooter
+):
+    self._scheduler = scheduler
+    self._task_executor = task_executor
+    self._snap_shooter = snap_shooter
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ scheduler_register + + +

+
scheduler_register(func: Callable, *, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int
+
+ +
+ +

Register a function to be run at specified times.

+

Pass cron-like specification of when the function should be called, +see docs +of apscheduler.triggers.cron for details. +`

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
func + Callable + +
+

function or method to be called

+
+
+ required +
func_args + Union[list, tuple] + +
+

list of positional arguments to call func with

+
+
+ None +
func_kwargs + dict + +
+

dict of keyword arguments to call func with

+
+
+ None +
year + Union[int, str] + +
+

4-digit year

+
+
+ None +
month + Union[int, str] + +
+

month (1-12)

+
+
+ None +
day + Union[int, str] + +
+

day of month (1-31)

+
+
+ None +
week + Union[int, str] + +
+

ISO week (1-53)

+
+
+ None +
day_of_week + Union[int, str] + +
+

number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)

+
+
+ None +
hour + Union[int, str] + +
+

hour (0-23)

+
+
+ None +
minute + Union[int, str] + +
+

minute (0-59)

+
+
+ None +
second + Union[int, str] + +
+

second (0-59)

+
+
+ None +
timezone + str + +
+

Timezone for time specification (default is UTC).

+
+
+ 'UTC' +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ int + +
+

job ID

+
+
+ +
+ Source code in dp3/common/callback_registrar.py +
19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
def scheduler_register(
+    self,
+    func: Callable,
+    *,
+    func_args: Union[list, tuple] = None,
+    func_kwargs: dict = None,
+    year: Union[int, str] = None,
+    month: Union[int, str] = None,
+    day: Union[int, str] = None,
+    week: Union[int, str] = None,
+    day_of_week: Union[int, str] = None,
+    hour: Union[int, str] = None,
+    minute: Union[int, str] = None,
+    second: Union[int, str] = None,
+    timezone: str = "UTC",
+) -> int:
+    """
+    Register a function to be run at specified times.
+
+    Pass cron-like specification of when the function should be called,
+    see [docs](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html)
+    of apscheduler.triggers.cron for details.
+    `
+    Args:
+        func: function or method to be called
+        func_args: list of positional arguments to call func with
+        func_kwargs: dict of keyword arguments to call func with
+        year: 4-digit year
+        month: month (1-12)
+        day: day of month (1-31)
+        week: ISO week (1-53)
+        day_of_week: number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)
+        hour: hour (0-23)
+        minute: minute (0-59)
+        second: second (0-59)
+        timezone: Timezone for time specification (default is UTC).
+    Returns:
+         job ID
+    """
+    return self._scheduler.register(
+        func,
+        func_args=func_args,
+        func_kwargs=func_kwargs,
+        year=year,
+        month=month,
+        day=day,
+        week=week,
+        day_of_week=day_of_week,
+        hour=hour,
+        minute=minute,
+        second=second,
+        timezone=timezone,
+    )
+
+
+
+ +
+ +
+ + + +

+ register_task_hook + + +

+
register_task_hook(hook_type: str, hook: Callable)
+
+ +
+ +

Registers one of available task hooks

+

See: TaskGenericHooksContainer +in task_hooks.py

+ +
+ Source code in dp3/common/callback_registrar.py +
73
+74
+75
+76
+77
+78
+79
def register_task_hook(self, hook_type: str, hook: Callable):
+    """Registers one of available task hooks
+
+    See: [`TaskGenericHooksContainer`][dp3.task_processing.task_hooks.TaskGenericHooksContainer]
+    in `task_hooks.py`
+    """
+    self._task_executor.register_task_hook(hook_type, hook)
+
+
+
+ +
+ +
+ + + +

+ register_entity_hook + + +

+
register_entity_hook(hook_type: str, hook: Callable, entity: str)
+
+ +
+ +

Registers one of available task entity hooks

+

See: TaskEntityHooksContainer +in task_hooks.py

+ +
+ Source code in dp3/common/callback_registrar.py +
81
+82
+83
+84
+85
+86
+87
def register_entity_hook(self, hook_type: str, hook: Callable, entity: str):
+    """Registers one of available task entity hooks
+
+    See: [`TaskEntityHooksContainer`][dp3.task_processing.task_hooks.TaskEntityHooksContainer]
+    in `task_hooks.py`
+    """
+    self._task_executor.register_entity_hook(hook_type, hook, entity)
+
+
+
+ +
+ +
+ + + +

+ register_attr_hook + + +

+
register_attr_hook(hook_type: str, hook: Callable, entity: str, attr: str)
+
+ +
+ +

Registers one of available task attribute hooks

+

See: TaskAttrHooksContainer +in task_hooks.py

+ +
+ Source code in dp3/common/callback_registrar.py +
89
+90
+91
+92
+93
+94
+95
def register_attr_hook(self, hook_type: str, hook: Callable, entity: str, attr: str):
+    """Registers one of available task attribute hooks
+
+    See: [`TaskAttrHooksContainer`][dp3.task_processing.task_hooks.TaskAttrHooksContainer]
+    in `task_hooks.py`
+    """
+    self._task_executor.register_attr_hook(hook_type, hook, entity, attr)
+
+
+
+ +
+ +
+ + + +

+ register_timeseries_hook + + +

+
register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
+ +
+ +

Registers passed timeseries hook to be called during snapshot creation.

+

Binds hook to specified entity_type and attr_type (though same hook can be bound +multiple times).

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, str, list[dict]], list[DataPointTask]] + +
+

hook callable should expect entity_type, attr_type and attribute +history as arguments and return a list of DataPointTask objects.

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
attr_type + str + +
+

specifies attribute type

+
+
+ required +
+ +

Raises:

+ + + + + + + + + + + + + +
TypeDescription
+ ValueError + +
+

If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised.

+
+
+ +
+ Source code in dp3/common/callback_registrar.py +
def register_timeseries_hook(
+    self,
+    hook: Callable[[str, str, list[dict]], list[DataPointTask]],
+    entity_type: str,
+    attr_type: str,
+):
+    """
+    Registers passed timeseries hook to be called during snapshot creation.
+
+    Binds hook to specified `entity_type` and `attr_type` (though same hook can be bound
+    multiple times).
+
+    Args:
+        hook: `hook` callable should expect entity_type, attr_type and attribute
+            history as arguments and return a list of `DataPointTask` objects.
+        entity_type: specifies entity type
+        attr_type: specifies attribute type
+
+    Raises:
+        ValueError: If entity_type and attr_type do not specify a valid timeseries attribute,
+            a ValueError is raised.
+    """
+    self._snap_shooter.register_timeseries_hook(hook, entity_type, attr_type)
+
+
+
+ +
+ +
+ + + +

+ register_correlation_hook + + +

+
register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])
+
+ +
+ +

Registers passed hook to be called during snapshot creation.

+

Binds hook to specified entity_type (though same hook can be bound multiple times).

+

entity_type and attribute specifications are validated, ValueError is raised on failure.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, dict], None] + +
+

hook callable should expect entity type as str +and its current values, including linked entities, as dict

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
depends_on + list[list[str]] + +
+

each item should specify an attribute that is depended on +in the form of a path from the specified entity_type to individual attributes +(even on linked entities).

+
+
+ required +
may_change + list[list[str]] + +
+

each item should specify an attribute that hook may change. +specification format is identical to depends_on.

+
+
+ required +
+ +

Raises:

+ + + + + + + + + + + + + +
TypeDescription
+ ValueError + +
+

On failure of specification validation.

+
+
+ +
+ Source code in dp3/common/callback_registrar.py +
def register_correlation_hook(
+    self,
+    hook: Callable[[str, dict], None],
+    entity_type: str,
+    depends_on: list[list[str]],
+    may_change: list[list[str]],
+):
+    """
+    Registers passed hook to be called during snapshot creation.
+
+    Binds hook to specified entity_type (though same hook can be bound multiple times).
+
+    `entity_type` and attribute specifications are validated, `ValueError` is raised on failure.
+
+    Args:
+        hook: `hook` callable should expect entity type as str
+            and its current values, including linked entities, as dict
+        entity_type: specifies entity type
+        depends_on: each item should specify an attribute that is depended on
+            in the form of a path from the specified entity_type to individual attributes
+            (even on linked entities).
+        may_change: each item should specify an attribute that `hook` may change.
+            specification format is identical to `depends_on`.
+
+    Raises:
+        ValueError: On failure of specification validation.
+    """
+    self._snap_shooter.register_correlation_hook(hook, entity_type, depends_on, may_change)
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/config/index.html b/reference/common/config/index.html new file mode 100644 index 00000000..80efcdce --- /dev/null +++ b/reference/common/config/index.html @@ -0,0 +1,2490 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + config - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.config + + +

+ +
+ +

Platform config file reader and config model.

+ + + +
+ + + + + + + + +
+ + + +

+ HierarchicalDict + + +

+ + +
+

+ Bases: dict

+ + +

Extension of built-in dict that simplifies working with a nested hierarchy of dicts.

+ + + + + +
+ + + + + + + + + +
+ + + +

+ get + + +

+
get(key, default = NoDefault)
+
+ +
+ +

Key may be a path (in dot notation) into a hierarchy of dicts. For example + dictionary.get('abc.x.y') +is equivalent to + dictionary['abc']['x']['y'].

+

:returns: self[key] or default if key is not found.

+ +
+ Source code in dp3/common/config.py +
31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
def get(self, key, default=NoDefault):
+    """
+    Key may be a path (in dot notation) into a hierarchy of dicts. For example
+      `dictionary.get('abc.x.y')`
+    is equivalent to
+      `dictionary['abc']['x']['y']`.
+
+    :returns: `self[key]` or `default` if key is not found.
+    """
+    d = self
+    try:
+        while "." in key:
+            first_key, key = key.split(".", 1)
+            d = d[first_key]
+        return d[key]
+    except (KeyError, TypeError):
+        pass  # not found - continue below
+    if default is NoDefault:
+        raise MissingConfigError("Mandatory configuration element is missing: " + key)
+    else:
+        return default
+
+
+
+ +
+ +
+ + + +

+ update + + +

+
update(other, **kwargs)
+
+ +
+ +

Update HierarchicalDict with other dictionary and merge common keys.

+

If there is a key in both current and the other dictionary and values of +both keys are dictionaries, they are merged together.

+

Example: +

HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})
+->
+HierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})
+
+Changes the dictionary directly, returns None.

+ +
+ Source code in dp3/common/config.py +
53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
def update(self, other, **kwargs):
+    """
+    Update `HierarchicalDict` with other dictionary and merge common keys.
+
+    If there is a key in both current and the other dictionary and values of
+    both keys are dictionaries, they are merged together.
+
+    Example:
+    ```
+    HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})
+    ->
+    HierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})
+    ```
+    Changes the dictionary directly, returns `None`.
+    """
+    other = dict(other)
+    for key in other:
+        if key in self:
+            if isinstance(self[key], dict) and isinstance(other[key], dict):
+                # The key is present in both dicts and both key values are dicts -> merge them
+                HierarchicalDict.update(self[key], other[key])
+            else:
+                # One of the key values is not a dict -> overwrite the value
+                # in self by the one from other (like normal "update" does)
+                self[key] = other[key]
+        else:
+            # key is not present in self -> set it to value from other
+            self[key] = other[key]
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ EntitySpecDict + + +

+ + +
+

+ Bases: BaseModel

+ + +

Class representing full specification of an entity.

+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
entity + EntitySpec + +
+

Specification and settings of entity itself.

+
+
attribs + dict[str, AttrSpecType] + +
+

A mapping of attribute id -> AttrSpec

+
+
+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ ModelSpec + + +

+
ModelSpec(config: HierarchicalDict)
+
+ +
+

+ Bases: BaseModel

+ + +

Class representing the platform's current entity and attribute specification.

+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
config + dict[str, EntitySpecDict] + +
+

Legacy config format, exactly mirrors the config files.

+
+
entities + dict[str, EntitySpec] + +
+

Mapping of entity id -> EntitySpec

+
+
attributes + dict[tuple[str, str], AttrSpecType] + +
+

Mapping of (entity id, attribute id) -> AttrSpec

+
+
entity_attributes + dict[str, dict[str, AttrSpecType]] + +
+

Mapping of entity id -> attribute id -> AttrSpec

+
+
relations + dict[tuple[str, str], AttrSpecType] + +
+

Mapping of (entity id, attribute id) -> AttrSpec +only contains attributes which are relations.

+
+
+ + +

Provided configuration must be a dict of following structure: +

{
+    <entity type>: {
+        'entity': {
+            entity specification
+        },
+        'attribs': {
+            <attr id>: {
+                attribute specification
+            },
+            other attributes
+        }
+    },
+    other entity types
+}
+

+ +

Raises:

+ + + + + + + + + + + + + +
TypeDescription
+ ValueError + +
+

if the specification is invalid.

+
+
+ +
+ Source code in dp3/common/config.py +
def __init__(self, config: HierarchicalDict):
+    """
+    Provided configuration must be a dict of following structure:
+    ```
+    {
+        <entity type>: {
+            'entity': {
+                entity specification
+            },
+            'attribs': {
+                <attr id>: {
+                    attribute specification
+                },
+                other attributes
+            }
+        },
+        other entity types
+    }
+    ```
+    Raises:
+        ValueError: if the specification is invalid.
+    """
+    super().__init__(
+        config=config, entities={}, attributes={}, entity_attributes={}, relations={}
+    )
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ PlatformConfig + + +

+ + +
+

+ Bases: BaseModel

+ + +

An aggregation of configuration available to modules.

+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
app_name + str + +
+

Name of the application, used when naming various structures of the platform

+
+
config_base_path + str + +
+

Path to directory containing platform config

+
+
config + HierarchicalDict + +
+

A dictionary that contains the platform config

+
+
model_spec + ModelSpec + +
+

Specification of the platform's model (entities and attributes)

+
+
num_processes + PositiveInt + +
+

Number of worker processes

+
+
process_index + NonNegativeInt + +
+

Index of current process

+
+
+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ + +
+ + + +

+ read_config + + +

+
read_config(filepath: str) -> HierarchicalDict
+
+ +
+ +

Read configuration file and return config as a dict-like object.

+

The configuration file should contain a valid YAML +- Comments may be included as lines starting with # (optionally preceded + by whitespaces).

+

This function reads the file and converts it to a HierarchicalDict. +The only difference from built-in dict is its get method, which allows +hierarchical keys (e.g. abc.x.y). +See doc of get method for more information.

+ +
+ Source code in dp3/common/config.py +
83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
+94
+95
+96
+97
def read_config(filepath: str) -> HierarchicalDict:
+    """
+    Read configuration file and return config as a dict-like object.
+
+    The configuration file should contain a valid YAML
+    - Comments may be included as lines starting with `#` (optionally preceded
+      by whitespaces).
+
+    This function reads the file and converts it to a `HierarchicalDict`.
+    The only difference from built-in `dict` is its `get` method, which allows
+    hierarchical keys (e.g. `abc.x.y`).
+    See [doc of get method][dp3.common.config.HierarchicalDict.get] for more information.
+    """
+    with open(filepath) as file_content:
+        return HierarchicalDict(yaml.safe_load(file_content))
+
+
+
+ +
+ +
+ + + +

+ read_config_dir + + +

+
read_config_dir(dir_path: str, recursive: bool = False) -> HierarchicalDict
+
+ +
+ +

Same as read_config, +but it loads whole configuration directory of YAML files, +so only files ending with ".yml" are loaded. +Each loaded configuration is located under key named after configuration filename.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
dir_path + str + +
+

Path to read config from.

+
+
+ required +
recursive + bool + +
+

If recursive is set, then the configuration directory will be read +recursively (including configuration files inside directories).

+
+
+ False +
+ +
+ Source code in dp3/common/config.py +
def read_config_dir(dir_path: str, recursive: bool = False) -> HierarchicalDict:
+    """
+    Same as [read_config][dp3.common.config.read_config],
+    but it loads whole configuration directory of YAML files,
+    so only files ending with ".yml" are loaded.
+    Each loaded configuration is located under key named after configuration filename.
+
+    Args:
+        dir_path: Path to read config from.
+        recursive: If `recursive` is set, then the configuration directory will be read
+            recursively (including configuration files inside directories).
+    """
+    all_files_paths = os.listdir(dir_path)
+    config = HierarchicalDict()
+    for config_filename in all_files_paths:
+        config_full_path = os.path.join(dir_path, config_filename)
+        if os.path.isdir(config_full_path) and recursive:
+            loaded_config = read_config_dir(config_full_path, recursive)
+        elif os.path.isfile(config_full_path) and config_filename.endswith(".yml"):
+            try:
+                loaded_config = read_config(config_full_path)
+            except TypeError:
+                # configuration file is empty
+                continue
+            # remove '.yml' suffix of filename
+            config_filename = config_filename[:-4]
+        else:
+            continue
+        # place configuration files into another dictionary level named by config dictionary name
+        config[config_filename] = loaded_config
+    return config
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/control/index.html b/reference/common/control/index.html new file mode 100644 index 00000000..414a0c8b --- /dev/null +++ b/reference/common/control/index.html @@ -0,0 +1,1936 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + control - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.control + + +

+ +
+ +

Module enabling remote control of the platform's internal events.

+ + + +
+ + + + + + + + +
+ + + +

+ Control + + +

+
Control(platform_config: PlatformConfig) -> None
+
+ +
+ + +

Class enabling remote control of the platform's internal events.

+ + +
+ Source code in dp3/common/control.py +
36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
def __init__(
+    self,
+    platform_config: PlatformConfig,
+) -> None:
+    self.log = logging.getLogger("Control")
+    self.action_handlers: dict[ControlAction, Callable] = {}
+    self.enabled = False
+
+    if platform_config.process_index != 0:
+        self.log.debug("Control will be disabled in this worker to avoid race conditions.")
+        return
+
+    self.enabled = True
+    self.config = ControlConfig.parse_obj(platform_config.config.get("control"))
+    self.allowed_actions = set(self.config.allowed_actions)
+    self.log.debug("Allowed actions: %s", self.allowed_actions)
+
+    queue = f"{platform_config.app_name}-control"
+    self.control_queue = TaskQueueReader(
+        callback=self.process_control_task,
+        parse_task=ControlMessage.parse_raw,
+        app_name=platform_config.app_name,
+        worker_index=platform_config.process_index,
+        rabbit_config=platform_config.config.get("processing_core.msg_broker", {}),
+        queue=queue,
+        priority_queue=queue,
+        parent_logger=self.log,
+    )
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ start + + +

+
start()
+
+ +
+ +

Connect to RabbitMQ and start consuming from TaskQueue.

+ +
+ Source code in dp3/common/control.py +
65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
def start(self):
+    """Connect to RabbitMQ and start consuming from TaskQueue."""
+    if not self.enabled:
+        return
+
+    unconfigured_handlers = self.allowed_actions - set(self.action_handlers)
+    if unconfigured_handlers:
+        raise ValueError(
+            f"The following configured actions are missing handlers: {unconfigured_handlers}"
+        )
+
+    self.log.info("Connecting to RabbitMQ")
+    self.control_queue.connect()
+    self.control_queue.check()  # check presence of needed queues
+    self.control_queue.start()
+
+    self.log.debug("Configured handlers: %s", self.action_handlers)
+
+
+
+ +
+ +
+ + + +

+ stop + + +

+
stop()
+
+ +
+ +

Stop consuming from TaskQueue, disconnect from RabbitMQ.

+ +
+ Source code in dp3/common/control.py +
83
+84
+85
+86
+87
+88
+89
def stop(self):
+    """Stop consuming from TaskQueue, disconnect from RabbitMQ."""
+    if not self.enabled:
+        return
+
+    self.control_queue.stop()
+    self.control_queue.disconnect()
+
+
+
+ +
+ +
+ + + +

+ set_action_handler + + +

+
set_action_handler(action: ControlAction, handler: Callable)
+
+ +
+ +

Sets the handler for the given action

+ +
+ Source code in dp3/common/control.py +
91
+92
+93
+94
def set_action_handler(self, action: ControlAction, handler: Callable):
+    """Sets the handler for the given action"""
+    self.log.debug("Setting handler for action %s: %s", action, handler)
+    self.action_handlers[action] = handler
+
+
+
+ +
+ +
+ + + +

+ process_control_task + + +

+
process_control_task(msg_id, task: ControlMessage)
+
+ +
+ +

Acknowledges the received message and executes an action according to the task.

+

This function should not be called directly, but set as callback for TaskQueueReader.

+ +
+ Source code in dp3/common/control.py +
def process_control_task(self, msg_id, task: ControlMessage):
+    """
+    Acknowledges the received message and executes an action according to the `task`.
+
+    This function should not be called directly, but set as callback for TaskQueueReader.
+    """
+    self.control_queue.ack(msg_id)
+    if task.action in self.allowed_actions:
+        self.log.info("Executing action: %s", task.action)
+        self.action_handlers[task.action]()
+    else:
+        self.log.error("Action not allowed: %s", task.action)
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/datapoint/index.html b/reference/common/datapoint/index.html new file mode 100644 index 00000000..45c761ee --- /dev/null +++ b/reference/common/datapoint/index.html @@ -0,0 +1,2012 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + datapoint - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.datapoint + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ DataPointBase + + +

+ + +
+

+ Bases: BaseModel

+ + +

Data-point

+

Contains single raw data value received on API. +This is just base class - plain, observation or timeseries datapoints inherit from this class +(see below).

+

Provides front line of validation for this data value.

+

Internal usage: inside Task, created by TaskExecutor

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ DataPointPlainBase + + +

+ + +
+

+ Bases: DataPointBase

+ + +

Plain attribute data-point

+

Contains single raw data value received on API for plain attribute.

+

In case of plain data-point, it's not really a data-point, but we use +the same naming for simplicity.

+ + + +
+ +
+ +
+ + + +

+ DataPointObservationsBase + + +

+ + +
+

+ Bases: DataPointBase

+ + +

Observations attribute data-point

+

Contains single raw data value received on API for observations attribute.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ DataPointTimeseriesBase + + +

+ + +
+

+ Bases: DataPointBase

+ + +

Timeseries attribute data-point

+

Contains single raw data value received on API for observations attribute.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ + +
+ + + +

+ is_list_ordered + + +

+
is_list_ordered(to_check: list)
+
+ +
+ +

Checks if list is ordered (not decreasing anywhere)

+ +
+ Source code in dp3/common/datapoint.py +
69
+70
+71
def is_list_ordered(to_check: list):
+    """Checks if list is ordered (not decreasing anywhere)"""
+    return all(to_check[i] <= to_check[i + 1] for i in range(len(to_check) - 1))
+
+
+
+ +
+ +
+ + + +

+ dp_ts_root_validator_irregular + + +

+
dp_ts_root_validator_irregular(cls, values)
+
+ +
+ +

Validates or sets t2 of irregular timeseries datapoint

+ +
+ Source code in dp3/common/datapoint.py +
@root_validator
+def dp_ts_root_validator_irregular(cls, values):
+    """Validates or sets t2 of irregular timeseries datapoint"""
+    if "v" in values:
+        first_time = values["v"].time[0]
+        last_time = values["v"].time[-1]
+
+        # Check t1 <= first_time
+        if "t1" in values:
+            assert (
+                values["t1"] <= first_time
+            ), f"'t1' is above first item in 'time' series ({first_time})"
+
+        # Check last_time <= t2
+        if "t2" in values and values["t2"]:
+            assert (
+                values["t2"] >= last_time
+            ), f"'t2' is below last item in 'time' series ({last_time})"
+        else:
+            values["t2"] = last_time
+
+        # time must be ordered
+        assert is_list_ordered(values["v"].time), "'time' series is not ordered"
+
+    return values
+
+
+
+ +
+ +
+ + + +

+ dp_ts_root_validator_irregular_intervals + + +

+
dp_ts_root_validator_irregular_intervals(cls, values)
+
+ +
+ +

Validates or sets t2 of irregular intervals timeseries datapoint

+ +
+ Source code in dp3/common/datapoint.py +
@root_validator
+def dp_ts_root_validator_irregular_intervals(cls, values):
+    """Validates or sets t2 of irregular intervals timeseries datapoint"""
+    if "v" in values:
+        first_time = values["v"].time_first[0]
+        last_time = values["v"].time_last[-1]
+
+        # Check t1 <= first_time
+        if "t1" in values:
+            assert (
+                values["t1"] <= first_time
+            ), f"'t1' is above first item in 'time_first' series ({first_time})"
+
+        # Check last_time <= t2
+        if "t2" in values and values["t2"]:
+            assert (
+                values["t2"] >= last_time
+            ), f"'t2' is below last item in 'time_last' series ({last_time})"
+        else:
+            values["t2"] = last_time
+
+        # Check time_first[i] <= time_last[i]
+        assert all(
+            t[0] <= t[1] for t in zip(values["v"].time_first, values["v"].time_last)
+        ), "'time_first[i] <= time_last[i]' isn't true for all 'i'"
+
+    return values
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/datatype/index.html b/reference/common/datatype/index.html new file mode 100644 index 00000000..9c7b97f8 --- /dev/null +++ b/reference/common/datatype/index.html @@ -0,0 +1,2066 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + datatype - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.datatype + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ DataType + + +

+
DataType(**data)
+
+ +
+

+ Bases: BaseModel

+ + +

Data type container

+

Represents one of primitive data types:

+
    +
  • tag
  • +
  • binary
  • +
  • string
  • +
  • int
  • +
  • int64
  • +
  • float
  • +
  • ipv4
  • +
  • ipv6
  • +
  • mac
  • +
  • time
  • +
  • special
  • +
  • json
  • +
+

or composite data type:

+
    +
  • link
  • +
  • array
  • +
  • set
  • +
  • dict
  • +
  • category
  • +
+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
data_type + str + +
+

type for incoming value validation

+
+
hashable + bool + +
+

whether contained data is hashable

+
+
is_link + bool + +
+

whether this data type is link

+
+
link_to + str + +
+

if is_link is True, what is linked target

+
+
+ + +
+ Source code in dp3/common/datatype.py +
79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
def __init__(self, **data):
+    super().__init__(**data)
+
+    str_type = data["__root__"]
+
+    self._hashable = not (
+        "dict" in str_type
+        or "set" in str_type
+        or "array" in str_type
+        or "special" in str_type
+        or "json" in str_type
+        or "link" in str_type
+    )
+
+    self.determine_value_validator(str_type)
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ determine_value_validator + + +

+
determine_value_validator(str_type: str)
+
+ +
+ +

Determines value validator (inner data_type)

+

This is not implemented inside @validator, because it apparently doesn't work with +__root__ models.

+ +
+ Source code in dp3/common/datatype.py +
def determine_value_validator(self, str_type: str):
+    """Determines value validator (inner `data_type`)
+
+    This is not implemented inside `@validator`, because it apparently doesn't work with
+    `__root__` models.
+    """
+    data_type = None
+
+    if type(str_type) is not str:
+        raise TypeError(f"Data type {str_type} is not string")
+
+    if str_type in primitive_data_types:
+        # Primitive type
+        data_type = primitive_data_types[str_type]
+
+    elif re.match(re_array, str_type):
+        # Array
+        element_type = str_type.split("<")[1].split(">")[0]
+        if element_type not in primitive_data_types:
+            raise TypeError(f"Data type {element_type} is not supported as an array element")
+        data_type = list[primitive_data_types[element_type]]
+
+    elif re.match(re_set, str_type):
+        # Set
+        element_type = str_type.split("<")[1].split(">")[0]
+        if element_type not in primitive_data_types:
+            raise TypeError(f"Data type {element_type} is not supported as an set element")
+        data_type = list[primitive_data_types[element_type]]  # set is not supported by MongoDB
+
+    elif m := re.match(re_link, str_type):
+        # Link
+        etype, data = m.group("etype"), m.group("data")
+        self._link_to = etype
+        self._is_link = True
+        self._link_data = bool(data)
+
+        if etype and data:
+            value_type = DataType(__root__=data)
+            data_type = create_model(
+                f"Link<{data}>", __base__=Link, data=(value_type._data_type, ...)
+            )
+        else:
+            data_type = Link
+
+    elif re.match(re_dict, str_type):
+        # Dict
+        dict_spec = {}
+
+        key_str = str_type.split("<")[1].split(">")[0]
+        key_spec = dict(item.split(":") for item in key_str.split(","))
+
+        # For each dict key
+        for k, v in key_spec.items():
+            if v not in primitive_data_types:
+                raise TypeError(f"Data type {v} of key {k} is not supported as a dict field")
+
+            # Optional subattribute
+            k_optional = k[-1] == "?"
+
+            if k_optional:
+                # Remove question mark from key
+                k = k[:-1]
+
+            # Set (type, default value) for the key
+            dict_spec[k] = (primitive_data_types[v], None if k_optional else ...)
+
+        # Create model for this dict
+        data_type = create_model(f"{str_type}__inner", **dict_spec)
+
+    elif m := re.match(re_category, str_type):
+        # Category
+        category_type, category_values = m.group("type"), m.group("vals")
+
+        category_type = DataType(__root__=category_type)
+        category_values = [
+            category_type._data_type(value.strip()) for value in category_values.split(",")
+        ]
+
+        data_type = Enum(f"Category<{category_type}>", {val: val for val in category_values})
+    else:
+        raise TypeError(f"Data type '{str_type}' is not supported")
+
+    # Set data type
+    self._data_type = data_type
+
+
+
+ +
+ +
+ + + +

+ get_linked_entity + + +

+
get_linked_entity() -> id
+
+ +
+ +

Returns linked entity id. Raises ValueError if DataType is not a link.

+ +
+ Source code in dp3/common/datatype.py +
def get_linked_entity(self) -> id:
+    """Returns linked entity id. Raises ValueError if DataType is not a link."""
+    try:
+        return self._link_to
+    except AttributeError:
+        raise ValueError(f"DataType '{self}' is not a link.") from None
+
+
+
+ +
+ +
+ + + + +
link_has_data() -> bool
+
+ +
+ +

Whether link has data. Raises ValueError if DataType is not a link.

+ +
+ Source code in dp3/common/datatype.py +
def link_has_data(self) -> bool:
+    """Whether link has data. Raises ValueError if DataType is not a link."""
+    try:
+        return self._link_data
+    except AttributeError:
+        raise ValueError(f"DataType '{self}' is not a link.") from None
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/entityspec/index.html b/reference/common/entityspec/index.html new file mode 100644 index 00000000..76fd84b3 --- /dev/null +++ b/reference/common/entityspec/index.html @@ -0,0 +1,1644 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + entityspec - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.entityspec + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ EntitySpec + + +

+
EntitySpec(id: str, spec: dict[str, Union[str, bool]])
+
+ +
+

+ Bases: BaseModel

+ + +

Entity specification

+

This class represents specification of an entity type (e.g. ip, asn, ...)

+ + +
+ Source code in dp3/common/entityspec.py +
def __init__(self, id: str, spec: dict[str, Union[str, bool]]):
+    super().__init__(id=id, name=spec.get("name"), snapshot=spec.get("snapshot"))
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/index.html b/reference/common/index.html new file mode 100644 index 00000000..6a14752b --- /dev/null +++ b/reference/common/index.html @@ -0,0 +1,1551 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + common - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common + + +

+ +
+ +

Common modules which are used throughout the platform.

+
    +
  • Config, EntitySpec and +AttrSpec - Models for reading, validation and representing +platform configuration of entities and their attributes. +base_attrs and datatype are also used + within this context.
  • +
  • Scheduler - Allows modules to run callbacks at specified times
  • +
  • Task - Model for a single task processed by the platform
  • +
  • Utils - Auxiliary utility functions
  • +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/scheduler/index.html b/reference/common/scheduler/index.html new file mode 100644 index 00000000..fc6ad980 --- /dev/null +++ b/reference/common/scheduler/index.html @@ -0,0 +1,2104 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + scheduler - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.scheduler + + +

+ +
+ +

Allows modules to register functions (callables) to be run at +specified times or intervals (like cron does).

+

Based on APScheduler package

+ + + +
+ + + + + + + + +
+ + + +

+ Scheduler + + +

+
Scheduler() -> None
+
+ +
+ + +

Allows modules to register functions (callables) to be run +at specified times or intervals (like cron does).

+ + +
+ Source code in dp3/common/scheduler.py +
21
+22
+23
+24
+25
+26
+27
def __init__(self) -> None:
+    self.log = logging.getLogger("Scheduler")
+    # self.log.setLevel("DEBUG")
+    logging.getLogger("apscheduler.scheduler").setLevel("WARNING")
+    logging.getLogger("apscheduler.executors.default").setLevel("WARNING")
+    self.sched = BackgroundScheduler(timezone="UTC")
+    self.last_job_id = 0
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ register + + +

+
register(func: Callable, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int
+
+ +
+ +

Register a function to be run at specified times.

+

Pass cron-like specification of when the function should be called, +see docs +of apscheduler.triggers.cron for details.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
func + Callable + +
+

function or method to be called

+
+
+ required +
func_args + Union[list, tuple] + +
+

list of positional arguments to call func with

+
+
+ None +
func_kwargs + dict + +
+

dict of keyword arguments to call func with

+
+
+ None +
year + Union[int, str] + +
+

4-digit year

+
+
+ None +
month + Union[int, str] + +
+

month (1-12)

+
+
+ None +
day + Union[int, str] + +
+

day of month (1-31)

+
+
+ None +
week + Union[int, str] + +
+

ISO week (1-53)

+
+
+ None +
day_of_week + Union[int, str] + +
+

number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)

+
+
+ None +
hour + Union[int, str] + +
+

hour (0-23)

+
+
+ None +
minute + Union[int, str] + +
+

minute (0-59)

+
+
+ None +
second + Union[int, str] + +
+

second (0-59)

+
+
+ None +
timezone + str + +
+

Timezone for time specification (default is UTC).

+
+
+ 'UTC' +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ int + +
+

job ID

+
+
+ +
+ Source code in dp3/common/scheduler.py +
37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
def register(
+    self,
+    func: Callable,
+    func_args: Union[list, tuple] = None,
+    func_kwargs: dict = None,
+    year: Union[int, str] = None,
+    month: Union[int, str] = None,
+    day: Union[int, str] = None,
+    week: Union[int, str] = None,
+    day_of_week: Union[int, str] = None,
+    hour: Union[int, str] = None,
+    minute: Union[int, str] = None,
+    second: Union[int, str] = None,
+    timezone: str = "UTC",
+) -> int:
+    """
+    Register a function to be run at specified times.
+
+    Pass cron-like specification of when the function should be called,
+    see [docs](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html)
+    of apscheduler.triggers.cron for details.
+
+    Args:
+        func: function or method to be called
+        func_args: list of positional arguments to call func with
+        func_kwargs: dict of keyword arguments to call func with
+        year: 4-digit year
+        month: month (1-12)
+        day: day of month (1-31)
+        week: ISO week (1-53)
+        day_of_week: number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)
+        hour: hour (0-23)
+        minute: minute (0-59)
+        second: second (0-59)
+        timezone: Timezone for time specification (default is UTC).
+    Returns:
+         job ID
+    """
+    self.last_job_id += 1
+    trigger = CronTrigger(
+        year, month, day, week, day_of_week, hour, minute, second, timezone=timezone
+    )
+    self.sched.add_job(
+        func,
+        trigger,
+        func_args,
+        func_kwargs,
+        coalesce=True,
+        max_instances=1,
+        id=str(self.last_job_id),
+    )
+    self.log.debug(f"Registered function {func.__qualname__} to be called at {trigger}")
+    return self.last_job_id
+
+
+
+ +
+ +
+ + + +

+ pause_job + + +

+
pause_job(id)
+
+ +
+ +

Pause job with given ID

+ +
+ Source code in dp3/common/scheduler.py +
91
+92
+93
def pause_job(self, id):
+    """Pause job with given ID"""
+    self.sched.pause_job(str(id))
+
+
+
+ +
+ +
+ + + +

+ resume_job + + +

+
resume_job(id)
+
+ +
+ +

Resume previously paused job with given ID

+ +
+ Source code in dp3/common/scheduler.py +
95
+96
+97
def resume_job(self, id):
+    """Resume previously paused job with given ID"""
+    self.sched.resume_job(str(id))
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/task/index.html b/reference/common/task/index.html new file mode 100644 index 00000000..6664c8a4 --- /dev/null +++ b/reference/common/task/index.html @@ -0,0 +1,2012 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + task - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.task + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ Task + + +

+ + +
+

+ Bases: BaseModel, ABC

+ + +

A generic task type class.

+

An abstraction for the task queue classes to depend upon.

+ + + + + +
+ + + + + + + + + +
+ + + +

+ routing_key + + + + abstractmethod + + +

+
routing_key() -> str
+
+ +
+ + +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ str + +
+

A string to be used as a routing key between workers.

+
+
+ +
+ Source code in dp3/common/task.py +
20
+21
+22
+23
+24
+25
@abstractmethod
+def routing_key(self) -> str:
+    """
+    Returns:
+        A string to be used as a routing key between workers.
+    """
+
+
+
+ +
+ +
+ + + +

+ as_message + + + + abstractmethod + + +

+
as_message() -> str
+
+ +
+ + +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ str + +
+

A string representation of the object.

+
+
+ +
+ Source code in dp3/common/task.py +
27
+28
+29
+30
+31
+32
@abstractmethod
+def as_message(self) -> str:
+    """
+    Returns:
+        A string representation of the object.
+    """
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ DataPointTask + + +

+ + +
+

+ Bases: Task

+ + +

DataPointTask

+

Contains single task to be pushed to TaskQueue and processed.

+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
etype + str + +
+

Entity type

+
+
eid + str + +
+

Entity id / key

+
+
data_points + list[DataPointBase] + +
+

List of DataPoints to process

+
+
tags + list[Any] + +
+

List of tags

+
+
ttl_token + Optional[datetime] + +
+

...

+
+
+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ Snapshot + + +

+ + +
+

+ Bases: Task

+ + +

Snapshot

+

Contains a list of entities, the meaning of which depends on the type. +If type is "task", then the list contains linked entities for which a snapshot +should be created. Otherwise type is "linked_entities", indicating which entities +must be skipped in a parallelized creation of unlinked entities.

+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
entities + list[tuple[str, str]] + +
+

List of (entity_type, entity_id)

+
+
time + datetime + +
+

timestamp for snapshot creation

+
+
+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/common/utils/index.html b/reference/common/utils/index.html new file mode 100644 index 00000000..d819bc94 --- /dev/null +++ b/reference/common/utils/index.html @@ -0,0 +1,1953 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + utils - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.common.utils + + +

+ +
+ +

auxiliary/utility functions and classes

+ + + +
+ + + + + + + + + +
+ + + +

+ parse_rfc_time + + +

+
parse_rfc_time(time_str)
+
+ +
+ +

Parse time in RFC 3339 format and return it as naive datetime in UTC.

+

Timezone specification is optional (UTC is assumed when none is specified).

+ +
+ Source code in dp3/common/utils.py +
32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
def parse_rfc_time(time_str):
+    """
+    Parse time in RFC 3339 format and return it as naive datetime in UTC.
+
+    Timezone specification is optional (UTC is assumed when none is specified).
+    """
+    res = timestamp_re.match(time_str)
+    if res is not None:
+        year, month, day, hour, minute, second = (int(n or 0) for n in res.group(*range(1, 7)))
+        us_str = (res.group(7) or "0")[:6].ljust(6, "0")
+        us = int(us_str)
+        zonestr = res.group(8)
+        zoneoffset = 0 if zonestr in (None, "z", "Z") else int(zonestr[:3]) * 60 + int(zonestr[4:6])
+        zonediff = datetime.timedelta(minutes=zoneoffset)
+        return datetime.datetime(year, month, day, hour, minute, second, us) - zonediff
+    else:
+        raise ValueError("Wrong timestamp format")
+
+
+
+ +
+ +
+ + + +

+ parse_time_duration + + +

+
parse_time_duration(duration_string: Union[str, int, datetime.timedelta]) -> datetime.timedelta
+
+ +
+ +

Parse duration in format (or just "0").

+

Return datetime.timedelta

+ +
+ Source code in dp3/common/utils.py +
51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
def parse_time_duration(duration_string: Union[str, int, datetime.timedelta]) -> datetime.timedelta:
+    """
+    Parse duration in format <num><s/m/h/d> (or just "0").
+
+    Return datetime.timedelta
+    """
+    # if it's already timedelta, just return it unchanged
+    if isinstance(duration_string, datetime.timedelta):
+        return duration_string
+    # if number is passed, consider it number of seconds
+    if isinstance(duration_string, (int, float)):
+        return datetime.timedelta(seconds=duration_string)
+
+    d = 0
+    h = 0
+    m = 0
+    s = 0
+
+    if duration_string == "0":
+        pass
+    elif duration_string[-1] == "d":
+        d = int(duration_string[:-1])
+    elif duration_string[-1] == "h":
+        h = int(duration_string[:-1])
+    elif duration_string[-1] == "m":
+        m = int(duration_string[:-1])
+    elif duration_string[-1] == "s":
+        s = int(duration_string[:-1])
+    else:
+        raise ValueError("Invalid time duration string")
+
+    return datetime.timedelta(days=d, hours=h, minutes=m, seconds=s)
+
+
+
+ +
+ +
+ + + +

+ conv_to_json + + +

+
conv_to_json(obj)
+
+ +
+ +

Convert special types to JSON (use as "default" param of json.dumps)

+

Supported types/objects: +- datetime +- timedelta

+ +
+ Source code in dp3/common/utils.py +
def conv_to_json(obj):
+    """Convert special types to JSON (use as "default" param of json.dumps)
+
+    Supported types/objects:
+    - datetime
+    - timedelta
+    """
+    if isinstance(obj, datetime.datetime):
+        if obj.tzinfo:
+            raise NotImplementedError(
+                "Can't serialize timezone-aware datetime object "
+                "(DP3 policy is to use naive datetimes in UTC everywhere)"
+            )
+        return {"$datetime": obj.strftime("%Y-%m-%dT%H:%M:%S.%f")}
+    if isinstance(obj, datetime.timedelta):
+        return {"$timedelta": f"{obj.days},{obj.seconds},{obj.microseconds}"}
+    raise TypeError("%r is not JSON serializable" % obj)
+
+
+
+ +
+ +
+ + + +

+ conv_from_json + + +

+
conv_from_json(dct)
+
+ +
+ +

Convert special JSON keys created by conv_to_json back to Python objects +(use as "object_hook" param of json.loads)

+

Supported types/objects: +- datetime +- timedelta

+ +
+ Source code in dp3/common/utils.py +
def conv_from_json(dct):
+    """Convert special JSON keys created by conv_to_json back to Python objects
+    (use as "object_hook" param of json.loads)
+
+    Supported types/objects:
+    - datetime
+    - timedelta
+    """
+    if "$datetime" in dct:
+        val = dct["$datetime"]
+        return datetime.datetime.strptime(val, "%Y-%m-%dT%H:%M:%S.%f")
+    if "$timedelta" in dct:
+        days, seconds, microseconds = dct["$timedelta"].split(",")
+        return datetime.timedelta(int(days), int(seconds), int(microseconds))
+    return dct
+
+
+
+ +
+ +
+ + + +

+ get_func_name + + +

+
get_func_name(func_or_method)
+
+ +
+ +

Get name of function or method as pretty string.

+ +
+ Source code in dp3/common/utils.py +
def get_func_name(func_or_method):
+    """Get name of function or method as pretty string."""
+    try:
+        fname = func_or_method.__func__.__qualname__
+    except AttributeError:
+        fname = func_or_method.__name__
+    return func_or_method.__module__ + "." + fname
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/database/database/index.html b/reference/database/database/index.html new file mode 100644 index 00000000..6708b357 --- /dev/null +++ b/reference/database/database/index.html @@ -0,0 +1,3792 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + database - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +
+ + + +

+ dp3.database.database + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ MongoHostConfig + + +

+ + +
+

+ Bases: BaseModel

+ + +

MongoDB host.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ MongoStandaloneConfig + + +

+ + +
+

+ Bases: BaseModel

+ + +

MongoDB standalone configuration.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ MongoReplicaConfig + + +

+ + +
+

+ Bases: BaseModel

+ + +

MongoDB replica set configuration.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ MongoConfig + + +

+ + +
+

+ Bases: BaseModel

+ + +

Database configuration.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ EntityDatabase + + +

+
EntityDatabase(db_conf: HierarchicalDict, model_spec: ModelSpec) -> None
+
+ +
+ + +

MongoDB database wrapper responsible for whole communication with database server. +Initializes database schema based on database configuration.

+

db_conf - configuration of database connection (content of database.yml) +model_spec - ModelSpec object, configuration of data model (entities and attributes)

+ + +
+ Source code in dp3/database/database.py +
def __init__(
+    self,
+    db_conf: HierarchicalDict,
+    model_spec: ModelSpec,
+) -> None:
+    self.log = logging.getLogger("EntityDatabase")
+
+    config = MongoConfig.parse_obj(db_conf)
+
+    self.log.info("Connecting to database...")
+    for attempt, delay in enumerate(RECONNECT_DELAYS):
+        try:
+            self._db = self.connect(config)
+            # Check if connected
+            self._db.admin.command("ping")
+        except pymongo.errors.ConnectionFailure as e:
+            if attempt + 1 == len(RECONNECT_DELAYS):
+                raise DatabaseError(
+                    "Cannot connect to database with specified connection arguments."
+                ) from e
+            else:
+                self.log.error(
+                    "Cannot connect to database (attempt %d, retrying in %ds).",
+                    attempt + 1,
+                    delay,
+                )
+                time.sleep(delay)
+
+    self._db_schema_config = model_spec
+
+    # Init and switch to correct database
+    self._db = self._db[config.db_name]
+    self._init_database_schema(config.db_name)
+
+    self.log.info("Database successfully initialized!")
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ insert_datapoints + + +

+
insert_datapoints(etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False) -> None
+
+ +
+ +

Inserts datapoint to raw data collection and updates master record.

+

Raises DatabaseError when insert or update fails.

+ +
+ Source code in dp3/database/database.py +
def insert_datapoints(
+    self, etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False
+) -> None:
+    """Inserts datapoint to raw data collection and updates master record.
+
+    Raises DatabaseError when insert or update fails.
+    """
+    if len(dps) == 0:
+        return
+
+    etype = dps[0].etype
+
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    # Insert raw datapoints
+    raw_col = self._raw_col_name(etype)
+    dps_dicts = [dp.dict(exclude={"attr_type"}) for dp in dps]
+    try:
+        self._db[raw_col].insert_many(dps_dicts)
+        self.log.debug(f"Inserted datapoints to raw collection:\n{dps}")
+    except Exception as e:
+        raise DatabaseError(f"Insert of datapoints failed: {e}\n{dps}") from e
+
+    # Update master document
+    master_changes = {"$push": {}, "$set": {}}
+    for dp in dps:
+        attr_spec = self._db_schema_config.attr(etype, dp.attr)
+
+        v = dp.v.dict() if isinstance(dp.v, BaseModel) else dp.v
+
+        # Rewrite value of plain attribute
+        if attr_spec.t == AttrType.PLAIN:
+            master_changes["$set"][dp.attr] = {"v": v, "ts_last_update": datetime.now()}
+
+        # Push new data of observation
+        if attr_spec.t == AttrType.OBSERVATIONS:
+            if dp.attr in master_changes["$push"]:
+                # Support multiple datapoints being pushed in the same request
+                if "$each" not in master_changes["$push"][dp.attr]:
+                    saved_dp = master_changes["$push"][dp.attr]
+                    master_changes["$push"][dp.attr] = {"$each": [saved_dp]}
+                master_changes["$push"][dp.attr]["$each"].append(
+                    {"t1": dp.t1, "t2": dp.t2, "v": v, "c": dp.c}
+                )
+            else:
+                # Otherwise just push one datapoint
+                master_changes["$push"][dp.attr] = {"t1": dp.t1, "t2": dp.t2, "v": v, "c": dp.c}
+
+        # Push new data of timeseries
+        if attr_spec.t == AttrType.TIMESERIES:
+            if dp.attr in master_changes["$push"]:
+                # Support multiple datapoints being pushed in the same request
+                if "$each" not in master_changes["$push"][dp.attr]:
+                    saved_dp = master_changes["$push"][dp.attr]
+                    master_changes["$push"][dp.attr] = {"$each": [saved_dp]}
+                master_changes["$push"][dp.attr]["$each"].append(
+                    {"t1": dp.t1, "t2": dp.t2, "v": v}
+                )
+            else:
+                # Otherwise just push one datapoint
+                master_changes["$push"][dp.attr] = {"t1": dp.t1, "t2": dp.t2, "v": v}
+
+    if new_entity:
+        master_changes["$set"]["#hash"] = HASH(f"{etype}:{eid}")
+
+    master_col = self._master_col_name(etype)
+    try:
+        self._db[master_col].update_one({"_id": eid}, master_changes, upsert=True)
+        self.log.debug(f"Updated master record of {etype} {eid}: {master_changes}")
+    except Exception as e:
+        raise DatabaseError(f"Update of master record failed: {e}\n{dps}") from e
+
+
+
+ +
+ +
+ + + +

+ update_master_records + + +

+
update_master_records(etype: str, eids: list[str], records: list[dict]) -> None
+
+ +
+ +

Replace master record of etype:eid with the provided record.

+

Raises DatabaseError when update fails.

+ +
+ Source code in dp3/database/database.py +
def update_master_records(self, etype: str, eids: list[str], records: list[dict]) -> None:
+    """Replace master record of `etype`:`eid` with the provided `record`.
+
+    Raises DatabaseError when update fails.
+    """
+    master_col = self._master_col_name(etype)
+    try:
+        self._db[master_col].bulk_write(
+            [
+                ReplaceOne({"_id": eid}, record, upsert=True)
+                for eid, record in zip(eids, records)
+            ]
+        )
+        self.log.debug("Updated master records of %s: %s.", eids, eids)
+    except Exception as e:
+        raise DatabaseError(f"Update of master records failed: {e}\n{records}") from e
+
+
+
+ +
+ +
+ + + +

+ delete_old_dps + + +

+
delete_old_dps(etype: str, attr_name: str, t_old: datetime) -> None
+
+ +
+ +

Delete old datapoints from master collection.

+

Periodically called for all etypes from HistoryManager.

+ +
+ Source code in dp3/database/database.py +
def delete_old_dps(self, etype: str, attr_name: str, t_old: datetime) -> None:
+    """Delete old datapoints from master collection.
+
+    Periodically called for all `etype`s from HistoryManager.
+    """
+    master_col = self._master_col_name(etype)
+    try:
+        self._db[master_col].update_many({}, {"$pull": {attr_name: {"t2": {"$lt": t_old}}}})
+    except Exception as e:
+        raise DatabaseError(f"Delete of old datapoints failed: {e}") from e
+
+
+
+ +
+ +
+ + + +

+ get_master_record + + +

+
get_master_record(etype: str, eid: str, **kwargs: str) -> dict
+
+ +
+ +

Get current master record for etype/eid.

+

If doesn't exist, returns {}.

+ +
+ Source code in dp3/database/database.py +
def get_master_record(self, etype: str, eid: str, **kwargs) -> dict:
+    """Get current master record for etype/eid.
+
+    If doesn't exist, returns {}.
+    """
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    master_col = self._master_col_name(etype)
+    return self._db[master_col].find_one({"_id": eid}, **kwargs) or {}
+
+
+
+ +
+ +
+ + + +

+ ekey_exists + + +

+
ekey_exists(etype: str, eid: str) -> bool
+
+ +
+ +

Checks whether master record for etype/eid exists

+ +
+ Source code in dp3/database/database.py +
def ekey_exists(self, etype: str, eid: str) -> bool:
+    """Checks whether master record for etype/eid exists"""
+    return bool(self.get_master_record(etype, eid))
+
+
+
+ +
+ +
+ + + +

+ get_master_records + + +

+
get_master_records(etype: str, **kwargs: str) -> pymongo.cursor.Cursor
+
+ +
+ +

Get cursor to current master records of etype.

+ +
+ Source code in dp3/database/database.py +
def get_master_records(self, etype: str, **kwargs) -> pymongo.cursor.Cursor:
+    """Get cursor to current master records of etype."""
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    master_col = self._master_col_name(etype)
+    return self._db[master_col].find({}, **kwargs)
+
+
+
+ +
+ +
+ + + +

+ get_worker_master_records + + +

+
get_worker_master_records(worker_index: int, worker_cnt: int, etype: str, **kwargs: str) -> pymongo.cursor.Cursor
+
+ +
+ +

Get cursor to current master records of etype.

+ +
+ Source code in dp3/database/database.py +
def get_worker_master_records(
+    self, worker_index: int, worker_cnt: int, etype: str, **kwargs
+) -> pymongo.cursor.Cursor:
+    """Get cursor to current master records of etype."""
+    if etype not in self._db_schema_config.entities:
+        raise DatabaseError(f"Entity '{etype}' does not exist")
+
+    master_col = self._master_col_name(etype)
+    return self._db[master_col].find({"#hash": {"$mod": [worker_cnt, worker_index]}}, **kwargs)
+
+
+
+ +
+ +
+ + + +

+ get_latest_snapshot + + +

+
get_latest_snapshot(etype: str, eid: str) -> dict
+
+ +
+ +

Get latest snapshot of given etype/eid.

+

If doesn't exist, returns {}.

+ +
+ Source code in dp3/database/database.py +
def get_latest_snapshot(self, etype: str, eid: str) -> dict:
+    """Get latest snapshot of given etype/eid.
+
+    If doesn't exist, returns {}.
+    """
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    snapshot_col = self._snapshots_col_name(etype)
+    return self._db[snapshot_col].find_one({"eid": eid}, sort=[("_id", -1)]) or {}
+
+
+
+ +
+ +
+ + + +

+ get_latest_snapshots + + +

+
get_latest_snapshots(etype: str) -> pymongo.cursor.Cursor
+
+ +
+ +

Get latest snapshots of given etype.

+

This method is useful for displaying data on web.

+ +
+ Source code in dp3/database/database.py +
def get_latest_snapshots(self, etype: str) -> pymongo.cursor.Cursor:
+    """Get latest snapshots of given `etype`.
+
+    This method is useful for displaying data on web.
+    """
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    snapshot_col = self._snapshots_col_name(etype)
+    latest_snapshot = self._db[snapshot_col].find_one({}, sort=[("_id", -1)])
+    if latest_snapshot is None:
+        return self._db[snapshot_col].find()
+
+    latest_snapshot_date = latest_snapshot["_time_created"]
+    return self._db[snapshot_col].find({"_time_created": latest_snapshot_date})
+
+
+
+ +
+ +
+ + + +

+ get_snapshots + + +

+
get_snapshots(etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> pymongo.cursor.Cursor
+
+ +
+ +

Get all (or filtered) snapshots of given eid.

+

This method is useful for displaying eid's history on web.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
etype + str + +
+

entity type

+
+
+ required +
eid + str + +
+

id of entity, to which data-points correspond

+
+
+ required +
t1 + Optional[datetime] + +
+

left value of time interval (inclusive)

+
+
+ None +
t2 + Optional[datetime] + +
+

right value of time interval (inclusive)

+
+
+ None +
+ +
+ Source code in dp3/database/database.py +
def get_snapshots(
+    self, etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None
+) -> pymongo.cursor.Cursor:
+    """Get all (or filtered) snapshots of given `eid`.
+
+    This method is useful for displaying `eid`'s history on web.
+
+    Args:
+        etype: entity type
+        eid: id of entity, to which data-points correspond
+        t1: left value of time interval (inclusive)
+        t2: right value of time interval (inclusive)
+    """
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    snapshot_col = self._snapshots_col_name(etype)
+    query = {"eid": eid, "_time_created": {}}
+
+    # Filter by date
+    if t1:
+        query["_time_created"]["$gte"] = t1
+    if t2:
+        query["_time_created"]["$lte"] = t2
+
+    # Unset if empty
+    if not query["_time_created"]:
+        del query["_time_created"]
+
+    return self._db[snapshot_col].find(query).sort([("_time_created", pymongo.ASCENDING)])
+
+
+
+ +
+ +
+ + + +

+ get_value_or_history + + +

+
get_value_or_history(etype: str, attr_name: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> dict
+
+ +
+ +

Gets current value and/or history of attribute for given eid.

+

Depends on attribute type: +- plain: just (current) value +- observations: (current) value and history stored in master record (optionally filtered) +- timeseries: just history stored in master record (optionally filtered)

+

Returns dict with two keys: current_value and history (list of values).

+ +
+ Source code in dp3/database/database.py +
def get_value_or_history(
+    self,
+    etype: str,
+    attr_name: str,
+    eid: str,
+    t1: Optional[datetime] = None,
+    t2: Optional[datetime] = None,
+) -> dict:
+    """Gets current value and/or history of attribute for given `eid`.
+
+    Depends on attribute type:
+    - plain: just (current) value
+    - observations: (current) value and history stored in master record (optionally filtered)
+    - timeseries: just history stored in master record (optionally filtered)
+
+    Returns dict with two keys: `current_value` and `history` (list of values).
+    """
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    attr_spec = self._db_schema_config.attr(etype, attr_name)
+
+    result = {"current_value": None, "history": []}
+
+    # Add current value to the result
+    if attr_spec.t == AttrType.PLAIN:
+        result["current_value"] = (
+            self.get_master_record(etype, eid).get(attr_name, {}).get("v", None)
+        )
+    elif attr_spec.t == AttrType.OBSERVATIONS:
+        result["current_value"] = self.get_latest_snapshot(etype, eid).get(attr_name, None)
+
+    # Add history
+    if attr_spec.t == AttrType.OBSERVATIONS:
+        result["history"] = self.get_observation_history(etype, attr_name, eid, t1, t2)
+    elif attr_spec.t == AttrType.TIMESERIES:
+        result["history"] = self.get_timeseries_history(etype, attr_name, eid, t1, t2)
+
+    return result
+
+
+
+ +
+ +
+ + + +

+ estimate_count_eids + + +

+
estimate_count_eids(etype: str) -> int
+
+ +
+ +

Estimates count of eids in given etype

+ +
+ Source code in dp3/database/database.py +
def estimate_count_eids(self, etype: str) -> int:
+    """Estimates count of `eid`s in given `etype`"""
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    master_col = self._master_col_name(etype)
+    return self._db[master_col].estimated_document_count({})
+
+
+
+ +
+ +
+ + + +

+ save_snapshot + + +

+
save_snapshot(etype: str, snapshot: dict, time: datetime)
+
+ +
+ +

Saves snapshot to specified entity of current master document.

+ +
+ Source code in dp3/database/database.py +
def save_snapshot(self, etype: str, snapshot: dict, time: datetime):
+    """Saves snapshot to specified entity of current master document."""
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    snapshot["_time_created"] = time
+
+    snapshot_col = self._snapshots_col_name(etype)
+    try:
+        self._db[snapshot_col].insert_one(snapshot)
+        self.log.debug(f"Inserted snapshot: {snapshot}")
+    except Exception as e:
+        raise DatabaseError(f"Insert of snapshot failed: {e}\n{snapshot}") from e
+
+
+
+ +
+ +
+ + + +

+ save_snapshots + + +

+
save_snapshots(etype: str, snapshots: list[dict], time: datetime)
+
+ +
+ +

Saves a list of snapshots of current master documents.

+

All snapshots must belong to same entity type.

+ +
+ Source code in dp3/database/database.py +
def save_snapshots(self, etype: str, snapshots: list[dict], time: datetime):
+    """
+    Saves a list of snapshots of current master documents.
+
+    All snapshots must belong to same entity type.
+    """
+    # Check `etype`
+    self._assert_etype_exists(etype)
+
+    for snapshot in snapshots:
+        snapshot["_time_created"] = time
+
+    snapshot_col = self._snapshots_col_name(etype)
+    try:
+        self._db[snapshot_col].insert_many(snapshots)
+        self.log.debug(f"Inserted snapshots: {snapshots}")
+    except Exception as e:
+        raise DatabaseError(f"Insert of snapshots failed: {e}\n{snapshots}") from e
+
+
+
+ +
+ +
+ + + +

+ save_metadata + + +

+
save_metadata(time: datetime, metadata: dict)
+
+ +
+ +

Saves snapshot to specified entity of current master document.

+ +
+ Source code in dp3/database/database.py +
def save_metadata(self, time: datetime, metadata: dict):
+    """Saves snapshot to specified entity of current master document."""
+    module = get_caller_id()
+    metadata["_id"] = module + time.strftime("%Y-%m-%dT%H:%M:%S.%fZ")[:-4]
+    metadata["#module"] = module
+    metadata["#time_created"] = time
+    metadata["#last_update"] = datetime.now()
+    try:
+        self._db["#metadata"].insert_one(metadata)
+        self.log.debug("Inserted metadata %s: %s", metadata["_id"], metadata)
+    except Exception as e:
+        raise DatabaseError(f"Insert of metadata failed: {e}\n{metadata}") from e
+
+
+
+ +
+ +
+ + + +

+ get_observation_history + + +

+
get_observation_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]
+
+ +
+ +

Get full (or filtered) history of observation attribute.

+

This method is useful for displaying eid's history on web. +Also used to feed data into get_timeseries_history().

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
etype + str + +
+

entity type

+
+
+ required +
attr_name + str + +
+

name of attribute

+
+
+ required +
eid + str + +
+

id of entity, to which data-points correspond

+
+
+ required +
t1 + datetime + +
+

left value of time interval (inclusive)

+
+
+ None +
t2 + datetime + +
+

right value of time interval (inclusive)

+
+
+ None +
sort + int + +
+

sort by timestamps - 0: ascending order by t1, 1: descending order by t2, +None: don't sort

+
+
+ None +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ list[dict] + +
+

list of dicts (reduced datapoints)

+
+
+ +
+ Source code in dp3/database/database.py +
def get_observation_history(
+    self,
+    etype: str,
+    attr_name: str,
+    eid: str,
+    t1: datetime = None,
+    t2: datetime = None,
+    sort: int = None,
+) -> list[dict]:
+    """Get full (or filtered) history of observation attribute.
+
+    This method is useful for displaying `eid`'s history on web.
+    Also used to feed data into `get_timeseries_history()`.
+
+    Args:
+        etype: entity type
+        attr_name: name of attribute
+        eid: id of entity, to which data-points correspond
+        t1: left value of time interval (inclusive)
+        t2: right value of time interval (inclusive)
+        sort: sort by timestamps - 0: ascending order by t1, 1: descending order by t2,
+            None: don't sort
+    Returns:
+        list of dicts (reduced datapoints)
+    """
+    t1 = datetime.fromtimestamp(0) if t1 is None else t1
+    t2 = datetime.now() if t2 is None else t2
+
+    # Get attribute history
+    mr = self.get_master_record(etype, eid)
+    attr_history = mr.get(attr_name, [])
+
+    # Filter
+    attr_history_filtered = [row for row in attr_history if row["t1"] <= t2 and row["t2"] >= t1]
+
+    # Sort
+    if sort == 1:
+        attr_history_filtered.sort(key=lambda row: row["t1"])
+    elif sort == 2:
+        attr_history_filtered.sort(key=lambda row: row["t2"], reverse=True)
+
+    return attr_history_filtered
+
+
+
+ +
+ +
+ + + +

+ get_timeseries_history + + +

+
get_timeseries_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]
+
+ +
+ +

Get full (or filtered) history of timeseries attribute. +Outputs them in format: +

    [
+        {
+            "t1": ...,
+            "t2": ...,
+            "v": {
+                "series1": ...,
+                "series2": ...
+            }
+        },
+        ...
+    ]
+
+This method is useful for displaying eid's history on web.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
etype + str + +
+

entity type

+
+
+ required +
attr_name + str + +
+

name of attribute

+
+
+ required +
eid + str + +
+

id of entity, to which data-points correspond

+
+
+ required +
t1 + datetime + +
+

left value of time interval (inclusive)

+
+
+ None +
t2 + datetime + +
+

right value of time interval (inclusive)

+
+
+ None +
sort + int + +
+

sort by timestamps - 0: ascending order by t1, 1: descending order by t2, +None: don't sort

+
+
+ None +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ list[dict] + +
+

list of dicts (reduced datapoints) - each represents just one point at time

+
+
+ +
+ Source code in dp3/database/database.py +
def get_timeseries_history(
+    self,
+    etype: str,
+    attr_name: str,
+    eid: str,
+    t1: datetime = None,
+    t2: datetime = None,
+    sort: int = None,
+) -> list[dict]:
+    """Get full (or filtered) history of timeseries attribute.
+    Outputs them in format:
+    ```
+        [
+            {
+                "t1": ...,
+                "t2": ...,
+                "v": {
+                    "series1": ...,
+                    "series2": ...
+                }
+            },
+            ...
+        ]
+    ```
+    This method is useful for displaying `eid`'s history on web.
+
+    Args:
+        etype: entity type
+        attr_name: name of attribute
+        eid: id of entity, to which data-points correspond
+        t1: left value of time interval (inclusive)
+        t2: right value of time interval (inclusive)
+        sort: sort by timestamps - `0`: ascending order by `t1`, `1`: descending order by `t2`,
+            `None`: don't sort
+    Returns:
+         list of dicts (reduced datapoints) - each represents just one point at time
+    """
+    t1 = datetime.fromtimestamp(0) if t1 is None else t1
+    t2 = datetime.now() if t2 is None else t2
+
+    attr_history = self.get_observation_history(etype, attr_name, eid, t1, t2, sort)
+    if not attr_history:
+        return []
+
+    attr_history_split = self._split_timeseries_dps(etype, attr_name, attr_history)
+
+    # Filter out rows outside [t1, t2] interval
+    attr_history_filtered = [
+        row for row in attr_history_split if row["t1"] <= t2 and row["t2"] >= t1
+    ]
+
+    return attr_history_filtered
+
+
+
+ +
+ +
+ + + +

+ delete_old_snapshots + + +

+
delete_old_snapshots(etype: str, t_old: datetime)
+
+ +
+ +

Delete old snapshots.

+

Periodically called for all etypes from HistoryManager.

+ +
+ Source code in dp3/database/database.py +
def delete_old_snapshots(self, etype: str, t_old: datetime):
+    """Delete old snapshots.
+
+    Periodically called for all `etype`s from HistoryManager.
+    """
+    snapshot_col_name = self._snapshots_col_name(etype)
+    try:
+        return self._db[snapshot_col_name].delete_many({"_time_created": {"$lt": t_old}})
+    except Exception as e:
+        raise DatabaseError(f"Delete of olds snapshots failed: {e}") from e
+
+
+
+ +
+ +
+ + + +

+ get_module_cache + + +

+
get_module_cache()
+
+ +
+ +

Return a persistent cache collection for given module name.

+ +
+ Source code in dp3/database/database.py +
def get_module_cache(self):
+    """Return a persistent cache collection for given module name."""
+    module = get_caller_id()
+    self.log.debug("Module %s is accessing its cache collection", module)
+    return self._db[f"#cache#{module}"]
+
+
+
+ +
+ + + +
+ +
+ +
+ + +
+ + + +

+ get_caller_id + + +

+
get_caller_id()
+
+ +
+ +

Returns the name of the caller method's class, or function name if caller is not a method.

+ +
+ Source code in dp3/database/database.py +
61
+62
+63
+64
+65
+66
def get_caller_id():
+    """Returns the name of the caller method's class, or function name if caller is not a method."""
+    caller = inspect.stack()[2]
+    if module := caller.frame.f_locals.get("self"):
+        return module.__class__.__qualname__
+    return caller.function
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/database/index.html b/reference/database/index.html new file mode 100644 index 00000000..a523939c --- /dev/null +++ b/reference/database/index.html @@ -0,0 +1,1541 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + database - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.database + + +

+ +
+ +

A wrapper responsible for communication with the database server.

+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/history_management/history_manager/index.html b/reference/history_management/history_manager/index.html new file mode 100644 index 00000000..07abd220 --- /dev/null +++ b/reference/history_management/history_manager/index.html @@ -0,0 +1,2167 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + history_manager - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.history_management.history_manager + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ DatetimeEncoder + + +

+ + +
+

+ Bases: JSONEncoder

+ + +

JSONEncoder to encode datetime using the standard ADiCT format string.

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ HistoryManager + + +

+
HistoryManager(db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar) -> None
+
+ +
+ + + +
+ Source code in dp3/history_management/history_manager.py +
28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
def __init__(
+    self, db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar
+) -> None:
+    self.log = logging.getLogger("HistoryManager")
+
+    self.db = db
+    self.model_spec = platform_config.model_spec
+    self.worker_index = platform_config.process_index
+    self.num_workers = platform_config.num_processes
+    self.config = platform_config.config.get("history_manager")
+
+    # Schedule master document aggregation
+    registrar.scheduler_register(self.aggregate_master_docs, minute="*/10")
+
+    if platform_config.process_index != 0:
+        self.log.debug(
+            "History management will be disabled in this worker to avoid race conditions."
+        )
+        return
+
+    # Schedule datapoints cleaning
+    datapoint_cleaning_period = self.config["datapoint_cleaning"]["tick_rate"]
+    registrar.scheduler_register(self.delete_old_dps, minute=f"*/{datapoint_cleaning_period}")
+
+    snapshot_cleaning_cron = self.config["snapshot_cleaning"]["cron_schedule"]
+    self.keep_snapshot_delta = timedelta(days=self.config["snapshot_cleaning"]["days_to_keep"])
+    registrar.scheduler_register(self.delete_old_snapshots, **snapshot_cleaning_cron)
+
+    # Schedule datapoint archivation
+    self.keep_raw_delta = timedelta(days=self.config["datapoint_archivation"]["days_to_keep"])
+    self.log_dir = self._ensure_log_dir(self.config["datapoint_archivation"]["archive_dir"])
+    registrar.scheduler_register(self.archive_old_dps, minute=0, hour=2)  # Every day at 2 AM
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ delete_old_dps + + +

+
delete_old_dps()
+
+ +
+ +

Deletes old data points from master collection.

+ +
+ Source code in dp3/history_management/history_manager.py +
61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
def delete_old_dps(self):
+    """Deletes old data points from master collection."""
+    self.log.debug("Deleting old records ...")
+
+    for etype_attr, attr_conf in self.model_spec.attributes.items():
+        etype, attr_name = etype_attr
+        max_age = None
+
+        if attr_conf.t == AttrType.OBSERVATIONS:
+            max_age = attr_conf.history_params.max_age
+        elif attr_conf.t == AttrType.TIMESERIES:
+            max_age = attr_conf.timeseries_params.max_age
+
+        if not max_age:
+            continue
+
+        t_old = datetime.utcnow() - max_age
+
+        try:
+            self.db.delete_old_dps(etype, attr_name, t_old)
+        except DatabaseError as e:
+            self.log.error(e)
+
+
+
+ +
+ +
+ + + +

+ delete_old_snapshots + + +

+
delete_old_snapshots()
+
+ +
+ +

Deletes old snapshots.

+ +
+ Source code in dp3/history_management/history_manager.py +
84
+85
+86
+87
+88
+89
+90
+91
+92
+93
+94
+95
+96
def delete_old_snapshots(self):
+    """Deletes old snapshots."""
+    t_old = datetime.now() - self.keep_snapshot_delta
+    self.log.debug("Deleting all snapshots before %s", t_old)
+
+    deleted_total = 0
+    for etype in self.model_spec.entities:
+        try:
+            result = self.db.delete_old_snapshots(etype, t_old)
+            deleted_total += result.deleted_count
+        except DatabaseError as e:
+            self.log.exception(e)
+    self.log.debug("Deleted %s snapshots in total.", deleted_total)
+
+
+
+ +
+ +
+ + + +

+ archive_old_dps + + +

+
archive_old_dps()
+
+ +
+ +

Archives old data points from raw collection.

+

Updates already saved archive files, if present.

+ +
+ Source code in dp3/history_management/history_manager.py +
def archive_old_dps(self):
+    """
+    Archives old data points from raw collection.
+
+    Updates already saved archive files, if present.
+    """
+
+    t_old = datetime.utcnow() - self.keep_raw_delta
+    t_old = t_old.replace(hour=0, minute=0, second=0, microsecond=0)
+    self.log.debug("Archiving all records before %s ...", t_old)
+
+    max_date, min_date, total_dps = self._get_raw_dps_summary(t_old)
+    if total_dps == 0:
+        self.log.debug("Found no datapoints to archive.")
+        return
+    self.log.debug(
+        "Found %s datapoints to archive in the range %s - %s", total_dps, min_date, max_date
+    )
+
+    n_days = (max_date - min_date).days + 1
+    for date, next_date in [
+        (min_date + timedelta(days=n), min_date + timedelta(days=n + 1)) for n in range(n_days)
+    ]:
+        date_string = date.strftime("%Y%m%d")
+        day_datapoints = 0
+        date_logfile = self.log_dir / f"dp-log-{date_string}.json"
+
+        with open(date_logfile, "w", encoding="utf-8") as logfile:
+            first = True
+            for etype in self.model_spec.entities:
+                result_cursor = self.db.get_raw(etype, after=date, before=next_date)
+                for dp in result_cursor:
+                    if first:
+                        logfile.write(
+                            f"[\n{json.dumps(self._reformat_dp(dp), cls=DatetimeEncoder)}"
+                        )
+                        first = False
+                    else:
+                        logfile.write(
+                            f",\n{json.dumps(self._reformat_dp(dp), cls=DatetimeEncoder)}"
+                        )
+                    day_datapoints += 1
+            logfile.write("\n]")
+        self.log.debug(
+            "%s: Archived %s datapoints to %s", date_string, day_datapoints, date_logfile
+        )
+        compress_file(date_logfile)
+        os.remove(date_logfile)
+        self.log.debug("%s: Saved archive was compressed", date_string)
+
+        if not day_datapoints:
+            continue
+
+        deleted_count = 0
+        for etype in self.model_spec.entities:
+            deleted_res = self.db.delete_old_raw_dps(etype, next_date)
+            deleted_count += deleted_res.deleted_count
+        self.log.debug("%s: Deleted %s datapoints", date_string, deleted_count)
+
+
+
+ +
+ + + +
+ +
+ +
+ + +
+ + + +

+ aggregate_dp_history_on_equal + + +

+
aggregate_dp_history_on_equal(history: list[dict], spec: ObservationsHistoryParams)
+
+ +
+ +

Merge datapoints in the history with equal values and overlapping time validity.

+

Avergages the confidence.

+ +
+ Source code in dp3/history_management/history_manager.py +
def aggregate_dp_history_on_equal(history: list[dict], spec: ObservationsHistoryParams):
+    """
+    Merge datapoints in the history with equal values and overlapping time validity.
+
+    Avergages the confidence.
+    """
+    history = sorted(history, key=lambda x: x["t1"])
+    aggregated_history = []
+    current_dp = None
+    merged_cnt = 0
+    pre = spec.pre_validity
+    post = spec.post_validity
+
+    for dp in history:
+        if not current_dp:
+            current_dp = dp
+            merged_cnt += 1
+            continue
+
+        if current_dp["v"] == dp["v"] and current_dp["t2"] + post >= dp["t1"] - pre:
+            current_dp["t2"] = max(dp["t2"], current_dp["t2"])
+            current_dp["c"] += dp["c"]
+            merged_cnt += 1
+        else:
+            aggregated_history.append(current_dp)
+            current_dp["c"] /= merged_cnt
+
+            merged_cnt = 1
+            current_dp = dp
+    if current_dp:
+        current_dp["c"] /= merged_cnt
+        aggregated_history.append(current_dp)
+    return aggregated_history
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/history_management/index.html b/reference/history_management/index.html new file mode 100644 index 00000000..1372ad54 --- /dev/null +++ b/reference/history_management/index.html @@ -0,0 +1,1541 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + history_management - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.history_management + + +

+ +
+ +

Module responsible for managing history saved in database, currently to clean old data.

+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/history_management/telemetry/index.html b/reference/history_management/telemetry/index.html new file mode 100644 index 00000000..e165cf99 --- /dev/null +++ b/reference/history_management/telemetry/index.html @@ -0,0 +1,1549 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + telemetry - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.history_management.telemetry + + +

+ +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/index.html b/reference/index.html new file mode 100644 index 00000000..da7a9fe4 --- /dev/null +++ b/reference/index.html @@ -0,0 +1,1594 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + dp3 - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3 + + +

+ +
+ +

Dynamic Profile Processing Platform (DP³)

+

Platform directory structure:

+
    +
  • +

    Worker - The main worker process.

    +
  • +
  • +

    Common - Common modules which are used throughout the platform.

    +
      +
    • Config, EntitySpec and +AttrSpec - Models for reading, validation and representing +platform configuration of entities and their attributes. +base_attrs and datatype are also used + within this context.
    • +
    • Scheduler - Allows modules to run callbacks at specified times
    • +
    • Task - Model for a single task processed by the platform
    • +
    • Utils - Auxiliary utility functions
    • +
    +
  • +
  • +

    Database.EntityDatabase - A wrapper responsible for communication +with the database server.

    +
  • +
  • +

    HistoryManagement.HistoryManager - Module responsible +for managing history saved in database, currently to clean old data.

    +
  • +
  • +

    Snapshots - SnapShooter, a module responsible +for snapshot creation and running configured data correlation and fusion hooks, +and Snapshot Hooks, which manage the registered hooks and their +dependencies on one another.

    +
  • +
  • +

    TaskProcessing - Module responsible for task +distribution, +processing and running configured +hooks. Task distribution is possible due to the +task queue.

    +
  • +
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/snapshots/index.html b/reference/snapshots/index.html new file mode 100644 index 00000000..f213a41c --- /dev/null +++ b/reference/snapshots/index.html @@ -0,0 +1,1544 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + snapshots - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.snapshots + + +

+ +
+ +

SnapShooter, a module responsible +for snapshot creation and running configured data correlation and fusion hooks, +and Snapshot Hooks, which manage the registered hooks and their +dependencies on one another.

+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/snapshots/snapshooter/index.html b/reference/snapshots/snapshooter/index.html new file mode 100644 index 00000000..81bf9c21 --- /dev/null +++ b/reference/snapshots/snapshooter/index.html @@ -0,0 +1,3441 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + snapshooter - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +
+ + + +

+ dp3.snapshots.snapshooter + + +

+ +
+ +

Module managing creation of snapshots, enabling data correlation and saving snapshots to DB.

+
    +
  • +

    Snapshots are created periodically (user configurable period)

    +
  • +
  • +

    When a snapshot is created, several things need to happen:

    +
      +
    • all registered timeseries processing modules must be called
    • +
    • this should result in observations or plain datapoints, which will be saved to db + and forwarded in processing
    • +
    • current value must be computed for all observations
    • +
    • load relevant section of observation's history and perform configured history analysis. + Result = plain values
    • +
    • load plain attributes saved in master collection
    • +
    • A record of described plain data makes a profile
    • +
    • Profile is additionally extended by related entities
    • +
    • Callbacks for data correlation and fusion should happen here
    • +
    • Save the complete results into database as snapshots
    • +
    +
  • +
+ + + +
+ + + + + + + + +
+ + + +

+ SnapShooter + + +

+
SnapShooter(db: EntityDatabase, task_queue_writer: TaskQueueWriter, task_executor: TaskExecutor, platform_config: PlatformConfig, scheduler: Scheduler) -> None
+
+ +
+ + +

Class responsible for creating entity snapshots.

+ + +
+ Source code in dp3/snapshots/snapshooter.py +
def __init__(
+    self,
+    db: EntityDatabase,
+    task_queue_writer: TaskQueueWriter,
+    task_executor: TaskExecutor,
+    platform_config: PlatformConfig,
+    scheduler: Scheduler,
+) -> None:
+    self.log = logging.getLogger("SnapShooter")
+
+    self.db = db
+    self.task_queue_writer = task_queue_writer
+    self.model_spec = platform_config.model_spec
+    self.entity_relation_attrs = defaultdict(dict)
+    for (entity, attr), _ in self.model_spec.relations.items():
+        self.entity_relation_attrs[entity][attr] = True
+    for entity in self.model_spec.entities:
+        self.entity_relation_attrs[entity]["_id"] = True
+
+    self.worker_index = platform_config.process_index
+    self.worker_cnt = platform_config.num_processes
+    self.config = SnapShooterConfig.parse_obj(platform_config.config.get("snapshots"))
+
+    self._timeseries_hooks = SnapshotTimeseriesHookContainer(self.log, self.model_spec)
+    self._correlation_hooks = SnapshotCorrelationHookContainer(self.log, self.model_spec)
+
+    queue = f"{platform_config.app_name}-worker-{platform_config.process_index}-snapshots"
+    self.snapshot_queue_reader = TaskQueueReader(
+        callback=self.process_snapshot_task,
+        parse_task=Snapshot.parse_raw,
+        app_name=platform_config.app_name,
+        worker_index=platform_config.process_index,
+        rabbit_config=platform_config.config.get("processing_core.msg_broker", {}),
+        queue=queue,
+        priority_queue=queue,
+        parent_logger=self.log,
+    )
+
+    self.snapshot_entities = [
+        entity for entity, spec in self.model_spec.entities.items() if spec.snapshot
+    ]
+    self.log.info("Snapshots will be created for entities: %s", self.snapshot_entities)
+
+    # Register snapshot cache
+    for (entity, attr), spec in self.model_spec.relations.items():
+        if spec.t == AttrType.PLAIN:
+            task_executor.register_attr_hook(
+                "on_new_plain", self.add_to_link_cache, entity, attr
+            )
+        elif spec.t == AttrType.OBSERVATIONS:
+            task_executor.register_attr_hook(
+                "on_new_observation", self.add_to_link_cache, entity, attr
+            )
+
+    if platform_config.process_index != 0:
+        self.log.debug(
+            "Snapshot task creation will be disabled in this worker to avoid race conditions."
+        )
+        self.snapshot_queue_writer = None
+        return
+
+    self.snapshot_queue_writer = TaskQueueWriter(
+        platform_config.app_name,
+        platform_config.num_processes,
+        platform_config.config.get("processing_core.msg_broker"),
+        f"{platform_config.app_name}-main-snapshot-exchange",
+        parent_logger=self.log,
+    )
+
+    # Schedule snapshot period
+    snapshot_period = self.config.creation_rate
+    scheduler.register(self.make_snapshots, minute=f"*/{snapshot_period}")
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ start + + +

+
start()
+
+ +
+ +

Connect to RabbitMQ and start consuming from TaskQueue.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def start(self):
+    """Connect to RabbitMQ and start consuming from TaskQueue."""
+    self.log.info("Connecting to RabbitMQ")
+    self.snapshot_queue_reader.connect()
+    self.snapshot_queue_reader.check()  # check presence of needed queues
+    if self.snapshot_queue_writer is not None:
+        self.snapshot_queue_writer.connect()
+        self.snapshot_queue_writer.check()  # check presence of needed exchanges
+
+    self.snapshot_queue_reader.start()
+
+
+
+ +
+ +
+ + + +

+ stop + + +

+
stop()
+
+ +
+ +

Stop consuming from TaskQueue, disconnect from RabbitMQ.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def stop(self):
+    """Stop consuming from TaskQueue, disconnect from RabbitMQ."""
+    self.snapshot_queue_reader.stop()
+
+    if self.snapshot_queue_writer is not None:
+        self.snapshot_queue_writer.disconnect()
+    self.snapshot_queue_reader.disconnect()
+
+
+
+ +
+ +
+ + + +

+ register_timeseries_hook + + +

+
register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
+ +
+ +

Registers passed timeseries hook to be called during snapshot creation.

+

Binds hook to specified entity_type and attr_type (though same hook can be bound +multiple times).

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, str, list[dict]], list[DataPointTask]] + +
+

hook callable should expect entity_type, attr_type and attribute +history as arguments and return a list of DataPointTask objects.

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
attr_type + str + +
+

specifies attribute type

+
+
+ required +
+ +

Raises:

+ + + + + + + + + + + + + +
TypeDescription
+ ValueError + +
+

If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised.

+
+
+ +
+ Source code in dp3/snapshots/snapshooter.py +
def register_timeseries_hook(
+    self,
+    hook: Callable[[str, str, list[dict]], list[DataPointTask]],
+    entity_type: str,
+    attr_type: str,
+):
+    """
+    Registers passed timeseries hook to be called during snapshot creation.
+
+    Binds hook to specified `entity_type` and `attr_type` (though same hook can be bound
+    multiple times).
+
+    Args:
+        hook: `hook` callable should expect entity_type, attr_type and attribute
+            history as arguments and return a list of `DataPointTask` objects.
+        entity_type: specifies entity type
+        attr_type: specifies attribute type
+
+    Raises:
+        ValueError: If entity_type and attr_type do not specify a valid timeseries attribute,
+            a ValueError is raised.
+    """
+    self._timeseries_hooks.register(hook, entity_type, attr_type)
+
+
+
+ +
+ +
+ + + +

+ register_correlation_hook + + +

+
register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])
+
+ +
+ +

Registers passed hook to be called during snapshot creation.

+

Binds hook to specified entity_type (though same hook can be bound multiple times).

+

entity_type and attribute specifications are validated, ValueError is raised on failure.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, dict], None] + +
+

hook callable should expect entity type as str +and its current values, including linked entities, as dict

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
depends_on + list[list[str]] + +
+

each item should specify an attribute that is depended on +in the form of a path from the specified entity_type to individual attributes +(even on linked entities).

+
+
+ required +
may_change + list[list[str]] + +
+

each item should specify an attribute that hook may change. +specification format is identical to depends_on.

+
+
+ required +
+ +

Raises:

+ + + + + + + + + + + + + +
TypeDescription
+ ValueError + +
+

On failure of specification validation.

+
+
+ +
+ Source code in dp3/snapshots/snapshooter.py +
def register_correlation_hook(
+    self,
+    hook: Callable[[str, dict], None],
+    entity_type: str,
+    depends_on: list[list[str]],
+    may_change: list[list[str]],
+):
+    """
+    Registers passed hook to be called during snapshot creation.
+
+    Binds hook to specified entity_type (though same hook can be bound multiple times).
+
+    `entity_type` and attribute specifications are validated, `ValueError` is raised on failure.
+
+    Args:
+        hook: `hook` callable should expect entity type as str
+            and its current values, including linked entities, as dict
+        entity_type: specifies entity type
+        depends_on: each item should specify an attribute that is depended on
+            in the form of a path from the specified entity_type to individual attributes
+            (even on linked entities).
+        may_change: each item should specify an attribute that `hook` may change.
+            specification format is identical to `depends_on`.
+
+    Raises:
+        ValueError: On failure of specification validation.
+    """
+    self._correlation_hooks.register(hook, entity_type, depends_on, may_change)
+
+
+
+ +
+ +
+ + + + +
add_to_link_cache(eid: str, dp: DataPointBase)
+
+ +
+ +

Adds the given entity,eid pair to the cache of all linked entitites.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def add_to_link_cache(self, eid: str, dp: DataPointBase):
+    """Adds the given entity,eid pair to the cache of all linked entitites."""
+    cache = self.db.get_module_cache()
+    etype_to = self.model_spec.relations[dp.etype, dp.attr].relation_to
+    to_insert = [
+        {
+            "_id": f"{dp.etype}#{eid}",
+            "etype": dp.etype,
+            "eid": eid,
+            "expire_at": datetime.now() + timedelta(days=2),
+        },
+        {
+            "_id": f"{etype_to}#{dp.v.eid}",
+            "etype": etype_to,
+            "eid": dp.v.eid,
+            "expire_at": datetime.now() + timedelta(days=2),
+        },
+    ]
+    res = cache.bulk_write([ReplaceOne({"_id": x["_id"]}, x, upsert=True) for x in to_insert])
+    self.log.debug("Cached %s linked entities: %s", len(to_insert), res.bulk_api_result)
+
+
+
+ +
+ +
+ + + +

+ make_snapshots + + +

+
make_snapshots()
+
+ +
+ +

Creates snapshots for all entities currently active in database.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def make_snapshots(self):
+    """Creates snapshots for all entities currently active in database."""
+    time = datetime.now()
+
+    # distribute list of possibly linked entities to all workers
+    cached = self.get_cached_link_entity_ids()
+    self.log.debug("Broadcasting %s cached linked entities", len(cached))
+    self.snapshot_queue_writer.broadcast_task(
+        task=Snapshot(entities=cached, time=time, type=SnapshotMessageType.linked_entities)
+    )
+
+    # Load links only for a reduced set of entities
+    self.log.debug("Loading linked entities.")
+    self.db.save_metadata(time, {"task_creation_start": time, "entities": 0, "components": 0})
+    times = {}
+    counts = {"entities": 0, "components": 0}
+    try:
+        linked_entities = self.get_linked_entities(time, cached)
+        times["components_loaded"] = datetime.now()
+
+        for linked_entities_component in linked_entities:
+            counts["entities"] += len(linked_entities_component)
+            counts["components"] += 1
+
+            self.snapshot_queue_writer.put_task(
+                task=Snapshot(
+                    entities=linked_entities_component, time=time, type=SnapshotMessageType.task
+                )
+            )
+    except pymongo.errors.CursorNotFound as err:
+        self.log.exception(err)
+    finally:
+        times["task_creation_end"] = datetime.now()
+        self.db.update_metadata(
+            time,
+            metadata=times,
+            increase=counts,
+        )
+
+
+
+ +
+ +
+ + + +

+ get_linked_entities + + +

+
get_linked_entities(time: datetime, cached_linked_entities: list[tuple[str, str]])
+
+ +
+ +

Get weakly connected components from entity graph.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def get_linked_entities(self, time: datetime, cached_linked_entities: list[tuple[str, str]]):
+    """Get weakly connected components from entity graph."""
+    visited_entities = set()
+    entity_to_component = {}
+    linked_components = []
+    for etype, eid in cached_linked_entities:
+        master_record = self.db.get_master_record(
+            etype, eid, projection=self.entity_relation_attrs[etype]
+        ) or {"_id": eid}
+        if (etype, master_record["_id"]) not in visited_entities:
+            # Get entities linked by current entity
+            current_values = self.get_values_at_time(etype, master_record, time)
+            linked_entities = self.load_linked_entity_ids(etype, current_values, time)
+
+            # Set linked as visited
+            visited_entities.update(linked_entities)
+
+            # Update component
+            have_component = linked_entities & set(entity_to_component.keys())
+            if have_component:
+                for entity in have_component:
+                    component = entity_to_component[entity]
+                    component.update(linked_entities)
+                    entity_to_component.update(
+                        {entity: component for entity in linked_entities}
+                    )
+                    break
+            else:
+                entity_to_component.update(
+                    {entity: linked_entities for entity in linked_entities}
+                )
+                linked_components.append(linked_entities)
+    return linked_components
+
+
+
+ +
+ +
+ + + +

+ process_snapshot_task + + +

+
process_snapshot_task(msg_id, task: Snapshot)
+
+ +
+ +

Acknowledges the received message and makes a snapshot according to the task.

+

This function should not be called directly, but set as callback for TaskQueueReader.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def process_snapshot_task(self, msg_id, task: Snapshot):
+    """
+    Acknowledges the received message and makes a snapshot according to the `task`.
+
+    This function should not be called directly, but set as callback for TaskQueueReader.
+    """
+    self.snapshot_queue_reader.ack(msg_id)
+    if task.type == SnapshotMessageType.task:
+        self.make_snapshot(task)
+    elif task.type == SnapshotMessageType.linked_entities:
+        self.make_snapshots_by_hash(task)
+    else:
+        raise ValueError("Unknown SnapshotMessageType.")
+
+
+
+ +
+ +
+ + + +

+ make_snapshots_by_hash + + +

+
make_snapshots_by_hash(task: Snapshot)
+
+ +
+ +

Make snapshots for all entities with routing key belonging to this worker.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def make_snapshots_by_hash(self, task: Snapshot):
+    """
+    Make snapshots for all entities with routing key belonging to this worker.
+    """
+    self.log.debug("Creating snapshots for worker portion by hash.")
+    have_links = set(task.entities)
+    entity_cnt = 0
+    for etype in self.snapshot_entities:
+        records_cursor = self.db.get_worker_master_records(
+            self.worker_index,
+            self.worker_cnt,
+            etype,
+            no_cursor_timeout=True,
+        )
+        try:
+            snapshots = []
+            for master_record in records_cursor:
+                if (etype, master_record["_id"]) in have_links:
+                    continue
+                entity_cnt += 1
+                snapshots.append(self.make_linkless_snapshot(etype, master_record, task.time))
+
+                if len(snapshots) >= DB_SEND_CHUNK:
+                    self.db.save_snapshots(etype, snapshots, task.time)
+                    snapshots.clear()
+
+            if snapshots:
+                self.db.save_snapshots(etype, snapshots, task.time)
+                snapshots.clear()
+        finally:
+            records_cursor.close()
+    self.db.update_metadata(
+        task.time,
+        metadata={},
+        increase={"entities": entity_cnt, "components": entity_cnt},
+    )
+    self.log.debug("Worker snapshot creation done.")
+
+
+
+ +
+ +
+ + + +

+ make_linkless_snapshot + + +

+
make_linkless_snapshot(entity_type: str, master_record: dict, time: datetime)
+
+ +
+ +

Make a snapshot for given entity master_record and time.

+

Runs timeseries and correlation hooks. +The resulting snapshot is saved into DB.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def make_linkless_snapshot(self, entity_type: str, master_record: dict, time: datetime):
+    """
+    Make a snapshot for given entity `master_record` and `time`.
+
+    Runs timeseries and correlation hooks.
+    The resulting snapshot is saved into DB.
+    """
+    self.run_timeseries_processing(entity_type, master_record)
+    values = self.get_values_at_time(entity_type, master_record, time)
+    entity_values = {(entity_type, master_record["_id"]): values}
+
+    self._correlation_hooks.run(entity_values)
+
+    assert len(entity_values) == 1, "Expected a single entity."
+    for record in entity_values.values():
+        return record
+
+
+
+ +
+ +
+ + + +

+ make_snapshot + + +

+
make_snapshot(task: Snapshot)
+
+ +
+ +

Make a snapshot for entities and time specified by task.

+

Runs timeseries and correlation hooks. +The resulting snapshots are saved into DB.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def make_snapshot(self, task: Snapshot):
+    """
+    Make a snapshot for entities and time specified by `task`.
+
+    Runs timeseries and correlation hooks.
+    The resulting snapshots are saved into DB.
+    """
+    entity_values = {}
+    for entity_type, entity_id in task.entities:
+        record = self.db.get_master_record(entity_type, entity_id) or {"_id": entity_id}
+        self.run_timeseries_processing(entity_type, record)
+        values = self.get_values_at_time(entity_type, record, task.time)
+        entity_values[entity_type, entity_id] = values
+
+    self.link_loaded_entities(entity_values)
+    self._correlation_hooks.run(entity_values)
+
+    # unlink entities again
+    for rtype_rid, record in entity_values.items():
+        rtype, rid = rtype_rid
+        for attr, value in record.items():
+            if (rtype, attr) not in self.model_spec.relations:
+                continue
+            if self.model_spec.relations[rtype, attr].multi_value:
+                record[attr] = [
+                    {k: v for k, v in link_dict.items() if k != "record"} for link_dict in value
+                ]
+            else:
+                record[attr] = {k: v for k, v in value.items() if k != "record"}
+
+    for rtype_rid, record in entity_values.items():
+        self.db.save_snapshot(rtype_rid[0], record, task.time)
+
+
+
+ +
+ +
+ + + +

+ run_timeseries_processing + + +

+
run_timeseries_processing(entity_type, master_record)
+
+ +
+ +
    +
  • all registered timeseries processing modules must be called
  • +
  • this should result in observations or plain datapoints, which will be saved to db + and forwarded in processing
  • +
+ +
+ Source code in dp3/snapshots/snapshooter.py +
def run_timeseries_processing(self, entity_type, master_record):
+    """
+    - all registered timeseries processing modules must be called
+      - this should result in `observations` or `plain` datapoints, which will be saved to db
+        and forwarded in processing
+    """
+    tasks = []
+    for attr, attr_spec in self.model_spec.entity_attributes[entity_type].items():
+        if attr_spec.t == AttrType.TIMESERIES and attr in master_record:
+            new_tasks = self._timeseries_hooks.run(entity_type, attr, master_record[attr])
+            tasks.extend(new_tasks)
+
+    self.extend_master_record(entity_type, master_record, tasks)
+    for task in tasks:
+        self.task_queue_writer.put_task(task)
+
+
+
+ +
+ +
+ + + +

+ extend_master_record + + + + staticmethod + + +

+
extend_master_record(etype, master_record, new_tasks: list[DataPointTask])
+
+ +
+ +

Update existing master record with datapoints from new tasks

+ +
+ Source code in dp3/snapshots/snapshooter.py +
@staticmethod
+def extend_master_record(etype, master_record, new_tasks: list[DataPointTask]):
+    """Update existing master record with datapoints from new tasks"""
+    for task in new_tasks:
+        for datapoint in task.data_points:
+            if datapoint.etype != etype:
+                continue
+            dp_dict = datapoint.dict(include={"v", "t1", "t2", "c"})
+            if datapoint.attr in master_record:
+                master_record[datapoint.attr].append()
+            else:
+                master_record[datapoint.attr] = [dp_dict]
+
+
+
+ +
+ +
+ + + +

+ load_linked_entity_ids + + +

+
load_linked_entity_ids(entity_type: str, current_values: dict, time: datetime)
+
+ +
+ +

Loads the subgraph of entities linked to the current entity, +returns a list of their types and ids.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def load_linked_entity_ids(self, entity_type: str, current_values: dict, time: datetime):
+    """
+    Loads the subgraph of entities linked to the current entity,
+    returns a list of their types and ids.
+    """
+    loaded_entity_ids = {(entity_type, current_values["eid"])}
+    linked_entity_ids_to_process = (
+        self.get_linked_entity_ids(entity_type, current_values) - loaded_entity_ids
+    )
+
+    while linked_entity_ids_to_process:
+        entity_identifiers = linked_entity_ids_to_process.pop()
+        linked_etype, linked_eid = entity_identifiers
+        relevant_attributes = self.entity_relation_attrs[linked_etype]
+        record = self.db.get_master_record(
+            linked_etype, linked_eid, projection=relevant_attributes
+        ) or {"_id": linked_eid}
+        linked_values = self.get_values_at_time(linked_etype, record, time)
+
+        linked_entity_ids_to_process.update(
+            self.get_linked_entity_ids(entity_type, linked_values) - set(loaded_entity_ids)
+        )
+        loaded_entity_ids.add((linked_etype, linked_eid))
+
+    return loaded_entity_ids
+
+
+
+ +
+ +
+ + + +

+ get_linked_entity_ids + + +

+
get_linked_entity_ids(entity_type: str, current_values: dict) -> set[tuple[str, str]]
+
+ +
+ +

Returns a set of tuples (entity_type, entity_id) identifying entities linked by +current_values.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def get_linked_entity_ids(self, entity_type: str, current_values: dict) -> set[tuple[str, str]]:
+    """
+    Returns a set of tuples (entity_type, entity_id) identifying entities linked by
+    `current_values`.
+    """
+    related_entity_ids = set()
+    for attr, val in current_values.items():
+        if (entity_type, attr) not in self.model_spec.relations:
+            continue
+        attr_spec = self.model_spec.relations[entity_type, attr]
+        if attr_spec.multi_value:
+            related_entity_ids.update((attr_spec.relation_to, v["eid"]) for v in val)
+        else:
+            related_entity_ids.add((attr_spec.relation_to, val["eid"]))
+    return related_entity_ids
+
+
+
+ +
+ +
+ + + +

+ get_value_at_time + + +

+
get_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[Any, float]
+
+ +
+ +

Get current value of an attribute from its history. Assumes multi_value = False.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def get_value_at_time(
+    self, attr_spec: AttrSpecObservations, attr_history, time: datetime
+) -> tuple[Any, float]:
+    """Get current value of an attribute from its history. Assumes `multi_value = False`."""
+    return max(
+        (
+            (point["v"], self.extrapolate_confidence(point, time, attr_spec.history_params))
+            for point in attr_history
+        ),
+        key=lambda val_conf: val_conf[1],
+        default=(None, 0.0),
+    )
+
+
+
+ +
+ +
+ + + +

+ get_multi_value_at_time + + +

+
get_multi_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[list, list[float]]
+
+ +
+ +

Get current value of a multi_value attribute from its history.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
def get_multi_value_at_time(
+    self, attr_spec: AttrSpecObservations, attr_history, time: datetime
+) -> tuple[list, list[float]]:
+    """Get current value of a multi_value attribute from its history."""
+    if attr_spec.data_type.hashable:
+        values_with_confidence = defaultdict(float)
+        for point in attr_history:
+            value = point["v"]
+            confidence = self.extrapolate_confidence(point, time, attr_spec.history_params)
+            if confidence > 0.0 and values_with_confidence[value] < confidence:
+                values_with_confidence[value] = confidence
+        return list(values_with_confidence.keys()), list(values_with_confidence.values())
+    else:
+        values = []
+        confidence_list = []
+        for point in attr_history:
+            value = point["v"]
+            confidence = self.extrapolate_confidence(point, time, attr_spec.history_params)
+            if value in values:
+                i = values.index(value)
+                if confidence_list[i] < confidence:
+                    confidence_list[i] = confidence
+            elif confidence > 0.0:
+                values.append(value)
+                confidence_list.append(confidence)
+        return values, confidence_list
+
+
+
+ +
+ +
+ + + +

+ extrapolate_confidence + + + + staticmethod + + +

+
extrapolate_confidence(datapoint: dict, time: datetime, history_params: ObservationsHistoryParams) -> float
+
+ +
+ +

Get the confidence value at given time.

+ +
+ Source code in dp3/snapshots/snapshooter.py +
@staticmethod
+def extrapolate_confidence(
+    datapoint: dict, time: datetime, history_params: ObservationsHistoryParams
+) -> float:
+    """Get the confidence value at given time."""
+    t1 = datapoint["t1"]
+    t2 = datapoint["t2"]
+    base_confidence = datapoint["c"]
+
+    if time < t1:
+        if time <= t1 - history_params.pre_validity:
+            return 0.0
+        return base_confidence * (1 - (t1 - time) / history_params.pre_validity)
+    if time <= t2:
+        return base_confidence  # completely inside the (strict) interval
+    if time >= t2 + history_params.post_validity:
+        return 0.0
+    return base_confidence * (1 - (time - t2) / history_params.post_validity)
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/snapshots/snapshot_hooks/index.html b/reference/snapshots/snapshot_hooks/index.html new file mode 100644 index 00000000..9cecccbd --- /dev/null +++ b/reference/snapshots/snapshot_hooks/index.html @@ -0,0 +1,2663 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + snapshot_hooks - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.snapshots.snapshot_hooks + + +

+ +
+ +

Module managing registered hooks and their dependencies on one another.

+ + + +
+ + + + + + + + +
+ + + +

+ SnapshotTimeseriesHookContainer + + +

+
SnapshotTimeseriesHookContainer(log: logging.Logger, model_spec: ModelSpec)
+
+ +
+ + +

Container for timeseries analysis hooks

+ + +
+ Source code in dp3/snapshots/snapshot_hooks.py +
24
+25
+26
+27
+28
def __init__(self, log: logging.Logger, model_spec: ModelSpec):
+    self.log = log.getChild("TimeseriesHooks")
+    self.model_spec = model_spec
+
+    self._hooks = defaultdict(list)
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ register + + +

+
register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)
+
+ +
+ +

Registers passed timeseries hook to be called during snapshot creation.

+

Binds hook to specified entity_type and attr_type (though same hook can be bound +multiple times). +If entity_type and attr_type do not specify a valid timeseries attribute, +a ValueError is raised.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, str, list[dict]], list[DataPointTask]] + +
+

hook callable should expect entity_type, attr_type and attribute +history as arguments and return a list of Task objects.

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
attr_type + str + +
+

specifies attribute type

+
+
+ required +
+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
def register(
+    self,
+    hook: Callable[[str, str, list[dict]], list[DataPointTask]],
+    entity_type: str,
+    attr_type: str,
+):
+    """
+    Registers passed timeseries hook to be called during snapshot creation.
+
+    Binds hook to specified entity_type and attr_type (though same hook can be bound
+    multiple times).
+    If entity_type and attr_type do not specify a valid timeseries attribute,
+    a ValueError is raised.
+    Args:
+        hook: `hook` callable should expect entity_type, attr_type and attribute
+            history as arguments and return a list of `Task` objects.
+        entity_type: specifies entity type
+        attr_type: specifies attribute type
+    """
+    if (entity_type, attr_type) not in self.model_spec.attributes:
+        raise ValueError(f"Attribute '{attr_type}' of entity '{entity_type}' does not exist.")
+    spec = self.model_spec.attributes[entity_type, attr_type]
+    if spec.t != AttrType.TIMESERIES:
+        raise ValueError(f"'{entity_type}.{attr_type}' is not a timeseries, but '{spec.t}'")
+    self._hooks[entity_type, attr_type].append(hook)
+    self.log.debug(f"Added hook: '{hook.__qualname__}'")
+
+
+
+ +
+ +
+ + + +

+ run + + +

+
run(entity_type: str, attr_type: str, attr_history: list[dict]) -> list[DataPointTask]
+
+ +
+ +

Runs registered hooks.

+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
def run(
+    self, entity_type: str, attr_type: str, attr_history: list[dict]
+) -> list[DataPointTask]:
+    """Runs registered hooks."""
+    tasks = []
+    for hook in self._hooks[entity_type, attr_type]:
+        try:
+            new_tasks = hook(entity_type, attr_type, attr_history)
+            tasks.extend(new_tasks)
+        except Exception as e:
+            self.log.error(f"Error during running hook {hook}: {e}")
+    return tasks
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ SnapshotCorrelationHookContainer + + +

+
SnapshotCorrelationHookContainer(log: logging.Logger, model_spec: ModelSpec)
+
+ +
+ + +

Container for data fusion and correlation hooks.

+ + +
+ Source code in dp3/snapshots/snapshot_hooks.py +
74
+75
+76
+77
+78
+79
+80
def __init__(self, log: logging.Logger, model_spec: ModelSpec):
+    self.log = log.getChild("CorrelationHooks")
+    self.model_spec = model_spec
+
+    self._hooks: defaultdict[str, list[tuple[str, Callable]]] = defaultdict(list)
+
+    self._dependency_graph = DependencyGraph(self.log)
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ register + + +

+
register(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]]) -> str
+
+ +
+ +

Registers passed hook to be called during snapshot creation.

+

Binds hook to specified entity_type (though same hook can be bound multiple times).

+

If entity_type and attribute specifications are validated +and ValueError is raised on failure.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
hook + Callable[[str, dict], None] + +
+

hook callable should expect entity type as str +and its current values, including linked entities, as dict

+
+
+ required +
entity_type + str + +
+

specifies entity type

+
+
+ required +
depends_on + list[list[str]] + +
+

each item should specify an attribute that is depended on +in the form of a path from the specified entity_type to individual attributes +(even on linked entities).

+
+
+ required +
may_change + list[list[str]] + +
+

each item should specify an attribute that hook may change. +specification format is identical to depends_on.

+
+
+ required +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ str + +
+

Generated hook id.

+
+
+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def register(
+    self,
+    hook: Callable[[str, dict], None],
+    entity_type: str,
+    depends_on: list[list[str]],
+    may_change: list[list[str]],
+) -> str:
+    """
+    Registers passed hook to be called during snapshot creation.
+
+    Binds hook to specified entity_type (though same hook can be bound multiple times).
+
+    If entity_type and attribute specifications are validated
+    and ValueError is raised on failure.
+    Args:
+        hook: `hook` callable should expect entity type as str
+            and its current values, including linked entities, as dict
+        entity_type: specifies entity type
+        depends_on: each item should specify an attribute that is depended on
+            in the form of a path from the specified entity_type to individual attributes
+            (even on linked entities).
+        may_change: each item should specify an attribute that `hook` may change.
+            specification format is identical to `depends_on`.
+    Returns:
+        Generated hook id.
+    """
+
+    if entity_type not in self.model_spec.entities:
+        raise ValueError(f"Entity '{entity_type}' does not exist.")
+
+    self._validate_attr_paths(entity_type, depends_on)
+    self._validate_attr_paths(entity_type, may_change)
+
+    depends_on = self._expand_path_backlinks(entity_type, depends_on)
+    may_change = self._expand_path_backlinks(entity_type, may_change)
+
+    depends_on = self._embed_base_entity(entity_type, depends_on)
+    may_change = self._embed_base_entity(entity_type, may_change)
+
+    hook_id = (
+        f"{hook.__qualname__}("
+        f"{entity_type}, [{','.join(depends_on)}], [{','.join(may_change)}]"
+        f")"
+    )
+    self._dependency_graph.add_hook_dependency(hook_id, depends_on, may_change)
+
+    self._hooks[entity_type].append((hook_id, hook))
+    self._restore_hook_order(self._hooks[entity_type])
+
+    self.log.debug(f"Added hook: '{hook_id}'")
+    return hook_id
+
+
+
+ +
+ +
+ + + +

+ run + + +

+
run(entities: dict)
+
+ +
+ +

Runs registered hooks.

+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def run(self, entities: dict):
+    """Runs registered hooks."""
+    entity_types = {etype for etype, _ in entities}
+    hook_subset = [
+        (hook_id, hook, etype) for etype in entity_types for hook_id, hook in self._hooks[etype]
+    ]
+    topological_order = self._dependency_graph.topological_order
+    hook_subset.sort(key=lambda x: topological_order.index(x[0]))
+    entities_by_etype = {
+        etype_eid[0]: {etype_eid[1]: entity} for etype_eid, entity in entities.items()
+    }
+
+    for hook_id, hook, etype in hook_subset:
+        for eid, entity_values in entities_by_etype[etype].items():
+            self.log.debug("Running hook %s on entity %s", hook_id, eid)
+            hook(etype, entity_values)
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ GraphVertex + + + + dataclass + + +

+ + +
+ + +

Vertex in a graph of dependencies

+ + + + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ DependencyGraph + + +

+
DependencyGraph(log)
+
+ +
+ + +

Class representing a graph of dependencies between correlation hooks.

+ + +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def __init__(self, log):
+    self.log = log.getChild("DependencyGraph")
+
+    # dictionary of adjacency lists for each edge
+    self._vertices = defaultdict(GraphVertex)
+    self.topological_order = []
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ add_hook_dependency + + +

+
add_hook_dependency(hook_id: str, depends_on: list[str], may_change: list[str])
+
+ +
+ +

Add hook to dependency graph and recalculate if any cycles are created.

+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def add_hook_dependency(self, hook_id: str, depends_on: list[str], may_change: list[str]):
+    """Add hook to dependency graph and recalculate if any cycles are created."""
+    if hook_id in self._vertices:
+        raise ValueError(f"Hook id '{hook_id}' already present in the vertices.")
+    for path in depends_on:
+        self.add_edge(path, hook_id)
+    for path in may_change:
+        self.add_edge(hook_id, path)
+    self._vertices[hook_id].type = "hook"
+    try:
+        self.topological_sort()
+    except ValueError as err:
+        raise ValueError(f"Hook {hook_id} introduces a circular dependency.") from err
+    self.check_multiple_writes()
+
+
+
+ +
+ +
+ + + +

+ add_edge + + +

+
add_edge(id_from: Hashable, id_to: Hashable)
+
+ +
+ +

Add oriented edge between specified vertices.

+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def add_edge(self, id_from: Hashable, id_to: Hashable):
+    """Add oriented edge between specified vertices."""
+    self._vertices[id_from].adj.append(id_to)
+    # Ensure vertex with 'id_to' exists to avoid iteration errors later.
+    _ = self._vertices[id_to]
+
+
+
+ +
+ +
+ + + +

+ calculate_in_degrees + + +

+
calculate_in_degrees()
+
+ +
+ +

Calculate number of incoming edges for each vertex. Time complexity O(V + E).

+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def calculate_in_degrees(self):
+    """Calculate number of incoming edges for each vertex. Time complexity O(V + E)."""
+    for vertex_node in self._vertices.values():
+        vertex_node.in_degree = 0
+
+    for vertex_node in self._vertices.values():
+        for adjacent_name in vertex_node.adj:
+            self._vertices[adjacent_name].in_degree += 1
+
+
+
+ +
+ +
+ + + +

+ topological_sort + + +

+
topological_sort()
+
+ +
+ +

Implementation of Kahn's algorithm for topological sorting. +Raises ValueError if there is a cycle in the graph.

+

See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm

+ +
+ Source code in dp3/snapshots/snapshot_hooks.py +
def topological_sort(self):
+    """
+    Implementation of Kahn's algorithm for topological sorting.
+    Raises ValueError if there is a cycle in the graph.
+
+    See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm
+    """
+    self.calculate_in_degrees()
+    queue = [(node_id, node) for node_id, node in self._vertices.items() if node.in_degree == 0]
+    topological_order = []
+    processed_vertices_cnt = 0
+
+    while queue:
+        curr_node_id, curr_node = queue.pop(0)
+        topological_order.append(curr_node_id)
+
+        # Decrease neighbouring nodes' in-degree by 1
+        for neighbor in curr_node.adj:
+            neighbor_node = self._vertices[neighbor]
+            neighbor_node.in_degree -= 1
+            # If in-degree becomes zero, add it to queue
+            if neighbor_node.in_degree == 0:
+                queue.append((neighbor, neighbor_node))
+
+        processed_vertices_cnt += 1
+
+    if processed_vertices_cnt != len(self._vertices):
+        raise ValueError("Dependency graph contains a cycle.")
+    else:
+        self.topological_order = topological_order
+        return topological_order
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/task_processing/index.html b/reference/task_processing/index.html new file mode 100644 index 00000000..ab2fc7fa --- /dev/null +++ b/reference/task_processing/index.html @@ -0,0 +1,1545 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + task_processing - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.task_processing + + +

+ +
+ +

Module responsible for task +distribution, +processing and running configured +hooks. Task distribution is possible due to the +task queue.

+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/task_processing/task_distributor/index.html b/reference/task_processing/task_distributor/index.html new file mode 100644 index 00000000..b8ed5fe4 --- /dev/null +++ b/reference/task_processing/task_distributor/index.html @@ -0,0 +1,1994 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + task_distributor - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.task_processing.task_distributor + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ TaskDistributor + + +

+
TaskDistributor(task_executor: TaskExecutor, platform_config: PlatformConfig, registrar: CallbackRegistrar, daemon_stop_lock: threading.Lock) -> None
+
+ +
+ + +

TaskDistributor uses task queues to distribute tasks between all running processes.

+

Tasks are assigned to worker processes based on hash of entity key, so each +entity is always processed by the same worker. Therefore, all requests +modifying a particular entity are done sequentially and no locking is +necessary.

+

Tasks that are assigned to the current process are passed to task_executor for execution.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
platform_config + PlatformConfig + +
+

Platform config

+
+
+ required +
task_executor + TaskExecutor + +
+

Instance of TaskExecutor

+
+
+ required +
registrar + CallbackRegistrar + +
+

Interface for callback registration

+
+
+ required +
daemon_stop_lock + threading.Lock + +
+

Lock used to control when the program stops. (see dp3.worker)

+
+
+ required +
+ + +
+ Source code in dp3/task_processing/task_distributor.py +
34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
def __init__(
+    self,
+    task_executor: TaskExecutor,
+    platform_config: PlatformConfig,
+    registrar: CallbackRegistrar,
+    daemon_stop_lock: threading.Lock,
+) -> None:
+    assert (
+        0 <= platform_config.process_index < platform_config.num_processes
+    ), "process index must be smaller than number of processes"
+
+    self.log = logging.getLogger("TaskDistributor")
+
+    self.process_index = platform_config.process_index
+    self.num_processes = platform_config.num_processes
+    self.model_spec = platform_config.model_spec
+    self.daemon_stop_lock = daemon_stop_lock
+
+    self.rabbit_params = platform_config.config.get("processing_core.msg_broker", {})
+
+    self.entity_types = list(
+        platform_config.config.get("db_entities").keys()
+    )  # List of configured entity types
+
+    self.running = False
+
+    # List of worker threads for processing the update requests
+    self._worker_threads = []
+    self.num_threads = platform_config.config.get("processing_core.worker_threads", 8)
+
+    # Internal queues for each worker
+    self._queues = [queue.Queue(10) for _ in range(self.num_threads)]
+
+    # Connections to main task queue
+    # Reader - reads tasks from a pair of queues (one pair per process)
+    # and distributes them to worker threads
+    self._task_queue_reader = TaskQueueReader(
+        callback=self._distribute_task,
+        parse_task=lambda body: DataPointTask(model_spec=self.model_spec, **json.loads(body)),
+        app_name=platform_config.app_name,
+        worker_index=self.process_index,
+        rabbit_config=self.rabbit_params,
+    )
+    # Writer - allows modules to write new tasks
+    self._task_queue_writer = TaskQueueWriter(
+        platform_config.app_name, self.num_processes, self.rabbit_params
+    )
+    self.task_executor = task_executor
+    # Object to store thread-local data (e.g. worker-thread index)
+    # (each thread sees different object contents)
+    self._current_thread_data = threading.local()
+
+    # Number of restarts of threads by watchdog
+    self._watchdog_restarts = 0
+    # Register watchdog to scheduler
+    registrar.scheduler_register(self._watchdog, second="*/30")
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ start + + +

+
start() -> None
+
+ +
+ +

Run the worker threads and start consuming from TaskQueue.

+ +
+ Source code in dp3/task_processing/task_distributor.py +
def start(self) -> None:
+    """Run the worker threads and start consuming from TaskQueue."""
+    self.log.info("Connecting to RabbitMQ")
+    self._task_queue_reader.connect()
+    self._task_queue_reader.check()  # check presence of needed queues
+    self._task_queue_writer.connect()
+    self._task_queue_writer.check()  # check presence of needed exchanges
+
+    self.log.info(f"Starting {self.num_threads} worker threads")
+    self.running = True
+    self._worker_threads = [
+        threading.Thread(
+            target=self._worker_func, args=(i,), name=f"Worker-{self.process_index}-{i}"
+        )
+        for i in range(self.num_threads)
+    ]
+    for worker in self._worker_threads:
+        worker.start()
+
+    self.log.info("Starting consuming tasks from main queue")
+    self._task_queue_reader.start()
+
+
+
+ +
+ +
+ + + +

+ stop + + +

+
stop() -> None
+
+ +
+ +

Stop the worker threads.

+ +
+ Source code in dp3/task_processing/task_distributor.py +
def stop(self) -> None:
+    """Stop the worker threads."""
+    self.log.info("Waiting for worker threads to finish their current tasks ...")
+    # Thread for printing debug messages about worker status
+    threading.Thread(target=self._dbg_worker_status_print, daemon=True).start()
+
+    # Stop receiving new tasks from global queue
+    self._task_queue_reader.stop()
+
+    # Signalize stop to worker threads
+    self.running = False
+
+    # Wait until all workers stopped
+    for worker in self._worker_threads:
+        worker.join()
+
+    self._task_queue_reader.disconnect()
+    self._task_queue_writer.disconnect()
+
+    # Cleanup
+    self._worker_threads = []
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/task_processing/task_executor/index.html b/reference/task_processing/task_executor/index.html new file mode 100644 index 00000000..f5a2fa21 --- /dev/null +++ b/reference/task_processing/task_executor/index.html @@ -0,0 +1,2183 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + task_executor - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.task_processing.task_executor + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ TaskExecutor + + +

+
TaskExecutor(db: EntityDatabase, platform_config: PlatformConfig) -> None
+
+ +
+ + +

TaskExecutor manages updates of entity records, +which are being read from task queue (via parent +TaskDistributor)

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
db + EntityDatabase + +
+

Instance of EntityDatabase

+
+
+ required +
platform_config + PlatformConfig + +
+

Current platform configuration.

+
+
+ required +
+ + +
+ Source code in dp3/task_processing/task_executor.py +
28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
def __init__(
+    self,
+    db: EntityDatabase,
+    platform_config: PlatformConfig,
+) -> None:
+    # initialize task distribution
+
+    self.log = logging.getLogger("TaskExecutor")
+
+    # Get list of configured entity types
+    self.entity_types = list(platform_config.model_spec.entities.keys())
+    self.log.debug(f"Configured entity types: {self.entity_types}")
+
+    self.model_spec = platform_config.model_spec
+    self.db = db
+
+    # EventCountLogger
+    # - count number of events across multiple processes using shared counters in Redis
+    ecl = EventCountLogger(
+        platform_config.config.get("event_logging.groups"),
+        platform_config.config.get("event_logging.redis"),
+    )
+    self.elog = ecl.get_group("te") or DummyEventGroup()
+    self.elog_by_src = ecl.get_group("tasks_by_src") or DummyEventGroup()
+    # Print warning if some event group is not configured
+    not_configured_groups = []
+    if isinstance(self.elog, DummyEventGroup):
+        not_configured_groups.append("te")
+    if isinstance(self.elog_by_src, DummyEventGroup):
+        not_configured_groups.append("tasks_by_src")
+    if not_configured_groups:
+        self.log.warning(
+            "EventCountLogger: No configuration for event group(s) "
+            f"'{','.join(not_configured_groups)}' found, "
+            "such events will not be logged (check event_logging.yml)"
+        )
+
+    # Hooks
+    self._task_generic_hooks = TaskGenericHooksContainer(self.log)
+    self._task_entity_hooks = {}
+    self._task_attr_hooks = {}
+
+    for entity in self.model_spec.entities:
+        self._task_entity_hooks[entity] = TaskEntityHooksContainer(entity, self.log)
+
+    for entity, attr in self.model_spec.attributes:
+        attr_type = self.model_spec.attributes[entity, attr].t
+        self._task_attr_hooks[entity, attr] = TaskAttrHooksContainer(
+            entity, attr, attr_type, self.log
+        )
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ register_task_hook + + +

+
register_task_hook(hook_type: str, hook: Callable)
+
+ +
+ +

Registers one of available task hooks

+

See: TaskGenericHooksContainer +in task_hooks.py

+ +
+ Source code in dp3/task_processing/task_executor.py +
79
+80
+81
+82
+83
+84
+85
def register_task_hook(self, hook_type: str, hook: Callable):
+    """Registers one of available task hooks
+
+    See: [`TaskGenericHooksContainer`][dp3.task_processing.task_hooks.TaskGenericHooksContainer]
+    in `task_hooks.py`
+    """
+    self._task_generic_hooks.register(hook_type, hook)
+
+
+
+ +
+ +
+ + + +

+ register_entity_hook + + +

+
register_entity_hook(hook_type: str, hook: Callable, entity: str)
+
+ +
+ +

Registers one of available task entity hooks

+

See: TaskEntityHooksContainer +in task_hooks.py

+ +
+ Source code in dp3/task_processing/task_executor.py +
87
+88
+89
+90
+91
+92
+93
def register_entity_hook(self, hook_type: str, hook: Callable, entity: str):
+    """Registers one of available task entity hooks
+
+    See: [`TaskEntityHooksContainer`][dp3.task_processing.task_hooks.TaskEntityHooksContainer]
+    in `task_hooks.py`
+    """
+    self._task_entity_hooks[entity].register(hook_type, hook)
+
+
+
+ +
+ +
+ + + +

+ register_attr_hook + + +

+
register_attr_hook(hook_type: str, hook: Callable, entity: str, attr: str)
+
+ +
+ +

Registers one of available task attribute hooks

+

See: TaskAttrHooksContainer +in task_hooks.py

+ +
+ Source code in dp3/task_processing/task_executor.py +
def register_attr_hook(self, hook_type: str, hook: Callable, entity: str, attr: str):
+    """Registers one of available task attribute hooks
+
+    See: [`TaskAttrHooksContainer`][dp3.task_processing.task_hooks.TaskAttrHooksContainer]
+    in `task_hooks.py`
+    """
+    self._task_attr_hooks[entity, attr].register(hook_type, hook)
+
+
+
+ +
+ +
+ + + +

+ process_task + + +

+
process_task(task: DataPointTask) -> tuple[bool, list[DataPointTask]]
+
+ +
+ +

Main processing function - push datapoint values, running all registered hooks.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
task + DataPointTask + +
+

Task object to process.

+
+
+ required +
+ +

Returns:

+ + + + + + + + + + + + + + + + + +
TypeDescription
+ bool + +
+

True if a new record was created, False otherwise,

+
+
+ list[DataPointTask] + +
+

and a list of new tasks created by hooks

+
+
+ +
+ Source code in dp3/task_processing/task_executor.py +
def process_task(self, task: DataPointTask) -> tuple[bool, list[DataPointTask]]:
+    """
+    Main processing function - push datapoint values, running all registered hooks.
+
+    Args:
+        task: Task object to process.
+    Returns:
+        True if a new record was created, False otherwise,
+        and a list of new tasks created by hooks
+    """
+    self.log.debug(f"Received new task {task.etype}/{task.eid}, starting processing!")
+
+    new_tasks = []
+
+    # Run on_task_start hook
+    self._task_generic_hooks.run_on_start(task)
+
+    # Check existence of etype
+    if task.etype not in self.entity_types:
+        self.log.error(f"Task {task.etype}/{task.eid}: Unknown entity type!")
+        self.elog.log("task_processing_error")
+        return False, new_tasks
+
+    # Check existence of eid
+    try:
+        ekey_exists = self.db.ekey_exists(task.etype, task.eid)
+    except DatabaseError as e:
+        self.log.error(f"Task {task.etype}/{task.eid}: DB error: {e}")
+        self.elog.log("task_processing_error")
+        return False, new_tasks
+
+    new_entity = not ekey_exists
+    if new_entity:
+        # Run allow_entity_creation hook
+        if not self._task_entity_hooks[task.etype].run_allow_creation(task.eid, task):
+            self.log.debug(
+                f"Task {task.etype}/{task.eid}: hooks decided not to create new eid record"
+            )
+            return False, new_tasks
+
+        # Run on_entity_creation hook
+        new_tasks += self._task_entity_hooks[task.etype].run_on_creation(task.eid, task)
+
+    # Insert into database
+    try:
+        self.db.insert_datapoints(task.etype, task.eid, task.data_points, new_entity=new_entity)
+        self.log.debug(f"Task {task.etype}/{task.eid}: All changes written to DB")
+    except DatabaseError as e:
+        self.log.error(f"Task {task.etype}/{task.eid}: DB error: {e}")
+        self.elog.log("task_processing_error")
+        return False, new_tasks
+
+    # Run attribute hooks
+    for dp in task.data_points:
+        new_tasks += self._task_attr_hooks[dp.etype, dp.attr].run_on_new(dp.eid, dp)
+
+    # Log the processed task
+    self.elog.log("task_processed")
+    for dp in task.data_points:
+        if dp.src:
+            self.elog_by_src.log(dp.src)
+    if new_entity:
+        self.elog.log("record_created")
+
+    self.log.debug(f"Secondary modules created {len(new_tasks)} new tasks.")
+
+    return new_entity, new_tasks
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/task_processing/task_hooks/index.html b/reference/task_processing/task_hooks/index.html new file mode 100644 index 00000000..fdf98de4 --- /dev/null +++ b/reference/task_processing/task_hooks/index.html @@ -0,0 +1,1814 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + task_hooks - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.task_processing.task_hooks + + +

+ +
+ + + +
+ + + + + + + + +
+ + + +

+ TaskGenericHooksContainer + + +

+
TaskGenericHooksContainer(log: logging.Logger)
+
+ +
+ + +

Container for generic hooks

+

Possible hooks:

+
    +
  • on_task_start: receives Task, no return value requirements
  • +
+ + +
+ Source code in dp3/task_processing/task_hooks.py +
17
+18
+19
+20
def __init__(self, log: logging.Logger):
+    self.log = log.getChild("genericHooks")
+
+    self._on_start = []
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ TaskEntityHooksContainer + + +

+
TaskEntityHooksContainer(entity: str, log: logging.Logger)
+
+ +
+ + +

Container for entity hooks

+

Possible hooks:

+
    +
  • allow_entity_creation: receives eid and Task, may prevent entity record creation (by + returning False)
  • +
  • on_entity_creation: receives eid and Task, may return list of DataPointTasks
  • +
+ + +
+ Source code in dp3/task_processing/task_hooks.py +
49
+50
+51
+52
+53
+54
def __init__(self, entity: str, log: logging.Logger):
+    self.entity = entity
+    self.log = log.getChild(f"entityHooks.{entity}")
+
+    self._allow_creation = []
+    self._on_creation = []
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ +
+ + + +

+ TaskAttrHooksContainer + + +

+
TaskAttrHooksContainer(entity: str, attr: str, attr_type: AttrType, log: logging.Logger)
+
+ +
+ + +

Container for attribute hooks

+

Possible hooks:

+
    +
  • on_new_plain, on_new_observation, on_new_ts_chunk: + receives eid and DataPointBase, may return a list of DataPointTasks
  • +
+ + +
+ Source code in dp3/task_processing/task_hooks.py +
def __init__(self, entity: str, attr: str, attr_type: AttrType, log: logging.Logger):
+    self.entity = entity
+    self.attr = attr
+    self.log = log.getChild(f"attributeHooks.{entity}.{attr}")
+
+    if attr_type == AttrType.PLAIN:
+        self.on_new_hook_type = "on_new_plain"
+    elif attr_type == AttrType.OBSERVATIONS:
+        self.on_new_hook_type = "on_new_observation"
+    elif attr_type == AttrType.TIMESERIES:
+        self.on_new_hook_type = "on_new_ts_chunk"
+    else:
+        raise ValueError(f"Invalid attribute type '{attr_type}'")
+
+    self._on_new = []
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +
+ + + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/task_processing/task_queue/index.html b/reference/task_processing/task_queue/index.html new file mode 100644 index 00000000..b7214bfc --- /dev/null +++ b/reference/task_processing/task_queue/index.html @@ -0,0 +1,3086 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + task_queue - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.task_processing.task_queue + + +

+ +
+ +

Functions to work with the main task queue (RabbitMQ)

+

There are two queues for each worker process: +- "normal" queue for tasks added by other components, this has a limit of 100 + tasks. +- "priority" one for tasks added by workers themselves, this has no limit since + workers mustn't be stopped by waiting for the queue.

+

These queues are presented as a single one by this wrapper. +The TaskQueueReader first looks into the "priority" queue and only if there +is no task waiting, it reads the normal one.

+

Tasks are distributed to worker processes (and threads) by hash of the entity +which is to be modified. The destination queue is decided by the message source, +so each source must know how many worker processes are there.

+

Exchange and queues must be declared externally!

+

Related configuration keys and their defaults: +(should be part of global DP3 config files) +

rabbitmq:
+  host: localhost
+  port: 5672
+  virtual_host: /
+  username: guest
+  password: guest
+
+worker_processes: 1
+

+ + + +
+ + + + + + + + +
+ + + +

+ RobustAMQPConnection + + +

+
RobustAMQPConnection(rabbit_config: dict = None) -> None
+
+ +
+ + +

Common TaskQueue wrapper, handles connection to RabbitMQ server with automatic reconnection. +TaskQueueWriter and TaskQueueReader are derived from this.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
rabbit_config + dict + +
+

RabbitMQ connection parameters, dict with following keys (all optional): +host, port, virtual_host, username, password

+
+
+ None +
+ + +
+ Source code in dp3/task_processing/task_queue.py +
def __init__(self, rabbit_config: dict = None) -> None:
+    rabbit_config = {} if rabbit_config is None else rabbit_config
+    self.log = logging.getLogger("RobustAMQPConnection")
+    self.conn_params = {
+        "hostname": rabbit_config.get("host", "localhost"),
+        "port": int(rabbit_config.get("port", 5672)),
+        "virtual_host": rabbit_config.get("virtual_host", "/"),
+        "username": rabbit_config.get("username", "guest"),
+        "password": rabbit_config.get("password", "guest"),
+    }
+    self.connection = None
+    self.channel = None
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ connect + + +

+
connect() -> None
+
+ +
+ +

Create a connection (or reconnect after error).

+

If connection can't be established, try it again indefinitely.

+ +
+ Source code in dp3/task_processing/task_queue.py +
def connect(self) -> None:
+    """Create a connection (or reconnect after error).
+
+    If connection can't be established, try it again indefinitely.
+    """
+    if self.connection:
+        self.connection.close()
+    attempts = 0
+    while True:
+        attempts += 1
+        try:
+            self.connection = amqpstorm.Connection(**self.conn_params)
+            self.log.debug(
+                "AMQP connection created, server: "
+                "'{hostname}:{port}/{virtual_host}'".format_map(self.conn_params)
+            )
+            if attempts > 1:
+                # This was a repeated attempt, print success message with ERROR level
+                self.log.error("... it's OK now, we're successfully connected!")
+
+            self.channel = self.connection.channel()
+            self.channel.confirm_deliveries()
+            self.channel.basic.qos(PREFETCH_COUNT)
+            break
+        except amqpstorm.AMQPError as e:
+            sleep_time = RECONNECT_DELAYS[min(attempts, len(RECONNECT_DELAYS)) - 1]
+            self.log.error(
+                f"RabbitMQ connection error (will try to reconnect in {sleep_time}s): {e}"
+            )
+            time.sleep(sleep_time)
+        except KeyboardInterrupt:
+            break
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ TaskQueueWriter + + +

+
TaskQueueWriter(app_name: str, workers: int = 1, rabbit_config: dict = None, exchange: str = None, priority_exchange: str = None, parent_logger: logging.Logger = None) -> None
+
+ +
+

+ Bases: RobustAMQPConnection

+ + +

Writes tasks into main Task Queue

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
app_name + str + +
+

DP3 application name (used as prefix for RMQ queues and exchanges)

+
+
+ required +
workers + int + +
+

Number of worker processes in the system

+
+
+ 1 +
rabbit_config + dict + +
+

RabbitMQ connection parameters, dict with following keys (all optional): +host, port, virtual_host, username, password

+
+
+ None +
exchange + str + +
+

Name of the exchange to write tasks to +(default: "<app-name>-main-task-exchange")

+
+
+ None +
priority_exchange + str + +
+

Name of the exchange to write priority tasks to +(default: "<app-name>-priority-task-exchange")

+
+
+ None +
parent_logger + logging.Logger + +
+

Logger to inherit prefix from.

+
+
+ None +
+ + +
+ Source code in dp3/task_processing/task_queue.py +
def __init__(
+    self,
+    app_name: str,
+    workers: int = 1,
+    rabbit_config: dict = None,
+    exchange: str = None,
+    priority_exchange: str = None,
+    parent_logger: logging.Logger = None,
+) -> None:
+    rabbit_config = {} if rabbit_config is None else rabbit_config
+    assert isinstance(workers, int) and workers >= 1, "count of workers must be positive number"
+    assert isinstance(exchange, str) or exchange is None, "exchange argument has to be string!"
+    assert (
+        isinstance(priority_exchange, str) or priority_exchange is None
+    ), "priority_exchange has to be string"
+
+    super().__init__(rabbit_config)
+
+    if parent_logger is not None:
+        self.log = parent_logger.getChild("TaskQueueWriter")
+    else:
+        self.log = logging.getLogger("TaskQueueWriter")
+
+    if exchange is None:
+        exchange = DEFAULT_EXCHANGE.format(app_name)
+    if priority_exchange is None:
+        priority_exchange = DEFAULT_PRIORITY_EXCHANGE.format(app_name)
+
+    self.workers = workers
+    self.exchange = exchange
+    self.exchange_pri = priority_exchange
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ check + + +

+
check() -> bool
+
+ +
+ +

Check that needed exchanges are declared, return True or raise RuntimeError.

+

If needed exchanges are not declared, reconnect and try again. (max 5 times)

+ +
+ Source code in dp3/task_processing/task_queue.py +
def check(self) -> bool:
+    """
+    Check that needed exchanges are declared, return True or raise RuntimeError.
+
+    If needed exchanges are not declared, reconnect and try again. (max 5 times)
+    """
+    for attempt, sleep_time in enumerate(RECONNECT_DELAYS):
+        if self.check_exchange_existence(self.exchange) and self.check_exchange_existence(
+            self.exchange_pri
+        ):
+            return True
+        self.log.warning(
+            "RabbitMQ exchange configuration doesn't match (attempt %d of %d, retrying in %ds)",
+            attempt + 1,
+            len(RECONNECT_DELAYS),
+            sleep_time,
+        )
+        time.sleep(sleep_time)
+        self.disconnect()
+        self.connect()
+    if not self.check_exchange_existence(self.exchange):
+        raise ExchangeNotDeclared(self.exchange)
+    if not self.check_exchange_existence(self.exchange_pri):
+        raise ExchangeNotDeclared(self.exchange_pri)
+    return True
+
+
+
+ +
+ +
+ + + +

+ broadcast_task + + +

+
broadcast_task(task: Task, priority: bool = False) -> None
+
+ +
+ +

Broadcast task to all workers

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
task + Task + +
+

prepared task

+
+
+ required +
priority + bool + +
+

if true, the task is placed into priority queue +(should only be used internally by workers)

+
+
+ False +
+ +
+ Source code in dp3/task_processing/task_queue.py +
def broadcast_task(self, task: Task, priority: bool = False) -> None:
+    """
+    Broadcast task to all workers
+
+    Args:
+        task: prepared task
+        priority: if true, the task is placed into priority queue
+            (should only be used internally by workers)
+    """
+    if not self.channel:
+        self.connect()
+
+    self.log.debug(f"Received new broadcast task: {task}")
+
+    body = task.as_message()
+    exchange = self.exchange_pri if priority else self.exchange
+
+    for routing_key in range(self.workers):
+        self._send_message(routing_key, exchange, body)
+
+
+
+ +
+ +
+ + + +

+ put_task + + +

+
put_task(task: Task, priority: bool = False) -> None
+
+ +
+ +

Put task (update_request) to the queue of corresponding worker

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
task + Task + +
+

prepared task

+
+
+ required +
priority + bool + +
+

if true, the task is placed into priority queue +(should only be used internally by workers)

+
+
+ False +
+ +
+ Source code in dp3/task_processing/task_queue.py +
def put_task(self, task: Task, priority: bool = False) -> None:
+    """
+    Put task (update_request) to the queue of corresponding worker
+
+    Args:
+        task: prepared task
+        priority: if true, the task is placed into priority queue
+            (should only be used internally by workers)
+    """
+    if not self.channel:
+        self.connect()
+
+    self.log.debug(f"Received new task: {task}")
+
+    # Prepare routing key
+    body = task.as_message()
+    key = task.routing_key()
+    routing_key = HASH(key) % self.workers  # index of the worker to send the task to
+
+    exchange = self.exchange_pri if priority else self.exchange
+    self._send_message(routing_key, exchange, body)
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ TaskQueueReader + + +

+
TaskQueueReader(callback: Callable, parse_task: Callable[[str], Task], app_name: str, worker_index: int = 0, rabbit_config: dict = None, queue: str = None, priority_queue: str = None, parent_logger: logging.Logger = None) -> None
+
+ +
+

+ Bases: RobustAMQPConnection

+ + +

TaskQueueReader consumes messages from two RabbitMQ queues +(normal and priority one for given worker) +and passes them to the given callback function.

+

Tasks from the priority queue are passed before the normal ones.

+

Each received message must be acknowledged by calling .ack(msg_tag).

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
callback + Callable + +
+

Function called when a message is received, prototype: func(tag, Task)

+
+
+ required +
parse_task + Callable[[str], Task] + +
+

Function called to parse message body into a task, prototype: func(body) -> Task

+
+
+ required +
app_name + str + +
+

DP3 application name (used as prefix for RMQ queues and exchanges)

+
+
+ required +
worker_index + int + +
+

index of this worker +(filled into DEFAULT_QUEUE string using .format() method)

+
+
+ 0 +
rabbit_config + dict + +
+

RabbitMQ connection parameters, dict with following keys +(all optional): host, port, virtual_host, username, password

+
+
+ None +
queue + str + +
+

Name of RabbitMQ queue to read from (default: "<app-name>-worker-<index>")

+
+
+ None +
priority_queue + str + +
+

Name of RabbitMQ queue to read from (priority messages) +(default: "<app-name>-worker-<index>-pri")

+
+
+ None +
parent_logger + logging.Logger + +
+

Logger to inherit prefix from.

+
+
+ None +
+ + +
+ Source code in dp3/task_processing/task_queue.py +
def __init__(
+    self,
+    callback: Callable,
+    parse_task: Callable[[str], Task],
+    app_name: str,
+    worker_index: int = 0,
+    rabbit_config: dict = None,
+    queue: str = None,
+    priority_queue: str = None,
+    parent_logger: logging.Logger = None,
+) -> None:
+    rabbit_config = {} if rabbit_config is None else rabbit_config
+    assert callable(callback), "callback must be callable object"
+    assert (
+        isinstance(worker_index, int) and worker_index >= 0
+    ), "worker_index must be positive number"
+    assert isinstance(queue, str) or queue is None, "queue must be string"
+    assert (
+        isinstance(priority_queue, str) or priority_queue is None
+    ), "priority_queue must be string"
+
+    super().__init__(rabbit_config)
+
+    if parent_logger is not None:
+        self.log = parent_logger.getChild("TaskQueueReader")
+    else:
+        self.log = logging.getLogger("TaskQueueReader")
+
+    self.callback = callback
+    self.parse_task = parse_task
+
+    if queue is None:
+        queue = DEFAULT_QUEUE.format(app_name, worker_index)
+    if priority_queue is None:
+        priority_queue = DEFAULT_PRIORITY_QUEUE.format(app_name, worker_index)
+    self.queue_name = queue
+    self.priority_queue_name = priority_queue
+
+    self.running = False
+
+    self._consuming_thread = None
+    self._processing_thread = None
+
+    # Receive messages into 2 temporary queues
+    # (max length should be equal to prefetch_count set in RabbitMQReader)
+    self.cache = collections.deque()
+    self.cache_pri = collections.deque()
+    self.cache_full = threading.Event()  # signalize there's something in the cache
+
+
+ + + +
+ + + + + + + + + +
+ + + +

+ start + + +

+
start() -> None
+
+ +
+ +

Start receiving tasks.

+ +
+ Source code in dp3/task_processing/task_queue.py +
def start(self) -> None:
+    """Start receiving tasks."""
+    if self.running:
+        raise RuntimeError("Already running")
+
+    if not self.connection:
+        self.connect()
+
+    self.log.info("Starting TaskQueueReader")
+
+    # Start thread for message consuming from server
+    self._consuming_thread = threading.Thread(None, self._consuming_thread_func)
+    self._consuming_thread.start()
+
+    # Start thread for message processing and passing to user's callback
+    self.running = True
+    self._processing_thread = threading.Thread(None, self._msg_processing_thread_func)
+    self._processing_thread.start()
+
+
+
+ +
+ +
+ + + +

+ stop + + +

+
stop() -> None
+
+ +
+ +

Stop receiving tasks.

+ +
+ Source code in dp3/task_processing/task_queue.py +
def stop(self) -> None:
+    """Stop receiving tasks."""
+    if not self.running:
+        raise RuntimeError("Not running")
+
+    self._stop_consuming_thread()
+    self._stop_processing_thread()
+    self.log.info("TaskQueueReader stopped")
+
+
+
+ +
+ +
+ + + +

+ check + + +

+
check() -> bool
+
+ +
+ +

Check that needed queues are declared, return True or raise RuntimeError.

+

If needed queues are not declared, reconnect and try again. (max 5 times)

+ +
+ Source code in dp3/task_processing/task_queue.py +
def check(self) -> bool:
+    """
+    Check that needed queues are declared, return True or raise RuntimeError.
+
+    If needed queues are not declared, reconnect and try again. (max 5 times)
+    """
+
+    for attempt, sleep_time in enumerate(RECONNECT_DELAYS):
+        if self.check_queue_existence(self.queue_name) and self.check_queue_existence(
+            self.priority_queue_name
+        ):
+            return True
+        self.log.warning(
+            "RabbitMQ queue configuration doesn't match (attempt %d of %d, retrying in %ds)",
+            attempt + 1,
+            len(RECONNECT_DELAYS),
+            sleep_time,
+        )
+        time.sleep(sleep_time)
+        self.disconnect()
+        self.connect()
+    if not self.check_queue_existence(self.queue_name):
+        raise QueueNotDeclared(self.queue_name)
+    if not self.check_queue_existence(self.priority_queue_name):
+        raise QueueNotDeclared(self.priority_queue_name)
+    return True
+
+
+
+ +
+ +
+ + + +

+ ack + + +

+
ack(msg_tag: Any)
+
+ +
+ +

Acknowledge processing of the message/task

+ +

Parameters:

+ + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
msg_tag + Any + +
+

Message tag received as the first param of the callback function.

+
+
+ required +
+ +
+ Source code in dp3/task_processing/task_queue.py +
def ack(self, msg_tag: Any):
+    """Acknowledge processing of the message/task
+    Args:
+        msg_tag: Message tag received as the first param of the callback function.
+    """
+    self.channel.basic.ack(delivery_tag=msg_tag)
+
+
+
+ +
+ + + +
+ +
+ +
+ + +
+ + + +

+ HASH + + +

+
HASH(key: str) -> int
+
+ +
+ +

Hash function used to distribute tasks to worker processes.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
key + str + +
+

to be hashed

+
+
+ required +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ int + +
+

last 4 bytes of MD5

+
+
+ +
+ Source code in dp3/task_processing/task_queue.py +
56
+57
+58
+59
+60
+61
+62
+63
def HASH(key: str) -> int:
+    """Hash function used to distribute tasks to worker processes.
+    Args:
+        key: to be hashed
+    Returns:
+        last 4 bytes of MD5
+    """
+    return int(hashlib.md5(key.encode("utf8")).hexdigest()[-4:], 16)
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/reference/worker/index.html b/reference/worker/index.html new file mode 100644 index 00000000..617efea6 --- /dev/null +++ b/reference/worker/index.html @@ -0,0 +1,2172 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + worker - DP3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +
+ + + +

+ dp3.worker + + +

+ +
+ +

Code of the main worker process.

+

Don't run directly. Import and run the main() function.

+ + + +
+ + + + + + + + + +
+ + + +

+ load_modules + + +

+
load_modules(modules_dir: str, enabled_modules: dict, log: logging.Logger, registrar: CallbackRegistrar, platform_config: PlatformConfig) -> list
+
+ +
+ +

Load plug-in modules

+

Import Python modules with names in 'enabled_modules' from 'modules_dir' directory +and return all found classes derived from BaseModule class.

+ +
+ Source code in dp3/worker.py +
36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
def load_modules(
+    modules_dir: str,
+    enabled_modules: dict,
+    log: logging.Logger,
+    registrar: CallbackRegistrar,
+    platform_config: PlatformConfig,
+) -> list:
+    """Load plug-in modules
+
+    Import Python modules with names in 'enabled_modules' from 'modules_dir' directory
+    and return all found classes derived from BaseModule class.
+    """
+    # Get list of all modules available in given folder
+    # [:-3] is for removing '.py' suffix from module filenames
+    available_modules = []
+    for item in os.scandir(modules_dir):
+        # A module can be a Python file or a Python package
+        # (i.e. a directory with "__init__.py" file)
+        if item.is_file() and item.name.endswith(".py"):
+            available_modules.append(item.name[:-3])  # name without .py
+        if item.is_dir() and "__init__.py" in os.listdir(os.path.join(modules_dir, item.name)):
+            available_modules.append(item.name)
+
+    log.debug(f"Available modules: {', '.join(available_modules)}")
+    log.debug(f"Enabled modules: {', '.join(enabled_modules)}")
+
+    # Check if all desired modules are in modules folder
+    missing_modules = set(enabled_modules) - set(available_modules)
+    if missing_modules:
+        log.fatal(
+            "Some of desired modules are not available (not in modules folder), "
+            f"specifically: {missing_modules}"
+        )
+        sys.exit(2)
+
+    # Do imports of desired modules from 'modules' folder
+    # (rewrite sys.path to modules_dir, import all modules and rewrite it back)
+    log.debug("Importing modules ...")
+    sys.path.insert(0, modules_dir)
+    imported_modules: list[tuple[str, str, type[BaseModule]]] = [
+        (module_name, name, obj)
+        for module_name in enabled_modules
+        for name, obj in inspect.getmembers(import_module(module_name))
+        if inspect.isclass(obj) and BaseModule in obj.__bases__
+    ]
+    del sys.path[0]
+
+    # Final list will contain main classes from all desired modules,
+    # which has BaseModule as parent
+    modules_main_objects = []
+    for module_name, _, obj in imported_modules:
+        # Append instance of module class (obj is class --> obj() is instance)
+        # --> call init, which registers handler
+        module_config = platform_config.config.get(f"modules.{module_name}", {})
+        modules_main_objects.append(obj(platform_config, module_config, registrar))
+        log.info(f"Module loaded: {module_name}:{obj.__name__}")
+
+    return modules_main_objects
+
+
+
+ +
+ +
+ + + +

+ main + + +

+
main(app_name: str, config_dir: str, process_index: int, verbose: bool) -> None
+
+ +
+ +

Run worker process.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
app_name + str + +
+

Name of the application to distinct it from other DP3-based apps. +For example, it's used as a prefix for RabbitMQ queue names.

+
+
+ required +
config_dir + str + +
+

Path to directory containing configuration files.

+
+
+ required +
process_index + int + +
+

Index of this worker process. For each application +there must be N processes running simultaneously, each started with a +unique index (from 0 to N-1). N is read from configuration +('worker_processes' in 'processing_core.yml').

+
+
+ required +
verbose + bool + +
+

More verbose output (set log level to DEBUG).

+
+
+ required +
+ +
+ Source code in dp3/worker.py +
def main(app_name: str, config_dir: str, process_index: int, verbose: bool) -> None:
+    """
+    Run worker process.
+    Args:
+        app_name: Name of the application to distinct it from other DP3-based apps.
+            For example, it's used as a prefix for RabbitMQ queue names.
+        config_dir: Path to directory containing configuration files.
+        process_index: Index of this worker process. For each application
+            there must be N processes running simultaneously, each started with a
+            unique index (from 0 to N-1). N is read from configuration
+            ('worker_processes' in 'processing_core.yml').
+        verbose: More verbose output (set log level to DEBUG).
+    """
+    ##############################################
+    # Initialize logging mechanism
+    LOGFORMAT = "%(asctime)-15s,%(threadName)s,%(name)s,[%(levelname)s] %(message)s"
+    LOGDATEFORMAT = "%Y-%m-%dT%H:%M:%S"
+
+    logging.basicConfig(
+        level=logging.DEBUG if verbose else logging.INFO, format=LOGFORMAT, datefmt=LOGDATEFORMAT
+    )
+    log = logging.getLogger()
+
+    # Disable INFO and DEBUG messages from some libraries
+    logging.getLogger("requests").setLevel(logging.WARNING)
+    logging.getLogger("urllib3").setLevel(logging.WARNING)
+    logging.getLogger("amqpstorm").setLevel(logging.WARNING)
+
+    ##############################################
+    # Load configuration
+    config_base_path = os.path.abspath(config_dir)
+    log.debug(f"Loading config directory {config_base_path}")
+
+    # Whole configuration should be loaded
+    config = read_config_dir(config_base_path, recursive=True)
+    try:
+        model_spec = ModelSpec(config.get("db_entities"))
+    except ValidationError as e:
+        log.fatal("Invalid model specification: %s", e)
+        sys.exit(2)
+
+    # Print whole attribute specification
+    log.debug(model_spec)
+
+    num_processes = config.get("processing_core.worker_processes")
+
+    platform_config = PlatformConfig(
+        app_name=app_name,
+        config_base_path=config_base_path,
+        config=config,
+        model_spec=model_spec,
+        process_index=process_index,
+        num_processes=num_processes,
+    )
+    ##############################################
+    # Create instances of core components
+    log.info(f"***** {app_name} worker {process_index} of {num_processes} start *****")
+
+    db = EntityDatabase(config.get("database"), model_spec)
+
+    global_scheduler = scheduler.Scheduler()
+    task_executor = TaskExecutor(db, platform_config)
+    snap_shooter = SnapShooter(
+        db,
+        TaskQueueWriter(app_name, num_processes, config.get("processing_core.msg_broker")),
+        task_executor,
+        platform_config,
+        global_scheduler,
+    )
+    registrar = CallbackRegistrar(global_scheduler, task_executor, snap_shooter)
+
+    HistoryManager(db, platform_config, registrar)
+    Telemetry(db, platform_config, registrar)
+
+    # Lock used to control when the program stops.
+    daemon_stop_lock = threading.Lock()
+    daemon_stop_lock.acquire()
+
+    # Signal handler releasing the lock on SIGINT or SIGTERM
+    def sigint_handler(signum, frame):
+        log.debug(
+            "Signal {} received, stopping worker".format(
+                {signal.SIGINT: "SIGINT", signal.SIGTERM: "SIGTERM"}.get(signum, signum)
+            )
+        )
+        daemon_stop_lock.release()
+
+    signal.signal(signal.SIGINT, sigint_handler)
+    signal.signal(signal.SIGTERM, sigint_handler)
+    signal.signal(signal.SIGABRT, sigint_handler)
+
+    task_distributor = TaskDistributor(task_executor, platform_config, registrar, daemon_stop_lock)
+
+    control = Control(platform_config)
+    control.set_action_handler(ControlAction.make_snapshots, snap_shooter.make_snapshots)
+
+    ##############################################
+    # Load all plug-in modules
+
+    os.path.dirname(__file__)
+    custom_modules_dir = config.get("processing_core.modules_dir")
+    custom_modules_dir = os.path.abspath(os.path.join(config_base_path, custom_modules_dir))
+
+    module_list = load_modules(
+        custom_modules_dir,
+        config.get("processing_core.enabled_modules"),
+        log,
+        registrar,
+        platform_config,
+    )
+
+    ################################################
+    # Initialization completed, run ...
+
+    # Run update manager thread
+    log.info("***** Initialization completed, starting all modules *****")
+
+    # Run modules that have their own threads (TODO: there are no such modules, should be kept?)
+    # (if they don't, the start() should do nothing)
+    for module in module_list:
+        module.start()
+
+    # start TaskDistributor (which starts TaskExecutors in several worker threads)
+    task_distributor.start()
+
+    # Run scheduler
+    global_scheduler.start()
+
+    # Run SnapShooter
+    snap_shooter.start()
+
+    control.start()
+
+    # Wait until someone wants to stop the program by releasing this Lock.
+    # It may be a user by pressing Ctrl-C or some program module.
+    # (try to acquire the lock again,
+    # effectively waiting until it's released by signal handler or another thread)
+    if os.name == "nt":
+        # This is needed on Windows in order to catch Ctrl-C, which doesn't break the waiting.
+        while not daemon_stop_lock.acquire(timeout=1):
+            pass
+    else:
+        daemon_stop_lock.acquire()
+
+    ################################################
+    # Finalization & cleanup
+    # Set signal handlers back to their defaults,
+    # so the second Ctrl-C closes the program immediately
+    signal.signal(signal.SIGINT, signal.SIG_DFL)
+    signal.signal(signal.SIGTERM, signal.SIG_DFL)
+    signal.signal(signal.SIGABRT, signal.SIG_DFL)
+
+    log.info("Stopping running components ...")
+    control.stop()
+    snap_shooter.stop()
+    global_scheduler.stop()
+    task_distributor.stop()
+    for module in module_list:
+        module.stop()
+
+    log.info("***** Finished, main thread exiting. *****")
+    logging.shutdown()
+
+
+
+ +
+ + + +
+ +
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 00000000..ea1ef1de --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Dynamic Profile Processing Platform (DP\u00b3)","text":"

DP\u00b3 is a platform helps to keep a database of information (attributes) about individual entities (designed for IP addresses and other network identifiers, but may be anything), when the data constantly changes in time.

DP\u00b3 doesn't do much by itself, it must be supplemented by application-specific modules providing and processing data.

This is a basis of CESNET's \"Asset Discovery Classification and Tagging\" (ADiCT) project, focused on discovery and classification of network devices, but the platform itself is general and should be usable for any kind of data.

For an introduction about how it works, see please check out the architecture, data-model and database config pages.

Then you should be able to create a DP\u00b3 app using the provided setup utility as described in the install page and start tinkering!

"},{"location":"#repository-structure","title":"Repository structure","text":"
  • dp3 - Python package containing code of the processing core and the API
  • config - default/example configuration
  • install - deployment configuration
"},{"location":"api/","title":"API","text":"

DP\u00b3's has HTTP API which you can use to post datapoints and to read data stored in DP\u00b3. As the API is made using FastAPI, there is also an interactive documentation available at /docs endpoint.

There are several API endpoints:

  • GET /: check if API is running (just returns It works! message)
  • POST /datapoints: insert datapoints into DP\u00b3
  • GET /entity/<entity_type>: list current snapshots of all entities of given type
  • GET /entity/<entity_type>/<entity_id>: get data of entity with given entity id
  • GET /entity/<entity_type>/<entity_id>/get/<attr_id>: get attribute value
  • GET /entity/<entity_type>/<entity_id>/set/<attr_id>: set attribute value
  • GET /entities: list entity configuration
  • GET /control/<action>: send a pre-defined action into execution queue.
"},{"location":"api/#index","title":"Index","text":"

Health check.

"},{"location":"api/#request","title":"Request","text":"

GET /

"},{"location":"api/#response","title":"Response","text":"

200 OK:

{ \"detail\": \"It works!\" }

"},{"location":"api/#insert-datapoints","title":"Insert datapoints","text":""},{"location":"api/#request_1","title":"Request","text":"

POST /datapoints

All data are written to DP\u00b3 in the form of datapoints. A datapoint sets a value of a given attribute of given entity.

It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).

Payload to this endpoint is JSON array of datapoints. For example:

[\n{ DATAPOINT1 },\n{ DATAPOINT2 }\n]\n
Key Description Data-type Required? Plain Observations Timeseries type Entity type string mandatory \u2714 \u2714 \u2714 id Entity identification string mandatory \u2714 \u2714 \u2714 attr Attribute name string mandatory \u2714 \u2714 \u2714 v The value to set, depends on attr. type and data-type, see below -- mandatory \u2714 \u2714 \u2714 t1 Start time of the observation interval string (RFC 3339 format) mandatory -- \u2714 \u2714 t2 End time of the observation interval string (RFC 3339 format) optional, default=t1 -- \u2714 \u2714 c Confidence float (0.0-1.0) optional, default=1.0 -- \u2714 \u2714 src Identification of the information source string optional, default=\"\" \u2714 \u2714 \u2714

More details depends on the particular type of the attribute.

"},{"location":"api/#examples-of-datapoints","title":"Examples of datapoints","text":""},{"location":"api/#plain","title":"Plain","text":"
{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"note\",\n\"v\": \"My home router\",\n\"src\": \"web_gui\"\n}\n
"},{"location":"api/#observations","title":"Observations","text":"
{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"open_ports\",\n\"v\": [22, 80, 443],\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:10:00\",\n\"src\": \"open_ports_module\"\n}\n
"},{"location":"api/#timeseries","title":"Timeseries","text":"

regular:

{\n...\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:20:00\", // assuming time_step = 5 min\n\"v\": {\n\"a\": [1, 3, 0, 2]\n}\n}\n

irregular: timestamps must always be present

{\n...\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:05:00\",\n\"v\": {\n\"time\": [\"2022-08-01T12:00:00\", \"2022-08-01T12:01:10\", \"2022-08-01T12:01:15\", \"2022-08-01T12:03:30\"],\n\"x\": [0.5, 0.8, 1.2, 0.7],\n\"y\": [-1, 3, 0, 0]\n}\n}\n

irregular_interval:

{\n...\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:05:00\",\n\"v\": {\n\"time_first\": [\"2022-08-01T12:00:00\", \"2022-08-01T12:01:10\", \"2022-08-01T12:01:15\", \"2022-08-01T12:03:30\"],\n\"time_last\": [\"2022-08-01T12:01:00\", \"2022-08-01T12:01:15\", \"2022-08-01T12:03:00\", \"2022-08-01T12:03:40\"],\n\"x\": [0.5, 0.8, 1.2, 0.7],\n\"y\": [-1, 3, 0, 0]\n}\n}\n
"},{"location":"api/#relations","title":"Relations","text":"

Can be represented using both plain attributes and observations. The difference will be only in time specification. Two examples using observations:

no data - link<mac>: just the eid is sent

{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"mac_addrs\",\n\"v\": \"AA:AA:AA:AA:AA\",\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:10:00\"\n}\n

with additional data - link<ip, int>: The eid and the data are sent as a dictionary.

{\n\"type\": \"ip\",\n\"id\": \"192.168.0.1\",\n\"attr\": \"ip_dep\",\n\"v\": {\"eid\": \"192.168.0.2\", \"data\": 22},\n\"t1\": \"2022-08-01T12:00:00\",\n\"t2\": \"2022-08-01T12:10:00\"\n}\n
"},{"location":"api/#response_1","title":"Response","text":"

200 OK:

Success\n

400 Bad request:

Returns some validation error message, for example:

1 validation error for DataPointObservations_some_field\nv -> some_embedded_dict_field\n  field required (type=value_error.missing)\n
"},{"location":"api/#list-entities","title":"List entities","text":"

List latest snapshots of all ids present in database under entity.

Contains only latest snapshot.

Uses pagination.

"},{"location":"api/#request_2","title":"Request","text":"

GET /entity/<entity_type>

Optional query parameters:

  • skip: how many entities to skip (default: 0)
  • limit: how many entities to return (default: 20)
"},{"location":"api/#response_2","title":"Response","text":"
{\n\"time_created\": \"2023-07-04T12:10:38.827Z\",\n\"data\": [\n{}\n]\n}\n
"},{"location":"api/#get-eid-data","title":"Get Eid data","text":"

Get data of entity's eid.

Contains all snapshots and master record. Snapshots are ordered by ascending creation time.

"},{"location":"api/#request_3","title":"Request","text":"

GET /entity/<entity_type>/<entity_id>

Optional query parameters:

  • date_from: date-time string
  • date_to: date-time string
"},{"location":"api/#response_3","title":"Response","text":"
{\n\"empty\": true,\n\"master_record\": {},\n\"snapshots\": [\n{}\n]\n}\n
"},{"location":"api/#get-attr-value","title":"Get attr value","text":"

Get attribute value

Value is either of:

  • current value: in case of plain attribute
  • current value and history: in case of observation attribute
  • history: in case of timeseries attribute
"},{"location":"api/#request_4","title":"Request","text":"

GET /entity/<entity_type>/<entity_id>/get/<attr_id>

Optional query parameters:

  • date_from: date-time string
  • date_to: date-time string
"},{"location":"api/#response_4","title":"Response","text":"
{\n\"attr_type\": 1,\n\"current_value\": \"string\",\n\"history\": []\n}\n
"},{"location":"api/#set-attr-value","title":"Set attr value","text":"

Set current value of attribute

Internally just creates datapoint for specified attribute and value.

This endpoint is meant for editable plain attributes -- for direct user edit on DP3 web UI.

"},{"location":"api/#request_5","title":"Request","text":"

POST /entity/<entity_type>/<entity_id>/set/<attr_id>

Required request body:

{\n\"value\": \"string\"\n}\n
"},{"location":"api/#response_5","title":"Response","text":"
{\n\"detail\": \"OK\"\n}\n
"},{"location":"api/#entities","title":"Entities","text":"

List entities

Returns dictionary containing all entities configured -- their simplified configuration and current state information.

"},{"location":"api/#request_6","title":"Request","text":"

GET /entities

"},{"location":"api/#response_6","title":"Response","text":"
{\n\"<entity_id>\": {\n\"id\": \"<entity_id>\",\n\"name\": \"<entity_spec.name>\",\n\"attribs\": \"<MODEL_SPEC.attribs(e_id)>\",\n\"eid_estimate_count\": \"<DB.estimate_count_eids(e_id)>\"\n},\n...\n}\n
"},{"location":"api/#control","title":"Control","text":"

Execute Action - Sends the given action into execution queue.

You can see the enabled actions in /config/control.yml, available are:

  • make_snapshots - Makes an out-of-order snapshot of all entities
"},{"location":"api/#request_7","title":"Request","text":"

GET /control/<action>

"},{"location":"api/#response_7","title":"Response","text":"
{\n\"detail\": \"OK\"\n}\n
"},{"location":"architecture/","title":"Architecture","text":"

DP\u00b3 is generic platform for data processing. It's currently used in systems for management of network devices in CESNET, but during development we focused on making DP\u00b3 as universal as possible.

This page describes the high-level architecture of DP\u00b3 and the individual components.

"},{"location":"architecture/#data-points","title":"Data-points","text":"

The base unit of data that DP\u00b3 uses is called a data-point, which looks like this:

{\n\"type\": \"ip\", // (1)!\n\"id\": \"192.168.0.1\", // (2)!\n\"attr\": \"open_ports\", // (3)!\n\"v\": [22, 80, 443], // (4)!\n\"t1\": \"2022-08-01T12:00:00\", // (5)!\n\"t2\": \"2022-08-01T12:10:00\",\n\"src\": \"open_ports_module\" // (6)!\n}\n
  1. A data-point's value belongs to a specific (user-defined) entity type, declared by the type.
  2. The exact entity is using its entity id in id.
  3. Each entity has multiple defined attributes, the attr field specifies the attribute of the data-point.
  4. Finally, the data-point's value is sent in the v field.
  5. Data-point validity interval is defined using the t1 and t2 field.
  6. To easily determine the data source of this data-point, you can optionally provide an identifier using the src field.

This example shows an example of an observations data-point (given it has a validity interval), to learn more about the different types of data-points, please see the API documentation.

"},{"location":"architecture/#platform-architecture","title":"Platform Architecture","text":"DP\u00b3 architecture

The DP\u00b3 architecture as shown in the figure above consists of several components, where the DP\u00b3 provided components are shown in blue:

  • The HTTP API (built with Fast API) validates incoming data-points and sends them for processing to the task distribution queues. It also provides access to the database for web or scripts.
  • The task distribution is done using RabbitMQ queues, which distribute tasks between workers.
  • The main code of the platform runs in parallel worker processes. In the worker processes is a processing core, which performs all updates and communicates with core modules and application-specific secondary modules when appropriate.
  • Both the HTTP API and worker processes use the database API to access the entity database, currently implemented in MongoDB.

The application-specific components, shown in yellow-orange, are as following:

  • The entity configuration via yml files determines the entities and their attributes, together with the specifics of platform behavior on these entities. For details of entity configuration, please see the database entities configuration page.
  • The distinction between primary and secondary modules is such that primary modules send data-points into the system using the HTTP API, while secondary modules react to the data present in the system, e.g.: altering the data-flow in an application-specific manner, deriving additional data based on incoming data-points or performing data correlation on entity snapshots. For primary module implementation, the API documentation may be useful, also feel free to check out the dummy_sender script in /scripts/dummy_sender.py. A comprehensive secondary module API documentation is under construction, for the time being, refer to the CallbackRegistrar code reference or check out the test modules in /modules/ or /tests/modules/.

  • The final remaining component is the web interface, which is ultimately application-specific. A generic web interface, or a set of generic components is a planned part of DP\u00b3, but is yet to be implemented. The API provides a variety of endpoints which should enable you to create any view of the data you may require.

"},{"location":"architecture/#data-flow","title":"Data flow","text":"

This section describes the data flow within the platform.

DP\u00b3 Data flow

The above figure shows a zoomed in view of the worker-process from the architecture figure. Incoming Tasks, which carry data-points from the API, are passed to secondary module callbacks configured on new data point, or around entity creation. These modules may create additional data points or perform any other action. When all registered callbacks are processed, the resulting data is written to two collections: The data-point (DP) history collection, where the raw data-points are stored until archivation, and the profile history collection, where a document is stored for each entity id with the relevant history. You can find these collections in the database under the names {entity}#raw and {entity}#master.

DP\u00b3 periodically creates new profile snapshots, triggered by the Scheduler. Snapshots take the profile history, and compute the current value of the profile, reducing each attribute history to a single value. The snapshot creation frequency is configurable. Snapshots are created on a per-entity basis, but all linked entities are processed at the same time. This means that when snapshots are created, the registered snapshot callbacks can access any linked entities for their data correlation needs. After all the correlation callbacks are called, the snapshot is written to the profile snapshot collection, for which it can be accessed via the API. The collection is accessible under the name {entity}#snapshots.

"},{"location":"data_model/","title":"DP\u00b3 data model","text":"

Basic elements of the DP\u00b3 data model are entities (or objects), each entity record (object instance) has a set of attributes. Each attribute has some value (associated to a particular entity), timestamp (history of previous values can be stored) and optionally confidence value.

Entities may be mutually connected. See Relationships below.

"},{"location":"data_model/#exemplary-system","title":"Exemplary system","text":"

In this chapter, we will illustrate details on an exemplary system. Imagine you are developing data model for bus tracking system. You have to store these data:

  • label: Custom label for the bus set by administrator in web interface.
  • location: Location of the bus in a particular time. Value are GPS coordinates (array of latitude and longitude).
  • speed: Speed of the bus in a particular time.
  • passengers getting in and out: Number of passengers getting in or out of the bus. Distinguished by the doors used (front, middle, back). Bus control unit sends counters value every 10 minutes.

Also, map displaying current position of all buses is required.

(In case you are interested, configuration of database entities for this system is available in DB entities chapter.)

To make everything clear and more readable, all example references below are typesetted as quotes.

"},{"location":"data_model/#types-of-attributes","title":"Types of attributes","text":"

There are 3 types of attributes:

"},{"location":"data_model/#plain","title":"Plain","text":"

Common attributes with only one value of some data type. There's no history stored, but timestamp of last change is available.

Very useful for:

  • data from external source, when you only need to have current value

  • notes and other manually entered information

This is exactly what we need for label in our bus tracking system. Administor labels particular bus inside web interface and we use this label until it's changed - particularly display label next to a marker on a map. No history is needed and it has 100% confidence.

"},{"location":"data_model/#observations","title":"Observations","text":"

Attributes with history of values at some time or interval of time. Consequently, we can derive value at any time (most often not now) from these values.

Each value may have associated confidence.

These attributes may be single or multi value (multiple current values in one point in time).

Very useful for data where both current value and history is needed.

In our example, location is great use-case for observations type. We need to track position of the bus in time and store the history. Current location is very important. Let's suppose, we also need to do oversampling by predicting where is the bus now, eventhout we received last data-point 2 minutes ago. This is all possible (predictions using custom secondary modules).

The same applies to speed. It can also be derived from location.

"},{"location":"data_model/#timeseries","title":"Timeseries","text":"

One or more numeric values for a particular time.

In this attribute type: history > current value. In fact, no explicit current value is provided.

Very useful for:

  • any kind of history-based analysis

  • logging of events/changes

May be:

  • regular: sampling is regular Example: datapoint is created every x minutes

  • irregular: sampling is irregular Example: datapoint is created when some event occurs

  • irregular intervals: sampling is irregular and includes two timestamps (from when till when were provided data gathered) Example: Some event triggers 5 minute monitoring routine. When this routine finishes, it creates datapoint containing all the data from past 5 minutes.

Timeseries are very useful for passengers getting in and out (from our example). As we need to count two directions (in/out) for three doors (front/middle/back), we create 6 series (e.g. front_in, front_out, ..., back_out). Counter data-points are received in 10 minute interval, so regular timeseries are best fit for this use-case. Every 10 minutes we receive values for all 6 series and store them. Current value is not important as these data are only useful for passenger flow analysis throught whole month/year/...

"},{"location":"data_model/#relationships","title":"Relationships","text":"

Relationships between entities can be represented with or without history. They are realized using the link attribute type. Depedning on whether the history is important, they can be configured using as the mentioned plain data or observations.

Relationships can contain additional data, if that fits the modelling needs of your use case.

Very useful for:

  • any kind of relationship between entities
  • linkning dynamic entities to entities with static data

As our example so far contains only one entity, we currently have no need for relationships. However, if we wanted to track the different bus drivers driving individual buses, relationships would come in quite handy. The bus driver is a separate entity, and can drive multiple buses during the day. The current bus driver will be represented as an observation link between the bus and the driver, as can be seen in the resulting configuration.

"},{"location":"data_model/#continue-to","title":"Continue to ...","text":"

Now that you have an understanding of the data model and the types of attributes, you might want to check out the details of DB configuration, where you will find the parameters for each attribute type and the data types supported by the platform.

"},{"location":"extending/","title":"Extending Documentation","text":"

This page provides the basic info on where to start with writing documentation. If you feel lost at any point, please check out the documentation of MkDocs and Material for MkDocs, with which this documentation is built.

"},{"location":"extending/#project-layout","title":"Project layout","text":"
mkdocs.yml            # The configuration file.\ndocs/\n    index.md          # The documentation homepage.\n    gen_ref_pages.py  # Script for generating the code reference.\n    ...               # Other markdown pages, images and other files.\n

The docs/ folder contains all source Markdown files for the documentation.

You can find all documentation settings in mkdocs.yml. See the nav section for mapping of the left navigation tab and the Markdown files.

"},{"location":"extending/#local-instance-commands","title":"Local instance & commands","text":"

To see the changes made to the documentation page locally, a local instance of mkdocs is required. You can install all the required packages using:

pip install -r requirements.doc.txt\n

After installing, you can use the following mkdocs commands:

  • mkdocs serve - Start the live-reloading docs server.
  • mkdocs build - Build the documentation site.
  • mkdocs -h - Print help message and exit.
"},{"location":"extending/#text-formatting-and-other-features","title":"Text formatting and other features","text":"

As the entire documentation is written in Markdown, all base Markdown syntax is supported. This means headings, bold text, italics, inline code, tables and many more.

This set of options can be further extended, if you ever find the need. See the possibilities in the Material theme reference.

Some of the enabled extensions
  • This is an example of a collapsable admonition with a custom title.
  • Admonitions are one of the enabled markdown extensions, an another example would be the TODO checklist syntax:
    • Unchecked item
    • Checked item
  • See the markdown_extensions section in mkdocs.yml for all enabled extensions.
"},{"location":"extending/#links-and-references","title":"Links and references","text":"

To reference an anchor within a page, such as a heading, use a Markdown link to the specific anchor, for example: Commands. If you're not sure which identifier to use, you can look at a heading's anchor by clicking the heading in your Web browser, either in the text itself, or in the table of contents. If the URL is https://example.com/some/page/#anchor-name then you know that this item is possible to link to with [<displayed text>](#anchor-name). (Tip taken from mkdocstrings)

To make a reference to another page within the documentation, use the path to the Markdown source file, followed by the desired anchor. For example, this link was created as [link](index.md#repository-structure).

When making references to the generated Code Reference, there are two options. Links can be made either using the standard Markdown syntax, where some reverse-engineering of the generated files is required, or, with the support of mkdocstrings, using the [example][full.path.to.object] syntax. A real link like this can be for example this one to the Platform Model Specification.

"},{"location":"extending/#code-reference-generation","title":"Code reference generation","text":"

Code reference is generated using mkdocstrings and the Automatic code reference pages recipe from their documentation. The generation of pages is done using the docs/gen_ref_pages.py script. The script is a slight modification of what is recommended within the mentioned recipe.

Mkdocstrings itself enables generating code documentation from its docstrings using a path.to.object syntax. Here is an example of documentation for dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register method:

There are additional options that can be specified, which affect the way the documentation is presented. For more on these options, see here.

Even if you create a duplicate code reference description, the mkdocstring-style link still leads to the code reference, as you can see here.

"},{"location":"extending/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register","title":"register","text":"
register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n

Registers passed timeseries hook to be called during snapshot creation.

Binds hook to specified entity_type and attr_type (though same hook can be bound multiple times). If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.

Parameters:

Name Type Description Default hook Callable[[str, str, list[dict]], list[DataPointTask]]

hook callable should expect entity_type, attr_type and attribute history as arguments and return a list of Task objects.

required entity_type str

specifies entity type

required attr_type str

specifies attribute type

required"},{"location":"extending/#deployment","title":"Deployment","text":"

The documentation is updated and deployed automatically with each push to selected branches thanks to the configured GitHub Action, which can be found in: .github/workflows/deploy.yml.

"},{"location":"install/","title":"Installing DP\u00b3 platform","text":"

When talking about installing the DP\u00b3 platform, a distinction must be made between installing for platform development, installing for application development (i.e. platform usage) and installing for application and platform deployment. We will cover all three cases separately.

"},{"location":"install/#installing-for-application-development","title":"Installing for application development","text":"

Pre-requisites: Python 3.9 or higher, pip (with virtualenv installed), git, Docker and Docker Compose.

Create a virtualenv and install the DP\u00b3 platform using:

python3 -m venv venv  # (1)!\nsource venv/bin/activate  # (2)!\npython -m pip install --upgrade pip  # (3)!\npip install git+https://github.com/CESNET/dp3.git@new_dp3#egg=dp3\n
  1. We recommend using virtual environment. If you are not familiar with it, please read this first. Note for Windows: If python3 does not work, try py -3 or python instead.
  2. Windows: venv/Scripts/activate.bat
  3. We require pip>=21.0.1 for the pyproject.toml support. If your pip is up-to-date, you can skip this step.
"},{"location":"install/#creating-a-dp3-application","title":"Creating a DP\u00b3 application","text":"

To create a new DP\u00b3 application we will use the included dp3-setup utility. Run:

dp3-setup <application_directory> <your_application_name> 

So for example, to create an application called my_app in the current directory, run:

dp3-setup . my_app\n

This produces the following directory structure:

 \ud83d\udcc2 .\n \u251c\u2500\u2500 \ud83d\udcc1 config  # (1)! \n\u2502   \u251c\u2500\u2500 \ud83d\udcc4 api.yml\n \u2502   \u251c\u2500\u2500 \ud83d\udcc4 control.yml\n \u2502   \u251c\u2500\u2500 \ud83d\udcc4 database.yml\n \u2502   \u251c\u2500\u2500 \ud83d\udcc1 db_entities # (2)!\n\u2502   \u251c\u2500\u2500 \ud83d\udcc4 event_logging.yml\n \u2502   \u251c\u2500\u2500 \ud83d\udcc4 history_manager.yml\n \u2502   \u251c\u2500\u2500 \ud83d\udcc1 modules # (3)!\n\u2502   \u251c\u2500\u2500 \ud83d\udcc4 processing_core.yml\n \u2502   \u251c\u2500\u2500 \ud83d\udcc4 README.md\n \u2502   \u2514\u2500\u2500 \ud83d\udcc4 snapshots.yml\n \u251c\u2500\u2500 \ud83d\udcc1 docker # (4)!\n\u2502   \u251c\u2500\u2500 \ud83d\udcc1 python\n \u2502   \u2514\u2500\u2500 \ud83d\udcc1 rabbitmq\n \u251c\u2500\u2500 \ud83d\udcc4 docker-compose.app.yml\n \u251c\u2500\u2500 \ud83d\udcc4 docker-compose.yml\n \u251c\u2500\u2500 \ud83d\udcc1 modules # (5)!\n\u2502   \u2514\u2500\u2500 \ud83d\udcc4 test_module.py\n \u251c\u2500\u2500 \ud83d\udcc4 README.md # (6)!\n\u2514\u2500\u2500 \ud83d\udcc4 requirements.txt\n

  1. The config directory contains the configuration files for the DP\u00b3 platform. For more details, please check out the configuration documentation.
  2. The config/db_entities directory contains the database entities of the application. This defines the data model of your application. For more details, you may want to check out the data model and the DB entities documentation.
  3. The config/modules directory is where you can place the configuration specific to your modules.
  4. The docker directory contains the Dockerfiles for the RabbitMQ and python images, tailored to your application.
  5. The modules directory contains the modules of your application. To get started, a single module called test_module is included. For more details, please check out the Modules page.
  6. The README.md file contains some instructions to get started. Edit it to your liking.
"},{"location":"install/#running-the-application","title":"Running the Application","text":"

To run the application, we first need to setup the other services the platform depends on, such as the MongoDB database, the RabbitMQ message distribution and the Redis database. This can be done using the supplied docker-compose.yml file. Simply run:

docker compose up -d --build  # (1)!\n
  1. The -d flag runs the services in the background, so you can continue working in the same terminal. The --build flag forces Docker to rebuild the images, so you can be sure you are running the latest version. If you want to run the services in the foreground, omit the -d flag.
Docker Compose basics

The state of running containers can be checked using:

docker compose ps\n

which will display the state of running processes. The logs of the services can be displayed using:

docker compose logs\n

which will display the logs of all services, or:

docker compose logs <service name>\n

which will display only the logs of the given service. (In this case, the services are rabbitmq, mongo, mongo_express, and redis)

We can now focus on running the platform and developing or testing. After you are done, simply run:

docker compose down\n

which will stop and remove all containers, networks and volumes created by docker compose up.

There are two main ways to run the application itself. First is a little more hand-on, and allows easier debugging. There are two main kinds of processes in the application: the API and the worker processes.

To run the API, simply run:

APP_NAME=my_app CONF_DIR=config api\n

The starting configuration sets only a single worker process, which you can run using:

worker my_app config 0     

The second way is to use the docker-compose.app.yml file, which runs the API and the worker processes in separate containers. To run the API, simply run:

docker compose -f docker-compose.app.yml up -d --build\n

Either way, to test that everything is running properly, you can run:

curl -X 'GET' 'http://localhost:5000/' \\\n-H 'Accept: application/json' 

Which should return a JSON response with the following content:

{\n\"detail\": \"It works!\"\n}\n

You are now ready to start developing your application!

"},{"location":"install/#installing-for-platform-development","title":"Installing for platform development","text":"

Pre-requisites: Python 3.9 or higher, pip (with virtualenv installed), git, Docker and Docker Compose.

Pull the repository and install using:

git clone --branch new_dp3 git@github.com:CESNET/dp3.git dp3 cd dp3\npython3 -m venv venv  # (1)!\nsource venv/bin/activate  # (2)!\npython -m pip install --upgrade pip  # (3)!\npip install --editable \".[dev]\" # (4)!\npre-commit install  # (5)!\n
  1. We recommend using virtual environment. If you are not familiar with it, please read this first. Note for Windows: If python3 does not work, try py -3 or python instead.
  2. Windows: venv/Scripts/activate.bat
  3. We require pip>=21.0.1 for the pyproject.toml support. If your pip is up-to-date, you can skip this step.
  4. Install using editable mode to allow for changes in the code to be reflected in the installed package. Also, install the development dependencies, such as pre-commit and mkdocs.
  5. Install pre-commit hooks to automatically format and lint the code before committing.

With the dependencies, the pre-commit package is installed. You can verify the installation using pre-commit --version. Pre-commit is used to automatically unify code formatting and perform code linting. The hooks configured in .pre-commit-config.yaml should now run automatically on every commit.

In case you want to make sure, you can run pre-commit run --all-files to see it in action.

"},{"location":"install/#running-the-dependencies-and-the-platform","title":"Running the dependencies and the platform","text":"

The DP\u00b3 platform is now installed and ready for development. To run it, we first need to set up the other services the platform depends on, such as the MongoDB database, the RabbitMQ message distribution and the Redis database. This can be done using the supplied docker-compose.yml file. Simply run:

docker compose up -d --build  # (1)!\n
  1. The -d flag runs the services in the background, so you can continue working in the same terminal. The --build flag forces Docker to rebuild the images, so you can be sure you are running the latest version. If you want to run the services in the foreground, omit the -d flag.
On Docker Compose

Docker Compose can be installed as a standalone (older v1) or as a plugin (v2), the only difference is when executing the command:

Note that Compose standalone uses the dash compose syntax instead of current\u2019s standard syntax (space compose). For example: type docker-compose up when using Compose standalone, instead of docker compose up.

This documentation uses the v2 syntax, so if you have the standalone version installed, adjust accordingly.

After the first compose up command, the images for RabbitMQ, MongoDB and Redis will be downloaded, their images will be built according to the configuration and all three services will be started. On subsequent runs, Docker will use the cache, so if the configuration does not change, the download and build steps will not be repeated.

The configuration is taken implicitly from the docker-compose.yml file in the current directory. The docker-compose.yml configuration contains the configuration for the services, as well as a testing setup of the DP\u00b3 platform itself. The full configuration is in tests/test_config. The setup includes one worker process and one API process to handle requests. The API process is exposed on port 5000, so you can send requests to it using curl or from your browser:

curl -X 'GET' 'http://localhost:5000/' \\\n-H 'Accept: application/json' 
curl -X 'POST' 'http://localhost:5000/datapoints' \\\n-H 'Content-Type: application/json' \\\n--data '[{\"type\": \"test_entity_type\", \"id\": \"abc\", \"attr\": \"test_attr_int\", \"v\": 123, \"t1\": \"2023-07-01T12:00:00\", \"t2\": \"2023-07-01T13:00:00\"}]'\n

Docker Compose basics

The state of running containers can be checked using:

docker compose ps\n

which will display the state of running processes. The logs of the services can be displayed using:

docker compose logs\n

which will display the logs of all services, or:

docker compose logs <service name>\n

which will display only the logs of the given service. (In this case, the services are rabbitmq, mongo, redis, receiver_api and worker)

We can now focus on running the platform and developing or testing. After you are done, simply run:

docker compose down\n

which will stop and remove all containers, networks and volumes created by docker compose up.

"},{"location":"install/#testing","title":"Testing","text":"

With the testing platform setup running, we can now run tests. Tests are run using the unittest framework and can be run using:

python -m unittest discover \\\n-s tests/test_common \\\n-v\nCONF_DIR=tests/test_config \\\npython -m unittest discover \\\n-s tests/test_api \\\n-v\n
"},{"location":"install/#documentation","title":"Documentation","text":"

For extending of this documentation, please refer to the Extending page.

"},{"location":"modules/","title":"Modules","text":"

DP\u00b3 enables its users to create custom modules to perform application specific data analysis. Modules are loaded using a plugin-like architecture and can influence the data flow from the very first moment upon handling the data-point push request.

As described in the Architecture page, DP\u00b3 uses a categorization of modules into primary and secondary modules. The distinction between primary and secondary modules is such that primary modules send data-points into the system using the HTTP API, while secondary modules react to the data present in the system, e.g.: altering the data-flow in an application-specific manner, deriving additional data based on incoming data-points or performing data correlation on entity snapshots.

This page covers the DP\u00b3 API for secondary modules, for primary module implementation, the API documentation may be useful, also feel free to check out the dummy_sender script in /scripts/dummy_sender.py.

"},{"location":"modules/#creating-a-new-module","title":"Creating a new Module","text":"

First, make a directory that will contain all modules of the application. For example, let's assume that the directory will be called /modules/.

As mentioned in the Processing core configuration page, the modules directory must be specified in the modules_dir configuration option. Let's create the main module file now - assuming the module will be called my_awesome_module, create a file /modules/my_awesome_module.py.

Finally, to make the processing core load the module, add the module name to the enabled_modules configuration option, e.g.:

Enabling the module in processing_core.yml
modules_dir: \"/modules/\"\nenabled_modules:\n- \"my_awesome_module\"\n

Here is a basic skeleton for the module file:

import logging\nfrom dp3.common.base_module import BaseModule\nfrom dp3.common.config import PlatformConfig\nfrom dp3.common.callback_registrar import CallbackRegistrar\nclass MyAwesomeModule(BaseModule):\ndef __init__(self,\n_platform_config: PlatformConfig, \n_module_config: dict, \n_registrar: CallbackRegistrar\n):\nself.log = logging.getLogger(\"MyAwesomeModule\")\n

All modules must subclass the BaseModule class. If a class does not subclass the BaseModule class, it will not be loaded and activated by the main DP\u00b3 worker. The declaration of BaseModule is as follows:

class BaseModule(ABC):\n@abstractmethod\ndef __init__(\nself, \nplatform_config: PlatformConfig, \nmodule_config: dict, \nregistrar: CallbackRegistrar\n):\npass\n

At initialization, each module receives a PlatformConfig, a module_config dictionary and a CallbackRegistrar. For the module to do anything, it must read the provided configuration from platform_configand module_config and register callbacks to perform data analysis using the registrar object. Let's go through them one at a time.

"},{"location":"modules/#configuration","title":"Configuration","text":"

PlatformConfig contains the entire DP\u00b3 platform configuration, which includes the application name, worker counts, which worker processes is the module running in and a ModelSpec which contains the entity specification.

If you want to create configuration specific to the module itself, create a .yml configuration file named as the module itself inside the modules/ folder, as described in the modules configuration page. This configuration will be then loaded into the module_config dictionary for convenience.

"},{"location":"modules/#callbacks","title":"Callbacks","text":"

The registrar: CallbackRegistrar object provides the API to register callbacks to be called during the data processing.

"},{"location":"modules/#cron-trigger-periodic-callbacks","title":"CRON Trigger Periodic Callbacks","text":"

For callbacks that need to be called periodically, the scheduler_register is used. The specific times the callback will be called are defined using the CRON schedule expressions. Here is a simplified example from the HistoryManager module:

registrar.scheduler_register(\nself.delete_old_dps, minute=\"*/10\"  # (1)!\n)\nregistrar.scheduler_register(\nself.archive_old_dps, minute=0, hour=2  # (2)!\n)  \n
  1. At every 10th minute.
  2. Every day at 2 AM.

By default, the callback will receive no arguments, but you can pass static arguments for every call using the func_args and func_kwargs keyword arguments. The function return value will always be ignored.

The complete documentation can be found at the scheduler_register page. As DP\u00b3 utilizes the APScheduler package internally to realize this functionality, specifically the CronTrigger, feel free to check their documentation for more details.

"},{"location":"modules/#callbacks-within-processing","title":"Callbacks within processing","text":"

There are a number of possible places to register callback functions during data-point processing.

"},{"location":"modules/#task-on_task_start-hook","title":"Task on_task_start hook","text":"

A hook will be called on task processing start. The callback is registered using the register_task_hook method. Required signature is Callable[[DataPointTask], Any], as the return value is ignored. It may be useful for implementing custom statistics.

def task_hook(task: DataPointTask):\nprint(task.etype)\nregistrar.register_task_hook(\"on_task_start\", task_hook)\n
"},{"location":"modules/#entity-allow_entity_creation-hook","title":"Entity allow_entity_creation hook","text":"

Receives eid and Task, may prevent entity record creation (by returning False). The callback is registered using the register_entity_hook method. Required signature is Callable[[str, DataPointTask], bool].

def entity_creation(eid: str, task: DataPointTask) -> bool:\nreturn eid.startswith(\"1\")\nregistrar.register_entity_hook(\n\"allow_entity_creation\", entity_creation, \"test_entity_type\"\n)\n
"},{"location":"modules/#entity-on_entity_creation-hook","title":"Entity on_entity_creation hook","text":"

Receives eid and Task, may return new DataPointTasks.

The callback is registered using the register_entity_hook method. Required signature is Callable[[str, DataPointTask], list[DataPointTask]].

def processing_function(eid: str, task: DataPointTask) -> list[DataPointTask]:\noutput = does_work(task)\nreturn [DataPointTask(\nmodel_spec=task.model_spec,\netype=\"mac\",\neid=eid,\ndata_points=[{\n\"etype\": \"test_enitity_type\",\n\"eid\": eid,\n\"attr\": \"derived_on_creation\",\n\"src\": \"secondary/derived_on_creation\",\n\"v\": output\n}]\n)]\nregistrar.register_entity_hook(\n\"on_entity_creation\", processing_function, \"test_entity_type\"\n)\n
"},{"location":"modules/#attribute-hooks","title":"Attribute hooks","text":"

There are register points for all attribute types: on_new_plain, on_new_observation, on_new_ts_chunk.

Callbacks are registered using the register_attr_hook method. The callback allways receives eid, attribute and Task, and may return new DataPointTasks. The required signature is Callable[[str, DataPointBase], list[DataPointTask]].

def attr_hook(eid: str, dp: DataPointBase) -> list[DataPointTask]:\n...\nreturn []\nregistrar.register_attr_hook(\n\"on_new_observation\", attr_hook, \"test_entity_type\", \"test_attr_type\",\n)\n
"},{"location":"modules/#timeseries-hook","title":"Timeseries hook","text":"

Timeseries hooks are run before snapshot creation, and allow to process the accumulated timeseries data into observations / plain attributes to be accessed in snapshots.

Callbacks are registered using the register_timeseries_hook method. The expected callback signature is Callable[[str, str, list[dict]], list[DataPointTask]], as the callback should expect entity_type, attr_type and attribute history as arguments and return a list of DataPointTask objects.

def timeseries_hook(\nentity_type: str, attr_type: str, attr_history: list[dict]\n) -> list[DataPointTask]:\n...\nreturn []\nregistrar.register_timeseries_hook(\ntimeseries_hook, \"test_entity_type\", \"test_attr_type\",\n)\n
"},{"location":"modules/#correlation-callbacks","title":"Correlation callbacks","text":"

Correlation callbacks are called during snapshot creation, and allow to perform analysis on the data of the snapshot.

The register_correlation_hook method expects a callable with the following signature: Callable[[str, dict], None], where the first argument is the entity type, and the second is a dict containing the current values of the entity and its linked entities.

As correlation hooks can depend on each other, the hook inputs and outputs must be specified using the depends_on and may_change arguments. Both arguments are lists of lists of strings, where each list of strings is a path from the specified entity type to individual attributes (even on linked entities). For example, if the entity type is test_entity_type, and the hook depends on the attribute test_attr_type1, the path is simply [[\"test_attr_type1\"]]. If the hook depends on the attribute test_attr_type1 of an entity linked using test_attr_link, the path will be [[\"test_attr_link\", \"test_attr_type1\"]].

def correlation_hook(entity_type: str, values: dict):\n...\nregistrar.register_correlation_hook(\ncorrelation_hook, \"test_entity_type\", [[\"test_attr_type1\"]], [[\"test_attr_type2\"]]\n)\n

The order of running callbacks is determined automatically, based on the dependencies. If there is a cycle in the dependencies, a ValueError will be raised at registration. Also, if the provided dependency / output paths are invalid, a ValueError will be raised.

"},{"location":"modules/#running-module-code-in-a-separate-thread","title":"Running module code in a separate thread","text":"

The module is free to run its own code in separate threads or processes. To synchronize such code with the platform, use the start() and stop() methods of the BaseModule class. the start() method is called after the platform is initialized, and the stop() method is called before the platform is shut down.

class MyModule(BaseModule):\ndef __init__(self, *args, **kwargs):\nsuper().__init__(*args, **kwargs)\nself._thread = None\nself._stop_event = threading.Event()\nself.log = logging.getLogger(\"MyModule\")\ndef start(self):\nself._thread = threading.Thread(target=self._run, daemon=True)\nself._thread.start()\ndef stop(self):\nself._stop_event.set()\nself._thread.join()\ndef _run(self):\nwhile not self._stop_event.is_set():\nself.log.info(\"Hello world!\")\ntime.sleep(1)\n
"},{"location":"configuration/","title":"Configuration","text":"

DP\u00b3 configuration folder consists of these files and folders:

db_entities/\nmodules/\ncommon.yml\ndatabase.yml\nevent_logging.yml\nhistory_manager.yml\nprocessing_core.yml\nsnapshots.yml\n

Their meaning and usage is explained in following chapters.

"},{"location":"configuration/#example-configuration","title":"Example configuration","text":"

Example configuration is included config/ folder in DP\u00b3 repository.

"},{"location":"configuration/database/","title":"Database","text":"

File database.yml specifies mainly MongoDB database connection details and credentials.

It looks like this:

connection:\nusername: \"dp3_user\"\npassword: \"dp3_password\"\naddress: \"127.0.0.1\"\nport: 27017\ndb_name: \"dp3_database\"\n
"},{"location":"configuration/database/#connection","title":"Connection","text":"

Connection details contain:

Parameter Data-type Default value Description username string dp3 Username for connection to DB. Escaped using urllib.parse.quote_plus. password string dp3 Password for connection to DB. Escaped using urllib.parse.quote_plus. address string localhost IP address or hostname for connection to DB. port int 27017 Listening port of DB. db_name string dp3 Database name to be utilized by DP\u00b3."},{"location":"configuration/db_entities/","title":"DB entities","text":"

Files in db_entities folder describe entities and their attributes. You can think of entity as class from object-oriented programming.

Below is YAML file (e.g. db_entities/bus.yml) corresponding to bus tracking system example from Data model chapter.

entity:\nid: bus\nname: Bus\nattribs:\n# Attribute `label`\nlabel:\nname: Label\ndescription: Custom label for the bus.\ntype: plain\ndata_type: string\neditable: true\n# Attribute `location`\nlocation:\nname: Location\ndescription: Location of the bus in a particular time. Value are GPS \\\ncoordinates (array of latitude and longitude).\ntype: observations\ndata_type: array<float>\nhistory_params:\npre_validity: 1m\npost_validity: 1m\nmax_age: 30d\n# Attribute `speed`\nspeed:\nname: Speed\ndescription: Speed of the bus in a particular time. In km/h.\ntype: observations\ndata_type: float\nhistory_params:\npre_validity: 1m\npost_validity: 1m\nmax_age: 30d\n# Attribute `passengers_in_out`\npassengers_in_out:\nname: Passengers in/out\ndescription: Number of passengers getting in or out of the bus. Distinguished by the doors used (front, middle, back). Regularly sampled every 10 minutes.\ntype: timeseries\ntimeseries_type: regular\ntimeseries_params:\nmax_age: 14d\ntime_step: 10m\nseries:\nfront_in:\ndata_type: int\nfront_out:\ndata_type: int\nmiddle_in:\ndata_type: int\nmiddle_out:\ndata_type: int\nback_in:\ndata_type: int\nback_out:\ndata_type: int\n# Attribute `driver` to link the driver of the bus at a given time.\ndriver:\nname: Driver\ndescription: Driver of the bus at a given time.\ntype: observations\ndata_type: link<driver>\nhistory_params:\npre_validity: 1m\npost_validity: 1m\nmax_age: 30d\n
"},{"location":"configuration/db_entities/#entity","title":"Entity","text":"

Entity is described simply by:

Parameter Data-type Default value Description id string (identifier) (mandatory) Short string identifying the entity type, it's machine name (must match regex [a-zA-Z_][a-zA-Z0-9_-]*). Lower-case only is recommended. name string (mandatory) Attribute name for humans. May contain any symbols."},{"location":"configuration/db_entities/#attributes","title":"Attributes","text":"

Each attribute is specified by the following set of parameters:

"},{"location":"configuration/db_entities/#base","title":"Base","text":"

These apply to all types of attributes (plain, observations and timeseries).

Parameter Data-type Default value Description id string (identifier) (mandatory) Short string identifying the attribute, it's machine name (must match this regex [a-zA-Z_][a-zA-Z0-9_-]*). Lower-case only is recommended. type string (mandatory) Type of attribute. Can be either plain, observations or timeseries. name string (mandatory) Attribute name for humans. May contain any symbols. description string \"\" Longer description of the attribute, if needed. color #xxxxxx null Color to use in GUI (useful mostly for tag values), not used currently."},{"location":"configuration/db_entities/#plain-specific-parameters","title":"Plain-specific parameters","text":"Parameter Data-type Default value Description data_type string (mandatory) Data type of attribute value, see Supported data types. categories array of strings null List of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example). editable bool false Whether value of this attribute is editable via web interface."},{"location":"configuration/db_entities/#observations-specific-parameters","title":"Observations-specific parameters","text":"Parameter Data-type Default value Description data_type string (mandatory) Data type of attribute value, see Supported data types. categories array of strings null List of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example). editable bool false Whether value of this attribute is editable via web interface. confidence bool false Whether a confidence value should be stored along with data value or not. multi_value bool false Whether multiple values can be set at the same time. history_params object, see below (mandatory) History and time aggregation parameters. A subobject with fields described in the table below. history_force_graph bool false By default, if data type of attribute is array, we show it's history on web interface as table. This option can force tag-like graph with comma-joined values of that array as tags."},{"location":"configuration/db_entities/#history-params","title":"History params","text":"

Description of history_params subobject (see table above).

Parameter Data-type Default value Description max_age <int><s/m/h/d> (e.g. 30s, 12h, 7d) null How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). max_items int (> 0) null How many data-points/intervals to store (oldest ones are removed when limit is exceeded). Currently not implemented. expire_time <int><s/m/h/d> or inf (infinity) infinity How long after the end time (t2) is the last value considered valid (i.e. is used as \"current value\"). Zero (0) means to strictly follow t1, t2. Zero can be specified without a unit (s/m/h/d). Currently not implemented. pre_validity <int><s/m/h/d> (e.g. 30s, 12h, 7d) 0s Max time before t1 for which the data-point's value is still considered to be the \"current value\" if there's no other data-point closer in time. post_validity <int><s/m/h/d> (e.g. 30s, 12h, 7d) 0s Max time after t2 for which the data-point's value is still considered to be the \"current value\" if there's no other data-point closer in time.

Note: At least one of max_age and max_items SHOULD be defined, otherwise the amount of stored data can grow unbounded.

"},{"location":"configuration/db_entities/#timeseries-specific-parameters","title":"Timeseries-specific parameters","text":"Parameter Data-type Default value Description timeseries_type string (mandatory) One of: regular, irregular or irregular_intervals. See chapter Data model for explanation. series object of objects, see below (mandatory) Configuration of series of data represented by this timeseries attribute. timeseries_params object, see below Other timeseries parameters. A subobject with fields described by the table below."},{"location":"configuration/db_entities/#series","title":"Series","text":"

Description of series subobject (see table above).

Key for series object is id - short string identifying the series (e.g. bytes, temperature, parcels).

Parameter Data-type Default value Description type string (mandatory) Data type of series. Only int and float are allowed (also time, but that's used internally, see below).

Time series (axis) is added implicitly by DP\u00b3 and this behaviour is specific to selected timeseries_type:

  • regular: \"time\": { \"data_type\": \"time\" }
  • irregular: \"time\": { \"data_type\": \"time\" }
  • irregular_timestamps: \"time_first\": { \"data_type\": \"time\" }, \"time_last\": { \"data_type\": \"time\" }
"},{"location":"configuration/db_entities/#timeseries-params","title":"Timeseries params","text":"

Description of timeseries_params subobject (see table above).

Parameter Data-type Default value Description max_age <int><s/m/h/d> (e.g. 30s, 12h, 7d) null How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). time_step <int><s/m/h/d> (e.g. 30s, 12h, 7d) (mandatory) for regular timeseries, null otherwise \"Sampling rate in time\" of this attribute. For example, with time_step = 10m we expect data-point at 12:00, 12:10, 12:20, 12:30,... Only relevant for regular timeseries.

Note: max_age SHOULD be defined, otherwise the amount of stored data can grow unbounded.

"},{"location":"configuration/db_entities/#supported-data-types","title":"Supported data types","text":"

List of supported values for parameter data_type:

  • tag: set/not_set (When the attribute is set, its value is always assumed to be true, the \"v\" field doesn't have to be stored.)
  • binary: true/false/not_set (Attribute value is true or false, or the attribute is not set at all.)
  • category<data_type; category1, category2, ...>: Categorical values. Use only when a fixed set of values should be allowed, which should be specified in the second part of the type definition. The first part of the type definition describes the data_type of the category.
  • string
  • int: 32-bit signed integer (range from -2147483648 to +2147483647)
  • int64: 64-bit signed integer (use when the range of normal int is not sufficent)
  • float
  • time: Timestamp in YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [\u00b1]HH[:]MM] format or timestamp since 1.1.1970 in seconds or milliseconds.
  • ip4: IPv4 address (passed as dotted-decimal string)
  • ip6: IPv6 address (passed as string in short or full format)
  • mac: MAC address (passed as string)
  • link<entity_type>: Link to a record of the specified type, e.g. link<ip>
  • link<entity_type,data_type>: Link to a record of the specified type, carrying additional data, e.g. link<ip,int>
  • array<data_type>: An array of values of specified data type (which must be one of the types above), e.g. array<int>
  • set<data_type>: Same as array, but values can't repeat and order is irrelevant.
  • dict<keys>: Dictionary (object) containing multiple values as subkeys. keys should contain a comma-separated list of key names and types separated by colon, e.g. dict<port:int,protocol:string,tag?:string>. By default, all fields are mandatory (i.e. a data-point missing some subkey will be refused), to mark a field as optional, put ? after its name. Only the following data types can be used here: binary,category,string,int,float,time,ip4,ip6,mac. Multi-level dicts are not supported.
  • json: Any JSON object can be stored, all processing is handled by user's code. This is here for special cases which can't be mapped to any data type above.
"},{"location":"configuration/event_logging/","title":"Event logging","text":"

Event logging is done using Redis and allows to count arbitrary events across multiple processes (using shared counters in Redis) and in various time intervals.

More information can be found in Github repository of EventCountLogger.

Configuration file event_logging.yml looks like this:

redis:\nhost: localhost\nport: 6379\ndb: 1\ngroups:\n# Main events of Task execution\nte:\nevents:\n- task_processed\n- task_processing_error\nintervals: [ \"5m\", \"2h\" ] # (1)!\nsync-interval: 1 # (2)!\n# Number of processed tasks by their \"src\" attribute\ntasks_by_src:\nevents: [ ]\nauto_declare_events: true\nintervals: [ \"5s\", \"5m\" ]\nsync-interval: 1\n
  1. Two intervals - 5 min and 2 hours for longer-term history in Munin/Icinga
  2. Cache counts locally, push to Redis every second
"},{"location":"configuration/event_logging/#redis","title":"Redis","text":"

This section describes Redis connection details:

Parameter Data-type Default value Description host string localhost IP address or hostname for connection to Redis. port int 6379 Listening port of Redis. db int 0 Index of Redis DB used for the counters (it shouldn't be used for anything else)."},{"location":"configuration/event_logging/#groups","title":"Groups","text":"

The default configuration groups enables logging of events in task execution, namely task_processed and task_processing_error.

To learn more about the group configuration for EventCountLogger, please refer to the official documentation.

"},{"location":"configuration/history_manager/","title":"History manager","text":"

History manager is reponsible for deleting old records from master records in database.

Configuration file history_manager.yml is very simple:

datapoint_cleaning:\ntick_rate: 10\n

Parameter tick_rate sets interval how often (in minutes) should DP\u00b3 check if any data in master record of observations and timeseries attributes isn't too old and if there's something too old, removes it. To control what is considered as \"too old\", see parameter max_age in Database entities configuration.

"},{"location":"configuration/modules/","title":"Modules","text":"

Folder modules/ optionally contains any module-specific configuration.

This configuration doesn't have to follow any required format (except being YAML files).

In secondary modules, you can access the configuration:

from dp3 import g\nprint(g.config[\"modules\"][\"MODULE_NAME\"])\n

Here, the MODULE_NAME corresponds to MODULE_NAME.yml file in modules/ folder.

"},{"location":"configuration/processing_core/","title":"Processing core","text":"

Processing core's configuration in processing_core.yml file looks like this:

msg_broker:\nhost: localhost\nport: 5672\nvirtual_host: /\nusername: dp3_user\npassword: dp3_password\nworker_processes: 2\nworker_threads: 16\nmodules_dir: \"../dp3_modules\"\nenabled_modules:\n- \"module_one\"\n- \"module_two\"\n
"},{"location":"configuration/processing_core/#message-broker","title":"Message broker","text":"

Message broker section describes connection details to RabbitMQ (or compatible) broker.

Parameter Data-type Default value Description host string localhost IP address or hostname for connection to broker. port int 5672 Listening port of broker. virtual_host string / Virtual host for connection to broker. username string guest Username for connection to broker. password string guest Password for connection to broker."},{"location":"configuration/processing_core/#worker-processes","title":"Worker processes","text":"

Number of worker processes. This has to be at least 1.

If changing number of worker processes, the following process must be followed:

  1. stop all inputs writing to task queue (e.g. API)
  2. when all queues are empty, stop all workers
  3. reconfigure queues in RabbitMQ using script found in /scripts/rmq_reconfigure.sh
  4. change the settings here and in init scripts for worker processes (e.g. supervisor)
  5. reload workers (e.g. using supervisorctl) and start all inputs again
"},{"location":"configuration/processing_core/#worker-threads","title":"Worker threads","text":"

Number of worker threads per process.

This may be higher than number of CPUs, because this is not primarily intended to utilize computational power of multiple CPUs (which Python cannot do well anyway due to the GIL), but to mask long I/O operations (e.g. queries to external services via network).

"},{"location":"configuration/processing_core/#modules-directory","title":"Modules directory","text":"

Path to directory with plug-in (secondary) modules.

Relative path is evaluated relative to location of this configuration file.

"},{"location":"configuration/processing_core/#enabled-modules","title":"Enabled modules","text":"

List of plug-in modules which should be enabled in processing pipeline.

Name of module filename without .py extension must be used!

"},{"location":"configuration/snapshots/","title":"Snapshots","text":"

Snapshots configuration is straightforward. Currently, it only sets creation_rate - period in minutes for creating new snapshots (30 minutes by default).

File snapshots.yml looks like this:

creation_rate: 30\n
"},{"location":"reference/","title":"dp3","text":""},{"location":"reference/#dp3","title":"dp3","text":""},{"location":"reference/#dp3--dynamic-profile-processing-platform-dp3","title":"Dynamic Profile Processing Platform (DP\u00b3)","text":"

Platform directory structure:

  • Worker - The main worker process.

  • Common - Common modules which are used throughout the platform.

    • Config, EntitySpec and AttrSpec - Models for reading, validation and representing platform configuration of entities and their attributes. base_attrs and datatype are also used within this context.
    • Scheduler - Allows modules to run callbacks at specified times
    • Task - Model for a single task processed by the platform
    • Utils - Auxiliary utility functions
  • Database.EntityDatabase - A wrapper responsible for communication with the database server.

  • HistoryManagement.HistoryManager - Module responsible for managing history saved in database, currently to clean old data.

  • Snapshots - SnapShooter, a module responsible for snapshot creation and running configured data correlation and fusion hooks, and Snapshot Hooks, which manage the registered hooks and their dependencies on one another.

  • TaskProcessing - Module responsible for task distribution, processing and running configured hooks. Task distribution is possible due to the task queue.

"},{"location":"reference/SUMMARY/","title":"SUMMARY","text":"
  • dp3
    • api
      • internal
        • config
        • dp_logger
        • entity_response_models
        • helpers
        • models
        • response_models
      • main
      • routers
        • control
        • entity
        • root
    • bin
      • api
      • setup
      • worker
    • common
      • attrspec
      • base_attrs
      • base_module
      • callback_registrar
      • config
      • control
      • datapoint
      • datatype
      • entityspec
      • scheduler
      • task
      • utils
    • database
      • database
    • history_management
      • history_manager
      • telemetry
    • snapshots
      • snapshooter
      • snapshot_hooks
    • task_processing
      • task_distributor
      • task_executor
      • task_hooks
      • task_queue
    • worker
"},{"location":"reference/worker/","title":"worker","text":""},{"location":"reference/worker/#dp3.worker","title":"dp3.worker","text":"

Code of the main worker process.

Don't run directly. Import and run the main() function.

"},{"location":"reference/worker/#dp3.worker.load_modules","title":"load_modules","text":"
load_modules(modules_dir: str, enabled_modules: dict, log: logging.Logger, registrar: CallbackRegistrar, platform_config: PlatformConfig) -> list\n

Load plug-in modules

Import Python modules with names in 'enabled_modules' from 'modules_dir' directory and return all found classes derived from BaseModule class.

Source code in dp3/worker.py
def load_modules(\nmodules_dir: str,\nenabled_modules: dict,\nlog: logging.Logger,\nregistrar: CallbackRegistrar,\nplatform_config: PlatformConfig,\n) -> list:\n\"\"\"Load plug-in modules\n    Import Python modules with names in 'enabled_modules' from 'modules_dir' directory\n    and return all found classes derived from BaseModule class.\n    \"\"\"\n# Get list of all modules available in given folder\n# [:-3] is for removing '.py' suffix from module filenames\navailable_modules = []\nfor item in os.scandir(modules_dir):\n# A module can be a Python file or a Python package\n# (i.e. a directory with \"__init__.py\" file)\nif item.is_file() and item.name.endswith(\".py\"):\navailable_modules.append(item.name[:-3])  # name without .py\nif item.is_dir() and \"__init__.py\" in os.listdir(os.path.join(modules_dir, item.name)):\navailable_modules.append(item.name)\nlog.debug(f\"Available modules: {', '.join(available_modules)}\")\nlog.debug(f\"Enabled modules: {', '.join(enabled_modules)}\")\n# Check if all desired modules are in modules folder\nmissing_modules = set(enabled_modules) - set(available_modules)\nif missing_modules:\nlog.fatal(\n\"Some of desired modules are not available (not in modules folder), \"\nf\"specifically: {missing_modules}\"\n)\nsys.exit(2)\n# Do imports of desired modules from 'modules' folder\n# (rewrite sys.path to modules_dir, import all modules and rewrite it back)\nlog.debug(\"Importing modules ...\")\nsys.path.insert(0, modules_dir)\nimported_modules: list[tuple[str, str, type[BaseModule]]] = [\n(module_name, name, obj)\nfor module_name in enabled_modules\nfor name, obj in inspect.getmembers(import_module(module_name))\nif inspect.isclass(obj) and BaseModule in obj.__bases__\n]\ndel sys.path[0]\n# Final list will contain main classes from all desired modules,\n# which has BaseModule as parent\nmodules_main_objects = []\nfor module_name, _, obj in imported_modules:\n# Append instance of module class (obj is class --> obj() is instance)\n# --> call init, which registers handler\nmodule_config = platform_config.config.get(f\"modules.{module_name}\", {})\nmodules_main_objects.append(obj(platform_config, module_config, registrar))\nlog.info(f\"Module loaded: {module_name}:{obj.__name__}\")\nreturn modules_main_objects\n
"},{"location":"reference/worker/#dp3.worker.main","title":"main","text":"
main(app_name: str, config_dir: str, process_index: int, verbose: bool) -> None\n

Run worker process.

Parameters:

Name Type Description Default app_name str

Name of the application to distinct it from other DP3-based apps. For example, it's used as a prefix for RabbitMQ queue names.

required config_dir str

Path to directory containing configuration files.

required process_index int

Index of this worker process. For each application there must be N processes running simultaneously, each started with a unique index (from 0 to N-1). N is read from configuration ('worker_processes' in 'processing_core.yml').

required verbose bool

More verbose output (set log level to DEBUG).

required Source code in dp3/worker.py
def main(app_name: str, config_dir: str, process_index: int, verbose: bool) -> None:\n\"\"\"\n    Run worker process.\n    Args:\n        app_name: Name of the application to distinct it from other DP3-based apps.\n            For example, it's used as a prefix for RabbitMQ queue names.\n        config_dir: Path to directory containing configuration files.\n        process_index: Index of this worker process. For each application\n            there must be N processes running simultaneously, each started with a\n            unique index (from 0 to N-1). N is read from configuration\n            ('worker_processes' in 'processing_core.yml').\n        verbose: More verbose output (set log level to DEBUG).\n    \"\"\"\n##############################################\n# Initialize logging mechanism\nLOGFORMAT = \"%(asctime)-15s,%(threadName)s,%(name)s,[%(levelname)s] %(message)s\"\nLOGDATEFORMAT = \"%Y-%m-%dT%H:%M:%S\"\nlogging.basicConfig(\nlevel=logging.DEBUG if verbose else logging.INFO, format=LOGFORMAT, datefmt=LOGDATEFORMAT\n)\nlog = logging.getLogger()\n# Disable INFO and DEBUG messages from some libraries\nlogging.getLogger(\"requests\").setLevel(logging.WARNING)\nlogging.getLogger(\"urllib3\").setLevel(logging.WARNING)\nlogging.getLogger(\"amqpstorm\").setLevel(logging.WARNING)\n##############################################\n# Load configuration\nconfig_base_path = os.path.abspath(config_dir)\nlog.debug(f\"Loading config directory {config_base_path}\")\n# Whole configuration should be loaded\nconfig = read_config_dir(config_base_path, recursive=True)\ntry:\nmodel_spec = ModelSpec(config.get(\"db_entities\"))\nexcept ValidationError as e:\nlog.fatal(\"Invalid model specification: %s\", e)\nsys.exit(2)\n# Print whole attribute specification\nlog.debug(model_spec)\nnum_processes = config.get(\"processing_core.worker_processes\")\nplatform_config = PlatformConfig(\napp_name=app_name,\nconfig_base_path=config_base_path,\nconfig=config,\nmodel_spec=model_spec,\nprocess_index=process_index,\nnum_processes=num_processes,\n)\n##############################################\n# Create instances of core components\nlog.info(f\"***** {app_name} worker {process_index} of {num_processes} start *****\")\ndb = EntityDatabase(config.get(\"database\"), model_spec)\nglobal_scheduler = scheduler.Scheduler()\ntask_executor = TaskExecutor(db, platform_config)\nsnap_shooter = SnapShooter(\ndb,\nTaskQueueWriter(app_name, num_processes, config.get(\"processing_core.msg_broker\")),\ntask_executor,\nplatform_config,\nglobal_scheduler,\n)\nregistrar = CallbackRegistrar(global_scheduler, task_executor, snap_shooter)\nHistoryManager(db, platform_config, registrar)\nTelemetry(db, platform_config, registrar)\n# Lock used to control when the program stops.\ndaemon_stop_lock = threading.Lock()\ndaemon_stop_lock.acquire()\n# Signal handler releasing the lock on SIGINT or SIGTERM\ndef sigint_handler(signum, frame):\nlog.debug(\n\"Signal {} received, stopping worker\".format(\n{signal.SIGINT: \"SIGINT\", signal.SIGTERM: \"SIGTERM\"}.get(signum, signum)\n)\n)\ndaemon_stop_lock.release()\nsignal.signal(signal.SIGINT, sigint_handler)\nsignal.signal(signal.SIGTERM, sigint_handler)\nsignal.signal(signal.SIGABRT, sigint_handler)\ntask_distributor = TaskDistributor(task_executor, platform_config, registrar, daemon_stop_lock)\ncontrol = Control(platform_config)\ncontrol.set_action_handler(ControlAction.make_snapshots, snap_shooter.make_snapshots)\n##############################################\n# Load all plug-in modules\nos.path.dirname(__file__)\ncustom_modules_dir = config.get(\"processing_core.modules_dir\")\ncustom_modules_dir = os.path.abspath(os.path.join(config_base_path, custom_modules_dir))\nmodule_list = load_modules(\ncustom_modules_dir,\nconfig.get(\"processing_core.enabled_modules\"),\nlog,\nregistrar,\nplatform_config,\n)\n################################################\n# Initialization completed, run ...\n# Run update manager thread\nlog.info(\"***** Initialization completed, starting all modules *****\")\n# Run modules that have their own threads (TODO: there are no such modules, should be kept?)\n# (if they don't, the start() should do nothing)\nfor module in module_list:\nmodule.start()\n# start TaskDistributor (which starts TaskExecutors in several worker threads)\ntask_distributor.start()\n# Run scheduler\nglobal_scheduler.start()\n# Run SnapShooter\nsnap_shooter.start()\ncontrol.start()\n# Wait until someone wants to stop the program by releasing this Lock.\n# It may be a user by pressing Ctrl-C or some program module.\n# (try to acquire the lock again,\n# effectively waiting until it's released by signal handler or another thread)\nif os.name == \"nt\":\n# This is needed on Windows in order to catch Ctrl-C, which doesn't break the waiting.\nwhile not daemon_stop_lock.acquire(timeout=1):\npass\nelse:\ndaemon_stop_lock.acquire()\n################################################\n# Finalization & cleanup\n# Set signal handlers back to their defaults,\n# so the second Ctrl-C closes the program immediately\nsignal.signal(signal.SIGINT, signal.SIG_DFL)\nsignal.signal(signal.SIGTERM, signal.SIG_DFL)\nsignal.signal(signal.SIGABRT, signal.SIG_DFL)\nlog.info(\"Stopping running components ...\")\ncontrol.stop()\nsnap_shooter.stop()\nglobal_scheduler.stop()\ntask_distributor.stop()\nfor module in module_list:\nmodule.stop()\nlog.info(\"***** Finished, main thread exiting. *****\")\nlogging.shutdown()\n
"},{"location":"reference/api/","title":"api","text":""},{"location":"reference/api/#dp3.api","title":"dp3.api","text":""},{"location":"reference/api/main/","title":"main","text":""},{"location":"reference/api/main/#dp3.api.main","title":"dp3.api.main","text":""},{"location":"reference/api/internal/","title":"internal","text":""},{"location":"reference/api/internal/#dp3.api.internal","title":"dp3.api.internal","text":""},{"location":"reference/api/internal/config/","title":"config","text":""},{"location":"reference/api/internal/config/#dp3.api.internal.config","title":"dp3.api.internal.config","text":""},{"location":"reference/api/internal/config/#dp3.api.internal.config.ConfigEnv","title":"ConfigEnv","text":"

Bases: BaseModel

Configuration environment variables container

"},{"location":"reference/api/internal/dp_logger/","title":"dp_logger","text":""},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger","title":"dp3.api.internal.dp_logger","text":""},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger","title":"DPLogger","text":"
DPLogger(config: dict)\n

Datapoint logger

Logs good/bad datapoints into file for further analysis. They are logged in JSON format. Bad datapoints are logged together with their error message.

Logging may be disabled in api.yml configuration file:

# ...\ndatapoint_logger:\n  good_log: false\n  bad_log: false\n# ...\n
Source code in dp3/api/internal/dp_logger.py
def __init__(self, config: dict):\nif not config:\nconfig = {}\ngood_log_file = config.get(\"good_log\", False)\nbad_log_file = config.get(\"bad_log\", False)\n# Setup loggers\nself._good_logger = self.setup_logger(\"GOOD\", good_log_file)\nself._bad_logger = self.setup_logger(\"BAD\", bad_log_file)\n
"},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger.setup_logger","title":"setup_logger","text":"
setup_logger(name: str, log_file: str)\n

Creates new logger instance with log_file as target

Source code in dp3/api/internal/dp_logger.py
def setup_logger(self, name: str, log_file: str):\n\"\"\"Creates new logger instance with `log_file` as target\"\"\"\n# Create log handler\nif log_file:\nparent_path = pathlib.Path(log_file).parent\nif not parent_path.exists():\nraise FileNotFoundError(\nf\"The directory {parent_path} does not exist,\"\n\" check the configured path or create the directory.\"\n)\nlog_handler = logging.FileHandler(log_file)\nlog_handler.setFormatter(self.LOG_FORMATTER)\nelse:\nlog_handler = logging.NullHandler()\n# Get logger instance\nlogger = logging.getLogger(name)\nlogger.addHandler(log_handler)\nlogger.setLevel(logging.INFO)\nreturn logger\n
"},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger.log_good","title":"log_good","text":"
log_good(dps: list[DataPointBase], src: str = UNKNOWN_SRC_MSG)\n

Logs good datapoints

Datapoints are logged one-by-one in processed form. Source should be IP address of incomping request.

Source code in dp3/api/internal/dp_logger.py
def log_good(self, dps: list[DataPointBase], src: str = UNKNOWN_SRC_MSG):\n\"\"\"Logs good datapoints\n    Datapoints are logged one-by-one in processed form.\n    Source should be IP address of incomping request.\n    \"\"\"\nfor dp in dps:\nself._good_logger.info(dp.json(), extra={\"src\": src})\n
"},{"location":"reference/api/internal/dp_logger/#dp3.api.internal.dp_logger.DPLogger.log_bad","title":"log_bad","text":"
log_bad(request_body: str, validation_error_msg: str, src: str = UNKNOWN_SRC_MSG)\n

Logs bad datapoints including the validation error message

Whole request body is logged at once (JSON string is expected). Source should be IP address of incomping request.

Source code in dp3/api/internal/dp_logger.py
def log_bad(self, request_body: str, validation_error_msg: str, src: str = UNKNOWN_SRC_MSG):\n\"\"\"Logs bad datapoints including the validation error message\n    Whole request body is logged at once (JSON string is expected).\n    Source should be IP address of incomping request.\n    \"\"\"\n# Remove newlines from request body\nrequest_body = request_body.replace(\"\\n\", \" \")\n# Prepend error message with tabs\nvalidation_error_msg = validation_error_msg.replace(\"\\n\", \"\\n\\t\")\nself._bad_logger.info(f\"{request_body}\\n\\t{validation_error_msg}\", extra={\"src\": src})\n
"},{"location":"reference/api/internal/entity_response_models/","title":"entity_response_models","text":""},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models","title":"dp3.api.internal.entity_response_models","text":""},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityState","title":"EntityState","text":"

Bases: BaseModel

Entity specification and current state

Merges (some) data from DP3's EntitySpec and state information from Database. Provides estimate count of master records in database.

"},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityEidList","title":"EntityEidList","text":"

Bases: BaseModel

List of entity eids and their data based on latest snapshot

Includes timestamp of latest snapshot creation.

Data does not include history of observations attributes and timeseries.

"},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityEidData","title":"EntityEidData","text":"

Bases: BaseModel

Data of entity eid

Includes all snapshots and master record.

empty signalizes whether this eid includes any data.

"},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityEidAttrValueOrHistory","title":"EntityEidAttrValueOrHistory","text":"

Bases: BaseModel

Value and/or history of entity attribute for given eid

Depends on attribute type: - plain: just (current) value - observations: (current) value and history stored in master record (optionally filtered) - timeseries: just history stored in master record (optionally filtered)

"},{"location":"reference/api/internal/entity_response_models/#dp3.api.internal.entity_response_models.EntityEidAttrValue","title":"EntityEidAttrValue","text":"

Bases: BaseModel

Value of entity attribute for given eid

The value is fetched from master record.

"},{"location":"reference/api/internal/helpers/","title":"helpers","text":""},{"location":"reference/api/internal/helpers/#dp3.api.internal.helpers","title":"dp3.api.internal.helpers","text":""},{"location":"reference/api/internal/helpers/#dp3.api.internal.helpers.api_to_dp3_datapoint","title":"api_to_dp3_datapoint","text":"
api_to_dp3_datapoint(api_dp_values: dict) -> DataPointBase\n

Converts API datapoint values to DP3 datapoint

If etype-attr pair doesn't exist in DP3 config, raises ValueError. If values are not valid, raises pydantic's ValidationError.

Source code in dp3/api/internal/helpers.py
def api_to_dp3_datapoint(api_dp_values: dict) -> DataPointBase:\n\"\"\"Converts API datapoint values to DP3 datapoint\n    If etype-attr pair doesn't exist in DP3 config, raises `ValueError`.\n    If values are not valid, raises pydantic's ValidationError.\n    \"\"\"\netype = api_dp_values[\"type\"]\nattr = api_dp_values[\"attr\"]\n# Convert to DP3 datapoint format\ndp3_dp_values = api_dp_values\ndp3_dp_values[\"etype\"] = etype\ndp3_dp_values[\"eid\"] = api_dp_values[\"id\"]\n# Get attribute-specific model\ntry:\nmodel = MODEL_SPEC.attr(etype, attr).dp_model\nexcept KeyError as e:\nraise ValueError(f\"Combination of type '{etype}' and attr '{attr}' doesn't exist\") from e\n# Parse using the model\n# This may raise pydantic's ValidationError, but that's intensional (to get\n# a JSON-serializable trace as a response from API).\nreturn model.parse_obj(dp3_dp_values)\n
"},{"location":"reference/api/internal/models/","title":"models","text":""},{"location":"reference/api/internal/models/#dp3.api.internal.models","title":"dp3.api.internal.models","text":""},{"location":"reference/api/internal/models/#dp3.api.internal.models.DataPoint","title":"DataPoint","text":"

Bases: BaseModel

Data-point for API

Contains single raw data value received on API. This is generic class for plain, observation and timeseries datapoints.

Provides front line of validation for this data value.

This differs slightly compared to DataPoint from DP3 in naming of attributes due to historic reasons.

After validation of this schema, datapoint is validated using attribute-specific validator to ensure full compilance.

"},{"location":"reference/api/internal/response_models/","title":"response_models","text":""},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models","title":"dp3.api.internal.response_models","text":""},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models.HealthCheckResponse","title":"HealthCheckResponse","text":"

Bases: BaseModel

Healthcheck endpoint response

"},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models.SuccessResponse","title":"SuccessResponse","text":"

Bases: BaseModel

Generic success response

"},{"location":"reference/api/internal/response_models/#dp3.api.internal.response_models.RequestValidationError","title":"RequestValidationError","text":"
RequestValidationError(loc, msg)\n

Bases: HTTPException

HTTP exception wrapper to simplify path and query validation

Source code in dp3/api/internal/response_models.py
def __init__(self, loc, msg):\nsuper().__init__(422, [{\"loc\": loc, \"msg\": msg, \"type\": \"value_error\"}])\n
"},{"location":"reference/api/routers/","title":"routers","text":""},{"location":"reference/api/routers/#dp3.api.routers","title":"dp3.api.routers","text":""},{"location":"reference/api/routers/control/","title":"control","text":""},{"location":"reference/api/routers/control/#dp3.api.routers.control","title":"dp3.api.routers.control","text":""},{"location":"reference/api/routers/control/#dp3.api.routers.control.execute_action","title":"execute_action async","text":"
execute_action(action: ControlAction) -> SuccessResponse\n

Sends the given action into execution queue.

Source code in dp3/api/routers/control.py
@router.get(\"/{action}\")\nasync def execute_action(action: ControlAction) -> SuccessResponse:\n\"\"\"Sends the given action into execution queue.\"\"\"\nCONTROL_WRITER.put_task(ControlMessage(action=action))\nreturn SuccessResponse(detail=\"Action sent.\")\n
"},{"location":"reference/api/routers/entity/","title":"entity","text":""},{"location":"reference/api/routers/entity/#dp3.api.routers.entity","title":"dp3.api.routers.entity","text":""},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.check_entity","title":"check_entity async","text":"
check_entity(entity: str)\n

Middleware to check entity existence

Source code in dp3/api/routers/entity.py
async def check_entity(entity: str):\n\"\"\"Middleware to check entity existence\"\"\"\nif entity not in MODEL_SPEC.entities:\nraise RequestValidationError([\"path\", \"entity\"], f\"Entity '{entity}' doesn't exist\")\nreturn entity\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.list_entity_eids","title":"list_entity_eids async","text":"
list_entity_eids(entity: str, skip: NonNegativeInt = 0, limit: PositiveInt = 20) -> EntityEidList\n

List latest snapshots of all ids present in database under entity.

Contains only latest snapshot.

Uses pagination.

Source code in dp3/api/routers/entity.py
@router.get(\"/{entity}\")\nasync def list_entity_eids(\nentity: str, skip: NonNegativeInt = 0, limit: PositiveInt = 20\n) -> EntityEidList:\n\"\"\"List latest snapshots of all `id`s present in database under `entity`.\n    Contains only latest snapshot.\n    Uses pagination.\n    \"\"\"\ncursor = DB.get_latest_snapshots(entity).skip(skip).limit(limit)\ntime_created = None\n# Remove _id field\nresult = list(cursor)\nfor r in result:\ntime_created = r[\"_time_created\"]\ndel r[\"_time_created\"]\ndel r[\"_id\"]\nreturn EntityEidList(time_created=time_created, data=result)\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.get_eid_data","title":"get_eid_data async","text":"
get_eid_data(entity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidData\n

Get data of entity's eid.

Contains all snapshots and master record. Snapshots are ordered by ascending creation time.

Source code in dp3/api/routers/entity.py
@router.get(\"/{entity}/{eid}\")\nasync def get_eid_data(\nentity: str, eid: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None\n) -> EntityEidData:\n\"\"\"Get data of `entity`'s `eid`.\n    Contains all snapshots and master record.\n    Snapshots are ordered by ascending creation time.\n    \"\"\"\n# Get master record\n# TODO: This is probably not the most efficient way. Maybe gather only\n# plain data from master record and then call `get_timeseries_history`\n# for timeseries.\nmaster_record = DB.get_master_record(entity, eid)\nif \"_id\" in master_record:\ndel master_record[\"_id\"]\nif \"#hash\" in master_record:\ndel master_record[\"#hash\"]\n# Get filtered timeseries data\nfor attr in master_record:\nif MODEL_SPEC.attr(entity, attr).t == AttrType.TIMESERIES:\nmaster_record[attr] = DB.get_timeseries_history(\nentity, attr, eid, t1=date_from, t2=date_to\n)\n# Get snapshots\nsnapshots = list(DB.get_snapshots(entity, eid, t1=date_from, t2=date_to))\nfor s in snapshots:\ndel s[\"_id\"]\n# Whether this eid contains any data\nempty = not master_record and len(snapshots) == 0\nreturn EntityEidData(empty=empty, master_record=master_record, snapshots=snapshots)\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.get_eid_attr_value","title":"get_eid_attr_value async","text":"
get_eid_attr_value(entity: str, eid: str, attr: str, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None) -> EntityEidAttrValueOrHistory\n

Get attribute value

Value is either of: - current value: in case of plain attribute - current value and history: in case of observation attribute - history: in case of timeseries attribute

Source code in dp3/api/routers/entity.py
@router.get(\"/{entity}/{eid}/get/{attr}\")\nasync def get_eid_attr_value(\nentity: str,\neid: str,\nattr: str,\ndate_from: Optional[datetime] = None,\ndate_to: Optional[datetime] = None,\n) -> EntityEidAttrValueOrHistory:\n\"\"\"Get attribute value\n    Value is either of:\n    - current value: in case of plain attribute\n    - current value and history: in case of observation attribute\n    - history: in case of timeseries attribute\n    \"\"\"\n# Check if attribute exists\nif attr not in MODEL_SPEC.attribs(entity):\nraise RequestValidationError([\"path\", \"attr\"], f\"Attribute '{attr}' doesn't exist\")\nvalue_or_history = DB.get_value_or_history(entity, attr, eid, t1=date_from, t2=date_to)\nreturn EntityEidAttrValueOrHistory(\nattr_type=MODEL_SPEC.attr(entity, attr).t, **value_or_history\n)\n
"},{"location":"reference/api/routers/entity/#dp3.api.routers.entity.set_eid_attr_value","title":"set_eid_attr_value async","text":"
set_eid_attr_value(entity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request) -> SuccessResponse\n

Set current value of attribute

Internally just creates datapoint for specified attribute and value.

This endpoint is meant for editable plain attributes -- for direct user edit on DP3 web UI.

Source code in dp3/api/routers/entity.py
@router.post(\"/{entity}/{eid}/set/{attr}\")\nasync def set_eid_attr_value(\nentity: str, eid: str, attr: str, body: EntityEidAttrValue, request: Request\n) -> SuccessResponse:\n\"\"\"Set current value of attribute\n    Internally just creates datapoint for specified attribute and value.\n    This endpoint is meant for `editable` plain attributes -- for direct user edit on DP3 web UI.\n    \"\"\"\n# Check if attribute exists\nif attr not in MODEL_SPEC.attribs(entity):\nraise RequestValidationError([\"path\", \"attr\"], f\"Attribute '{attr}' doesn't exist\")\n# Construct datapoint\ntry:\ndp = DataPoint(\ntype=entity,\nid=eid,\nattr=attr,\nv=body.value,\nt1=datetime.now(),\nsrc=f\"{request.client.host} via API\",\n)\ndp3_dp = api_to_dp3_datapoint(dp.dict())\nexcept ValidationError as e:\nraise RequestValidationError([\"body\", \"value\"], e.errors()[0][\"msg\"]) from e\n# This shouldn't fail\ntask = DataPointTask(model_spec=MODEL_SPEC, etype=entity, eid=eid, data_points=[dp3_dp])\n# Push tasks to task queue\nTASK_WRITER.put_task(task, False)\n# Datapoints from this endpoint are intentionally not logged using `DPLogger`.\n# If for some reason, in the future, they need to be, just copy code from data ingestion\n# endpoint.\nreturn SuccessResponse()\n
"},{"location":"reference/api/routers/root/","title":"root","text":""},{"location":"reference/api/routers/root/#dp3.api.routers.root","title":"dp3.api.routers.root","text":""},{"location":"reference/api/routers/root/#dp3.api.routers.root.health_check","title":"health_check async","text":"
health_check() -> HealthCheckResponse\n

Health check

Returns simple 'It works!' response.

Source code in dp3/api/routers/root.py
@router.get(\"/\", tags=[\"Health\"])\nasync def health_check() -> HealthCheckResponse:\n\"\"\"Health check\n    Returns simple 'It works!' response.\n    \"\"\"\nreturn HealthCheckResponse()\n
"},{"location":"reference/api/routers/root/#dp3.api.routers.root.insert_datapoints","title":"insert_datapoints async","text":"
insert_datapoints(dps: list[DataPoint], request: Request) -> SuccessResponse\n

Insert datapoints

Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.

Source code in dp3/api/routers/root.py
@router.post(DATAPOINTS_INGESTION_URL_PATH, tags=[\"Data ingestion\"])\nasync def insert_datapoints(dps: list[DataPoint], request: Request) -> SuccessResponse:\n\"\"\"Insert datapoints\n    Validates and pushes datapoints into task queue, so they are processed by one of DP3 workers.\n    \"\"\"\n# Convert to DP3 datapoints\n# This should not fail as all datapoints are already validated\ndp3_dps = [api_to_dp3_datapoint(dp.dict()) for dp in dps]\n# Group datapoints by etype-eid\ntasks_dps = defaultdict(list)\nfor dp in dp3_dps:\nkey = (dp.etype, dp.eid)\ntasks_dps[key].append(dp)\n# Create tasks\ntasks = []\nfor k in tasks_dps:\netype, eid = k\n# This shouldn't fail either\ntasks.append(\nDataPointTask(model_spec=MODEL_SPEC, etype=etype, eid=eid, data_points=tasks_dps[k])\n)\n# Push tasks to task queue\nfor task in tasks:\nTASK_WRITER.put_task(task, False)\n# Log datapoints\nDP_LOGGER.log_good(dp3_dps, src=request.client.host)\nreturn SuccessResponse()\n
"},{"location":"reference/api/routers/root/#dp3.api.routers.root.list_entities","title":"list_entities async","text":"
list_entities() -> dict[str, EntityState]\n

List entities

Returns dictionary containing all entities configured -- their simplified configuration and current state information.

Source code in dp3/api/routers/root.py
@router.get(\"/entities\", tags=[\"Entity\"])\nasync def list_entities() -> dict[str, EntityState]:\n\"\"\"List entities\n    Returns dictionary containing all entities configured -- their simplified configuration\n    and current state information.\n    \"\"\"\nentities = {}\nfor e_id in MODEL_SPEC.entities:\nentity_spec = MODEL_SPEC.entity(e_id)\nentities[e_id] = {\n\"id\": e_id,\n\"name\": entity_spec.name,\n\"attribs\": MODEL_SPEC.attribs(e_id),\n\"eid_estimate_count\": DB.estimate_count_eids(e_id),\n}\nreturn entities\n
"},{"location":"reference/bin/","title":"bin","text":""},{"location":"reference/bin/#dp3.bin","title":"dp3.bin","text":""},{"location":"reference/bin/api/","title":"api","text":""},{"location":"reference/bin/api/#dp3.bin.api","title":"dp3.bin.api","text":"

Run the DP3 API using uvicorn.

"},{"location":"reference/bin/setup/","title":"setup","text":""},{"location":"reference/bin/setup/#dp3.bin.setup","title":"dp3.bin.setup","text":"

DP3 Setup Script for creating a DP3 application.

"},{"location":"reference/bin/setup/#dp3.bin.setup.replace_template","title":"replace_template","text":"
replace_template(directory: Path, template: str, replace_with: str)\n

Replace all occurrences of template with the given text.

Source code in dp3/bin/setup.py
def replace_template(directory: Path, template: str, replace_with: str):\n\"\"\"Replace all occurrences of `template` with the given text.\"\"\"\nfor file in directory.rglob(\"*\"):\nif file.is_file():\ntry:\nwith file.open(\"r+\") as f:\ncontents = f.read()\ncontents = contents.replace(template, replace_with)\nf.seek(0)\nf.write(contents)\nf.truncate()\nexcept UnicodeDecodeError:\npass\nexcept PermissionError:\npass\n
"},{"location":"reference/bin/worker/","title":"worker","text":""},{"location":"reference/bin/worker/#dp3.bin.worker","title":"dp3.bin.worker","text":""},{"location":"reference/common/","title":"common","text":""},{"location":"reference/common/#dp3.common","title":"dp3.common","text":"

Common modules which are used throughout the platform.

  • Config, EntitySpec and AttrSpec - Models for reading, validation and representing platform configuration of entities and their attributes. base_attrs and datatype are also used within this context.
  • Scheduler - Allows modules to run callbacks at specified times
  • Task - Model for a single task processed by the platform
  • Utils - Auxiliary utility functions
"},{"location":"reference/common/attrspec/","title":"attrspec","text":""},{"location":"reference/common/attrspec/#dp3.common.attrspec","title":"dp3.common.attrspec","text":""},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrType","title":"AttrType","text":"

Bases: Flag

Enum of attribute types

PLAIN = 1 OBSERVATIONS = 2 TIMESERIES = 4

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrType.from_str","title":"from_str classmethod","text":"
from_str(type_str: str)\n

Convert string representation like \"plain\" to AttrType.

Source code in dp3/common/attrspec.py
@classmethod\ndef from_str(cls, type_str: str):\n\"\"\"\n    Convert string representation like \"plain\" to AttrType.\n    \"\"\"\ntry:\nreturn cls(cls[type_str.upper()])\nexcept Exception as e:\nraise AttrTypeError(f\"Invalid attribute type '{type_str}'\") from e\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.ObservationsHistoryParams","title":"ObservationsHistoryParams","text":"

Bases: BaseModel

History parameters field of observations attribute

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.TimeseriesTSParams","title":"TimeseriesTSParams","text":"

Bases: BaseModel

Timeseries parameters field of timeseries attribute

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.TimeseriesSeries","title":"TimeseriesSeries","text":"

Bases: BaseModel

Series of timeseries attribute

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecGeneric","title":"AttrSpecGeneric","text":"

Bases: BaseModel

Base of attribute specification

Parent of other AttrSpec classes.

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecClassic","title":"AttrSpecClassic","text":"

Bases: AttrSpecGeneric

Parent of non-timeseries AttrSpec classes.

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecClassic.is_relation","title":"is_relation property","text":"
is_relation: bool\n

Returns whether specified attribute is a link.

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecClassic.relation_to","title":"relation_to property","text":"
relation_to: str\n

Returns linked entity id. Raises ValueError if attribute is not a link.

"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecPlain","title":"AttrSpecPlain","text":"
AttrSpecPlain(**data)\n

Bases: AttrSpecClassic

Plain attribute specification

Source code in dp3/common/attrspec.py
def __init__(self, **data):\nsuper().__init__(**data)\nself._dp_model = create_model(\nf\"DataPointPlain_{self.id}\",\n__base__=DataPointPlainBase,\nv=(self.data_type.data_type, ...),\n)\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecObservations","title":"AttrSpecObservations","text":"
AttrSpecObservations(**data)\n

Bases: AttrSpecClassic

Observations attribute specification

Source code in dp3/common/attrspec.py
def __init__(self, **data):\nsuper().__init__(**data)\nvalue_validator = self.data_type.data_type\nself._dp_model = create_model(\nf\"DataPointObservations_{self.id}\",\n__base__=DataPointObservationsBase,\nv=(value_validator, ...),\n)\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpecTimeseries","title":"AttrSpecTimeseries","text":"
AttrSpecTimeseries(**data)\n

Bases: AttrSpecGeneric

Timeseries attribute specification

Source code in dp3/common/attrspec.py
def __init__(self, **data):\nsuper().__init__(**data)\n# Typing of `v` field\ndp_value_typing = {}\nfor s in self.series:\ndata_type = self.series[s].data_type.data_type\ndp_value_typing[s] = ((list[data_type]), ...)\n# Validators\ndp_validators = {\n\"v_validator\": dp_ts_v_validator,\n}\n# Add root validator\nif self.timeseries_type == \"regular\":\ndp_validators[\"root_validator\"] = dp_ts_root_validator_regular_wrapper(\nself.timeseries_params.time_step\n)\nelif self.timeseries_type == \"irregular\":\ndp_validators[\"root_validator\"] = dp_ts_root_validator_irregular\nelif self.timeseries_type == \"irregular_intervals\":\ndp_validators[\"root_validator\"] = dp_ts_root_validator_irregular_intervals\nself._dp_model = create_model(\nf\"DataPointTimeseries_{self.id}\",\n__base__=DataPointTimeseriesBase,\n__validators__=dp_validators,\nv=(create_model(f\"DataPointTimeseriesValue_{self.id}\", **dp_value_typing), ...),\n)\n
"},{"location":"reference/common/attrspec/#dp3.common.attrspec.AttrSpec","title":"AttrSpec","text":"
AttrSpec(id: str, spec: dict[str, Any]) -> AttrSpecType\n

Factory for AttrSpec classes

Source code in dp3/common/attrspec.py
def AttrSpec(id: str, spec: dict[str, Any]) -> AttrSpecType:\n\"\"\"Factory for `AttrSpec` classes\"\"\"\nattr_type = AttrType.from_str(spec.get(\"type\"))\nsubclasses = {\nAttrType.PLAIN: AttrSpecPlain,\nAttrType.OBSERVATIONS: AttrSpecObservations,\nAttrType.TIMESERIES: AttrSpecTimeseries,\n}\nreturn subclasses[attr_type](id=id, **spec)\n
"},{"location":"reference/common/base_attrs/","title":"base_attrs","text":""},{"location":"reference/common/base_attrs/#dp3.common.base_attrs","title":"dp3.common.base_attrs","text":""},{"location":"reference/common/base_module/","title":"base_module","text":""},{"location":"reference/common/base_module/#dp3.common.base_module","title":"dp3.common.base_module","text":""},{"location":"reference/common/base_module/#dp3.common.base_module.BaseModule","title":"BaseModule","text":"
BaseModule(platform_config: PlatformConfig, module_config: dict, registrar: CallbackRegistrar)\n

Bases: ABC

Abstract class for platform modules. Every module must inherit this abstract class for automatic loading of module!

Initialize the module and register callbacks.

Parameters:

Name Type Description Default platform_config PlatformConfig

Platform configuration class

required module_config dict

Configuration of the module, equivalent of platform_config.config.get(\"modules.<module_name>\")

required registrar CallbackRegistrar

A callback / hook registration interface

required Source code in dp3/common/base_module.py
@abstractmethod\ndef __init__(\nself, platform_config: PlatformConfig, module_config: dict, registrar: CallbackRegistrar\n):\n\"\"\"Initialize the module and register callbacks.\n    Args:\n        platform_config: Platform configuration class\n        module_config: Configuration of the module,\n            equivalent of `platform_config.config.get(\"modules.<module_name>\")`\n        registrar: A callback / hook registration interface\n    \"\"\"\n
"},{"location":"reference/common/base_module/#dp3.common.base_module.BaseModule.start","title":"start","text":"
start() -> None\n

Run the module - used to run own thread if needed.

Called after initialization, may be used to create and run a separate thread if needed by the module. Do nothing unless overridden.

Source code in dp3/common/base_module.py
def start(self) -> None:\n\"\"\"\n    Run the module - used to run own thread if needed.\n    Called after initialization, may be used to create and run a separate\n    thread if needed by the module. Do nothing unless overridden.\n    \"\"\"\nreturn None\n
"},{"location":"reference/common/base_module/#dp3.common.base_module.BaseModule.stop","title":"stop","text":"
stop() -> None\n

Stop the module - used to stop own thread.

Called before program exit, may be used to finalize and stop the separate thread if it is used. Do nothing unless overridden.

Source code in dp3/common/base_module.py
def stop(self) -> None:\n\"\"\"\n    Stop the module - used to stop own thread.\n    Called before program exit, may be used to finalize and stop the\n    separate thread if it is used. Do nothing unless overridden.\n    \"\"\"\nreturn None\n
"},{"location":"reference/common/callback_registrar/","title":"callback_registrar","text":""},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar","title":"dp3.common.callback_registrar","text":""},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar","title":"CallbackRegistrar","text":"
CallbackRegistrar(scheduler: Scheduler, task_executor: TaskExecutor, snap_shooter: SnapShooter)\n

Interface for callback registration.

Source code in dp3/common/callback_registrar.py
def __init__(\nself, scheduler: Scheduler, task_executor: TaskExecutor, snap_shooter: SnapShooter\n):\nself._scheduler = scheduler\nself._task_executor = task_executor\nself._snap_shooter = snap_shooter\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.scheduler_register","title":"scheduler_register","text":"
scheduler_register(func: Callable, *, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int\n

Register a function to be run at specified times.

Pass cron-like specification of when the function should be called, see docs of apscheduler.triggers.cron for details. `

Parameters:

Name Type Description Default func Callable

function or method to be called

required func_args Union[list, tuple]

list of positional arguments to call func with

None func_kwargs dict

dict of keyword arguments to call func with

None year Union[int, str]

4-digit year

None month Union[int, str]

month (1-12)

None day Union[int, str]

day of month (1-31)

None week Union[int, str]

ISO week (1-53)

None day_of_week Union[int, str]

number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)

None hour Union[int, str]

hour (0-23)

None minute Union[int, str]

minute (0-59)

None second Union[int, str]

second (0-59)

None timezone str

Timezone for time specification (default is UTC).

'UTC'

Returns:

Type Description int

job ID

Source code in dp3/common/callback_registrar.py
def scheduler_register(\nself,\nfunc: Callable,\n*,\nfunc_args: Union[list, tuple] = None,\nfunc_kwargs: dict = None,\nyear: Union[int, str] = None,\nmonth: Union[int, str] = None,\nday: Union[int, str] = None,\nweek: Union[int, str] = None,\nday_of_week: Union[int, str] = None,\nhour: Union[int, str] = None,\nminute: Union[int, str] = None,\nsecond: Union[int, str] = None,\ntimezone: str = \"UTC\",\n) -> int:\n\"\"\"\n    Register a function to be run at specified times.\n    Pass cron-like specification of when the function should be called,\n    see [docs](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html)\n    of apscheduler.triggers.cron for details.\n    `\n    Args:\n        func: function or method to be called\n        func_args: list of positional arguments to call func with\n        func_kwargs: dict of keyword arguments to call func with\n        year: 4-digit year\n        month: month (1-12)\n        day: day of month (1-31)\n        week: ISO week (1-53)\n        day_of_week: number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)\n        hour: hour (0-23)\n        minute: minute (0-59)\n        second: second (0-59)\n        timezone: Timezone for time specification (default is UTC).\n    Returns:\n         job ID\n    \"\"\"\nreturn self._scheduler.register(\nfunc,\nfunc_args=func_args,\nfunc_kwargs=func_kwargs,\nyear=year,\nmonth=month,\nday=day,\nweek=week,\nday_of_week=day_of_week,\nhour=hour,\nminute=minute,\nsecond=second,\ntimezone=timezone,\n)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_task_hook","title":"register_task_hook","text":"
register_task_hook(hook_type: str, hook: Callable)\n

Registers one of available task hooks

See: TaskGenericHooksContainer in task_hooks.py

Source code in dp3/common/callback_registrar.py
def register_task_hook(self, hook_type: str, hook: Callable):\n\"\"\"Registers one of available task hooks\n    See: [`TaskGenericHooksContainer`][dp3.task_processing.task_hooks.TaskGenericHooksContainer]\n    in `task_hooks.py`\n    \"\"\"\nself._task_executor.register_task_hook(hook_type, hook)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_entity_hook","title":"register_entity_hook","text":"
register_entity_hook(hook_type: str, hook: Callable, entity: str)\n

Registers one of available task entity hooks

See: TaskEntityHooksContainer in task_hooks.py

Source code in dp3/common/callback_registrar.py
def register_entity_hook(self, hook_type: str, hook: Callable, entity: str):\n\"\"\"Registers one of available task entity hooks\n    See: [`TaskEntityHooksContainer`][dp3.task_processing.task_hooks.TaskEntityHooksContainer]\n    in `task_hooks.py`\n    \"\"\"\nself._task_executor.register_entity_hook(hook_type, hook, entity)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_attr_hook","title":"register_attr_hook","text":"
register_attr_hook(hook_type: str, hook: Callable, entity: str, attr: str)\n

Registers one of available task attribute hooks

See: TaskAttrHooksContainer in task_hooks.py

Source code in dp3/common/callback_registrar.py
def register_attr_hook(self, hook_type: str, hook: Callable, entity: str, attr: str):\n\"\"\"Registers one of available task attribute hooks\n    See: [`TaskAttrHooksContainer`][dp3.task_processing.task_hooks.TaskAttrHooksContainer]\n    in `task_hooks.py`\n    \"\"\"\nself._task_executor.register_attr_hook(hook_type, hook, entity, attr)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_timeseries_hook","title":"register_timeseries_hook","text":"
register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n

Registers passed timeseries hook to be called during snapshot creation.

Binds hook to specified entity_type and attr_type (though same hook can be bound multiple times).

Parameters:

Name Type Description Default hook Callable[[str, str, list[dict]], list[DataPointTask]]

hook callable should expect entity_type, attr_type and attribute history as arguments and return a list of DataPointTask objects.

required entity_type str

specifies entity type

required attr_type str

specifies attribute type

required

Raises:

Type Description ValueError

If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.

Source code in dp3/common/callback_registrar.py
def register_timeseries_hook(\nself,\nhook: Callable[[str, str, list[dict]], list[DataPointTask]],\nentity_type: str,\nattr_type: str,\n):\n\"\"\"\n    Registers passed timeseries hook to be called during snapshot creation.\n    Binds hook to specified `entity_type` and `attr_type` (though same hook can be bound\n    multiple times).\n    Args:\n        hook: `hook` callable should expect entity_type, attr_type and attribute\n            history as arguments and return a list of `DataPointTask` objects.\n        entity_type: specifies entity type\n        attr_type: specifies attribute type\n    Raises:\n        ValueError: If entity_type and attr_type do not specify a valid timeseries attribute,\n            a ValueError is raised.\n    \"\"\"\nself._snap_shooter.register_timeseries_hook(hook, entity_type, attr_type)\n
"},{"location":"reference/common/callback_registrar/#dp3.common.callback_registrar.CallbackRegistrar.register_correlation_hook","title":"register_correlation_hook","text":"
register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])\n

Registers passed hook to be called during snapshot creation.

Binds hook to specified entity_type (though same hook can be bound multiple times).

entity_type and attribute specifications are validated, ValueError is raised on failure.

Parameters:

Name Type Description Default hook Callable[[str, dict], None]

hook callable should expect entity type as str and its current values, including linked entities, as dict

required entity_type str

specifies entity type

required depends_on list[list[str]]

each item should specify an attribute that is depended on in the form of a path from the specified entity_type to individual attributes (even on linked entities).

required may_change list[list[str]]

each item should specify an attribute that hook may change. specification format is identical to depends_on.

required

Raises:

Type Description ValueError

On failure of specification validation.

Source code in dp3/common/callback_registrar.py
def register_correlation_hook(\nself,\nhook: Callable[[str, dict], None],\nentity_type: str,\ndepends_on: list[list[str]],\nmay_change: list[list[str]],\n):\n\"\"\"\n    Registers passed hook to be called during snapshot creation.\n    Binds hook to specified entity_type (though same hook can be bound multiple times).\n    `entity_type` and attribute specifications are validated, `ValueError` is raised on failure.\n    Args:\n        hook: `hook` callable should expect entity type as str\n            and its current values, including linked entities, as dict\n        entity_type: specifies entity type\n        depends_on: each item should specify an attribute that is depended on\n            in the form of a path from the specified entity_type to individual attributes\n            (even on linked entities).\n        may_change: each item should specify an attribute that `hook` may change.\n            specification format is identical to `depends_on`.\n    Raises:\n        ValueError: On failure of specification validation.\n    \"\"\"\nself._snap_shooter.register_correlation_hook(hook, entity_type, depends_on, may_change)\n
"},{"location":"reference/common/config/","title":"config","text":""},{"location":"reference/common/config/#dp3.common.config","title":"dp3.common.config","text":"

Platform config file reader and config model.

"},{"location":"reference/common/config/#dp3.common.config.HierarchicalDict","title":"HierarchicalDict","text":"

Bases: dict

Extension of built-in dict that simplifies working with a nested hierarchy of dicts.

"},{"location":"reference/common/config/#dp3.common.config.HierarchicalDict.get","title":"get","text":"
get(key, default = NoDefault)\n

Key may be a path (in dot notation) into a hierarchy of dicts. For example dictionary.get('abc.x.y') is equivalent to dictionary['abc']['x']['y'].

:returns: self[key] or default if key is not found.

Source code in dp3/common/config.py
def get(self, key, default=NoDefault):\n\"\"\"\n    Key may be a path (in dot notation) into a hierarchy of dicts. For example\n      `dictionary.get('abc.x.y')`\n    is equivalent to\n      `dictionary['abc']['x']['y']`.\n    :returns: `self[key]` or `default` if key is not found.\n    \"\"\"\nd = self\ntry:\nwhile \".\" in key:\nfirst_key, key = key.split(\".\", 1)\nd = d[first_key]\nreturn d[key]\nexcept (KeyError, TypeError):\npass  # not found - continue below\nif default is NoDefault:\nraise MissingConfigError(\"Mandatory configuration element is missing: \" + key)\nelse:\nreturn default\n
"},{"location":"reference/common/config/#dp3.common.config.HierarchicalDict.update","title":"update","text":"
update(other, **kwargs)\n

Update HierarchicalDict with other dictionary and merge common keys.

If there is a key in both current and the other dictionary and values of both keys are dictionaries, they are merged together.

Example:

HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})\n->\nHierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})\n
Changes the dictionary directly, returns None.

Source code in dp3/common/config.py
def update(self, other, **kwargs):\n\"\"\"\n    Update `HierarchicalDict` with other dictionary and merge common keys.\n    If there is a key in both current and the other dictionary and values of\n    both keys are dictionaries, they are merged together.\n    Example:\n    ```\n    HierarchicalDict({'a': {'b': 1, 'c': 2}}).update({'a': {'b': 10, 'd': 3}})\n    ->\n    HierarchicalDict({'a': {'b': 10, 'c': 2, 'd': 3}})\n    ```\n    Changes the dictionary directly, returns `None`.\n    \"\"\"\nother = dict(other)\nfor key in other:\nif key in self:\nif isinstance(self[key], dict) and isinstance(other[key], dict):\n# The key is present in both dicts and both key values are dicts -> merge them\nHierarchicalDict.update(self[key], other[key])\nelse:\n# One of the key values is not a dict -> overwrite the value\n# in self by the one from other (like normal \"update\" does)\nself[key] = other[key]\nelse:\n# key is not present in self -> set it to value from other\nself[key] = other[key]\n
"},{"location":"reference/common/config/#dp3.common.config.EntitySpecDict","title":"EntitySpecDict","text":"

Bases: BaseModel

Class representing full specification of an entity.

Attributes:

Name Type Description entity EntitySpec

Specification and settings of entity itself.

attribs dict[str, AttrSpecType]

A mapping of attribute id -> AttrSpec

"},{"location":"reference/common/config/#dp3.common.config.ModelSpec","title":"ModelSpec","text":"
ModelSpec(config: HierarchicalDict)\n

Bases: BaseModel

Class representing the platform's current entity and attribute specification.

Attributes:

Name Type Description config dict[str, EntitySpecDict]

Legacy config format, exactly mirrors the config files.

entities dict[str, EntitySpec]

Mapping of entity id -> EntitySpec

attributes dict[tuple[str, str], AttrSpecType]

Mapping of (entity id, attribute id) -> AttrSpec

entity_attributes dict[str, dict[str, AttrSpecType]]

Mapping of entity id -> attribute id -> AttrSpec

relations dict[tuple[str, str], AttrSpecType]

Mapping of (entity id, attribute id) -> AttrSpec only contains attributes which are relations.

Provided configuration must be a dict of following structure:

{\n    <entity type>: {\n        'entity': {\n            entity specification\n        },\n        'attribs': {\n            <attr id>: {\n                attribute specification\n            },\n            other attributes\n        }\n    },\n    other entity types\n}\n

Raises:

Type Description ValueError

if the specification is invalid.

Source code in dp3/common/config.py
def __init__(self, config: HierarchicalDict):\n\"\"\"\n    Provided configuration must be a dict of following structure:\n    ```\n    {\n        <entity type>: {\n            'entity': {\n                entity specification\n            },\n            'attribs': {\n                <attr id>: {\n                    attribute specification\n                },\n                other attributes\n            }\n        },\n        other entity types\n    }\n    ```\n    Raises:\n        ValueError: if the specification is invalid.\n    \"\"\"\nsuper().__init__(\nconfig=config, entities={}, attributes={}, entity_attributes={}, relations={}\n)\n
"},{"location":"reference/common/config/#dp3.common.config.PlatformConfig","title":"PlatformConfig","text":"

Bases: BaseModel

An aggregation of configuration available to modules.

Attributes:

Name Type Description app_name str

Name of the application, used when naming various structures of the platform

config_base_path str

Path to directory containing platform config

config HierarchicalDict

A dictionary that contains the platform config

model_spec ModelSpec

Specification of the platform's model (entities and attributes)

num_processes PositiveInt

Number of worker processes

process_index NonNegativeInt

Index of current process

"},{"location":"reference/common/config/#dp3.common.config.read_config","title":"read_config","text":"
read_config(filepath: str) -> HierarchicalDict\n

Read configuration file and return config as a dict-like object.

The configuration file should contain a valid YAML - Comments may be included as lines starting with # (optionally preceded by whitespaces).

This function reads the file and converts it to a HierarchicalDict. The only difference from built-in dict is its get method, which allows hierarchical keys (e.g. abc.x.y). See doc of get method for more information.

Source code in dp3/common/config.py
def read_config(filepath: str) -> HierarchicalDict:\n\"\"\"\n    Read configuration file and return config as a dict-like object.\n    The configuration file should contain a valid YAML\n    - Comments may be included as lines starting with `#` (optionally preceded\n      by whitespaces).\n    This function reads the file and converts it to a `HierarchicalDict`.\n    The only difference from built-in `dict` is its `get` method, which allows\n    hierarchical keys (e.g. `abc.x.y`).\n    See [doc of get method][dp3.common.config.HierarchicalDict.get] for more information.\n    \"\"\"\nwith open(filepath) as file_content:\nreturn HierarchicalDict(yaml.safe_load(file_content))\n
"},{"location":"reference/common/config/#dp3.common.config.read_config_dir","title":"read_config_dir","text":"
read_config_dir(dir_path: str, recursive: bool = False) -> HierarchicalDict\n

Same as read_config, but it loads whole configuration directory of YAML files, so only files ending with \".yml\" are loaded. Each loaded configuration is located under key named after configuration filename.

Parameters:

Name Type Description Default dir_path str

Path to read config from.

required recursive bool

If recursive is set, then the configuration directory will be read recursively (including configuration files inside directories).

False Source code in dp3/common/config.py
def read_config_dir(dir_path: str, recursive: bool = False) -> HierarchicalDict:\n\"\"\"\n    Same as [read_config][dp3.common.config.read_config],\n    but it loads whole configuration directory of YAML files,\n    so only files ending with \".yml\" are loaded.\n    Each loaded configuration is located under key named after configuration filename.\n    Args:\n        dir_path: Path to read config from.\n        recursive: If `recursive` is set, then the configuration directory will be read\n            recursively (including configuration files inside directories).\n    \"\"\"\nall_files_paths = os.listdir(dir_path)\nconfig = HierarchicalDict()\nfor config_filename in all_files_paths:\nconfig_full_path = os.path.join(dir_path, config_filename)\nif os.path.isdir(config_full_path) and recursive:\nloaded_config = read_config_dir(config_full_path, recursive)\nelif os.path.isfile(config_full_path) and config_filename.endswith(\".yml\"):\ntry:\nloaded_config = read_config(config_full_path)\nexcept TypeError:\n# configuration file is empty\ncontinue\n# remove '.yml' suffix of filename\nconfig_filename = config_filename[:-4]\nelse:\ncontinue\n# place configuration files into another dictionary level named by config dictionary name\nconfig[config_filename] = loaded_config\nreturn config\n
"},{"location":"reference/common/control/","title":"control","text":""},{"location":"reference/common/control/#dp3.common.control","title":"dp3.common.control","text":"

Module enabling remote control of the platform's internal events.

"},{"location":"reference/common/control/#dp3.common.control.Control","title":"Control","text":"
Control(platform_config: PlatformConfig) -> None\n

Class enabling remote control of the platform's internal events.

Source code in dp3/common/control.py
def __init__(\nself,\nplatform_config: PlatformConfig,\n) -> None:\nself.log = logging.getLogger(\"Control\")\nself.action_handlers: dict[ControlAction, Callable] = {}\nself.enabled = False\nif platform_config.process_index != 0:\nself.log.debug(\"Control will be disabled in this worker to avoid race conditions.\")\nreturn\nself.enabled = True\nself.config = ControlConfig.parse_obj(platform_config.config.get(\"control\"))\nself.allowed_actions = set(self.config.allowed_actions)\nself.log.debug(\"Allowed actions: %s\", self.allowed_actions)\nqueue = f\"{platform_config.app_name}-control\"\nself.control_queue = TaskQueueReader(\ncallback=self.process_control_task,\nparse_task=ControlMessage.parse_raw,\napp_name=platform_config.app_name,\nworker_index=platform_config.process_index,\nrabbit_config=platform_config.config.get(\"processing_core.msg_broker\", {}),\nqueue=queue,\npriority_queue=queue,\nparent_logger=self.log,\n)\n
"},{"location":"reference/common/control/#dp3.common.control.Control.start","title":"start","text":"
start()\n

Connect to RabbitMQ and start consuming from TaskQueue.

Source code in dp3/common/control.py
def start(self):\n\"\"\"Connect to RabbitMQ and start consuming from TaskQueue.\"\"\"\nif not self.enabled:\nreturn\nunconfigured_handlers = self.allowed_actions - set(self.action_handlers)\nif unconfigured_handlers:\nraise ValueError(\nf\"The following configured actions are missing handlers: {unconfigured_handlers}\"\n)\nself.log.info(\"Connecting to RabbitMQ\")\nself.control_queue.connect()\nself.control_queue.check()  # check presence of needed queues\nself.control_queue.start()\nself.log.debug(\"Configured handlers: %s\", self.action_handlers)\n
"},{"location":"reference/common/control/#dp3.common.control.Control.stop","title":"stop","text":"
stop()\n

Stop consuming from TaskQueue, disconnect from RabbitMQ.

Source code in dp3/common/control.py
def stop(self):\n\"\"\"Stop consuming from TaskQueue, disconnect from RabbitMQ.\"\"\"\nif not self.enabled:\nreturn\nself.control_queue.stop()\nself.control_queue.disconnect()\n
"},{"location":"reference/common/control/#dp3.common.control.Control.set_action_handler","title":"set_action_handler","text":"
set_action_handler(action: ControlAction, handler: Callable)\n

Sets the handler for the given action

Source code in dp3/common/control.py
def set_action_handler(self, action: ControlAction, handler: Callable):\n\"\"\"Sets the handler for the given action\"\"\"\nself.log.debug(\"Setting handler for action %s: %s\", action, handler)\nself.action_handlers[action] = handler\n
"},{"location":"reference/common/control/#dp3.common.control.Control.process_control_task","title":"process_control_task","text":"
process_control_task(msg_id, task: ControlMessage)\n

Acknowledges the received message and executes an action according to the task.

This function should not be called directly, but set as callback for TaskQueueReader.

Source code in dp3/common/control.py
def process_control_task(self, msg_id, task: ControlMessage):\n\"\"\"\n    Acknowledges the received message and executes an action according to the `task`.\n    This function should not be called directly, but set as callback for TaskQueueReader.\n    \"\"\"\nself.control_queue.ack(msg_id)\nif task.action in self.allowed_actions:\nself.log.info(\"Executing action: %s\", task.action)\nself.action_handlers[task.action]()\nelse:\nself.log.error(\"Action not allowed: %s\", task.action)\n
"},{"location":"reference/common/datapoint/","title":"datapoint","text":""},{"location":"reference/common/datapoint/#dp3.common.datapoint","title":"dp3.common.datapoint","text":""},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointBase","title":"DataPointBase","text":"

Bases: BaseModel

Data-point

Contains single raw data value received on API. This is just base class - plain, observation or timeseries datapoints inherit from this class (see below).

Provides front line of validation for this data value.

Internal usage: inside Task, created by TaskExecutor

"},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointPlainBase","title":"DataPointPlainBase","text":"

Bases: DataPointBase

Plain attribute data-point

Contains single raw data value received on API for plain attribute.

In case of plain data-point, it's not really a data-point, but we use the same naming for simplicity.

"},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointObservationsBase","title":"DataPointObservationsBase","text":"

Bases: DataPointBase

Observations attribute data-point

Contains single raw data value received on API for observations attribute.

"},{"location":"reference/common/datapoint/#dp3.common.datapoint.DataPointTimeseriesBase","title":"DataPointTimeseriesBase","text":"

Bases: DataPointBase

Timeseries attribute data-point

Contains single raw data value received on API for observations attribute.

"},{"location":"reference/common/datapoint/#dp3.common.datapoint.is_list_ordered","title":"is_list_ordered","text":"
is_list_ordered(to_check: list)\n

Checks if list is ordered (not decreasing anywhere)

Source code in dp3/common/datapoint.py
def is_list_ordered(to_check: list):\n\"\"\"Checks if list is ordered (not decreasing anywhere)\"\"\"\nreturn all(to_check[i] <= to_check[i + 1] for i in range(len(to_check) - 1))\n
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.dp_ts_root_validator_irregular","title":"dp_ts_root_validator_irregular","text":"
dp_ts_root_validator_irregular(cls, values)\n

Validates or sets t2 of irregular timeseries datapoint

Source code in dp3/common/datapoint.py
@root_validator\ndef dp_ts_root_validator_irregular(cls, values):\n\"\"\"Validates or sets t2 of irregular timeseries datapoint\"\"\"\nif \"v\" in values:\nfirst_time = values[\"v\"].time[0]\nlast_time = values[\"v\"].time[-1]\n# Check t1 <= first_time\nif \"t1\" in values:\nassert (\nvalues[\"t1\"] <= first_time\n), f\"'t1' is above first item in 'time' series ({first_time})\"\n# Check last_time <= t2\nif \"t2\" in values and values[\"t2\"]:\nassert (\nvalues[\"t2\"] >= last_time\n), f\"'t2' is below last item in 'time' series ({last_time})\"\nelse:\nvalues[\"t2\"] = last_time\n# time must be ordered\nassert is_list_ordered(values[\"v\"].time), \"'time' series is not ordered\"\nreturn values\n
"},{"location":"reference/common/datapoint/#dp3.common.datapoint.dp_ts_root_validator_irregular_intervals","title":"dp_ts_root_validator_irregular_intervals","text":"
dp_ts_root_validator_irregular_intervals(cls, values)\n

Validates or sets t2 of irregular intervals timeseries datapoint

Source code in dp3/common/datapoint.py
@root_validator\ndef dp_ts_root_validator_irregular_intervals(cls, values):\n\"\"\"Validates or sets t2 of irregular intervals timeseries datapoint\"\"\"\nif \"v\" in values:\nfirst_time = values[\"v\"].time_first[0]\nlast_time = values[\"v\"].time_last[-1]\n# Check t1 <= first_time\nif \"t1\" in values:\nassert (\nvalues[\"t1\"] <= first_time\n), f\"'t1' is above first item in 'time_first' series ({first_time})\"\n# Check last_time <= t2\nif \"t2\" in values and values[\"t2\"]:\nassert (\nvalues[\"t2\"] >= last_time\n), f\"'t2' is below last item in 'time_last' series ({last_time})\"\nelse:\nvalues[\"t2\"] = last_time\n# Check time_first[i] <= time_last[i]\nassert all(\nt[0] <= t[1] for t in zip(values[\"v\"].time_first, values[\"v\"].time_last)\n), \"'time_first[i] <= time_last[i]' isn't true for all 'i'\"\nreturn values\n
"},{"location":"reference/common/datatype/","title":"datatype","text":""},{"location":"reference/common/datatype/#dp3.common.datatype","title":"dp3.common.datatype","text":""},{"location":"reference/common/datatype/#dp3.common.datatype.DataType","title":"DataType","text":"
DataType(**data)\n

Bases: BaseModel

Data type container

Represents one of primitive data types:

  • tag
  • binary
  • string
  • int
  • int64
  • float
  • ipv4
  • ipv6
  • mac
  • time
  • special
  • json

or composite data type:

  • link
  • array
  • set
  • dict
  • category

    Attributes:

    Name Type Description data_type str

    type for incoming value validation

    hashable bool

    whether contained data is hashable

    is_link bool

    whether this data type is link

    link_to str

    if is_link is True, what is linked target

    Source code in dp3/common/datatype.py
    def __init__(self, **data):\nsuper().__init__(**data)\nstr_type = data[\"__root__\"]\nself._hashable = not (\n\"dict\" in str_type\nor \"set\" in str_type\nor \"array\" in str_type\nor \"special\" in str_type\nor \"json\" in str_type\nor \"link\" in str_type\n)\nself.determine_value_validator(str_type)\n
    "},{"location":"reference/common/datatype/#dp3.common.datatype.DataType.determine_value_validator","title":"determine_value_validator","text":"
    determine_value_validator(str_type: str)\n

    Determines value validator (inner data_type)

    This is not implemented inside @validator, because it apparently doesn't work with __root__ models.

    Source code in dp3/common/datatype.py
    def determine_value_validator(self, str_type: str):\n\"\"\"Determines value validator (inner `data_type`)\n    This is not implemented inside `@validator`, because it apparently doesn't work with\n    `__root__` models.\n    \"\"\"\ndata_type = None\nif type(str_type) is not str:\nraise TypeError(f\"Data type {str_type} is not string\")\nif str_type in primitive_data_types:\n# Primitive type\ndata_type = primitive_data_types[str_type]\nelif re.match(re_array, str_type):\n# Array\nelement_type = str_type.split(\"<\")[1].split(\">\")[0]\nif element_type not in primitive_data_types:\nraise TypeError(f\"Data type {element_type} is not supported as an array element\")\ndata_type = list[primitive_data_types[element_type]]\nelif re.match(re_set, str_type):\n# Set\nelement_type = str_type.split(\"<\")[1].split(\">\")[0]\nif element_type not in primitive_data_types:\nraise TypeError(f\"Data type {element_type} is not supported as an set element\")\ndata_type = list[primitive_data_types[element_type]]  # set is not supported by MongoDB\nelif m := re.match(re_link, str_type):\n# Link\netype, data = m.group(\"etype\"), m.group(\"data\")\nself._link_to = etype\nself._is_link = True\nself._link_data = bool(data)\nif etype and data:\nvalue_type = DataType(__root__=data)\ndata_type = create_model(\nf\"Link<{data}>\", __base__=Link, data=(value_type._data_type, ...)\n)\nelse:\ndata_type = Link\nelif re.match(re_dict, str_type):\n# Dict\ndict_spec = {}\nkey_str = str_type.split(\"<\")[1].split(\">\")[0]\nkey_spec = dict(item.split(\":\") for item in key_str.split(\",\"))\n# For each dict key\nfor k, v in key_spec.items():\nif v not in primitive_data_types:\nraise TypeError(f\"Data type {v} of key {k} is not supported as a dict field\")\n# Optional subattribute\nk_optional = k[-1] == \"?\"\nif k_optional:\n# Remove question mark from key\nk = k[:-1]\n# Set (type, default value) for the key\ndict_spec[k] = (primitive_data_types[v], None if k_optional else ...)\n# Create model for this dict\ndata_type = create_model(f\"{str_type}__inner\", **dict_spec)\nelif m := re.match(re_category, str_type):\n# Category\ncategory_type, category_values = m.group(\"type\"), m.group(\"vals\")\ncategory_type = DataType(__root__=category_type)\ncategory_values = [\ncategory_type._data_type(value.strip()) for value in category_values.split(\",\")\n]\ndata_type = Enum(f\"Category<{category_type}>\", {val: val for val in category_values})\nelse:\nraise TypeError(f\"Data type '{str_type}' is not supported\")\n# Set data type\nself._data_type = data_type\n
    "},{"location":"reference/common/datatype/#dp3.common.datatype.DataType.get_linked_entity","title":"get_linked_entity","text":"
    get_linked_entity() -> id\n

    Returns linked entity id. Raises ValueError if DataType is not a link.

    Source code in dp3/common/datatype.py
    def get_linked_entity(self) -> id:\n\"\"\"Returns linked entity id. Raises ValueError if DataType is not a link.\"\"\"\ntry:\nreturn self._link_to\nexcept AttributeError:\nraise ValueError(f\"DataType '{self}' is not a link.\") from None\n
    "},{"location":"reference/common/datatype/#dp3.common.datatype.DataType.link_has_data","title":"link_has_data","text":"
    link_has_data() -> bool\n

    Whether link has data. Raises ValueError if DataType is not a link.

    Source code in dp3/common/datatype.py
    def link_has_data(self) -> bool:\n\"\"\"Whether link has data. Raises ValueError if DataType is not a link.\"\"\"\ntry:\nreturn self._link_data\nexcept AttributeError:\nraise ValueError(f\"DataType '{self}' is not a link.\") from None\n
    "},{"location":"reference/common/entityspec/","title":"entityspec","text":""},{"location":"reference/common/entityspec/#dp3.common.entityspec","title":"dp3.common.entityspec","text":""},{"location":"reference/common/entityspec/#dp3.common.entityspec.EntitySpec","title":"EntitySpec","text":"
    EntitySpec(id: str, spec: dict[str, Union[str, bool]])\n

    Bases: BaseModel

    Entity specification

    This class represents specification of an entity type (e.g. ip, asn, ...)

    Source code in dp3/common/entityspec.py
    def __init__(self, id: str, spec: dict[str, Union[str, bool]]):\nsuper().__init__(id=id, name=spec.get(\"name\"), snapshot=spec.get(\"snapshot\"))\n
    "},{"location":"reference/common/scheduler/","title":"scheduler","text":""},{"location":"reference/common/scheduler/#dp3.common.scheduler","title":"dp3.common.scheduler","text":"

    Allows modules to register functions (callables) to be run at specified times or intervals (like cron does).

    Based on APScheduler package

    "},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler","title":"Scheduler","text":"
    Scheduler() -> None\n

    Allows modules to register functions (callables) to be run at specified times or intervals (like cron does).

    Source code in dp3/common/scheduler.py
    def __init__(self) -> None:\nself.log = logging.getLogger(\"Scheduler\")\n# self.log.setLevel(\"DEBUG\")\nlogging.getLogger(\"apscheduler.scheduler\").setLevel(\"WARNING\")\nlogging.getLogger(\"apscheduler.executors.default\").setLevel(\"WARNING\")\nself.sched = BackgroundScheduler(timezone=\"UTC\")\nself.last_job_id = 0\n
    "},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler.register","title":"register","text":"
    register(func: Callable, func_args: Union[list, tuple] = None, func_kwargs: dict = None, year: Union[int, str] = None, month: Union[int, str] = None, day: Union[int, str] = None, week: Union[int, str] = None, day_of_week: Union[int, str] = None, hour: Union[int, str] = None, minute: Union[int, str] = None, second: Union[int, str] = None, timezone: str = 'UTC') -> int\n

    Register a function to be run at specified times.

    Pass cron-like specification of when the function should be called, see docs of apscheduler.triggers.cron for details.

    Parameters:

    Name Type Description Default func Callable

    function or method to be called

    required func_args Union[list, tuple]

    list of positional arguments to call func with

    None func_kwargs dict

    dict of keyword arguments to call func with

    None year Union[int, str]

    4-digit year

    None month Union[int, str]

    month (1-12)

    None day Union[int, str]

    day of month (1-31)

    None week Union[int, str]

    ISO week (1-53)

    None day_of_week Union[int, str]

    number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)

    None hour Union[int, str]

    hour (0-23)

    None minute Union[int, str]

    minute (0-59)

    None second Union[int, str]

    second (0-59)

    None timezone str

    Timezone for time specification (default is UTC).

    'UTC'

    Returns:

    Type Description int

    job ID

    Source code in dp3/common/scheduler.py
    def register(\nself,\nfunc: Callable,\nfunc_args: Union[list, tuple] = None,\nfunc_kwargs: dict = None,\nyear: Union[int, str] = None,\nmonth: Union[int, str] = None,\nday: Union[int, str] = None,\nweek: Union[int, str] = None,\nday_of_week: Union[int, str] = None,\nhour: Union[int, str] = None,\nminute: Union[int, str] = None,\nsecond: Union[int, str] = None,\ntimezone: str = \"UTC\",\n) -> int:\n\"\"\"\n    Register a function to be run at specified times.\n    Pass cron-like specification of when the function should be called,\n    see [docs](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html)\n    of apscheduler.triggers.cron for details.\n    Args:\n        func: function or method to be called\n        func_args: list of positional arguments to call func with\n        func_kwargs: dict of keyword arguments to call func with\n        year: 4-digit year\n        month: month (1-12)\n        day: day of month (1-31)\n        week: ISO week (1-53)\n        day_of_week: number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)\n        hour: hour (0-23)\n        minute: minute (0-59)\n        second: second (0-59)\n        timezone: Timezone for time specification (default is UTC).\n    Returns:\n         job ID\n    \"\"\"\nself.last_job_id += 1\ntrigger = CronTrigger(\nyear, month, day, week, day_of_week, hour, minute, second, timezone=timezone\n)\nself.sched.add_job(\nfunc,\ntrigger,\nfunc_args,\nfunc_kwargs,\ncoalesce=True,\nmax_instances=1,\nid=str(self.last_job_id),\n)\nself.log.debug(f\"Registered function {func.__qualname__} to be called at {trigger}\")\nreturn self.last_job_id\n
    "},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler.pause_job","title":"pause_job","text":"
    pause_job(id)\n

    Pause job with given ID

    Source code in dp3/common/scheduler.py
    def pause_job(self, id):\n\"\"\"Pause job with given ID\"\"\"\nself.sched.pause_job(str(id))\n
    "},{"location":"reference/common/scheduler/#dp3.common.scheduler.Scheduler.resume_job","title":"resume_job","text":"
    resume_job(id)\n

    Resume previously paused job with given ID

    Source code in dp3/common/scheduler.py
    def resume_job(self, id):\n\"\"\"Resume previously paused job with given ID\"\"\"\nself.sched.resume_job(str(id))\n
    "},{"location":"reference/common/task/","title":"task","text":""},{"location":"reference/common/task/#dp3.common.task","title":"dp3.common.task","text":""},{"location":"reference/common/task/#dp3.common.task.Task","title":"Task","text":"

    Bases: BaseModel, ABC

    A generic task type class.

    An abstraction for the task queue classes to depend upon.

    "},{"location":"reference/common/task/#dp3.common.task.Task.routing_key","title":"routing_key abstractmethod","text":"
    routing_key() -> str\n

    Returns:

    Type Description str

    A string to be used as a routing key between workers.

    Source code in dp3/common/task.py
    @abstractmethod\ndef routing_key(self) -> str:\n\"\"\"\n    Returns:\n        A string to be used as a routing key between workers.\n    \"\"\"\n
    "},{"location":"reference/common/task/#dp3.common.task.Task.as_message","title":"as_message abstractmethod","text":"
    as_message() -> str\n

    Returns:

    Type Description str

    A string representation of the object.

    Source code in dp3/common/task.py
    @abstractmethod\ndef as_message(self) -> str:\n\"\"\"\n    Returns:\n        A string representation of the object.\n    \"\"\"\n
    "},{"location":"reference/common/task/#dp3.common.task.DataPointTask","title":"DataPointTask","text":"

    Bases: Task

    DataPointTask

    Contains single task to be pushed to TaskQueue and processed.

    Attributes:

    Name Type Description etype str

    Entity type

    eid str

    Entity id / key

    data_points list[DataPointBase]

    List of DataPoints to process

    tags list[Any]

    List of tags

    ttl_token Optional[datetime]

    ...

    "},{"location":"reference/common/task/#dp3.common.task.Snapshot","title":"Snapshot","text":"

    Bases: Task

    Snapshot

    Contains a list of entities, the meaning of which depends on the type. If type is \"task\", then the list contains linked entities for which a snapshot should be created. Otherwise type is \"linked_entities\", indicating which entities must be skipped in a parallelized creation of unlinked entities.

    Attributes:

    Name Type Description entities list[tuple[str, str]]

    List of (entity_type, entity_id)

    time datetime

    timestamp for snapshot creation

    "},{"location":"reference/common/utils/","title":"utils","text":""},{"location":"reference/common/utils/#dp3.common.utils","title":"dp3.common.utils","text":"

    auxiliary/utility functions and classes

    "},{"location":"reference/common/utils/#dp3.common.utils.parse_rfc_time","title":"parse_rfc_time","text":"
    parse_rfc_time(time_str)\n

    Parse time in RFC 3339 format and return it as naive datetime in UTC.

    Timezone specification is optional (UTC is assumed when none is specified).

    Source code in dp3/common/utils.py
    def parse_rfc_time(time_str):\n\"\"\"\n    Parse time in RFC 3339 format and return it as naive datetime in UTC.\n    Timezone specification is optional (UTC is assumed when none is specified).\n    \"\"\"\nres = timestamp_re.match(time_str)\nif res is not None:\nyear, month, day, hour, minute, second = (int(n or 0) for n in res.group(*range(1, 7)))\nus_str = (res.group(7) or \"0\")[:6].ljust(6, \"0\")\nus = int(us_str)\nzonestr = res.group(8)\nzoneoffset = 0 if zonestr in (None, \"z\", \"Z\") else int(zonestr[:3]) * 60 + int(zonestr[4:6])\nzonediff = datetime.timedelta(minutes=zoneoffset)\nreturn datetime.datetime(year, month, day, hour, minute, second, us) - zonediff\nelse:\nraise ValueError(\"Wrong timestamp format\")\n
    "},{"location":"reference/common/utils/#dp3.common.utils.parse_time_duration","title":"parse_time_duration","text":"
    parse_time_duration(duration_string: Union[str, int, datetime.timedelta]) -> datetime.timedelta\n

    Parse duration in format (or just \"0\").

    Return datetime.timedelta

    Source code in dp3/common/utils.py
    def parse_time_duration(duration_string: Union[str, int, datetime.timedelta]) -> datetime.timedelta:\n\"\"\"\n    Parse duration in format <num><s/m/h/d> (or just \"0\").\n    Return datetime.timedelta\n    \"\"\"\n# if it's already timedelta, just return it unchanged\nif isinstance(duration_string, datetime.timedelta):\nreturn duration_string\n# if number is passed, consider it number of seconds\nif isinstance(duration_string, (int, float)):\nreturn datetime.timedelta(seconds=duration_string)\nd = 0\nh = 0\nm = 0\ns = 0\nif duration_string == \"0\":\npass\nelif duration_string[-1] == \"d\":\nd = int(duration_string[:-1])\nelif duration_string[-1] == \"h\":\nh = int(duration_string[:-1])\nelif duration_string[-1] == \"m\":\nm = int(duration_string[:-1])\nelif duration_string[-1] == \"s\":\ns = int(duration_string[:-1])\nelse:\nraise ValueError(\"Invalid time duration string\")\nreturn datetime.timedelta(days=d, hours=h, minutes=m, seconds=s)\n
    "},{"location":"reference/common/utils/#dp3.common.utils.conv_to_json","title":"conv_to_json","text":"
    conv_to_json(obj)\n

    Convert special types to JSON (use as \"default\" param of json.dumps)

    Supported types/objects: - datetime - timedelta

    Source code in dp3/common/utils.py
    def conv_to_json(obj):\n\"\"\"Convert special types to JSON (use as \"default\" param of json.dumps)\n    Supported types/objects:\n    - datetime\n    - timedelta\n    \"\"\"\nif isinstance(obj, datetime.datetime):\nif obj.tzinfo:\nraise NotImplementedError(\n\"Can't serialize timezone-aware datetime object \"\n\"(DP3 policy is to use naive datetimes in UTC everywhere)\"\n)\nreturn {\"$datetime\": obj.strftime(\"%Y-%m-%dT%H:%M:%S.%f\")}\nif isinstance(obj, datetime.timedelta):\nreturn {\"$timedelta\": f\"{obj.days},{obj.seconds},{obj.microseconds}\"}\nraise TypeError(\"%r is not JSON serializable\" % obj)\n
    "},{"location":"reference/common/utils/#dp3.common.utils.conv_from_json","title":"conv_from_json","text":"
    conv_from_json(dct)\n

    Convert special JSON keys created by conv_to_json back to Python objects (use as \"object_hook\" param of json.loads)

    Supported types/objects: - datetime - timedelta

    Source code in dp3/common/utils.py
    def conv_from_json(dct):\n\"\"\"Convert special JSON keys created by conv_to_json back to Python objects\n    (use as \"object_hook\" param of json.loads)\n    Supported types/objects:\n    - datetime\n    - timedelta\n    \"\"\"\nif \"$datetime\" in dct:\nval = dct[\"$datetime\"]\nreturn datetime.datetime.strptime(val, \"%Y-%m-%dT%H:%M:%S.%f\")\nif \"$timedelta\" in dct:\ndays, seconds, microseconds = dct[\"$timedelta\"].split(\",\")\nreturn datetime.timedelta(int(days), int(seconds), int(microseconds))\nreturn dct\n
    "},{"location":"reference/common/utils/#dp3.common.utils.get_func_name","title":"get_func_name","text":"
    get_func_name(func_or_method)\n

    Get name of function or method as pretty string.

    Source code in dp3/common/utils.py
    def get_func_name(func_or_method):\n\"\"\"Get name of function or method as pretty string.\"\"\"\ntry:\nfname = func_or_method.__func__.__qualname__\nexcept AttributeError:\nfname = func_or_method.__name__\nreturn func_or_method.__module__ + \".\" + fname\n
    "},{"location":"reference/database/","title":"database","text":""},{"location":"reference/database/#dp3.database","title":"dp3.database","text":"

    A wrapper responsible for communication with the database server.

    "},{"location":"reference/database/database/","title":"database","text":""},{"location":"reference/database/database/#dp3.database.database","title":"dp3.database.database","text":""},{"location":"reference/database/database/#dp3.database.database.MongoHostConfig","title":"MongoHostConfig","text":"

    Bases: BaseModel

    MongoDB host.

    "},{"location":"reference/database/database/#dp3.database.database.MongoStandaloneConfig","title":"MongoStandaloneConfig","text":"

    Bases: BaseModel

    MongoDB standalone configuration.

    "},{"location":"reference/database/database/#dp3.database.database.MongoReplicaConfig","title":"MongoReplicaConfig","text":"

    Bases: BaseModel

    MongoDB replica set configuration.

    "},{"location":"reference/database/database/#dp3.database.database.MongoConfig","title":"MongoConfig","text":"

    Bases: BaseModel

    Database configuration.

    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase","title":"EntityDatabase","text":"
    EntityDatabase(db_conf: HierarchicalDict, model_spec: ModelSpec) -> None\n

    MongoDB database wrapper responsible for whole communication with database server. Initializes database schema based on database configuration.

    db_conf - configuration of database connection (content of database.yml) model_spec - ModelSpec object, configuration of data model (entities and attributes)

    Source code in dp3/database/database.py
    def __init__(\nself,\ndb_conf: HierarchicalDict,\nmodel_spec: ModelSpec,\n) -> None:\nself.log = logging.getLogger(\"EntityDatabase\")\nconfig = MongoConfig.parse_obj(db_conf)\nself.log.info(\"Connecting to database...\")\nfor attempt, delay in enumerate(RECONNECT_DELAYS):\ntry:\nself._db = self.connect(config)\n# Check if connected\nself._db.admin.command(\"ping\")\nexcept pymongo.errors.ConnectionFailure as e:\nif attempt + 1 == len(RECONNECT_DELAYS):\nraise DatabaseError(\n\"Cannot connect to database with specified connection arguments.\"\n) from e\nelse:\nself.log.error(\n\"Cannot connect to database (attempt %d, retrying in %ds).\",\nattempt + 1,\ndelay,\n)\ntime.sleep(delay)\nself._db_schema_config = model_spec\n# Init and switch to correct database\nself._db = self._db[config.db_name]\nself._init_database_schema(config.db_name)\nself.log.info(\"Database successfully initialized!\")\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.insert_datapoints","title":"insert_datapoints","text":"
    insert_datapoints(etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False) -> None\n

    Inserts datapoint to raw data collection and updates master record.

    Raises DatabaseError when insert or update fails.

    Source code in dp3/database/database.py
    def insert_datapoints(\nself, etype: str, eid: str, dps: list[DataPointBase], new_entity: bool = False\n) -> None:\n\"\"\"Inserts datapoint to raw data collection and updates master record.\n    Raises DatabaseError when insert or update fails.\n    \"\"\"\nif len(dps) == 0:\nreturn\netype = dps[0].etype\n# Check `etype`\nself._assert_etype_exists(etype)\n# Insert raw datapoints\nraw_col = self._raw_col_name(etype)\ndps_dicts = [dp.dict(exclude={\"attr_type\"}) for dp in dps]\ntry:\nself._db[raw_col].insert_many(dps_dicts)\nself.log.debug(f\"Inserted datapoints to raw collection:\\n{dps}\")\nexcept Exception as e:\nraise DatabaseError(f\"Insert of datapoints failed: {e}\\n{dps}\") from e\n# Update master document\nmaster_changes = {\"$push\": {}, \"$set\": {}}\nfor dp in dps:\nattr_spec = self._db_schema_config.attr(etype, dp.attr)\nv = dp.v.dict() if isinstance(dp.v, BaseModel) else dp.v\n# Rewrite value of plain attribute\nif attr_spec.t == AttrType.PLAIN:\nmaster_changes[\"$set\"][dp.attr] = {\"v\": v, \"ts_last_update\": datetime.now()}\n# Push new data of observation\nif attr_spec.t == AttrType.OBSERVATIONS:\nif dp.attr in master_changes[\"$push\"]:\n# Support multiple datapoints being pushed in the same request\nif \"$each\" not in master_changes[\"$push\"][dp.attr]:\nsaved_dp = master_changes[\"$push\"][dp.attr]\nmaster_changes[\"$push\"][dp.attr] = {\"$each\": [saved_dp]}\nmaster_changes[\"$push\"][dp.attr][\"$each\"].append(\n{\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v, \"c\": dp.c}\n)\nelse:\n# Otherwise just push one datapoint\nmaster_changes[\"$push\"][dp.attr] = {\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v, \"c\": dp.c}\n# Push new data of timeseries\nif attr_spec.t == AttrType.TIMESERIES:\nif dp.attr in master_changes[\"$push\"]:\n# Support multiple datapoints being pushed in the same request\nif \"$each\" not in master_changes[\"$push\"][dp.attr]:\nsaved_dp = master_changes[\"$push\"][dp.attr]\nmaster_changes[\"$push\"][dp.attr] = {\"$each\": [saved_dp]}\nmaster_changes[\"$push\"][dp.attr][\"$each\"].append(\n{\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v}\n)\nelse:\n# Otherwise just push one datapoint\nmaster_changes[\"$push\"][dp.attr] = {\"t1\": dp.t1, \"t2\": dp.t2, \"v\": v}\nif new_entity:\nmaster_changes[\"$set\"][\"#hash\"] = HASH(f\"{etype}:{eid}\")\nmaster_col = self._master_col_name(etype)\ntry:\nself._db[master_col].update_one({\"_id\": eid}, master_changes, upsert=True)\nself.log.debug(f\"Updated master record of {etype} {eid}: {master_changes}\")\nexcept Exception as e:\nraise DatabaseError(f\"Update of master record failed: {e}\\n{dps}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.update_master_records","title":"update_master_records","text":"
    update_master_records(etype: str, eids: list[str], records: list[dict]) -> None\n

    Replace master record of etype:eid with the provided record.

    Raises DatabaseError when update fails.

    Source code in dp3/database/database.py
    def update_master_records(self, etype: str, eids: list[str], records: list[dict]) -> None:\n\"\"\"Replace master record of `etype`:`eid` with the provided `record`.\n    Raises DatabaseError when update fails.\n    \"\"\"\nmaster_col = self._master_col_name(etype)\ntry:\nself._db[master_col].bulk_write(\n[\nReplaceOne({\"_id\": eid}, record, upsert=True)\nfor eid, record in zip(eids, records)\n]\n)\nself.log.debug(\"Updated master records of %s: %s.\", eids, eids)\nexcept Exception as e:\nraise DatabaseError(f\"Update of master records failed: {e}\\n{records}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.delete_old_dps","title":"delete_old_dps","text":"
    delete_old_dps(etype: str, attr_name: str, t_old: datetime) -> None\n

    Delete old datapoints from master collection.

    Periodically called for all etypes from HistoryManager.

    Source code in dp3/database/database.py
    def delete_old_dps(self, etype: str, attr_name: str, t_old: datetime) -> None:\n\"\"\"Delete old datapoints from master collection.\n    Periodically called for all `etype`s from HistoryManager.\n    \"\"\"\nmaster_col = self._master_col_name(etype)\ntry:\nself._db[master_col].update_many({}, {\"$pull\": {attr_name: {\"t2\": {\"$lt\": t_old}}}})\nexcept Exception as e:\nraise DatabaseError(f\"Delete of old datapoints failed: {e}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_master_record","title":"get_master_record","text":"
    get_master_record(etype: str, eid: str, **kwargs: str) -> dict\n

    Get current master record for etype/eid.

    If doesn't exist, returns {}.

    Source code in dp3/database/database.py
    def get_master_record(self, etype: str, eid: str, **kwargs) -> dict:\n\"\"\"Get current master record for etype/eid.\n    If doesn't exist, returns {}.\n    \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].find_one({\"_id\": eid}, **kwargs) or {}\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.ekey_exists","title":"ekey_exists","text":"
    ekey_exists(etype: str, eid: str) -> bool\n

    Checks whether master record for etype/eid exists

    Source code in dp3/database/database.py
    def ekey_exists(self, etype: str, eid: str) -> bool:\n\"\"\"Checks whether master record for etype/eid exists\"\"\"\nreturn bool(self.get_master_record(etype, eid))\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_master_records","title":"get_master_records","text":"
    get_master_records(etype: str, **kwargs: str) -> pymongo.cursor.Cursor\n

    Get cursor to current master records of etype.

    Source code in dp3/database/database.py
    def get_master_records(self, etype: str, **kwargs) -> pymongo.cursor.Cursor:\n\"\"\"Get cursor to current master records of etype.\"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].find({}, **kwargs)\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_worker_master_records","title":"get_worker_master_records","text":"
    get_worker_master_records(worker_index: int, worker_cnt: int, etype: str, **kwargs: str) -> pymongo.cursor.Cursor\n

    Get cursor to current master records of etype.

    Source code in dp3/database/database.py
    def get_worker_master_records(\nself, worker_index: int, worker_cnt: int, etype: str, **kwargs\n) -> pymongo.cursor.Cursor:\n\"\"\"Get cursor to current master records of etype.\"\"\"\nif etype not in self._db_schema_config.entities:\nraise DatabaseError(f\"Entity '{etype}' does not exist\")\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].find({\"#hash\": {\"$mod\": [worker_cnt, worker_index]}}, **kwargs)\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_latest_snapshot","title":"get_latest_snapshot","text":"
    get_latest_snapshot(etype: str, eid: str) -> dict\n

    Get latest snapshot of given etype/eid.

    If doesn't exist, returns {}.

    Source code in dp3/database/database.py
    def get_latest_snapshot(self, etype: str, eid: str) -> dict:\n\"\"\"Get latest snapshot of given etype/eid.\n    If doesn't exist, returns {}.\n    \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot_col = self._snapshots_col_name(etype)\nreturn self._db[snapshot_col].find_one({\"eid\": eid}, sort=[(\"_id\", -1)]) or {}\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_latest_snapshots","title":"get_latest_snapshots","text":"
    get_latest_snapshots(etype: str) -> pymongo.cursor.Cursor\n

    Get latest snapshots of given etype.

    This method is useful for displaying data on web.

    Source code in dp3/database/database.py
    def get_latest_snapshots(self, etype: str) -> pymongo.cursor.Cursor:\n\"\"\"Get latest snapshots of given `etype`.\n    This method is useful for displaying data on web.\n    \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot_col = self._snapshots_col_name(etype)\nlatest_snapshot = self._db[snapshot_col].find_one({}, sort=[(\"_id\", -1)])\nif latest_snapshot is None:\nreturn self._db[snapshot_col].find()\nlatest_snapshot_date = latest_snapshot[\"_time_created\"]\nreturn self._db[snapshot_col].find({\"_time_created\": latest_snapshot_date})\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_snapshots","title":"get_snapshots","text":"
    get_snapshots(etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> pymongo.cursor.Cursor\n

    Get all (or filtered) snapshots of given eid.

    This method is useful for displaying eid's history on web.

    Parameters:

    Name Type Description Default etype str

    entity type

    required eid str

    id of entity, to which data-points correspond

    required t1 Optional[datetime]

    left value of time interval (inclusive)

    None t2 Optional[datetime]

    right value of time interval (inclusive)

    None Source code in dp3/database/database.py
    def get_snapshots(\nself, etype: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None\n) -> pymongo.cursor.Cursor:\n\"\"\"Get all (or filtered) snapshots of given `eid`.\n    This method is useful for displaying `eid`'s history on web.\n    Args:\n        etype: entity type\n        eid: id of entity, to which data-points correspond\n        t1: left value of time interval (inclusive)\n        t2: right value of time interval (inclusive)\n    \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot_col = self._snapshots_col_name(etype)\nquery = {\"eid\": eid, \"_time_created\": {}}\n# Filter by date\nif t1:\nquery[\"_time_created\"][\"$gte\"] = t1\nif t2:\nquery[\"_time_created\"][\"$lte\"] = t2\n# Unset if empty\nif not query[\"_time_created\"]:\ndel query[\"_time_created\"]\nreturn self._db[snapshot_col].find(query).sort([(\"_time_created\", pymongo.ASCENDING)])\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_value_or_history","title":"get_value_or_history","text":"
    get_value_or_history(etype: str, attr_name: str, eid: str, t1: Optional[datetime] = None, t2: Optional[datetime] = None) -> dict\n

    Gets current value and/or history of attribute for given eid.

    Depends on attribute type: - plain: just (current) value - observations: (current) value and history stored in master record (optionally filtered) - timeseries: just history stored in master record (optionally filtered)

    Returns dict with two keys: current_value and history (list of values).

    Source code in dp3/database/database.py
    def get_value_or_history(\nself,\netype: str,\nattr_name: str,\neid: str,\nt1: Optional[datetime] = None,\nt2: Optional[datetime] = None,\n) -> dict:\n\"\"\"Gets current value and/or history of attribute for given `eid`.\n    Depends on attribute type:\n    - plain: just (current) value\n    - observations: (current) value and history stored in master record (optionally filtered)\n    - timeseries: just history stored in master record (optionally filtered)\n    Returns dict with two keys: `current_value` and `history` (list of values).\n    \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nattr_spec = self._db_schema_config.attr(etype, attr_name)\nresult = {\"current_value\": None, \"history\": []}\n# Add current value to the result\nif attr_spec.t == AttrType.PLAIN:\nresult[\"current_value\"] = (\nself.get_master_record(etype, eid).get(attr_name, {}).get(\"v\", None)\n)\nelif attr_spec.t == AttrType.OBSERVATIONS:\nresult[\"current_value\"] = self.get_latest_snapshot(etype, eid).get(attr_name, None)\n# Add history\nif attr_spec.t == AttrType.OBSERVATIONS:\nresult[\"history\"] = self.get_observation_history(etype, attr_name, eid, t1, t2)\nelif attr_spec.t == AttrType.TIMESERIES:\nresult[\"history\"] = self.get_timeseries_history(etype, attr_name, eid, t1, t2)\nreturn result\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.estimate_count_eids","title":"estimate_count_eids","text":"
    estimate_count_eids(etype: str) -> int\n

    Estimates count of eids in given etype

    Source code in dp3/database/database.py
    def estimate_count_eids(self, etype: str) -> int:\n\"\"\"Estimates count of `eid`s in given `etype`\"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nmaster_col = self._master_col_name(etype)\nreturn self._db[master_col].estimated_document_count({})\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.save_snapshot","title":"save_snapshot","text":"
    save_snapshot(etype: str, snapshot: dict, time: datetime)\n

    Saves snapshot to specified entity of current master document.

    Source code in dp3/database/database.py
    def save_snapshot(self, etype: str, snapshot: dict, time: datetime):\n\"\"\"Saves snapshot to specified entity of current master document.\"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nsnapshot[\"_time_created\"] = time\nsnapshot_col = self._snapshots_col_name(etype)\ntry:\nself._db[snapshot_col].insert_one(snapshot)\nself.log.debug(f\"Inserted snapshot: {snapshot}\")\nexcept Exception as e:\nraise DatabaseError(f\"Insert of snapshot failed: {e}\\n{snapshot}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.save_snapshots","title":"save_snapshots","text":"
    save_snapshots(etype: str, snapshots: list[dict], time: datetime)\n

    Saves a list of snapshots of current master documents.

    All snapshots must belong to same entity type.

    Source code in dp3/database/database.py
    def save_snapshots(self, etype: str, snapshots: list[dict], time: datetime):\n\"\"\"\n    Saves a list of snapshots of current master documents.\n    All snapshots must belong to same entity type.\n    \"\"\"\n# Check `etype`\nself._assert_etype_exists(etype)\nfor snapshot in snapshots:\nsnapshot[\"_time_created\"] = time\nsnapshot_col = self._snapshots_col_name(etype)\ntry:\nself._db[snapshot_col].insert_many(snapshots)\nself.log.debug(f\"Inserted snapshots: {snapshots}\")\nexcept Exception as e:\nraise DatabaseError(f\"Insert of snapshots failed: {e}\\n{snapshots}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.save_metadata","title":"save_metadata","text":"
    save_metadata(time: datetime, metadata: dict)\n

    Saves snapshot to specified entity of current master document.

    Source code in dp3/database/database.py
    def save_metadata(self, time: datetime, metadata: dict):\n\"\"\"Saves snapshot to specified entity of current master document.\"\"\"\nmodule = get_caller_id()\nmetadata[\"_id\"] = module + time.strftime(\"%Y-%m-%dT%H:%M:%S.%fZ\")[:-4]\nmetadata[\"#module\"] = module\nmetadata[\"#time_created\"] = time\nmetadata[\"#last_update\"] = datetime.now()\ntry:\nself._db[\"#metadata\"].insert_one(metadata)\nself.log.debug(\"Inserted metadata %s: %s\", metadata[\"_id\"], metadata)\nexcept Exception as e:\nraise DatabaseError(f\"Insert of metadata failed: {e}\\n{metadata}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_observation_history","title":"get_observation_history","text":"
    get_observation_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]\n

    Get full (or filtered) history of observation attribute.

    This method is useful for displaying eid's history on web. Also used to feed data into get_timeseries_history().

    Parameters:

    Name Type Description Default etype str

    entity type

    required attr_name str

    name of attribute

    required eid str

    id of entity, to which data-points correspond

    required t1 datetime

    left value of time interval (inclusive)

    None t2 datetime

    right value of time interval (inclusive)

    None sort int

    sort by timestamps - 0: ascending order by t1, 1: descending order by t2, None: don't sort

    None

    Returns:

    Type Description list[dict]

    list of dicts (reduced datapoints)

    Source code in dp3/database/database.py
    def get_observation_history(\nself,\netype: str,\nattr_name: str,\neid: str,\nt1: datetime = None,\nt2: datetime = None,\nsort: int = None,\n) -> list[dict]:\n\"\"\"Get full (or filtered) history of observation attribute.\n    This method is useful for displaying `eid`'s history on web.\n    Also used to feed data into `get_timeseries_history()`.\n    Args:\n        etype: entity type\n        attr_name: name of attribute\n        eid: id of entity, to which data-points correspond\n        t1: left value of time interval (inclusive)\n        t2: right value of time interval (inclusive)\n        sort: sort by timestamps - 0: ascending order by t1, 1: descending order by t2,\n            None: don't sort\n    Returns:\n        list of dicts (reduced datapoints)\n    \"\"\"\nt1 = datetime.fromtimestamp(0) if t1 is None else t1\nt2 = datetime.now() if t2 is None else t2\n# Get attribute history\nmr = self.get_master_record(etype, eid)\nattr_history = mr.get(attr_name, [])\n# Filter\nattr_history_filtered = [row for row in attr_history if row[\"t1\"] <= t2 and row[\"t2\"] >= t1]\n# Sort\nif sort == 1:\nattr_history_filtered.sort(key=lambda row: row[\"t1\"])\nelif sort == 2:\nattr_history_filtered.sort(key=lambda row: row[\"t2\"], reverse=True)\nreturn attr_history_filtered\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_timeseries_history","title":"get_timeseries_history","text":"
    get_timeseries_history(etype: str, attr_name: str, eid: str, t1: datetime = None, t2: datetime = None, sort: int = None) -> list[dict]\n

    Get full (or filtered) history of timeseries attribute. Outputs them in format:

        [\n        {\n            \"t1\": ...,\n            \"t2\": ...,\n            \"v\": {\n                \"series1\": ...,\n                \"series2\": ...\n            }\n        },\n        ...\n    ]\n
    This method is useful for displaying eid's history on web.

    Parameters:

    Name Type Description Default etype str

    entity type

    required attr_name str

    name of attribute

    required eid str

    id of entity, to which data-points correspond

    required t1 datetime

    left value of time interval (inclusive)

    None t2 datetime

    right value of time interval (inclusive)

    None sort int

    sort by timestamps - 0: ascending order by t1, 1: descending order by t2, None: don't sort

    None

    Returns:

    Type Description list[dict]

    list of dicts (reduced datapoints) - each represents just one point at time

    Source code in dp3/database/database.py
    def get_timeseries_history(\nself,\netype: str,\nattr_name: str,\neid: str,\nt1: datetime = None,\nt2: datetime = None,\nsort: int = None,\n) -> list[dict]:\n\"\"\"Get full (or filtered) history of timeseries attribute.\n    Outputs them in format:\n    ```\n        [\n            {\n                \"t1\": ...,\n                \"t2\": ...,\n                \"v\": {\n                    \"series1\": ...,\n                    \"series2\": ...\n                }\n            },\n            ...\n        ]\n    ```\n    This method is useful for displaying `eid`'s history on web.\n    Args:\n        etype: entity type\n        attr_name: name of attribute\n        eid: id of entity, to which data-points correspond\n        t1: left value of time interval (inclusive)\n        t2: right value of time interval (inclusive)\n        sort: sort by timestamps - `0`: ascending order by `t1`, `1`: descending order by `t2`,\n            `None`: don't sort\n    Returns:\n         list of dicts (reduced datapoints) - each represents just one point at time\n    \"\"\"\nt1 = datetime.fromtimestamp(0) if t1 is None else t1\nt2 = datetime.now() if t2 is None else t2\nattr_history = self.get_observation_history(etype, attr_name, eid, t1, t2, sort)\nif not attr_history:\nreturn []\nattr_history_split = self._split_timeseries_dps(etype, attr_name, attr_history)\n# Filter out rows outside [t1, t2] interval\nattr_history_filtered = [\nrow for row in attr_history_split if row[\"t1\"] <= t2 and row[\"t2\"] >= t1\n]\nreturn attr_history_filtered\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.delete_old_snapshots","title":"delete_old_snapshots","text":"
    delete_old_snapshots(etype: str, t_old: datetime)\n

    Delete old snapshots.

    Periodically called for all etypes from HistoryManager.

    Source code in dp3/database/database.py
    def delete_old_snapshots(self, etype: str, t_old: datetime):\n\"\"\"Delete old snapshots.\n    Periodically called for all `etype`s from HistoryManager.\n    \"\"\"\nsnapshot_col_name = self._snapshots_col_name(etype)\ntry:\nreturn self._db[snapshot_col_name].delete_many({\"_time_created\": {\"$lt\": t_old}})\nexcept Exception as e:\nraise DatabaseError(f\"Delete of olds snapshots failed: {e}\") from e\n
    "},{"location":"reference/database/database/#dp3.database.database.EntityDatabase.get_module_cache","title":"get_module_cache","text":"
    get_module_cache()\n

    Return a persistent cache collection for given module name.

    Source code in dp3/database/database.py
    def get_module_cache(self):\n\"\"\"Return a persistent cache collection for given module name.\"\"\"\nmodule = get_caller_id()\nself.log.debug(\"Module %s is accessing its cache collection\", module)\nreturn self._db[f\"#cache#{module}\"]\n
    "},{"location":"reference/database/database/#dp3.database.database.get_caller_id","title":"get_caller_id","text":"
    get_caller_id()\n

    Returns the name of the caller method's class, or function name if caller is not a method.

    Source code in dp3/database/database.py
    def get_caller_id():\n\"\"\"Returns the name of the caller method's class, or function name if caller is not a method.\"\"\"\ncaller = inspect.stack()[2]\nif module := caller.frame.f_locals.get(\"self\"):\nreturn module.__class__.__qualname__\nreturn caller.function\n
    "},{"location":"reference/history_management/","title":"history_management","text":""},{"location":"reference/history_management/#dp3.history_management","title":"dp3.history_management","text":"

    Module responsible for managing history saved in database, currently to clean old data.

    "},{"location":"reference/history_management/history_manager/","title":"history_manager","text":""},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager","title":"dp3.history_management.history_manager","text":""},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.DatetimeEncoder","title":"DatetimeEncoder","text":"

    Bases: JSONEncoder

    JSONEncoder to encode datetime using the standard ADiCT format string.

    "},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager","title":"HistoryManager","text":"
    HistoryManager(db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar) -> None\n
    Source code in dp3/history_management/history_manager.py
    def __init__(\nself, db: EntityDatabase, platform_config: PlatformConfig, registrar: CallbackRegistrar\n) -> None:\nself.log = logging.getLogger(\"HistoryManager\")\nself.db = db\nself.model_spec = platform_config.model_spec\nself.worker_index = platform_config.process_index\nself.num_workers = platform_config.num_processes\nself.config = platform_config.config.get(\"history_manager\")\n# Schedule master document aggregation\nregistrar.scheduler_register(self.aggregate_master_docs, minute=\"*/10\")\nif platform_config.process_index != 0:\nself.log.debug(\n\"History management will be disabled in this worker to avoid race conditions.\"\n)\nreturn\n# Schedule datapoints cleaning\ndatapoint_cleaning_period = self.config[\"datapoint_cleaning\"][\"tick_rate\"]\nregistrar.scheduler_register(self.delete_old_dps, minute=f\"*/{datapoint_cleaning_period}\")\nsnapshot_cleaning_cron = self.config[\"snapshot_cleaning\"][\"cron_schedule\"]\nself.keep_snapshot_delta = timedelta(days=self.config[\"snapshot_cleaning\"][\"days_to_keep\"])\nregistrar.scheduler_register(self.delete_old_snapshots, **snapshot_cleaning_cron)\n# Schedule datapoint archivation\nself.keep_raw_delta = timedelta(days=self.config[\"datapoint_archivation\"][\"days_to_keep\"])\nself.log_dir = self._ensure_log_dir(self.config[\"datapoint_archivation\"][\"archive_dir\"])\nregistrar.scheduler_register(self.archive_old_dps, minute=0, hour=2)  # Every day at 2 AM\n
    "},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager.delete_old_dps","title":"delete_old_dps","text":"
    delete_old_dps()\n

    Deletes old data points from master collection.

    Source code in dp3/history_management/history_manager.py
    def delete_old_dps(self):\n\"\"\"Deletes old data points from master collection.\"\"\"\nself.log.debug(\"Deleting old records ...\")\nfor etype_attr, attr_conf in self.model_spec.attributes.items():\netype, attr_name = etype_attr\nmax_age = None\nif attr_conf.t == AttrType.OBSERVATIONS:\nmax_age = attr_conf.history_params.max_age\nelif attr_conf.t == AttrType.TIMESERIES:\nmax_age = attr_conf.timeseries_params.max_age\nif not max_age:\ncontinue\nt_old = datetime.utcnow() - max_age\ntry:\nself.db.delete_old_dps(etype, attr_name, t_old)\nexcept DatabaseError as e:\nself.log.error(e)\n
    "},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager.delete_old_snapshots","title":"delete_old_snapshots","text":"
    delete_old_snapshots()\n

    Deletes old snapshots.

    Source code in dp3/history_management/history_manager.py
    def delete_old_snapshots(self):\n\"\"\"Deletes old snapshots.\"\"\"\nt_old = datetime.now() - self.keep_snapshot_delta\nself.log.debug(\"Deleting all snapshots before %s\", t_old)\ndeleted_total = 0\nfor etype in self.model_spec.entities:\ntry:\nresult = self.db.delete_old_snapshots(etype, t_old)\ndeleted_total += result.deleted_count\nexcept DatabaseError as e:\nself.log.exception(e)\nself.log.debug(\"Deleted %s snapshots in total.\", deleted_total)\n
    "},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.HistoryManager.archive_old_dps","title":"archive_old_dps","text":"
    archive_old_dps()\n

    Archives old data points from raw collection.

    Updates already saved archive files, if present.

    Source code in dp3/history_management/history_manager.py
    def archive_old_dps(self):\n\"\"\"\n    Archives old data points from raw collection.\n    Updates already saved archive files, if present.\n    \"\"\"\nt_old = datetime.utcnow() - self.keep_raw_delta\nt_old = t_old.replace(hour=0, minute=0, second=0, microsecond=0)\nself.log.debug(\"Archiving all records before %s ...\", t_old)\nmax_date, min_date, total_dps = self._get_raw_dps_summary(t_old)\nif total_dps == 0:\nself.log.debug(\"Found no datapoints to archive.\")\nreturn\nself.log.debug(\n\"Found %s datapoints to archive in the range %s - %s\", total_dps, min_date, max_date\n)\nn_days = (max_date - min_date).days + 1\nfor date, next_date in [\n(min_date + timedelta(days=n), min_date + timedelta(days=n + 1)) for n in range(n_days)\n]:\ndate_string = date.strftime(\"%Y%m%d\")\nday_datapoints = 0\ndate_logfile = self.log_dir / f\"dp-log-{date_string}.json\"\nwith open(date_logfile, \"w\", encoding=\"utf-8\") as logfile:\nfirst = True\nfor etype in self.model_spec.entities:\nresult_cursor = self.db.get_raw(etype, after=date, before=next_date)\nfor dp in result_cursor:\nif first:\nlogfile.write(\nf\"[\\n{json.dumps(self._reformat_dp(dp), cls=DatetimeEncoder)}\"\n)\nfirst = False\nelse:\nlogfile.write(\nf\",\\n{json.dumps(self._reformat_dp(dp), cls=DatetimeEncoder)}\"\n)\nday_datapoints += 1\nlogfile.write(\"\\n]\")\nself.log.debug(\n\"%s: Archived %s datapoints to %s\", date_string, day_datapoints, date_logfile\n)\ncompress_file(date_logfile)\nos.remove(date_logfile)\nself.log.debug(\"%s: Saved archive was compressed\", date_string)\nif not day_datapoints:\ncontinue\ndeleted_count = 0\nfor etype in self.model_spec.entities:\ndeleted_res = self.db.delete_old_raw_dps(etype, next_date)\ndeleted_count += deleted_res.deleted_count\nself.log.debug(\"%s: Deleted %s datapoints\", date_string, deleted_count)\n
    "},{"location":"reference/history_management/history_manager/#dp3.history_management.history_manager.aggregate_dp_history_on_equal","title":"aggregate_dp_history_on_equal","text":"
    aggregate_dp_history_on_equal(history: list[dict], spec: ObservationsHistoryParams)\n

    Merge datapoints in the history with equal values and overlapping time validity.

    Avergages the confidence.

    Source code in dp3/history_management/history_manager.py
    def aggregate_dp_history_on_equal(history: list[dict], spec: ObservationsHistoryParams):\n\"\"\"\n    Merge datapoints in the history with equal values and overlapping time validity.\n    Avergages the confidence.\n    \"\"\"\nhistory = sorted(history, key=lambda x: x[\"t1\"])\naggregated_history = []\ncurrent_dp = None\nmerged_cnt = 0\npre = spec.pre_validity\npost = spec.post_validity\nfor dp in history:\nif not current_dp:\ncurrent_dp = dp\nmerged_cnt += 1\ncontinue\nif current_dp[\"v\"] == dp[\"v\"] and current_dp[\"t2\"] + post >= dp[\"t1\"] - pre:\ncurrent_dp[\"t2\"] = max(dp[\"t2\"], current_dp[\"t2\"])\ncurrent_dp[\"c\"] += dp[\"c\"]\nmerged_cnt += 1\nelse:\naggregated_history.append(current_dp)\ncurrent_dp[\"c\"] /= merged_cnt\nmerged_cnt = 1\ncurrent_dp = dp\nif current_dp:\ncurrent_dp[\"c\"] /= merged_cnt\naggregated_history.append(current_dp)\nreturn aggregated_history\n
    "},{"location":"reference/history_management/telemetry/","title":"telemetry","text":""},{"location":"reference/history_management/telemetry/#dp3.history_management.telemetry","title":"dp3.history_management.telemetry","text":""},{"location":"reference/snapshots/","title":"snapshots","text":""},{"location":"reference/snapshots/#dp3.snapshots","title":"dp3.snapshots","text":"

    SnapShooter, a module responsible for snapshot creation and running configured data correlation and fusion hooks, and Snapshot Hooks, which manage the registered hooks and their dependencies on one another.

    "},{"location":"reference/snapshots/snapshooter/","title":"snapshooter","text":""},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter","title":"dp3.snapshots.snapshooter","text":"

    Module managing creation of snapshots, enabling data correlation and saving snapshots to DB.

    • Snapshots are created periodically (user configurable period)

    • When a snapshot is created, several things need to happen:

      • all registered timeseries processing modules must be called
      • this should result in observations or plain datapoints, which will be saved to db and forwarded in processing
      • current value must be computed for all observations
      • load relevant section of observation's history and perform configured history analysis. Result = plain values
      • load plain attributes saved in master collection
      • A record of described plain data makes a profile
      • Profile is additionally extended by related entities
      • Callbacks for data correlation and fusion should happen here
      • Save the complete results into database as snapshots
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter","title":"SnapShooter","text":"
    SnapShooter(db: EntityDatabase, task_queue_writer: TaskQueueWriter, task_executor: TaskExecutor, platform_config: PlatformConfig, scheduler: Scheduler) -> None\n

    Class responsible for creating entity snapshots.

    Source code in dp3/snapshots/snapshooter.py
    def __init__(\nself,\ndb: EntityDatabase,\ntask_queue_writer: TaskQueueWriter,\ntask_executor: TaskExecutor,\nplatform_config: PlatformConfig,\nscheduler: Scheduler,\n) -> None:\nself.log = logging.getLogger(\"SnapShooter\")\nself.db = db\nself.task_queue_writer = task_queue_writer\nself.model_spec = platform_config.model_spec\nself.entity_relation_attrs = defaultdict(dict)\nfor (entity, attr), _ in self.model_spec.relations.items():\nself.entity_relation_attrs[entity][attr] = True\nfor entity in self.model_spec.entities:\nself.entity_relation_attrs[entity][\"_id\"] = True\nself.worker_index = platform_config.process_index\nself.worker_cnt = platform_config.num_processes\nself.config = SnapShooterConfig.parse_obj(platform_config.config.get(\"snapshots\"))\nself._timeseries_hooks = SnapshotTimeseriesHookContainer(self.log, self.model_spec)\nself._correlation_hooks = SnapshotCorrelationHookContainer(self.log, self.model_spec)\nqueue = f\"{platform_config.app_name}-worker-{platform_config.process_index}-snapshots\"\nself.snapshot_queue_reader = TaskQueueReader(\ncallback=self.process_snapshot_task,\nparse_task=Snapshot.parse_raw,\napp_name=platform_config.app_name,\nworker_index=platform_config.process_index,\nrabbit_config=platform_config.config.get(\"processing_core.msg_broker\", {}),\nqueue=queue,\npriority_queue=queue,\nparent_logger=self.log,\n)\nself.snapshot_entities = [\nentity for entity, spec in self.model_spec.entities.items() if spec.snapshot\n]\nself.log.info(\"Snapshots will be created for entities: %s\", self.snapshot_entities)\n# Register snapshot cache\nfor (entity, attr), spec in self.model_spec.relations.items():\nif spec.t == AttrType.PLAIN:\ntask_executor.register_attr_hook(\n\"on_new_plain\", self.add_to_link_cache, entity, attr\n)\nelif spec.t == AttrType.OBSERVATIONS:\ntask_executor.register_attr_hook(\n\"on_new_observation\", self.add_to_link_cache, entity, attr\n)\nif platform_config.process_index != 0:\nself.log.debug(\n\"Snapshot task creation will be disabled in this worker to avoid race conditions.\"\n)\nself.snapshot_queue_writer = None\nreturn\nself.snapshot_queue_writer = TaskQueueWriter(\nplatform_config.app_name,\nplatform_config.num_processes,\nplatform_config.config.get(\"processing_core.msg_broker\"),\nf\"{platform_config.app_name}-main-snapshot-exchange\",\nparent_logger=self.log,\n)\n# Schedule snapshot period\nsnapshot_period = self.config.creation_rate\nscheduler.register(self.make_snapshots, minute=f\"*/{snapshot_period}\")\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.start","title":"start","text":"
    start()\n

    Connect to RabbitMQ and start consuming from TaskQueue.

    Source code in dp3/snapshots/snapshooter.py
    def start(self):\n\"\"\"Connect to RabbitMQ and start consuming from TaskQueue.\"\"\"\nself.log.info(\"Connecting to RabbitMQ\")\nself.snapshot_queue_reader.connect()\nself.snapshot_queue_reader.check()  # check presence of needed queues\nif self.snapshot_queue_writer is not None:\nself.snapshot_queue_writer.connect()\nself.snapshot_queue_writer.check()  # check presence of needed exchanges\nself.snapshot_queue_reader.start()\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.stop","title":"stop","text":"
    stop()\n

    Stop consuming from TaskQueue, disconnect from RabbitMQ.

    Source code in dp3/snapshots/snapshooter.py
    def stop(self):\n\"\"\"Stop consuming from TaskQueue, disconnect from RabbitMQ.\"\"\"\nself.snapshot_queue_reader.stop()\nif self.snapshot_queue_writer is not None:\nself.snapshot_queue_writer.disconnect()\nself.snapshot_queue_reader.disconnect()\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.register_timeseries_hook","title":"register_timeseries_hook","text":"
    register_timeseries_hook(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n

    Registers passed timeseries hook to be called during snapshot creation.

    Binds hook to specified entity_type and attr_type (though same hook can be bound multiple times).

    Parameters:

    Name Type Description Default hook Callable[[str, str, list[dict]], list[DataPointTask]]

    hook callable should expect entity_type, attr_type and attribute history as arguments and return a list of DataPointTask objects.

    required entity_type str

    specifies entity type

    required attr_type str

    specifies attribute type

    required

    Raises:

    Type Description ValueError

    If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.

    Source code in dp3/snapshots/snapshooter.py
    def register_timeseries_hook(\nself,\nhook: Callable[[str, str, list[dict]], list[DataPointTask]],\nentity_type: str,\nattr_type: str,\n):\n\"\"\"\n    Registers passed timeseries hook to be called during snapshot creation.\n    Binds hook to specified `entity_type` and `attr_type` (though same hook can be bound\n    multiple times).\n    Args:\n        hook: `hook` callable should expect entity_type, attr_type and attribute\n            history as arguments and return a list of `DataPointTask` objects.\n        entity_type: specifies entity type\n        attr_type: specifies attribute type\n    Raises:\n        ValueError: If entity_type and attr_type do not specify a valid timeseries attribute,\n            a ValueError is raised.\n    \"\"\"\nself._timeseries_hooks.register(hook, entity_type, attr_type)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.register_correlation_hook","title":"register_correlation_hook","text":"
    register_correlation_hook(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]])\n

    Registers passed hook to be called during snapshot creation.

    Binds hook to specified entity_type (though same hook can be bound multiple times).

    entity_type and attribute specifications are validated, ValueError is raised on failure.

    Parameters:

    Name Type Description Default hook Callable[[str, dict], None]

    hook callable should expect entity type as str and its current values, including linked entities, as dict

    required entity_type str

    specifies entity type

    required depends_on list[list[str]]

    each item should specify an attribute that is depended on in the form of a path from the specified entity_type to individual attributes (even on linked entities).

    required may_change list[list[str]]

    each item should specify an attribute that hook may change. specification format is identical to depends_on.

    required

    Raises:

    Type Description ValueError

    On failure of specification validation.

    Source code in dp3/snapshots/snapshooter.py
    def register_correlation_hook(\nself,\nhook: Callable[[str, dict], None],\nentity_type: str,\ndepends_on: list[list[str]],\nmay_change: list[list[str]],\n):\n\"\"\"\n    Registers passed hook to be called during snapshot creation.\n    Binds hook to specified entity_type (though same hook can be bound multiple times).\n    `entity_type` and attribute specifications are validated, `ValueError` is raised on failure.\n    Args:\n        hook: `hook` callable should expect entity type as str\n            and its current values, including linked entities, as dict\n        entity_type: specifies entity type\n        depends_on: each item should specify an attribute that is depended on\n            in the form of a path from the specified entity_type to individual attributes\n            (even on linked entities).\n        may_change: each item should specify an attribute that `hook` may change.\n            specification format is identical to `depends_on`.\n    Raises:\n        ValueError: On failure of specification validation.\n    \"\"\"\nself._correlation_hooks.register(hook, entity_type, depends_on, may_change)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.add_to_link_cache","title":"add_to_link_cache","text":"
    add_to_link_cache(eid: str, dp: DataPointBase)\n

    Adds the given entity,eid pair to the cache of all linked entitites.

    Source code in dp3/snapshots/snapshooter.py
    def add_to_link_cache(self, eid: str, dp: DataPointBase):\n\"\"\"Adds the given entity,eid pair to the cache of all linked entitites.\"\"\"\ncache = self.db.get_module_cache()\netype_to = self.model_spec.relations[dp.etype, dp.attr].relation_to\nto_insert = [\n{\n\"_id\": f\"{dp.etype}#{eid}\",\n\"etype\": dp.etype,\n\"eid\": eid,\n\"expire_at\": datetime.now() + timedelta(days=2),\n},\n{\n\"_id\": f\"{etype_to}#{dp.v.eid}\",\n\"etype\": etype_to,\n\"eid\": dp.v.eid,\n\"expire_at\": datetime.now() + timedelta(days=2),\n},\n]\nres = cache.bulk_write([ReplaceOne({\"_id\": x[\"_id\"]}, x, upsert=True) for x in to_insert])\nself.log.debug(\"Cached %s linked entities: %s\", len(to_insert), res.bulk_api_result)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_snapshots","title":"make_snapshots","text":"
    make_snapshots()\n

    Creates snapshots for all entities currently active in database.

    Source code in dp3/snapshots/snapshooter.py
    def make_snapshots(self):\n\"\"\"Creates snapshots for all entities currently active in database.\"\"\"\ntime = datetime.now()\n# distribute list of possibly linked entities to all workers\ncached = self.get_cached_link_entity_ids()\nself.log.debug(\"Broadcasting %s cached linked entities\", len(cached))\nself.snapshot_queue_writer.broadcast_task(\ntask=Snapshot(entities=cached, time=time, type=SnapshotMessageType.linked_entities)\n)\n# Load links only for a reduced set of entities\nself.log.debug(\"Loading linked entities.\")\nself.db.save_metadata(time, {\"task_creation_start\": time, \"entities\": 0, \"components\": 0})\ntimes = {}\ncounts = {\"entities\": 0, \"components\": 0}\ntry:\nlinked_entities = self.get_linked_entities(time, cached)\ntimes[\"components_loaded\"] = datetime.now()\nfor linked_entities_component in linked_entities:\ncounts[\"entities\"] += len(linked_entities_component)\ncounts[\"components\"] += 1\nself.snapshot_queue_writer.put_task(\ntask=Snapshot(\nentities=linked_entities_component, time=time, type=SnapshotMessageType.task\n)\n)\nexcept pymongo.errors.CursorNotFound as err:\nself.log.exception(err)\nfinally:\ntimes[\"task_creation_end\"] = datetime.now()\nself.db.update_metadata(\ntime,\nmetadata=times,\nincrease=counts,\n)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_linked_entities","title":"get_linked_entities","text":"
    get_linked_entities(time: datetime, cached_linked_entities: list[tuple[str, str]])\n

    Get weakly connected components from entity graph.

    Source code in dp3/snapshots/snapshooter.py
    def get_linked_entities(self, time: datetime, cached_linked_entities: list[tuple[str, str]]):\n\"\"\"Get weakly connected components from entity graph.\"\"\"\nvisited_entities = set()\nentity_to_component = {}\nlinked_components = []\nfor etype, eid in cached_linked_entities:\nmaster_record = self.db.get_master_record(\netype, eid, projection=self.entity_relation_attrs[etype]\n) or {\"_id\": eid}\nif (etype, master_record[\"_id\"]) not in visited_entities:\n# Get entities linked by current entity\ncurrent_values = self.get_values_at_time(etype, master_record, time)\nlinked_entities = self.load_linked_entity_ids(etype, current_values, time)\n# Set linked as visited\nvisited_entities.update(linked_entities)\n# Update component\nhave_component = linked_entities & set(entity_to_component.keys())\nif have_component:\nfor entity in have_component:\ncomponent = entity_to_component[entity]\ncomponent.update(linked_entities)\nentity_to_component.update(\n{entity: component for entity in linked_entities}\n)\nbreak\nelse:\nentity_to_component.update(\n{entity: linked_entities for entity in linked_entities}\n)\nlinked_components.append(linked_entities)\nreturn linked_components\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.process_snapshot_task","title":"process_snapshot_task","text":"
    process_snapshot_task(msg_id, task: Snapshot)\n

    Acknowledges the received message and makes a snapshot according to the task.

    This function should not be called directly, but set as callback for TaskQueueReader.

    Source code in dp3/snapshots/snapshooter.py
    def process_snapshot_task(self, msg_id, task: Snapshot):\n\"\"\"\n    Acknowledges the received message and makes a snapshot according to the `task`.\n    This function should not be called directly, but set as callback for TaskQueueReader.\n    \"\"\"\nself.snapshot_queue_reader.ack(msg_id)\nif task.type == SnapshotMessageType.task:\nself.make_snapshot(task)\nelif task.type == SnapshotMessageType.linked_entities:\nself.make_snapshots_by_hash(task)\nelse:\nraise ValueError(\"Unknown SnapshotMessageType.\")\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_snapshots_by_hash","title":"make_snapshots_by_hash","text":"
    make_snapshots_by_hash(task: Snapshot)\n

    Make snapshots for all entities with routing key belonging to this worker.

    Source code in dp3/snapshots/snapshooter.py
    def make_snapshots_by_hash(self, task: Snapshot):\n\"\"\"\n    Make snapshots for all entities with routing key belonging to this worker.\n    \"\"\"\nself.log.debug(\"Creating snapshots for worker portion by hash.\")\nhave_links = set(task.entities)\nentity_cnt = 0\nfor etype in self.snapshot_entities:\nrecords_cursor = self.db.get_worker_master_records(\nself.worker_index,\nself.worker_cnt,\netype,\nno_cursor_timeout=True,\n)\ntry:\nsnapshots = []\nfor master_record in records_cursor:\nif (etype, master_record[\"_id\"]) in have_links:\ncontinue\nentity_cnt += 1\nsnapshots.append(self.make_linkless_snapshot(etype, master_record, task.time))\nif len(snapshots) >= DB_SEND_CHUNK:\nself.db.save_snapshots(etype, snapshots, task.time)\nsnapshots.clear()\nif snapshots:\nself.db.save_snapshots(etype, snapshots, task.time)\nsnapshots.clear()\nfinally:\nrecords_cursor.close()\nself.db.update_metadata(\ntask.time,\nmetadata={},\nincrease={\"entities\": entity_cnt, \"components\": entity_cnt},\n)\nself.log.debug(\"Worker snapshot creation done.\")\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_linkless_snapshot","title":"make_linkless_snapshot","text":"
    make_linkless_snapshot(entity_type: str, master_record: dict, time: datetime)\n

    Make a snapshot for given entity master_record and time.

    Runs timeseries and correlation hooks. The resulting snapshot is saved into DB.

    Source code in dp3/snapshots/snapshooter.py
    def make_linkless_snapshot(self, entity_type: str, master_record: dict, time: datetime):\n\"\"\"\n    Make a snapshot for given entity `master_record` and `time`.\n    Runs timeseries and correlation hooks.\n    The resulting snapshot is saved into DB.\n    \"\"\"\nself.run_timeseries_processing(entity_type, master_record)\nvalues = self.get_values_at_time(entity_type, master_record, time)\nentity_values = {(entity_type, master_record[\"_id\"]): values}\nself._correlation_hooks.run(entity_values)\nassert len(entity_values) == 1, \"Expected a single entity.\"\nfor record in entity_values.values():\nreturn record\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.make_snapshot","title":"make_snapshot","text":"
    make_snapshot(task: Snapshot)\n

    Make a snapshot for entities and time specified by task.

    Runs timeseries and correlation hooks. The resulting snapshots are saved into DB.

    Source code in dp3/snapshots/snapshooter.py
    def make_snapshot(self, task: Snapshot):\n\"\"\"\n    Make a snapshot for entities and time specified by `task`.\n    Runs timeseries and correlation hooks.\n    The resulting snapshots are saved into DB.\n    \"\"\"\nentity_values = {}\nfor entity_type, entity_id in task.entities:\nrecord = self.db.get_master_record(entity_type, entity_id) or {\"_id\": entity_id}\nself.run_timeseries_processing(entity_type, record)\nvalues = self.get_values_at_time(entity_type, record, task.time)\nentity_values[entity_type, entity_id] = values\nself.link_loaded_entities(entity_values)\nself._correlation_hooks.run(entity_values)\n# unlink entities again\nfor rtype_rid, record in entity_values.items():\nrtype, rid = rtype_rid\nfor attr, value in record.items():\nif (rtype, attr) not in self.model_spec.relations:\ncontinue\nif self.model_spec.relations[rtype, attr].multi_value:\nrecord[attr] = [\n{k: v for k, v in link_dict.items() if k != \"record\"} for link_dict in value\n]\nelse:\nrecord[attr] = {k: v for k, v in value.items() if k != \"record\"}\nfor rtype_rid, record in entity_values.items():\nself.db.save_snapshot(rtype_rid[0], record, task.time)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.run_timeseries_processing","title":"run_timeseries_processing","text":"
    run_timeseries_processing(entity_type, master_record)\n
    • all registered timeseries processing modules must be called
    • this should result in observations or plain datapoints, which will be saved to db and forwarded in processing
    Source code in dp3/snapshots/snapshooter.py
    def run_timeseries_processing(self, entity_type, master_record):\n\"\"\"\n    - all registered timeseries processing modules must be called\n      - this should result in `observations` or `plain` datapoints, which will be saved to db\n        and forwarded in processing\n    \"\"\"\ntasks = []\nfor attr, attr_spec in self.model_spec.entity_attributes[entity_type].items():\nif attr_spec.t == AttrType.TIMESERIES and attr in master_record:\nnew_tasks = self._timeseries_hooks.run(entity_type, attr, master_record[attr])\ntasks.extend(new_tasks)\nself.extend_master_record(entity_type, master_record, tasks)\nfor task in tasks:\nself.task_queue_writer.put_task(task)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.extend_master_record","title":"extend_master_record staticmethod","text":"
    extend_master_record(etype, master_record, new_tasks: list[DataPointTask])\n

    Update existing master record with datapoints from new tasks

    Source code in dp3/snapshots/snapshooter.py
    @staticmethod\ndef extend_master_record(etype, master_record, new_tasks: list[DataPointTask]):\n\"\"\"Update existing master record with datapoints from new tasks\"\"\"\nfor task in new_tasks:\nfor datapoint in task.data_points:\nif datapoint.etype != etype:\ncontinue\ndp_dict = datapoint.dict(include={\"v\", \"t1\", \"t2\", \"c\"})\nif datapoint.attr in master_record:\nmaster_record[datapoint.attr].append()\nelse:\nmaster_record[datapoint.attr] = [dp_dict]\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.load_linked_entity_ids","title":"load_linked_entity_ids","text":"
    load_linked_entity_ids(entity_type: str, current_values: dict, time: datetime)\n

    Loads the subgraph of entities linked to the current entity, returns a list of their types and ids.

    Source code in dp3/snapshots/snapshooter.py
    def load_linked_entity_ids(self, entity_type: str, current_values: dict, time: datetime):\n\"\"\"\n    Loads the subgraph of entities linked to the current entity,\n    returns a list of their types and ids.\n    \"\"\"\nloaded_entity_ids = {(entity_type, current_values[\"eid\"])}\nlinked_entity_ids_to_process = (\nself.get_linked_entity_ids(entity_type, current_values) - loaded_entity_ids\n)\nwhile linked_entity_ids_to_process:\nentity_identifiers = linked_entity_ids_to_process.pop()\nlinked_etype, linked_eid = entity_identifiers\nrelevant_attributes = self.entity_relation_attrs[linked_etype]\nrecord = self.db.get_master_record(\nlinked_etype, linked_eid, projection=relevant_attributes\n) or {\"_id\": linked_eid}\nlinked_values = self.get_values_at_time(linked_etype, record, time)\nlinked_entity_ids_to_process.update(\nself.get_linked_entity_ids(entity_type, linked_values) - set(loaded_entity_ids)\n)\nloaded_entity_ids.add((linked_etype, linked_eid))\nreturn loaded_entity_ids\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_linked_entity_ids","title":"get_linked_entity_ids","text":"
    get_linked_entity_ids(entity_type: str, current_values: dict) -> set[tuple[str, str]]\n

    Returns a set of tuples (entity_type, entity_id) identifying entities linked by current_values.

    Source code in dp3/snapshots/snapshooter.py
    def get_linked_entity_ids(self, entity_type: str, current_values: dict) -> set[tuple[str, str]]:\n\"\"\"\n    Returns a set of tuples (entity_type, entity_id) identifying entities linked by\n    `current_values`.\n    \"\"\"\nrelated_entity_ids = set()\nfor attr, val in current_values.items():\nif (entity_type, attr) not in self.model_spec.relations:\ncontinue\nattr_spec = self.model_spec.relations[entity_type, attr]\nif attr_spec.multi_value:\nrelated_entity_ids.update((attr_spec.relation_to, v[\"eid\"]) for v in val)\nelse:\nrelated_entity_ids.add((attr_spec.relation_to, val[\"eid\"]))\nreturn related_entity_ids\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_value_at_time","title":"get_value_at_time","text":"
    get_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[Any, float]\n

    Get current value of an attribute from its history. Assumes multi_value = False.

    Source code in dp3/snapshots/snapshooter.py
    def get_value_at_time(\nself, attr_spec: AttrSpecObservations, attr_history, time: datetime\n) -> tuple[Any, float]:\n\"\"\"Get current value of an attribute from its history. Assumes `multi_value = False`.\"\"\"\nreturn max(\n(\n(point[\"v\"], self.extrapolate_confidence(point, time, attr_spec.history_params))\nfor point in attr_history\n),\nkey=lambda val_conf: val_conf[1],\ndefault=(None, 0.0),\n)\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.get_multi_value_at_time","title":"get_multi_value_at_time","text":"
    get_multi_value_at_time(attr_spec: AttrSpecObservations, attr_history: AttrSpecObservations, time: datetime) -> tuple[list, list[float]]\n

    Get current value of a multi_value attribute from its history.

    Source code in dp3/snapshots/snapshooter.py
    def get_multi_value_at_time(\nself, attr_spec: AttrSpecObservations, attr_history, time: datetime\n) -> tuple[list, list[float]]:\n\"\"\"Get current value of a multi_value attribute from its history.\"\"\"\nif attr_spec.data_type.hashable:\nvalues_with_confidence = defaultdict(float)\nfor point in attr_history:\nvalue = point[\"v\"]\nconfidence = self.extrapolate_confidence(point, time, attr_spec.history_params)\nif confidence > 0.0 and values_with_confidence[value] < confidence:\nvalues_with_confidence[value] = confidence\nreturn list(values_with_confidence.keys()), list(values_with_confidence.values())\nelse:\nvalues = []\nconfidence_list = []\nfor point in attr_history:\nvalue = point[\"v\"]\nconfidence = self.extrapolate_confidence(point, time, attr_spec.history_params)\nif value in values:\ni = values.index(value)\nif confidence_list[i] < confidence:\nconfidence_list[i] = confidence\nelif confidence > 0.0:\nvalues.append(value)\nconfidence_list.append(confidence)\nreturn values, confidence_list\n
    "},{"location":"reference/snapshots/snapshooter/#dp3.snapshots.snapshooter.SnapShooter.extrapolate_confidence","title":"extrapolate_confidence staticmethod","text":"
    extrapolate_confidence(datapoint: dict, time: datetime, history_params: ObservationsHistoryParams) -> float\n

    Get the confidence value at given time.

    Source code in dp3/snapshots/snapshooter.py
    @staticmethod\ndef extrapolate_confidence(\ndatapoint: dict, time: datetime, history_params: ObservationsHistoryParams\n) -> float:\n\"\"\"Get the confidence value at given time.\"\"\"\nt1 = datapoint[\"t1\"]\nt2 = datapoint[\"t2\"]\nbase_confidence = datapoint[\"c\"]\nif time < t1:\nif time <= t1 - history_params.pre_validity:\nreturn 0.0\nreturn base_confidence * (1 - (t1 - time) / history_params.pre_validity)\nif time <= t2:\nreturn base_confidence  # completely inside the (strict) interval\nif time >= t2 + history_params.post_validity:\nreturn 0.0\nreturn base_confidence * (1 - (time - t2) / history_params.post_validity)\n
    "},{"location":"reference/snapshots/snapshot_hooks/","title":"snapshot_hooks","text":""},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks","title":"dp3.snapshots.snapshot_hooks","text":"

    Module managing registered hooks and their dependencies on one another.

    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer","title":"SnapshotTimeseriesHookContainer","text":"
    SnapshotTimeseriesHookContainer(log: logging.Logger, model_spec: ModelSpec)\n

    Container for timeseries analysis hooks

    Source code in dp3/snapshots/snapshot_hooks.py
    def __init__(self, log: logging.Logger, model_spec: ModelSpec):\nself.log = log.getChild(\"TimeseriesHooks\")\nself.model_spec = model_spec\nself._hooks = defaultdict(list)\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.register","title":"register","text":"
    register(hook: Callable[[str, str, list[dict]], list[DataPointTask]], entity_type: str, attr_type: str)\n

    Registers passed timeseries hook to be called during snapshot creation.

    Binds hook to specified entity_type and attr_type (though same hook can be bound multiple times). If entity_type and attr_type do not specify a valid timeseries attribute, a ValueError is raised.

    Parameters:

    Name Type Description Default hook Callable[[str, str, list[dict]], list[DataPointTask]]

    hook callable should expect entity_type, attr_type and attribute history as arguments and return a list of Task objects.

    required entity_type str

    specifies entity type

    required attr_type str

    specifies attribute type

    required Source code in dp3/snapshots/snapshot_hooks.py
    def register(\nself,\nhook: Callable[[str, str, list[dict]], list[DataPointTask]],\nentity_type: str,\nattr_type: str,\n):\n\"\"\"\n    Registers passed timeseries hook to be called during snapshot creation.\n    Binds hook to specified entity_type and attr_type (though same hook can be bound\n    multiple times).\n    If entity_type and attr_type do not specify a valid timeseries attribute,\n    a ValueError is raised.\n    Args:\n        hook: `hook` callable should expect entity_type, attr_type and attribute\n            history as arguments and return a list of `Task` objects.\n        entity_type: specifies entity type\n        attr_type: specifies attribute type\n    \"\"\"\nif (entity_type, attr_type) not in self.model_spec.attributes:\nraise ValueError(f\"Attribute '{attr_type}' of entity '{entity_type}' does not exist.\")\nspec = self.model_spec.attributes[entity_type, attr_type]\nif spec.t != AttrType.TIMESERIES:\nraise ValueError(f\"'{entity_type}.{attr_type}' is not a timeseries, but '{spec.t}'\")\nself._hooks[entity_type, attr_type].append(hook)\nself.log.debug(f\"Added hook: '{hook.__qualname__}'\")\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotTimeseriesHookContainer.run","title":"run","text":"
    run(entity_type: str, attr_type: str, attr_history: list[dict]) -> list[DataPointTask]\n

    Runs registered hooks.

    Source code in dp3/snapshots/snapshot_hooks.py
    def run(\nself, entity_type: str, attr_type: str, attr_history: list[dict]\n) -> list[DataPointTask]:\n\"\"\"Runs registered hooks.\"\"\"\ntasks = []\nfor hook in self._hooks[entity_type, attr_type]:\ntry:\nnew_tasks = hook(entity_type, attr_type, attr_history)\ntasks.extend(new_tasks)\nexcept Exception as e:\nself.log.error(f\"Error during running hook {hook}: {e}\")\nreturn tasks\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotCorrelationHookContainer","title":"SnapshotCorrelationHookContainer","text":"
    SnapshotCorrelationHookContainer(log: logging.Logger, model_spec: ModelSpec)\n

    Container for data fusion and correlation hooks.

    Source code in dp3/snapshots/snapshot_hooks.py
    def __init__(self, log: logging.Logger, model_spec: ModelSpec):\nself.log = log.getChild(\"CorrelationHooks\")\nself.model_spec = model_spec\nself._hooks: defaultdict[str, list[tuple[str, Callable]]] = defaultdict(list)\nself._dependency_graph = DependencyGraph(self.log)\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotCorrelationHookContainer.register","title":"register","text":"
    register(hook: Callable[[str, dict], None], entity_type: str, depends_on: list[list[str]], may_change: list[list[str]]) -> str\n

    Registers passed hook to be called during snapshot creation.

    Binds hook to specified entity_type (though same hook can be bound multiple times).

    If entity_type and attribute specifications are validated and ValueError is raised on failure.

    Parameters:

    Name Type Description Default hook Callable[[str, dict], None]

    hook callable should expect entity type as str and its current values, including linked entities, as dict

    required entity_type str

    specifies entity type

    required depends_on list[list[str]]

    each item should specify an attribute that is depended on in the form of a path from the specified entity_type to individual attributes (even on linked entities).

    required may_change list[list[str]]

    each item should specify an attribute that hook may change. specification format is identical to depends_on.

    required

    Returns:

    Type Description str

    Generated hook id.

    Source code in dp3/snapshots/snapshot_hooks.py
    def register(\nself,\nhook: Callable[[str, dict], None],\nentity_type: str,\ndepends_on: list[list[str]],\nmay_change: list[list[str]],\n) -> str:\n\"\"\"\n    Registers passed hook to be called during snapshot creation.\n    Binds hook to specified entity_type (though same hook can be bound multiple times).\n    If entity_type and attribute specifications are validated\n    and ValueError is raised on failure.\n    Args:\n        hook: `hook` callable should expect entity type as str\n            and its current values, including linked entities, as dict\n        entity_type: specifies entity type\n        depends_on: each item should specify an attribute that is depended on\n            in the form of a path from the specified entity_type to individual attributes\n            (even on linked entities).\n        may_change: each item should specify an attribute that `hook` may change.\n            specification format is identical to `depends_on`.\n    Returns:\n        Generated hook id.\n    \"\"\"\nif entity_type not in self.model_spec.entities:\nraise ValueError(f\"Entity '{entity_type}' does not exist.\")\nself._validate_attr_paths(entity_type, depends_on)\nself._validate_attr_paths(entity_type, may_change)\ndepends_on = self._expand_path_backlinks(entity_type, depends_on)\nmay_change = self._expand_path_backlinks(entity_type, may_change)\ndepends_on = self._embed_base_entity(entity_type, depends_on)\nmay_change = self._embed_base_entity(entity_type, may_change)\nhook_id = (\nf\"{hook.__qualname__}(\"\nf\"{entity_type}, [{','.join(depends_on)}], [{','.join(may_change)}]\"\nf\")\"\n)\nself._dependency_graph.add_hook_dependency(hook_id, depends_on, may_change)\nself._hooks[entity_type].append((hook_id, hook))\nself._restore_hook_order(self._hooks[entity_type])\nself.log.debug(f\"Added hook: '{hook_id}'\")\nreturn hook_id\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.SnapshotCorrelationHookContainer.run","title":"run","text":"
    run(entities: dict)\n

    Runs registered hooks.

    Source code in dp3/snapshots/snapshot_hooks.py
    def run(self, entities: dict):\n\"\"\"Runs registered hooks.\"\"\"\nentity_types = {etype for etype, _ in entities}\nhook_subset = [\n(hook_id, hook, etype) for etype in entity_types for hook_id, hook in self._hooks[etype]\n]\ntopological_order = self._dependency_graph.topological_order\nhook_subset.sort(key=lambda x: topological_order.index(x[0]))\nentities_by_etype = {\netype_eid[0]: {etype_eid[1]: entity} for etype_eid, entity in entities.items()\n}\nfor hook_id, hook, etype in hook_subset:\nfor eid, entity_values in entities_by_etype[etype].items():\nself.log.debug(\"Running hook %s on entity %s\", hook_id, eid)\nhook(etype, entity_values)\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.GraphVertex","title":"GraphVertex dataclass","text":"

    Vertex in a graph of dependencies

    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph","title":"DependencyGraph","text":"
    DependencyGraph(log)\n

    Class representing a graph of dependencies between correlation hooks.

    Source code in dp3/snapshots/snapshot_hooks.py
    def __init__(self, log):\nself.log = log.getChild(\"DependencyGraph\")\n# dictionary of adjacency lists for each edge\nself._vertices = defaultdict(GraphVertex)\nself.topological_order = []\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.add_hook_dependency","title":"add_hook_dependency","text":"
    add_hook_dependency(hook_id: str, depends_on: list[str], may_change: list[str])\n

    Add hook to dependency graph and recalculate if any cycles are created.

    Source code in dp3/snapshots/snapshot_hooks.py
    def add_hook_dependency(self, hook_id: str, depends_on: list[str], may_change: list[str]):\n\"\"\"Add hook to dependency graph and recalculate if any cycles are created.\"\"\"\nif hook_id in self._vertices:\nraise ValueError(f\"Hook id '{hook_id}' already present in the vertices.\")\nfor path in depends_on:\nself.add_edge(path, hook_id)\nfor path in may_change:\nself.add_edge(hook_id, path)\nself._vertices[hook_id].type = \"hook\"\ntry:\nself.topological_sort()\nexcept ValueError as err:\nraise ValueError(f\"Hook {hook_id} introduces a circular dependency.\") from err\nself.check_multiple_writes()\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.add_edge","title":"add_edge","text":"
    add_edge(id_from: Hashable, id_to: Hashable)\n

    Add oriented edge between specified vertices.

    Source code in dp3/snapshots/snapshot_hooks.py
    def add_edge(self, id_from: Hashable, id_to: Hashable):\n\"\"\"Add oriented edge between specified vertices.\"\"\"\nself._vertices[id_from].adj.append(id_to)\n# Ensure vertex with 'id_to' exists to avoid iteration errors later.\n_ = self._vertices[id_to]\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.calculate_in_degrees","title":"calculate_in_degrees","text":"
    calculate_in_degrees()\n

    Calculate number of incoming edges for each vertex. Time complexity O(V + E).

    Source code in dp3/snapshots/snapshot_hooks.py
    def calculate_in_degrees(self):\n\"\"\"Calculate number of incoming edges for each vertex. Time complexity O(V + E).\"\"\"\nfor vertex_node in self._vertices.values():\nvertex_node.in_degree = 0\nfor vertex_node in self._vertices.values():\nfor adjacent_name in vertex_node.adj:\nself._vertices[adjacent_name].in_degree += 1\n
    "},{"location":"reference/snapshots/snapshot_hooks/#dp3.snapshots.snapshot_hooks.DependencyGraph.topological_sort","title":"topological_sort","text":"
    topological_sort()\n

    Implementation of Kahn's algorithm for topological sorting. Raises ValueError if there is a cycle in the graph.

    See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm

    Source code in dp3/snapshots/snapshot_hooks.py
    def topological_sort(self):\n\"\"\"\n    Implementation of Kahn's algorithm for topological sorting.\n    Raises ValueError if there is a cycle in the graph.\n    See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm\n    \"\"\"\nself.calculate_in_degrees()\nqueue = [(node_id, node) for node_id, node in self._vertices.items() if node.in_degree == 0]\ntopological_order = []\nprocessed_vertices_cnt = 0\nwhile queue:\ncurr_node_id, curr_node = queue.pop(0)\ntopological_order.append(curr_node_id)\n# Decrease neighbouring nodes' in-degree by 1\nfor neighbor in curr_node.adj:\nneighbor_node = self._vertices[neighbor]\nneighbor_node.in_degree -= 1\n# If in-degree becomes zero, add it to queue\nif neighbor_node.in_degree == 0:\nqueue.append((neighbor, neighbor_node))\nprocessed_vertices_cnt += 1\nif processed_vertices_cnt != len(self._vertices):\nraise ValueError(\"Dependency graph contains a cycle.\")\nelse:\nself.topological_order = topological_order\nreturn topological_order\n
    "},{"location":"reference/task_processing/","title":"task_processing","text":""},{"location":"reference/task_processing/#dp3.task_processing","title":"dp3.task_processing","text":"

    Module responsible for task distribution, processing and running configured hooks. Task distribution is possible due to the task queue.

    "},{"location":"reference/task_processing/task_distributor/","title":"task_distributor","text":""},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor","title":"dp3.task_processing.task_distributor","text":""},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor.TaskDistributor","title":"TaskDistributor","text":"
    TaskDistributor(task_executor: TaskExecutor, platform_config: PlatformConfig, registrar: CallbackRegistrar, daemon_stop_lock: threading.Lock) -> None\n

    TaskDistributor uses task queues to distribute tasks between all running processes.

    Tasks are assigned to worker processes based on hash of entity key, so each entity is always processed by the same worker. Therefore, all requests modifying a particular entity are done sequentially and no locking is necessary.

    Tasks that are assigned to the current process are passed to task_executor for execution.

    Parameters:

    Name Type Description Default platform_config PlatformConfig

    Platform config

    required task_executor TaskExecutor

    Instance of TaskExecutor

    required registrar CallbackRegistrar

    Interface for callback registration

    required daemon_stop_lock threading.Lock

    Lock used to control when the program stops. (see dp3.worker)

    required Source code in dp3/task_processing/task_distributor.py
    def __init__(\nself,\ntask_executor: TaskExecutor,\nplatform_config: PlatformConfig,\nregistrar: CallbackRegistrar,\ndaemon_stop_lock: threading.Lock,\n) -> None:\nassert (\n0 <= platform_config.process_index < platform_config.num_processes\n), \"process index must be smaller than number of processes\"\nself.log = logging.getLogger(\"TaskDistributor\")\nself.process_index = platform_config.process_index\nself.num_processes = platform_config.num_processes\nself.model_spec = platform_config.model_spec\nself.daemon_stop_lock = daemon_stop_lock\nself.rabbit_params = platform_config.config.get(\"processing_core.msg_broker\", {})\nself.entity_types = list(\nplatform_config.config.get(\"db_entities\").keys()\n)  # List of configured entity types\nself.running = False\n# List of worker threads for processing the update requests\nself._worker_threads = []\nself.num_threads = platform_config.config.get(\"processing_core.worker_threads\", 8)\n# Internal queues for each worker\nself._queues = [queue.Queue(10) for _ in range(self.num_threads)]\n# Connections to main task queue\n# Reader - reads tasks from a pair of queues (one pair per process)\n# and distributes them to worker threads\nself._task_queue_reader = TaskQueueReader(\ncallback=self._distribute_task,\nparse_task=lambda body: DataPointTask(model_spec=self.model_spec, **json.loads(body)),\napp_name=platform_config.app_name,\nworker_index=self.process_index,\nrabbit_config=self.rabbit_params,\n)\n# Writer - allows modules to write new tasks\nself._task_queue_writer = TaskQueueWriter(\nplatform_config.app_name, self.num_processes, self.rabbit_params\n)\nself.task_executor = task_executor\n# Object to store thread-local data (e.g. worker-thread index)\n# (each thread sees different object contents)\nself._current_thread_data = threading.local()\n# Number of restarts of threads by watchdog\nself._watchdog_restarts = 0\n# Register watchdog to scheduler\nregistrar.scheduler_register(self._watchdog, second=\"*/30\")\n
    "},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor.TaskDistributor.start","title":"start","text":"
    start() -> None\n

    Run the worker threads and start consuming from TaskQueue.

    Source code in dp3/task_processing/task_distributor.py
    def start(self) -> None:\n\"\"\"Run the worker threads and start consuming from TaskQueue.\"\"\"\nself.log.info(\"Connecting to RabbitMQ\")\nself._task_queue_reader.connect()\nself._task_queue_reader.check()  # check presence of needed queues\nself._task_queue_writer.connect()\nself._task_queue_writer.check()  # check presence of needed exchanges\nself.log.info(f\"Starting {self.num_threads} worker threads\")\nself.running = True\nself._worker_threads = [\nthreading.Thread(\ntarget=self._worker_func, args=(i,), name=f\"Worker-{self.process_index}-{i}\"\n)\nfor i in range(self.num_threads)\n]\nfor worker in self._worker_threads:\nworker.start()\nself.log.info(\"Starting consuming tasks from main queue\")\nself._task_queue_reader.start()\n
    "},{"location":"reference/task_processing/task_distributor/#dp3.task_processing.task_distributor.TaskDistributor.stop","title":"stop","text":"
    stop() -> None\n

    Stop the worker threads.

    Source code in dp3/task_processing/task_distributor.py
    def stop(self) -> None:\n\"\"\"Stop the worker threads.\"\"\"\nself.log.info(\"Waiting for worker threads to finish their current tasks ...\")\n# Thread for printing debug messages about worker status\nthreading.Thread(target=self._dbg_worker_status_print, daemon=True).start()\n# Stop receiving new tasks from global queue\nself._task_queue_reader.stop()\n# Signalize stop to worker threads\nself.running = False\n# Wait until all workers stopped\nfor worker in self._worker_threads:\nworker.join()\nself._task_queue_reader.disconnect()\nself._task_queue_writer.disconnect()\n# Cleanup\nself._worker_threads = []\n
    "},{"location":"reference/task_processing/task_executor/","title":"task_executor","text":""},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor","title":"dp3.task_processing.task_executor","text":""},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor","title":"TaskExecutor","text":"
    TaskExecutor(db: EntityDatabase, platform_config: PlatformConfig) -> None\n

    TaskExecutor manages updates of entity records, which are being read from task queue (via parent TaskDistributor)

    Parameters:

    Name Type Description Default db EntityDatabase

    Instance of EntityDatabase

    required platform_config PlatformConfig

    Current platform configuration.

    required Source code in dp3/task_processing/task_executor.py
    def __init__(\nself,\ndb: EntityDatabase,\nplatform_config: PlatformConfig,\n) -> None:\n# initialize task distribution\nself.log = logging.getLogger(\"TaskExecutor\")\n# Get list of configured entity types\nself.entity_types = list(platform_config.model_spec.entities.keys())\nself.log.debug(f\"Configured entity types: {self.entity_types}\")\nself.model_spec = platform_config.model_spec\nself.db = db\n# EventCountLogger\n# - count number of events across multiple processes using shared counters in Redis\necl = EventCountLogger(\nplatform_config.config.get(\"event_logging.groups\"),\nplatform_config.config.get(\"event_logging.redis\"),\n)\nself.elog = ecl.get_group(\"te\") or DummyEventGroup()\nself.elog_by_src = ecl.get_group(\"tasks_by_src\") or DummyEventGroup()\n# Print warning if some event group is not configured\nnot_configured_groups = []\nif isinstance(self.elog, DummyEventGroup):\nnot_configured_groups.append(\"te\")\nif isinstance(self.elog_by_src, DummyEventGroup):\nnot_configured_groups.append(\"tasks_by_src\")\nif not_configured_groups:\nself.log.warning(\n\"EventCountLogger: No configuration for event group(s) \"\nf\"'{','.join(not_configured_groups)}' found, \"\n\"such events will not be logged (check event_logging.yml)\"\n)\n# Hooks\nself._task_generic_hooks = TaskGenericHooksContainer(self.log)\nself._task_entity_hooks = {}\nself._task_attr_hooks = {}\nfor entity in self.model_spec.entities:\nself._task_entity_hooks[entity] = TaskEntityHooksContainer(entity, self.log)\nfor entity, attr in self.model_spec.attributes:\nattr_type = self.model_spec.attributes[entity, attr].t\nself._task_attr_hooks[entity, attr] = TaskAttrHooksContainer(\nentity, attr, attr_type, self.log\n)\n
    "},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.register_task_hook","title":"register_task_hook","text":"
    register_task_hook(hook_type: str, hook: Callable)\n

    Registers one of available task hooks

    See: TaskGenericHooksContainer in task_hooks.py

    Source code in dp3/task_processing/task_executor.py
    def register_task_hook(self, hook_type: str, hook: Callable):\n\"\"\"Registers one of available task hooks\n    See: [`TaskGenericHooksContainer`][dp3.task_processing.task_hooks.TaskGenericHooksContainer]\n    in `task_hooks.py`\n    \"\"\"\nself._task_generic_hooks.register(hook_type, hook)\n
    "},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.register_entity_hook","title":"register_entity_hook","text":"
    register_entity_hook(hook_type: str, hook: Callable, entity: str)\n

    Registers one of available task entity hooks

    See: TaskEntityHooksContainer in task_hooks.py

    Source code in dp3/task_processing/task_executor.py
    def register_entity_hook(self, hook_type: str, hook: Callable, entity: str):\n\"\"\"Registers one of available task entity hooks\n    See: [`TaskEntityHooksContainer`][dp3.task_processing.task_hooks.TaskEntityHooksContainer]\n    in `task_hooks.py`\n    \"\"\"\nself._task_entity_hooks[entity].register(hook_type, hook)\n
    "},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.register_attr_hook","title":"register_attr_hook","text":"
    register_attr_hook(hook_type: str, hook: Callable, entity: str, attr: str)\n

    Registers one of available task attribute hooks

    See: TaskAttrHooksContainer in task_hooks.py

    Source code in dp3/task_processing/task_executor.py
    def register_attr_hook(self, hook_type: str, hook: Callable, entity: str, attr: str):\n\"\"\"Registers one of available task attribute hooks\n    See: [`TaskAttrHooksContainer`][dp3.task_processing.task_hooks.TaskAttrHooksContainer]\n    in `task_hooks.py`\n    \"\"\"\nself._task_attr_hooks[entity, attr].register(hook_type, hook)\n
    "},{"location":"reference/task_processing/task_executor/#dp3.task_processing.task_executor.TaskExecutor.process_task","title":"process_task","text":"
    process_task(task: DataPointTask) -> tuple[bool, list[DataPointTask]]\n

    Main processing function - push datapoint values, running all registered hooks.

    Parameters:

    Name Type Description Default task DataPointTask

    Task object to process.

    required

    Returns:

    Type Description bool

    True if a new record was created, False otherwise,

    list[DataPointTask]

    and a list of new tasks created by hooks

    Source code in dp3/task_processing/task_executor.py
    def process_task(self, task: DataPointTask) -> tuple[bool, list[DataPointTask]]:\n\"\"\"\n    Main processing function - push datapoint values, running all registered hooks.\n    Args:\n        task: Task object to process.\n    Returns:\n        True if a new record was created, False otherwise,\n        and a list of new tasks created by hooks\n    \"\"\"\nself.log.debug(f\"Received new task {task.etype}/{task.eid}, starting processing!\")\nnew_tasks = []\n# Run on_task_start hook\nself._task_generic_hooks.run_on_start(task)\n# Check existence of etype\nif task.etype not in self.entity_types:\nself.log.error(f\"Task {task.etype}/{task.eid}: Unknown entity type!\")\nself.elog.log(\"task_processing_error\")\nreturn False, new_tasks\n# Check existence of eid\ntry:\nekey_exists = self.db.ekey_exists(task.etype, task.eid)\nexcept DatabaseError as e:\nself.log.error(f\"Task {task.etype}/{task.eid}: DB error: {e}\")\nself.elog.log(\"task_processing_error\")\nreturn False, new_tasks\nnew_entity = not ekey_exists\nif new_entity:\n# Run allow_entity_creation hook\nif not self._task_entity_hooks[task.etype].run_allow_creation(task.eid, task):\nself.log.debug(\nf\"Task {task.etype}/{task.eid}: hooks decided not to create new eid record\"\n)\nreturn False, new_tasks\n# Run on_entity_creation hook\nnew_tasks += self._task_entity_hooks[task.etype].run_on_creation(task.eid, task)\n# Insert into database\ntry:\nself.db.insert_datapoints(task.etype, task.eid, task.data_points, new_entity=new_entity)\nself.log.debug(f\"Task {task.etype}/{task.eid}: All changes written to DB\")\nexcept DatabaseError as e:\nself.log.error(f\"Task {task.etype}/{task.eid}: DB error: {e}\")\nself.elog.log(\"task_processing_error\")\nreturn False, new_tasks\n# Run attribute hooks\nfor dp in task.data_points:\nnew_tasks += self._task_attr_hooks[dp.etype, dp.attr].run_on_new(dp.eid, dp)\n# Log the processed task\nself.elog.log(\"task_processed\")\nfor dp in task.data_points:\nif dp.src:\nself.elog_by_src.log(dp.src)\nif new_entity:\nself.elog.log(\"record_created\")\nself.log.debug(f\"Secondary modules created {len(new_tasks)} new tasks.\")\nreturn new_entity, new_tasks\n
    "},{"location":"reference/task_processing/task_hooks/","title":"task_hooks","text":""},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks","title":"dp3.task_processing.task_hooks","text":""},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks.TaskGenericHooksContainer","title":"TaskGenericHooksContainer","text":"
    TaskGenericHooksContainer(log: logging.Logger)\n

    Container for generic hooks

    Possible hooks:

    • on_task_start: receives Task, no return value requirements
    Source code in dp3/task_processing/task_hooks.py
    def __init__(self, log: logging.Logger):\nself.log = log.getChild(\"genericHooks\")\nself._on_start = []\n
    "},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks.TaskEntityHooksContainer","title":"TaskEntityHooksContainer","text":"
    TaskEntityHooksContainer(entity: str, log: logging.Logger)\n

    Container for entity hooks

    Possible hooks:

    • allow_entity_creation: receives eid and Task, may prevent entity record creation (by returning False)
    • on_entity_creation: receives eid and Task, may return list of DataPointTasks
    Source code in dp3/task_processing/task_hooks.py
    def __init__(self, entity: str, log: logging.Logger):\nself.entity = entity\nself.log = log.getChild(f\"entityHooks.{entity}\")\nself._allow_creation = []\nself._on_creation = []\n
    "},{"location":"reference/task_processing/task_hooks/#dp3.task_processing.task_hooks.TaskAttrHooksContainer","title":"TaskAttrHooksContainer","text":"
    TaskAttrHooksContainer(entity: str, attr: str, attr_type: AttrType, log: logging.Logger)\n

    Container for attribute hooks

    Possible hooks:

    • on_new_plain, on_new_observation, on_new_ts_chunk: receives eid and DataPointBase, may return a list of DataPointTasks
    Source code in dp3/task_processing/task_hooks.py
    def __init__(self, entity: str, attr: str, attr_type: AttrType, log: logging.Logger):\nself.entity = entity\nself.attr = attr\nself.log = log.getChild(f\"attributeHooks.{entity}.{attr}\")\nif attr_type == AttrType.PLAIN:\nself.on_new_hook_type = \"on_new_plain\"\nelif attr_type == AttrType.OBSERVATIONS:\nself.on_new_hook_type = \"on_new_observation\"\nelif attr_type == AttrType.TIMESERIES:\nself.on_new_hook_type = \"on_new_ts_chunk\"\nelse:\nraise ValueError(f\"Invalid attribute type '{attr_type}'\")\nself._on_new = []\n
    "},{"location":"reference/task_processing/task_queue/","title":"task_queue","text":""},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue","title":"dp3.task_processing.task_queue","text":"

    Functions to work with the main task queue (RabbitMQ)

    There are two queues for each worker process: - \"normal\" queue for tasks added by other components, this has a limit of 100 tasks. - \"priority\" one for tasks added by workers themselves, this has no limit since workers mustn't be stopped by waiting for the queue.

    These queues are presented as a single one by this wrapper. The TaskQueueReader first looks into the \"priority\" queue and only if there is no task waiting, it reads the normal one.

    Tasks are distributed to worker processes (and threads) by hash of the entity which is to be modified. The destination queue is decided by the message source, so each source must know how many worker processes are there.

    Exchange and queues must be declared externally!

    Related configuration keys and their defaults: (should be part of global DP3 config files)

    rabbitmq:\n  host: localhost\n  port: 5672\n  virtual_host: /\n  username: guest\n  password: guest\n\nworker_processes: 1\n

    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.RobustAMQPConnection","title":"RobustAMQPConnection","text":"
    RobustAMQPConnection(rabbit_config: dict = None) -> None\n

    Common TaskQueue wrapper, handles connection to RabbitMQ server with automatic reconnection. TaskQueueWriter and TaskQueueReader are derived from this.

    Parameters:

    Name Type Description Default rabbit_config dict

    RabbitMQ connection parameters, dict with following keys (all optional): host, port, virtual_host, username, password

    None Source code in dp3/task_processing/task_queue.py
    def __init__(self, rabbit_config: dict = None) -> None:\nrabbit_config = {} if rabbit_config is None else rabbit_config\nself.log = logging.getLogger(\"RobustAMQPConnection\")\nself.conn_params = {\n\"hostname\": rabbit_config.get(\"host\", \"localhost\"),\n\"port\": int(rabbit_config.get(\"port\", 5672)),\n\"virtual_host\": rabbit_config.get(\"virtual_host\", \"/\"),\n\"username\": rabbit_config.get(\"username\", \"guest\"),\n\"password\": rabbit_config.get(\"password\", \"guest\"),\n}\nself.connection = None\nself.channel = None\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.RobustAMQPConnection.connect","title":"connect","text":"
    connect() -> None\n

    Create a connection (or reconnect after error).

    If connection can't be established, try it again indefinitely.

    Source code in dp3/task_processing/task_queue.py
    def connect(self) -> None:\n\"\"\"Create a connection (or reconnect after error).\n    If connection can't be established, try it again indefinitely.\n    \"\"\"\nif self.connection:\nself.connection.close()\nattempts = 0\nwhile True:\nattempts += 1\ntry:\nself.connection = amqpstorm.Connection(**self.conn_params)\nself.log.debug(\n\"AMQP connection created, server: \"\n\"'{hostname}:{port}/{virtual_host}'\".format_map(self.conn_params)\n)\nif attempts > 1:\n# This was a repeated attempt, print success message with ERROR level\nself.log.error(\"... it's OK now, we're successfully connected!\")\nself.channel = self.connection.channel()\nself.channel.confirm_deliveries()\nself.channel.basic.qos(PREFETCH_COUNT)\nbreak\nexcept amqpstorm.AMQPError as e:\nsleep_time = RECONNECT_DELAYS[min(attempts, len(RECONNECT_DELAYS)) - 1]\nself.log.error(\nf\"RabbitMQ connection error (will try to reconnect in {sleep_time}s): {e}\"\n)\ntime.sleep(sleep_time)\nexcept KeyboardInterrupt:\nbreak\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter","title":"TaskQueueWriter","text":"
    TaskQueueWriter(app_name: str, workers: int = 1, rabbit_config: dict = None, exchange: str = None, priority_exchange: str = None, parent_logger: logging.Logger = None) -> None\n

    Bases: RobustAMQPConnection

    Writes tasks into main Task Queue

    Parameters:

    Name Type Description Default app_name str

    DP3 application name (used as prefix for RMQ queues and exchanges)

    required workers int

    Number of worker processes in the system

    1 rabbit_config dict

    RabbitMQ connection parameters, dict with following keys (all optional): host, port, virtual_host, username, password

    None exchange str

    Name of the exchange to write tasks to (default: \"<app-name>-main-task-exchange\")

    None priority_exchange str

    Name of the exchange to write priority tasks to (default: \"<app-name>-priority-task-exchange\")

    None parent_logger logging.Logger

    Logger to inherit prefix from.

    None Source code in dp3/task_processing/task_queue.py
    def __init__(\nself,\napp_name: str,\nworkers: int = 1,\nrabbit_config: dict = None,\nexchange: str = None,\npriority_exchange: str = None,\nparent_logger: logging.Logger = None,\n) -> None:\nrabbit_config = {} if rabbit_config is None else rabbit_config\nassert isinstance(workers, int) and workers >= 1, \"count of workers must be positive number\"\nassert isinstance(exchange, str) or exchange is None, \"exchange argument has to be string!\"\nassert (\nisinstance(priority_exchange, str) or priority_exchange is None\n), \"priority_exchange has to be string\"\nsuper().__init__(rabbit_config)\nif parent_logger is not None:\nself.log = parent_logger.getChild(\"TaskQueueWriter\")\nelse:\nself.log = logging.getLogger(\"TaskQueueWriter\")\nif exchange is None:\nexchange = DEFAULT_EXCHANGE.format(app_name)\nif priority_exchange is None:\npriority_exchange = DEFAULT_PRIORITY_EXCHANGE.format(app_name)\nself.workers = workers\nself.exchange = exchange\nself.exchange_pri = priority_exchange\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter.check","title":"check","text":"
    check() -> bool\n

    Check that needed exchanges are declared, return True or raise RuntimeError.

    If needed exchanges are not declared, reconnect and try again. (max 5 times)

    Source code in dp3/task_processing/task_queue.py
    def check(self) -> bool:\n\"\"\"\n    Check that needed exchanges are declared, return True or raise RuntimeError.\n    If needed exchanges are not declared, reconnect and try again. (max 5 times)\n    \"\"\"\nfor attempt, sleep_time in enumerate(RECONNECT_DELAYS):\nif self.check_exchange_existence(self.exchange) and self.check_exchange_existence(\nself.exchange_pri\n):\nreturn True\nself.log.warning(\n\"RabbitMQ exchange configuration doesn't match (attempt %d of %d, retrying in %ds)\",\nattempt + 1,\nlen(RECONNECT_DELAYS),\nsleep_time,\n)\ntime.sleep(sleep_time)\nself.disconnect()\nself.connect()\nif not self.check_exchange_existence(self.exchange):\nraise ExchangeNotDeclared(self.exchange)\nif not self.check_exchange_existence(self.exchange_pri):\nraise ExchangeNotDeclared(self.exchange_pri)\nreturn True\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter.broadcast_task","title":"broadcast_task","text":"
    broadcast_task(task: Task, priority: bool = False) -> None\n

    Broadcast task to all workers

    Parameters:

    Name Type Description Default task Task

    prepared task

    required priority bool

    if true, the task is placed into priority queue (should only be used internally by workers)

    False Source code in dp3/task_processing/task_queue.py
    def broadcast_task(self, task: Task, priority: bool = False) -> None:\n\"\"\"\n    Broadcast task to all workers\n    Args:\n        task: prepared task\n        priority: if true, the task is placed into priority queue\n            (should only be used internally by workers)\n    \"\"\"\nif not self.channel:\nself.connect()\nself.log.debug(f\"Received new broadcast task: {task}\")\nbody = task.as_message()\nexchange = self.exchange_pri if priority else self.exchange\nfor routing_key in range(self.workers):\nself._send_message(routing_key, exchange, body)\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueWriter.put_task","title":"put_task","text":"
    put_task(task: Task, priority: bool = False) -> None\n

    Put task (update_request) to the queue of corresponding worker

    Parameters:

    Name Type Description Default task Task

    prepared task

    required priority bool

    if true, the task is placed into priority queue (should only be used internally by workers)

    False Source code in dp3/task_processing/task_queue.py
    def put_task(self, task: Task, priority: bool = False) -> None:\n\"\"\"\n    Put task (update_request) to the queue of corresponding worker\n    Args:\n        task: prepared task\n        priority: if true, the task is placed into priority queue\n            (should only be used internally by workers)\n    \"\"\"\nif not self.channel:\nself.connect()\nself.log.debug(f\"Received new task: {task}\")\n# Prepare routing key\nbody = task.as_message()\nkey = task.routing_key()\nrouting_key = HASH(key) % self.workers  # index of the worker to send the task to\nexchange = self.exchange_pri if priority else self.exchange\nself._send_message(routing_key, exchange, body)\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader","title":"TaskQueueReader","text":"
    TaskQueueReader(callback: Callable, parse_task: Callable[[str], Task], app_name: str, worker_index: int = 0, rabbit_config: dict = None, queue: str = None, priority_queue: str = None, parent_logger: logging.Logger = None) -> None\n

    Bases: RobustAMQPConnection

    TaskQueueReader consumes messages from two RabbitMQ queues (normal and priority one for given worker) and passes them to the given callback function.

    Tasks from the priority queue are passed before the normal ones.

    Each received message must be acknowledged by calling .ack(msg_tag).

    Parameters:

    Name Type Description Default callback Callable

    Function called when a message is received, prototype: func(tag, Task)

    required parse_task Callable[[str], Task]

    Function called to parse message body into a task, prototype: func(body) -> Task

    required app_name str

    DP3 application name (used as prefix for RMQ queues and exchanges)

    required worker_index int

    index of this worker (filled into DEFAULT_QUEUE string using .format() method)

    0 rabbit_config dict

    RabbitMQ connection parameters, dict with following keys (all optional): host, port, virtual_host, username, password

    None queue str

    Name of RabbitMQ queue to read from (default: \"<app-name>-worker-<index>\")

    None priority_queue str

    Name of RabbitMQ queue to read from (priority messages) (default: \"<app-name>-worker-<index>-pri\")

    None parent_logger logging.Logger

    Logger to inherit prefix from.

    None Source code in dp3/task_processing/task_queue.py
    def __init__(\nself,\ncallback: Callable,\nparse_task: Callable[[str], Task],\napp_name: str,\nworker_index: int = 0,\nrabbit_config: dict = None,\nqueue: str = None,\npriority_queue: str = None,\nparent_logger: logging.Logger = None,\n) -> None:\nrabbit_config = {} if rabbit_config is None else rabbit_config\nassert callable(callback), \"callback must be callable object\"\nassert (\nisinstance(worker_index, int) and worker_index >= 0\n), \"worker_index must be positive number\"\nassert isinstance(queue, str) or queue is None, \"queue must be string\"\nassert (\nisinstance(priority_queue, str) or priority_queue is None\n), \"priority_queue must be string\"\nsuper().__init__(rabbit_config)\nif parent_logger is not None:\nself.log = parent_logger.getChild(\"TaskQueueReader\")\nelse:\nself.log = logging.getLogger(\"TaskQueueReader\")\nself.callback = callback\nself.parse_task = parse_task\nif queue is None:\nqueue = DEFAULT_QUEUE.format(app_name, worker_index)\nif priority_queue is None:\npriority_queue = DEFAULT_PRIORITY_QUEUE.format(app_name, worker_index)\nself.queue_name = queue\nself.priority_queue_name = priority_queue\nself.running = False\nself._consuming_thread = None\nself._processing_thread = None\n# Receive messages into 2 temporary queues\n# (max length should be equal to prefetch_count set in RabbitMQReader)\nself.cache = collections.deque()\nself.cache_pri = collections.deque()\nself.cache_full = threading.Event()  # signalize there's something in the cache\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.start","title":"start","text":"
    start() -> None\n

    Start receiving tasks.

    Source code in dp3/task_processing/task_queue.py
    def start(self) -> None:\n\"\"\"Start receiving tasks.\"\"\"\nif self.running:\nraise RuntimeError(\"Already running\")\nif not self.connection:\nself.connect()\nself.log.info(\"Starting TaskQueueReader\")\n# Start thread for message consuming from server\nself._consuming_thread = threading.Thread(None, self._consuming_thread_func)\nself._consuming_thread.start()\n# Start thread for message processing and passing to user's callback\nself.running = True\nself._processing_thread = threading.Thread(None, self._msg_processing_thread_func)\nself._processing_thread.start()\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.stop","title":"stop","text":"
    stop() -> None\n

    Stop receiving tasks.

    Source code in dp3/task_processing/task_queue.py
    def stop(self) -> None:\n\"\"\"Stop receiving tasks.\"\"\"\nif not self.running:\nraise RuntimeError(\"Not running\")\nself._stop_consuming_thread()\nself._stop_processing_thread()\nself.log.info(\"TaskQueueReader stopped\")\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.check","title":"check","text":"
    check() -> bool\n

    Check that needed queues are declared, return True or raise RuntimeError.

    If needed queues are not declared, reconnect and try again. (max 5 times)

    Source code in dp3/task_processing/task_queue.py
    def check(self) -> bool:\n\"\"\"\n    Check that needed queues are declared, return True or raise RuntimeError.\n    If needed queues are not declared, reconnect and try again. (max 5 times)\n    \"\"\"\nfor attempt, sleep_time in enumerate(RECONNECT_DELAYS):\nif self.check_queue_existence(self.queue_name) and self.check_queue_existence(\nself.priority_queue_name\n):\nreturn True\nself.log.warning(\n\"RabbitMQ queue configuration doesn't match (attempt %d of %d, retrying in %ds)\",\nattempt + 1,\nlen(RECONNECT_DELAYS),\nsleep_time,\n)\ntime.sleep(sleep_time)\nself.disconnect()\nself.connect()\nif not self.check_queue_existence(self.queue_name):\nraise QueueNotDeclared(self.queue_name)\nif not self.check_queue_existence(self.priority_queue_name):\nraise QueueNotDeclared(self.priority_queue_name)\nreturn True\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.TaskQueueReader.ack","title":"ack","text":"
    ack(msg_tag: Any)\n

    Acknowledge processing of the message/task

    Parameters:

    Name Type Description Default msg_tag Any

    Message tag received as the first param of the callback function.

    required Source code in dp3/task_processing/task_queue.py
    def ack(self, msg_tag: Any):\n\"\"\"Acknowledge processing of the message/task\n    Args:\n        msg_tag: Message tag received as the first param of the callback function.\n    \"\"\"\nself.channel.basic.ack(delivery_tag=msg_tag)\n
    "},{"location":"reference/task_processing/task_queue/#dp3.task_processing.task_queue.HASH","title":"HASH","text":"
    HASH(key: str) -> int\n

    Hash function used to distribute tasks to worker processes.

    Parameters:

    Name Type Description Default key str

    to be hashed

    required

    Returns:

    Type Description int

    last 4 bytes of MD5

    Source code in dp3/task_processing/task_queue.py
    def HASH(key: str) -> int:\n\"\"\"Hash function used to distribute tasks to worker processes.\n    Args:\n        key: to be hashed\n    Returns:\n        last 4 bytes of MD5\n    \"\"\"\nreturn int(hashlib.md5(key.encode(\"utf8\")).hexdigest()[-4:], 16)\n
    "}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 00000000..741ec82d --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,303 @@ + + + + https://cesnet.github.io/dp3/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/api/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/architecture/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/data_model/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/extending/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/install/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/modules/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/database/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/db_entities/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/event_logging/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/history_manager/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/modules/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/processing_core/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/configuration/snapshots/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/worker/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/main/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/config/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/dp_logger/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/entity_response_models/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/helpers/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/models/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/internal/response_models/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/routers/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/routers/control/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/routers/entity/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/api/routers/root/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/bin/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/bin/api/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/bin/setup/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/bin/worker/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/attrspec/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/base_attrs/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/base_module/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/callback_registrar/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/config/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/control/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/datapoint/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/datatype/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/entityspec/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/scheduler/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/task/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/common/utils/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/database/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/database/database/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/history_management/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/history_management/history_manager/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/history_management/telemetry/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/snapshots/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/snapshots/snapshooter/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/snapshots/snapshot_hooks/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/task_processing/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/task_processing/task_distributor/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/task_processing/task_executor/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/task_processing/task_hooks/ + 2023-07-14 + daily + + + https://cesnet.github.io/dp3/reference/task_processing/task_queue/ + 2023-07-14 + daily + + \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz new file mode 100644 index 00000000..f10210f0 Binary files /dev/null and b/sitemap.xml.gz differ diff --git a/stylesheets/slate.css b/stylesheets/slate.css new file mode 100644 index 00000000..d064112a --- /dev/null +++ b/stylesheets/slate.css @@ -0,0 +1,4 @@ +[data-md-color-scheme="slate"] { + --md-default-bg-color: #252632; + --md-code-bg-color: #1A1B23; +}