-
Notifications
You must be signed in to change notification settings - Fork 2
Coding a Driver for DaTtSs
The driver plays a very important role in DaTtSs as it pre-aggregates data to avoid overcharging your network and leverage HTTP reliability. A driver sends the pre-aggregated values it has received periodically, every push-period (by default 5s). These pre-aggregated values are called partial-aggregates. They are sent to DaTtSs through the Agggregation API on the PUT /agg
endpoint (see DaTtSs API Documentation) where they are indexed and accesible later on through DaTtSs website or any
other client based on DaTtSs' Access API.
This page describes how a driver works, which interface it should comply to and how it is expected to behave so that you can write your own driver and make DaTtSs available for you favorite language or environment. [Note: A driver creator and active maintener is of course given free and unlimited access to DaTtSs! A simple way for us to thank you for your work :)]
If you make a driver and feel proud about it, we want to hear about it! Feel free to make a pull-request on dattss-sdk so that it can be featured there! Feel free to inspire yourself from the drivers already featured there and don't hesitate to reach out through email [email protected] or IRC at #dattss on irc.freenode.org.
First of all a driver should provide a configuration interface so that users can globally configure the following variables:
# DEFAULT CONFIGURATION
DATTSS_AUTH_KEY=DUMMY // Global AUTH_KEY to use (by default DUMMY)
DATTSS_PUSH_PERIOD=5 // Default push period (in s)
DATTSS_PERCENTILE=0.1 // Default percentile value
DATTSS_SERVER_HOST=agg.dattss.com // Default DaTtSs server
DATTSS_SERVER_PORT=80 // Default DaTtSs port
The specific interface you provide for configuration is probably extremely dependent on the environment / language you are working with. Nevertheless, it is not acceptable to ask the user to edit your code to set their AUTH_KEY or servers. The configuration interface should must be programmatic: constructor arguments, shared or specific config files, environment variables, are all viable solutions.
As an example the NodeJS driver, reads these variable from the environment as well as the command-line arguments of the process it is bundled in.
DaTtSs statistics are organised by processes. A process is a logical group of statistics that are displayed together in DaTtSs and accessible as a group through the Access API. A driver should provide a process
method taking a process name
as single non-optional argument. The process method should accept additional 2 optional arguments: auth
and pct
. The auth
arguments lets the user override the globally configured AUTH_KEY
and the pct
argument, the globally configured percentile value.
/* types exposed by the driver */
class dattss // the driver type
class process // the process type
/* `process` method signature */
process dattss.process(
string name // the compulsory process name
[,string auth] // the optional `AUTH_KEY` for that process
[,float pct] // the optional percentile value to use
)
The process
method should follow a singleton pattern, meaning that for a given {name
, auth
} tuple, the same process object should always be returned. This is important as it prevents the user from generating a number of process objects with the same AUTH_KEY
and name
that would eventually interfere with each others.
The process object should expose these three public methods which behavior is described in the next sections:
/* `agg` method signature */
void process.agg(
string stat, // the stat name
string value // the stat value string
)
/* `start` method signature */
void process.start()
/* `stop` method signature */
void process.stop()
DaTtSs supports the capture of three types of statistics: counters, gauges and timers. Each type serves a specific purpose and is best use in specific situations. Additionally, DaTtSs differently displays each of these aggregated statistics. A statistic is captured through its value string which must have the following format:
(-?[0-9]+)(c|g|ms)(\!?)
The first regular expression capture is the value captured, the second regular expression capture is the type of the statistic and the last one is wether that statistic should be emphasized in the DaTtSs UI. The three following paragraphs describe in greater details reach statistic and how it is rendered on DaTtSs:
A counter is used to capture an incrementally varying value (or an acccumulator) that generally keeps growing such as a number of IOs, a number of requests served, a number of writes to disk or database, etc... The value increment is captured using the c
type character:
/* example capture of a counter increment [javascript] */
dts.agg('write', '1c');
dts.agg('query', '2c!'); // increments by 2
- current value display: DaTtSs displays the current value of a counter as well as the increment/s 1mn moving-average (by how much the counter growed per second in the last minute)
- plot display: When plotting counters, DaTtSs plots the sum of the increment received over the resolution period of the plot (as traffic is plotted on web analytics platforms)
A gauge is used to snapshot a value that generally do not keep growing and/or do not evolves incrementally such as the process rss memory, a cache size, a number of open connections, etc... The value itself is captured using the g
type character:
/* example capture of a gauge value [javascript] */
dts.agg('cache', '459g');
- current value display: DaTtSs displays the last value of a gauge as well as its 1mn moving-average
- plot display: When plotting gauges, DaTtSs plots the average value of the gauge over the resolution period of the plot
A timer is used to capture the time it takes to perform an action, such as a disk I/O, a call to a third-party system or API, a computation, etc... The time the action took is captured using the ms
type characters:
/* example capture of a timer [javascript] */
dts.agg('query', '45ms'); // that was quite fast
dts.agg('hdd_read', '7ms!');
- current value display: DaTtSs displays the 1mn moving average of the received timers value as well as the 1mn moving maximum and minimum values
- plot display: When plotting timers, DaTtSs plots the average value of the timers received, the maximum, the minimum, the 10% (configurable) percentile and the 90% percentile over the resolution period of the plot
A driver should provide a unique method agg
for data capture. As shown on the signature below, the agg
method takes two arguments stat
and value
. The stat
argument is the name of the statistic being captured while value
is a value string. The agg
method should verify that the value
string has an acceptable format.
/* `agg` method signature */
void process.agg(
string stat, // the stat name
string value // the stat value string
)
After receiving and validating a statistic value, a driver should generate a tuple containing the value, the date the value was captured (in ms
since epoch) and wether it is emphasized.
{ value: 123,
date: 1343059274178,
emphasis: false }
That tuple should be immediately pushed to an accumulator array for the given statistic name and type for latter partial aggregate computation.
The entire DaTtSs infrastructure is based on the concept of partial-aggregates. A partial-aggregate is a set of aggregated values relative to a statistic that represent that statistic over an arbitrary period of time and that are themselves lineary summable to represents larger periods of time. A partial aggregate is generated every push-period (5s) by a DaTtSs driver. These patial-aggregates are pushed to DaTtSs servers and re-aggregated and stored there for later reuse. A partial-aggregate must have the following JSON representation:
/* PARTIAL := */ {
typ: 'ms', // the statistic type 'c'|'g'|'ms'
nam: 'view', // the statistic name
pct: 0.1, // the percentage used for percentile calculation
sum: 123149, // the sum all received values during the push-period
cnt: 9874, // the number of values received during the push-period
max: 123, // the maximal value received during the push-period
min: 4, // the minimal value received during the push-period
lst: 15, // the last value received during the push-period
fst: 12, // the first value received during the push-period
bot: 11, // the (pct)-th percentile
top: 15, // the (1-pct)-th percentile
emp: false // should the stat be visually emphasized
};
A partial aggregate is fairly easy to generate given the accumulator array generated in the previous section for each statistic name and type. The pct
value is given by the current configuration of the driver while the typ
and nam
value are naturally given by the current statistic name and type being computed.
PCT := DRIVER_CONFIG(DATTSS_PERCENTILE)
TYP := TYPE
NAM := NAME
The sum
, cnt
, min
, max
, fst
, lst
, emp
values are calculated by looping on the accumulated values and applying the following aggregation rules.
/* initialization */
SUM := 0
CNT := 0
MAX := null
MIN := null
FST := null
LST := mull
EMP := false
/* accumulation */
for ACC in ACCUMULATOR[TYPE][NAME]:
SUM := SUM + ACC.value
CNT := CNT + 1;
MAX := (MAX || ACC.value) > ACC.value ? MAX : ACC.value
MIN := (MIN || ACC.value) < ACC.value ? MIN : ACC.value
FST := FST || ACC.value
LST := ACC.value
EMP := EMP || ACC.emphasis
Finally, the bot
and top
values are calculated as a percentile by ordering the array of accumulated values and extracting the corresponding values.
SORT_BY_VALUE(ACCUMULATOR[TYPE][NAME])
LEN := LENGTH(ACCUMULATOR[TYPE][NAME])
BIDX := MAX(MIN(CEIL(PCT * LEN), LEN-1), 0)
TIDX := MAX(MIN(ROUND((1.0 - PCT) * LEN), LEN-1), 0)
BOT := ACCUMULATOR[TYPE][NAME][BIDX].value
TOP := ACCUMULATOR[TYPE][NAME][TIDX].value
This algorithm is not valid if no value is present in the accumulator array. In that case the statistic is considered inactive and nothing should be done by the driver.
Every push-period (5s), a driver should compute the current partial-aggregate (JSON format) with the content of the accumulator for each active statistic name and type and push it to the DaTtSs servers through the PUT /agg
endpoint. Please refer to DaTtSs API Documentation for a complete description of the Aggregation API. Once the partial-aggregates are pushed, all the accumulators arrays should be emptied.
- If an error occur during the computation of the partial-aggregate or the commit to DaTtSs servers, the driver is expected to fail silently.
- A driver is expected not to commit anything if a statistic has been inactive during the last push-period.
Since the driver periodically performs an aggregation calculation and push to the server it also provides a a start an stop interface so that the client software can disable the driver when needed.
/* `start` method signature */
void process.start()
/* `stop` method signature */
void process.stop()
When a driver is stopped, all periodical activity should be disable and call to the agg
interface should be ignored. Starting the driver again should return the driver to its normal state, restarting the periodical partial aggregation and push to the server.
You should have all the information you need to code your own driver or relay server for DaTtSs! Please do not hesitate to come chat on #dattss on freenode or reach us by email at [email protected]!