Skip to content

Coding a Driver for DaTtSs

spolu edited this page Jul 24, 2012 · 5 revisions

Coding a Driver for DaTtSs

The driver plays a very important role in DaTtSs as it pre-aggregates data to avoid overcharging your network and leverage HTTP reliability. A driver sends the pre-aggregated values it has received periodically, every push-period (by default 5s). These pre-aggregated values are called partial-aggregates. They are sent to DaTtSs through the Agggregation API on the PUT /agg endpoint (see DaTtSs API Documentation) where they are indexed and accesible later on through DaTtSs website or any other client based on DaTtSs' Access API.

This page describes how a driver works, which interface it should comply to and how it is expected to behave so that you can write your own driver and make DaTtSs available for you favorite language or environment. [Note: A driver creator and active maintener is of course given free and unlimited access to DaTtSs! A simple way for us to thank you for your work :)]

If you make a driver and feel proud about it, we want to hear about it! Feel free to make a pull-request on dattss-sdk so that it can be featured there! Feel free to inspire yourself from the drivers already featured there and don't hesitate to reach out through email [email protected] or IRC at #dattss on irc.freenode.org.

Driver Configuration

First of all a driver should provide a configuration interface so that users can globally configure the following variables:

# DEFAULT CONFIGURATION

DATTSS_AUTH_KEY=DUMMY               // Global AUTH_KEY to use (by default DUMMY)
DATTSS_PUSH_PERIOD=5                // Default push period (in s)
DATTSS_PERCENTILE=0.1               // Default percentile value

DATTSS_SERVER_HOST=agg.dattss.com   // Default DaTtSs server 
DATTSS_SERVER_PORT=80               // Default DaTtSs port 

The specific interface you provide for configuration is probably extremely dependent on the environment / language you are working with. Nevertheless, it is not acceptable to ask the user to edit your code to set their AUTH_KEY or servers. The configuration interface should must be programmatic: constructor arguments, shared or specific config files, environment variables, are all viable solutions.

As an example the NodeJS driver, reads these variable from the environment as well as the command-line arguments of the process it is bundled in.

The Process Singleton

DaTtSs statistics are organised by processes. A process is a logical group of statistics that are displayed together in DaTtSs and accessible as a group through the Access API. A driver should provide a process method taking a process name as single non-optional argument. The process method should accept additional 2 optional arguments: auth and pct. The auth arguments lets the user override the globally configured AUTH_KEY and the pct argument, the globally configured percentile value.

/* types exposed by the driver */

class dattss    // the driver type
class process   // the process type

/* `process` method signature */

process dattss.process(
                  string name         // the compulsory process name
                [,string auth]        // the optional `AUTH_KEY` for that process
                [,float pct]          // the optional percentile value to use
             )

The process method should follow a singleton pattern, meaning that for a given {name, auth} tuple, the same process object should always be returned. This is important as it prevents the user from generating a number of process objects with the same AUTH_KEY and name that would eventually interfere with each others.

The process object should expose these three public methods which behavior is described in the next sections:

/* `agg` method signature */

void process.agg(
               string stat,           // the stat name
               string value           // the stat value string
            )


/* `start` method signature */

void process.start()


/* `stop` method signature */

void process.stop()

Statistic Capture Interfaces

DaTtSs supports the capture of three types of statistics: counters, gauges and timers. Each type serves a specific purpose and is best use in specific situations. Additionally, DaTtSs differently displays each of these aggregated statistics. A statistic is captured through its value string which must have the following format:

(-?[0-9]+)(c|g|ms)(\!?)

The first regular expression capture is the value captured, the second regular expression capture is the type of the statistic and the last one is wether that statistic should be emphasized in the DaTtSs UI. The three following paragraphs describe in greater details reach statistic and how it is rendered on DaTtSs:

Counters

A counter is used to capture an incrementally varying value (or an acccumulator) that generally keeps growing such as a number of IOs, a number of requests served, a number of writes to disk or database, etc... The value increment is captured using the c type character:

/* example capture of a counter increment [javascript] */
dts.agg('write', '1c');
dts.agg('query', '2c!');   // increments by 2
  • current value display: DaTtSs displays the current value of a counter as well as the increment/s 1mn moving-average (by how much the counter growed per second in the last minute)
  • plot display: When plotting counters, DaTtSs plots the sum of the increment received over the resolution period of the plot (as traffic is plotted on web analytics platforms)

Gaugues

A gauge is used to snapshot a value that generally do not keep growing and/or do not evolves incrementally such as the process rss memory, a cache size, a number of open connections, etc... The value itself is captured using the g type character:

/* example capture of a gauge value [javascript] */
dts.agg('cache', '459g');
  • current value display: DaTtSs displays the last value of a gauge as well as its 1mn moving-average
  • plot display: When plotting gauges, DaTtSs plots the average value of the gauge over the resolution period of the plot

Timers

A timer is used to capture the time it takes to perform an action, such as a disk I/O, a call to a third-party system or API, a computation, etc... The time the action took is captured using the ms type characters:

/* example capture of a timer [javascript] */
dts.agg('query', '45ms');     // that was quite fast
dts.agg('hdd_read', '7ms!');
  • current value display: DaTtSs displays the 1mn moving average of the received timers value as well as the 1mn moving maximum and minimum values
  • plot display: When plotting timers, DaTtSs plots the average value of the timers received, the maximum, the minimum, the 10% (configurable) percentile and the 90% percentile over the resolution period of the plot

Driver Interface

A driver should provide a unique method agg for data capture. As shown on the signature below, the agg method takes two arguments stat and value. The stat argument is the name of the statistic being captured while value is a value string. The agg method should verify that the value string has an acceptable format.

/* `agg` method signature */

void process.agg(
               string stat,           // the stat name
               string value           // the stat value string
            )

After receiving and validating a statistic value, a driver should generate a tuple containing the value, the date the value was captured (in ms since epoch) and wether it is emphasized.

{ value: 123,
  date: 1343059274178,
  emphasis: false }

That tuple should be immediately pushed to an accumulator array for the given statistic name and type for latter partial aggregate computation.

Partial Aggregates Generation

The entire DaTtSs infrastructure is based on the concept of partial-aggregates. A partial-aggregate is a set of aggregated values relative to a statistic that represent that statistic over an arbitrary period of time and that are themselves lineary summable to represents larger periods of time. A partial aggregate is generated every push-period (5s) by a DaTtSs driver. These patial-aggregates are pushed to DaTtSs servers and re-aggregated and stored there for later reuse. A partial-aggregate must have the following JSON representation:

/* PARTIAL := */ { 
  typ: 'ms',         // the statistic type 'c'|'g'|'ms'
  nam: 'view',       // the statistic name
  pct: 0.1,          // the percentage used for percentile calculation
  sum: 123149,       // the sum all received values during the push-period
  cnt: 9874,         // the number of values received during the push-period
  max: 123,          // the maximal value received during the push-period
  min: 4,            // the minimal value received during the push-period 
  lst: 15,           // the last value received during the push-period
  fst: 12,           // the first value received during the push-period
  bot: 11,           // the (pct)-th percentile
  top: 15,           // the (1-pct)-th percentile
  emp: false         // should the stat be visually emphasized
};

A partial aggregate is fairly easy to generate given the accumulator array generated in the previous section for each statistic name and type. The pct value is given by the current configuration of the driver while the typ and nam value are naturally given by the current statistic name and type being computed.

PCT := DRIVER_CONFIG(DATTSS_PERCENTILE)
TYP := TYPE
NAM := NAME

The sum, cnt, min, max, fst, lst, emp values are calculated by looping on the accumulated values and applying the following aggregation rules.

/* initialization */

SUM := 0
CNT := 0
MAX := null
MIN := null
FST := null
LST := mull
EMP := false

/* accumulation */

for ACC in ACCUMULATOR[TYPE][NAME]:

  SUM := SUM + ACC.value
  CNT := CNT + 1;
  MAX := (MAX || ACC.value) > ACC.value ? MAX : ACC.value
  MIN := (MIN || ACC.value) < ACC.value ? MIN : ACC.value
  FST := FST || ACC.value
  LST := ACC.value
  EMP := EMP || ACC.emphasis

Finally, the bot and top values are calculated as a percentile by ordering the array of accumulated values and extracting the corresponding values.

SORT_BY_VALUE(ACCUMULATOR[TYPE][NAME])
LEN := LENGTH(ACCUMULATOR[TYPE][NAME])
BIDX := MAX(MIN(CEIL(PCT * LEN), LEN-1), 0)
TIDX := MAX(MIN(ROUND((1.0 - PCT) * LEN), LEN-1), 0)

BOT := ACCUMULATOR[TYPE][NAME][BIDX].value
TOP := ACCUMULATOR[TYPE][NAME][TIDX].value

This algorithm is not valid if no value is present in the accumulator array. In that case the statistic is considered inactive and nothing should be done by the driver.

Periodical Commit

Every push-period (5s), a driver should compute the current partial-aggregate (JSON format) with the content of the accumulator for each active statistic name and type and push it to the DaTtSs servers through the PUT /agg endpoint. Please refer to DaTtSs API Documentation for a complete description of the Aggregation API. Once the partial-aggregates are pushed, all the accumulators arrays should be emptied.

Expected Behaviour

  • If an error occur during the computation of the partial-aggregate or the commit to DaTtSs servers, the driver is expected to fail silently.
  • A driver is expected not to commit anything if a statistic has been inactive during the last push-period.

Start/Stop

Since the driver periodically performs an aggregation calculation and push to the server it also provides a a start an stop interface so that the client software can disable the driver when needed.

/* `start` method signature */

void process.start()


/* `stop` method signature */

void process.stop()

When a driver is stopped, all periodical activity should be disable and call to the agg interface should be ignored. Starting the driver again should return the driver to its normal state, restarting the periodical partial aggregation and push to the server.

That's all folks

You should have all the information you need to code your own driver or relay server for DaTtSs! Please do not hesitate to come chat on #dattss on freenode or reach us by email at [email protected]!