Skip to content

Latest commit

 

History

History
243 lines (189 loc) · 7.77 KB

PARSING_DATA.md

File metadata and controls

243 lines (189 loc) · 7.77 KB

Parsing Data

Telegraf has the ability to take data in a variety of formats. Telegraf requires configuration from the user in order to correctly parse, store, and send the original data. Telegraf does not take the raw data and maintain it internally.

Telegraf uses an internal metric representation consisting of the metric name, tags, fields and a timestamp, very similar to line protocol. This means that data needs to be broken up into a metric name, tags, fields, and a timestamp. While none of these options are required, they are available to the user and might be necessary to ensure the data is represented correctly.

Parsers

The first step is to determine which parser to use. Look at the list of parsers and find one that will work with the user's data. This is generally straightforward as the data-type will only have one parser that is actually applicable to the data.

JSON parsers

There is an exception when it comes to JSON data. Instead of a single parser, there are three different parsers capable of reading JSON data:

  • json: This parser is great for flat JSON data. If the JSON is more complex and for example, has other objects or nested arrays, then do not use this and look at the other two options.
  • json_v2: The v2 parser was created out of the need to parse JSON objects. It can take on more advanced cases, at the cost of additional configuration.
  • xpath_json: The xpath parser is the most capable of the three options. While the xpath name may imply XML data, it can parse a variety of data types using XPath expressions.

Tags and fields

The next step is to look at the data and determine how the data needs to be split up between tags and fields. Tags are generally strings or values that a user will want to search on. While fields are the raw data values, numeric types, etc. Generally, data is considered to be a field unless otherwise specified as a tag.

Timestamp

To parse a timestamp, at the very least the users needs to specify which field has the timestamp and what the format of the timestamp is. The format can either be a predefined Unix timestamp or parsed using a custom format based on Go reference time.

For Unix timestamps Telegraf understands the following settings:

Timestamp Timestamp Format
1709572232 unix
1709572232123 unix_ms
1709572232123456 unix_us
1709572232123456789 unix_ns

There are some named formats available as well:

Timestamp Named Format
Mon Jan _2 15:04:05 2006 ANSIC
Mon Jan _2 15:04:05 MST 2006 UnixDate
Mon Jan 02 15:04:05 -0700 2006 RubyDate
02 Jan 06 15:04 MST RFC822
02 Jan 06 15:04 -0700 RFC822Z
Monday, 02-Jan-06 15:04:05 MST RFC850
Mon, 02 Jan 2006 15:04:05 MST RFC1123
Mon, 02 Jan 2006 15:04:05 -0700 RFC1123Z
2006-01-02T15:04:05Z07:00 RFC3339
2006-01-02T15:04:05.999999999Z07:00 RFC3339Nano
Jan _2 15:04:05 Stamp
Jan _2 15:04:05.000 StampMilli
Jan _2 15:04:05.000000 StampMicro
Jan _2 15:04:05.000000000 StampNano

If the timestamp does not conform to any of the above, then the user can specify a custom timestamp format, in which the user must provide the timestamp in Go reference time notation. Here are a few example timestamps and their Go reference time equivalent:

Timestamp Go reference time
2024-03-04T17:10:32 2006-01-02T15:04:05
04 Mar 24 10:10 -0700 02 Jan 06 15:04 -0700
2024-03-04T10:10:32Z07:00 2006-01-02T15:04:05Z07:00
2024-03-04 17:10:32.123+00 2006-01-02 15:04:05.999+00
2024-03-04T10:10:32.123456Z 2006-01-02T15:04:05.000000Z
2024-03-04T10:10:32.123456Z 2006-01-02T15:04:05.999999999Z

Note for fractional second values, the user can use either a 9 or 0. Using a 0 forces a certain length, but using 9s do not.

Please note, that timezone abbreviations are ambiguous! For example MST, can stand for either Mountain Standard Time (UTC-07) or Malaysia Standard Time (UTC+08). As such, avoid abbreviated timezones if possible.

Unix timestamps use UTC, there is no concept of a timezone for a Unix timestamp.

Examples

Below are a few basic examples to get users started.

CSV

Given the following data:

node,temp,humidity,alarm,time
node1,32.3,23,false,2023-03-06T16:52:23Z
node2,22.6,44,false,2023-03-06T16:52:23Z
node3,17.9,56,true,2023-03-06T16:52:23Z

Here is corresponding parser configuration and result:

[[inputs.file]]
files = ["test.csv"]
data_format = "csv"

csv_header_row_count = 1
csv_column_names = ["node","temp","humidity","alarm","time"]
csv_tag_columns = ["node"]
csv_timestamp_column = "time"
csv_timestamp_format = "2006-01-02T15:04:05Z"
file,node=node1 temp=32.3,humidity=23i,alarm=false 1678121543000000000
file,node=node2 temp=22.6,humidity=44i,alarm=false 1678121543000000000
file,node=node3 temp=17.9,humidity=56i,alarm=true 1678121543000000000

JSON flat data

Given the following data:

{ "node": "node", "temp": 32.3, "humidity": 23, "alarm": false, "time": "1709572232123456789"}

Here is corresponding parser configuration:

[[inputs.file]]
files = ["test.json"]
precision = "1ns"
data_format = "json"

tag_keys = ["node"]
json_time_key = "time"
json_time_format = "unix_ns"
file,node=node temp=32.3,humidity=23 1709572232123456789

JSON Objects

Given the following data:

{
    "metrics": [
        { "node": "node1", "temp": 32.3, "humidity": 23, "alarm": "false", "time": "1678121543"},
        { "node": "node2", "temp": 22.6, "humidity": 44, "alarm": "false", "time": "1678121543"},
        { "node": "node3", "temp": 17.9, "humidity": 56, "alarm": "true", "time": "1678121543"}
    ]
}

Here is corresponding parser configuration:

[[inputs.file]]
files = ["test.json"]
data_format = "json_v2"

[[inputs.file.json_v2]]
[[inputs.file.json_v2.object]]
  path = "metrics"
  timestamp_key = "time"
  timestamp_format = "unix"
  [[inputs.file.json_v2.object.tag]]
    path = "#.node"
  [[inputs.file.json_v2.object.field]]
    path = "#.temp"
    type = "float"
  [[inputs.file.json_v2.object.field]]
    path = "#.humidity"
    type = "int"
  [[inputs.file.json_v2.object.field]]
    path = "#.alarm"
    type = "bool"
file,node=node1 temp=32.3,humidity=23i,alarm=false 1678121543000000000
file,node=node2 temp=22.6,humidity=44i,alarm=false 1678121543000000000
file,node=node3 temp=17.9,humidity=56i,alarm=true 1678121543000000000

JSON Line Protocol

Given the following data:

{
  "fields": {"temp": 32.3, "humidity": 23, "alarm": false},
  "name": "measurement",
  "tags": {"node": "node1"},
  "time": "2024-03-04T10:10:32.123456Z"
}

Here is corresponding parser configuration:

[[inputs.file]]
files = ["test.json"]
precision = "1us"
data_format = "xpath_json"

[[inputs.file.xpath]]
  metric_name = "/name"
  field_selection = "fields/*"
  tag_selection = "tags/*"
  timestamp = "/time"
  timestamp_format = "2006-01-02T15:04:05.999999999Z"
measurement,node=node1 alarm="false",humidity="23",temp="32.3" 1709547032123456000