Skip to content

Latest commit

 

History

History
539 lines (351 loc) · 16.7 KB

NOTES

File metadata and controls

539 lines (351 loc) · 16.7 KB

Opening a Database

All database operations are performed on an open database, which is represented by a database handle.

Here’s an example:

rc = tsdb_open(path,
               &handler,
               &num_values_per_entry,
               rrd_slot_time_duration,
               read_only)

path is the path to the database.

handler is the reference to the database. This can be use in subsequent calls.

num_values_per_entry specifies the number of values per entry when the database is created. If the database already exists, it’s set to the number used when the database was created. Once set, it cannot be changed.

rrd_slot_time_duration specifies the number of seconds per time slot. E.g. a value of 60 means that each slot in the database covers 60 seconds.

Setting the Current Epoch

Use tsdb_goto_epoch to set the current epoch. This is used to read time series per key using tsdb_get.

If load_on_demand is set to 1, the operation will always succeed. If set to 0, the operation will fail if the epoch doesn’t exist.

This operation has the substantial side effect of loading all of the time series for that specified epoch into memory. As time series are stored in one or more compressed fragments, this operation is not cheap.

The advantage is that all of the time series for the epoch are available via a quick mapping of key to index offset.

Setting Values

There are three operations involved in setting time series values:

  • Open a database (tsdb_open)
  • Go to an epoch (tsdb_goto_epoch)
  • Set the value (tsdb_set)

Indexes

Keys are associated with indexes.

The index for a key depends on the epoch. tsdb supports multiple indexes per key – I suspect because at some point, the chunk, which is a fixed size, will fill up (all available indexes will be used) and in order to get an index, tsdb will create a new map of indexes. The new map will start at the current epoch and will remain in effect until all of the available indexes are used.

What is the point of rsv-INDEX?

An index mapping is represented by a “map-INDEX” entry in the db.

The mapping consists of

  • epoch_start, which is set to the current chunk epoch
  • epoch_end, which is set to 0, indicating that the mapping is not bounded
  • the index value

When a new index is created, it’s also reserved by way of a “rsv-INDEX” entry in the db.

Indexes aren’t reused – they keep growing as needed for each key. So they must be “rem’d” to the range of the value array. What then of chunks?

As more keys are needed for an epoch, chunks are added.

From what I can tell (suspect) as time goes by and keys change, we’ll have chunks whose sole purppose is to maintain a consistent index offset.

While there’s support for mapping different indexes to different epoch ranges, it doesn’t appear that it’s even possible to reset/change an index for a key.

The more I look, the more it looks like the whole mapping scheme is misguided:

  • At no point is the map expanded to a new epoch range
  • If a key comes along that has an index that’s beyond the current epoch’s range, a new chunk is added – values are always accommodated by making room rather than creating a new mapping

The scheme looks like it could deteriorate over time – indexes that land past the previous chunks will cause the chunk size for new epochs to grow. There ought to be a way to reuse indexes for new epochs to avoid this. So e.g. the first 10K indexes could be phased out by a another 10K indexes and have new epochs still use only 1 chunk.

How would this work?

  • You’d assign indexes up to num_values (10K)
  • At num_values + 1, the index would reset to 1
  • You’d mark the epoch end for all indexes that the current epoch

This would reset the index counter. Old indexes would be preserved because they’re tied to the old epoch range. New indexes would be assigned based on the current index counter and be associated with a new mapping entry.

But rather than specify explicit start and stop epochsin each map, it’d be better to associate the map with a range generation. E.g. an index map would look like this:

typedef struct {
  u_int8_t generation;
  u_int32_t hash_idx;
} tsdb_hash_mapping;

The generation would be associated with a range.

There’d be a current generation, which would be used for new maps.

Map entries would be added in LIFO order, so that the most recent maps occurred first in the array.

The generation ranges could be stored in an array, and referenced by the generation value. This would make bounds checking very fast.

There’d be no reason to use idx-INDEX entries – you’d rely strictly on the index counter for the current map generation.

Purging Old Data

I’m slightly inclined to not support data deletions (it’s not supported now anyway).

I think it’d be easier to push the bruden on multi DB queries.

What’s the cost of querying two or more databases to collect and consolidate values? Using the API as it is today, it would appear that there’s not much cost – values are retrieved one key at a time, each at a current epoch.

Using multiple DBs could have a performance increase, as operations could be run in parallel.

But what if we wanted to purge old values?

One super easy method is to simply delete the epoch entries. This would clear out the majority of the values.

That would leave cleaning indexes up – the map-INDEX entries would continue to expand – there’d be no easy way to reclaim old indexes (i.e. indexes that are defunct because there are no values associated with them), nor any way to reset the index counter.

However, for use cases where indexes don’t grow beyond the 10K limit (or whatever), dropping old epochs is a very easy way to reclaim space.

Questions

load_page_on_demand

When using load_page_on_demand=1 in tsdb_goto_epoch, map_raw_get always fails. For the time being, using load_page_on_demand=0.

Missing Values

The default_unknown_value for a handler is used as the resp in tsdb_get as expected, but tsdb_get returns -1, suggesting that the get failed. Is -1 explicitly different from a general error, indicating that the default value was used? It seems to me that default value might not be needed, if we can infer that -1 mean “not set”.

Multiple Values

What are some use cases for storing more than one value per key per slot? E.g. I could see this for double precision values.

Open API Inconsistencies

The open API is a bit odd I think:

  • On create num_values_per_entry and rrd_slot_time_duration are both sensibly used to initialize the database
  • On open, num_values_per_entry is ignored as input and is instread written to as an output variable, however rrd_slot_time_duration is neither read as input nor written to as output

I’d expect something like this:

  • Both args are used as input/output
  • There’s some indication as to whether or not the db was created or not

Or

  • Both args are only input and are ignored on open
  • User needs to read from the handler to get the real values

Current Vision

Same performance characteristics (storage, retrieval, space) of current tsdb, but with tag based queries.

Bugs

test-simple fails sometimes, related to tracing

Steps to reproduce:

  • Force a chunk growth with test-simple

The test typically fails with:

main Line 75 Expected 1000 but was 0

Ways to make the test pass:

  • Set log level below warnings
  • Comment out the trace msg
  • trace something before the test is run (e.g. at the top of test_simple:main)
  • Comment out this line in tsdb_trace.c:

    strftime(theDate, 32, “%d/%b/%Y %H:%M:%S”, localtime(&theTime));

This very odd behavior feels like a race condition, but on what I have no idea.

Next Steps

Clean up source

Source code could use some reformatting:

  • 79 max line length
  • Improved white spacing
  • Misc cleanup

traceEvent to trace_info, etc

Misc refactor

db_set -> db_put

lowerCamelCase to standard

consistent names

  • hash/hash_index -> index
  • name/hash_name -> key

Shorter names

  • num_xxx -> xxx
  • xxx_mode -> xxx
  • alive_and_kicking -> open

Refactor map-XXX

Rename to key-XXX

Store only the index

Drop index map

Refactor rsv-XXX

This is pointless. We can use the next counter authoritatively.

Refactor normalize_epoch

Should be a void.

Possibly simplify goto_epoch options

#ExistsOptionValueCreateLoadRetWritableScrap
1nocreate0non/a-1no
ondemand0
3nocreate1yesyes0yes
ondemand0
5nocreate0nono0no
ondemand1
7nocreate1yesno0yes
ondemand1
2yescreate0n/ayes0yesy
ondemand0
4yescreate1n/ayes0yesy
ondemand0
6yescreate0n/ano0yesy
ondemand1
8yescreate1n/ano0yesy
ondemand1
  1. Fail if the epoch isn’t there. This is useful for checking if an epoch exists. Could be handled with fail_if_missing=1 flag.
  2. Actively load the epoch if it’s there.

    if epoch found: load current epoch else: if error_on_missing flag: return error else: set current epoch as missing, create/load on demand on first write return success

Proposed API:

tsdb_goto_epoch(db, epoch, fail_on_missing, growable)

get_key_offset -> prepare_read_write

Lookup by index

Provide an API for looking up epoch values using the epoch array offset directly, rather than dereference it each time:

tsdb_get_by_index(db, index)

This would require a corresponding function for getting a key’s array offset:

tsdb_get_key_index(db, key)

We’d use this for range queries of a particular key by looking up the index first, then using it for each epoch.

More direct index support

Return index used with tsdb_set

We might as well return this value, as it’s convenient. It can be used for followup index related operations, e.g. adding tags via index.

add_tag_for_index

We’ll have the index when we set a value (see previous bullet) and can followup by adding applicable tags for it.

If we maintained an index of uncompressed tags, this update operation would be 0(1) as it’s a bit set operation on the array index.

If we’re lazy and reload the tag array on each operation, it’s still O(1) for the insert, but the disk/db load cost (probably) overwhelms that savings.

Consider: tag array cache

There will be a number of common tags that should be cached, saving the compress/decompress cycle and taking advantage of the O(1) cost of setting an index for the tag.

This looks like a simple starting point for a LRU cache we could use for uncompressed tag arrays:

http://www.geeksforgeeks.org/implement-lru-cache/

Sure up read only mode

Need some tests for read only mode.

Test growable = false and new keys

Exit on impossible conditions

malloc, e.g. should cause an immediate exit.

Fix tracing

I’m not seeing value from the tracing facility so far. If it continues to be more distration than helpful, nuke it.

Tags

Support a new index type: a tab.

The function would look like this:

tsdb_add_key_tag(db, key, tag)

E.g.

tsdb_add_key_tag(db, “test-1”, “metric=test”) tsdb_add_key_tag(db, “test-2”, “server=2”) tsdb_add_key_tag(db, “test-3”, “size=small”)

For each tag, ensure a tag-XXX entry in the database.

Each tax-xxx db value would be an array of unsorted, unique key indexes.

Each tag-XXX element would be an array. Each element in the array would be a 0 or 1. 1 indicates that the tag applies to the epoch values at the correponding index. E.g. if position 0 of the tag array contained a 1, it would indicate that the tag applied to epoch values at position 1.

For example, we’ll start by adding a value to an empty database:

tsdb_goto_epoch(db, 60, 0, 0) tsdb_set(db, “test-1”, 100)

We have in our db the following:

key-test-1: 0 60-0: [ 56 ]

And another key:

tsdb_set(db, “test-2”, 200)

Our db:

key-test-1: 0 key-test-2: 1 1-0: [ 100, 200 ]

Let’s tag a key:

tsdb_add_key_tag(db, “test-1”, “metric=test”)

Our db has a new value:

tag-metric=test: [ 1 ]

This indicates that the epoch values stored at index 0 (i.e. key test-1) have the tag “metric=test”.

We’ll do the same for test-2:

tsdb_add_key_tag(db, “test-2”, “metric=test”)

Now our tag element looks like this:

tag-metric=test: [ 1, 1 ]

Some more tags:

tsdb_add_key_tag(db, “test-1”, “server=1”) tsdb_add_key_tag(db, “test-1”, “size=small”) tsdb_add_key_tag(db, “test-2”, “server=2”) tsdb_add_key_tag(db, “test-2”, “size=large”)

Here’s the full listing of all tag elements:

tag-metric=test: [ 1, 1 ] tag-size=small: [ 1, 0 ] tag-size=large: [ 0, 1 ] tag-server=1: [ 1, 0 ] tag-server=2: [ 0, 1 ]

Here’s a query for metric=test & size=small:

“intersection of metric=test and size=small”

This is how we’d answer the query:

  • Load the arrays for the tags metric=test and size=small
  • Apply a logic AND operation for each array cell, up to the shorted array
  • Return values for epoch positions for each resulting 1

E.g.

tag-metric=test: [ 1, 1 ] tag-size=small: [ 1, 0 ] AND: [ 1, 0 ]

We can implement a similar function for the union using OR, e.g.

“union of metric=test and size=small”

tag-metric=test: [ 1, 1 ] tag-size=small: [ 1, 0 ] OR: [ 1, 1 ]

More

We can implement NOT operations, e.g.

tsdb_get_by_tag_not_intersection(db, [“metric=cpu”, “server_size=small”]) tsdb_get_by_tag_not_union(db, [“metric=cpu”, “server_size=small”])

By applying a NOT to the result.

The most flexible scheme would be a simple query language.

Here’s a union of 2 tags. Each tag is preceded by its byte length.

tsdb_query(db, [ “u” 2 10 “metric=cpu” 17 “server_size=small”])

Let’s say I want all of the cpu metrics for servers that aren’t small:

tsdb_query(db, [ “i” 2 “t” 10 “metric=cpu” “n” “t” 17 “server_size=small” ])

Here’s the key we’re using so far:

i = intersection u = union n = not t = tag

So we’d implement this query in these steps:

  1. Get the epoch indexes for the tag metric=cpu

    tag-metric=cpu: [ 1, 1 ]

  2. Get the epoch indexes for the tag server_size=small

    tag-server_size=small: [ 1, 0 ]

  3. Apply NOT to the result from step 2

    NOT tag-server_size=small: [ 0, 1 ]

  4. Apply AND to the results from steps 1 and 3

    tag-metric=cpu: [ 1, 1 ] NOT tag-server_size=small: [ 0, 1 ] AND: [ 0, 1 ]

We could obviously use bit arrays, which would reduce our space requirements for tagging by 32x. This at some minor code complexity.

We could minimize the storage and IO requirements for these tag indexes further using compression, as is done currently with epoch values.

A possible performance problem with tags is that they have to be read fully from the database, updated, and then stored again. If a tag array didn’t have to be read from the database, it could be updated in constant time.

However, before we worry too much about performance, we should consider the caching that occurs in the database. It may be pointless to cache tag arrays if they’re already being cached by the database. As long as we’re not compressing the array, it might be the case that we have direct access to the database’s value in memory.