Opening a Database

All database operations are performed on an open database, which is represented by a database handle.

Here’s an example:

rc = tsdb_open(path,
               &handler,
               &num_values_per_entry,
               rrd_slot_time_duration,
               read_only)

path is the path to the database.

handler is the reference to the database. This can be use in subsequent calls.

num_values_per_entry specifies the number of values per entry when the database is created. If the database already exists, it’s set to the number used when the database was created. Once set, it cannot be changed.

rrd_slot_time_duration specifies the number of seconds per time slot. E.g. a value of 60 means that each slot in the database covers 60 seconds.

Setting the Current Epoch

Use tsdb_goto_epoch to set the current epoch. This is used to read time series per key using tsdb_get.

If load_on_demand is set to 1, the operation will always succeed. If set to 0, the operation will fail if the epoch doesn’t exist.

This operation has the substantial side effect of loading all of the time series for that specified epoch into memory. As time series are stored in one or more compressed fragments, this operation is not cheap.

The advantage is that all of the time series for the epoch are available via a quick mapping of key to index offset.

Setting Values

There are three operations involved in setting time series values:

Open a database (tsdb_open)
Go to an epoch (tsdb_goto_epoch)
Set the value (tsdb_set)

Indexes

Keys are associated with indexes.

The index for a key depends on the epoch. tsdb supports multiple indexes per key – I suspect because at some point, the chunk, which is a fixed size, will fill up (all available indexes will be used) and in order to get an index, tsdb will create a new map of indexes. The new map will start at the current epoch and will remain in effect until all of the available indexes are used.

What is the point of rsv-INDEX?

An index mapping is represented by a “map-INDEX” entry in the db.

The mapping consists of

epoch_start, which is set to the current chunk epoch
epoch_end, which is set to 0, indicating that the mapping is not bounded
the index value

When a new index is created, it’s also reserved by way of a “rsv-INDEX” entry in the db.

Indexes aren’t reused – they keep growing as needed for each key. So they must be “rem’d” to the range of the value array. What then of chunks?

As more keys are needed for an epoch, chunks are added.

From what I can tell (suspect) as time goes by and keys change, we’ll have chunks whose sole purppose is to maintain a consistent index offset.

While there’s support for mapping different indexes to different epoch ranges, it doesn’t appear that it’s even possible to reset/change an index for a key.

The more I look, the more it looks like the whole mapping scheme is misguided:

At no point is the map expanded to a new epoch range
If a key comes along that has an index that’s beyond the current epoch’s range, a new chunk is added – values are always accommodated by making room rather than creating a new mapping

The scheme looks like it could deteriorate over time – indexes that land past the previous chunks will cause the chunk size for new epochs to grow. There ought to be a way to reuse indexes for new epochs to avoid this. So e.g. the first 10K indexes could be phased out by a another 10K indexes and have new epochs still use only 1 chunk.

How would this work?

You’d assign indexes up to num_values (10K)
At num_values + 1, the index would reset to 1
You’d mark the epoch end for all indexes that the current epoch

This would reset the index counter. Old indexes would be preserved because they’re tied to the old epoch range. New indexes would be assigned based on the current index counter and be associated with a new mapping entry.

But rather than specify explicit start and stop epochsin each map, it’d be better to associate the map with a range generation. E.g. an index map would look like this:

typedef struct {
  u_int8_t generation;
  u_int32_t hash_idx;
} tsdb_hash_mapping;

The generation would be associated with a range.

There’d be a current generation, which would be used for new maps.

Map entries would be added in LIFO order, so that the most recent maps occurred first in the array.

The generation ranges could be stored in an array, and referenced by the generation value. This would make bounds checking very fast.

There’d be no reason to use idx-INDEX entries – you’d rely strictly on the index counter for the current map generation.

Purging Old Data

I’m slightly inclined to not support data deletions (it’s not supported now anyway).

I think it’d be easier to push the bruden on multi DB queries.

What’s the cost of querying two or more databases to collect and consolidate values? Using the API as it is today, it would appear that there’s not much cost – values are retrieved one key at a time, each at a current epoch.

Using multiple DBs could have a performance increase, as operations could be run in parallel.

But what if we wanted to purge old values?

One super easy method is to simply delete the epoch entries. This would clear out the majority of the values.

That would leave cleaning indexes up – the map-INDEX entries would continue to expand – there’d be no easy way to reclaim old indexes (i.e. indexes that are defunct because there are no values associated with them), nor any way to reset the index counter.

However, for use cases where indexes don’t grow beyond the 10K limit (or whatever), dropping old epochs is a very easy way to reclaim space.

Questions

load_page_on_demand

When using load_page_on_demand=1 in tsdb_goto_epoch, map_raw_get always fails. For the time being, using load_page_on_demand=0.

Missing Values

The default_unknown_value for a handler is used as the resp in tsdb_get as expected, but tsdb_get returns -1, suggesting that the get failed. Is -1 explicitly different from a general error, indicating that the default value was used? It seems to me that default value might not be needed, if we can infer that -1 mean “not set”.

Multiple Values

What are some use cases for storing more than one value per key per slot? E.g. I could see this for double precision values.

Open API Inconsistencies

The open API is a bit odd I think:

On create num_values_per_entry and rrd_slot_time_duration are both sensibly used to initialize the database
On open, num_values_per_entry is ignored as input and is instread written to as an output variable, however rrd_slot_time_duration is neither read as input nor written to as output

I’d expect something like this:

Both args are used as input/output
There’s some indication as to whether or not the db was created or not

Or

Both args are only input and are ignored on open
User needs to read from the handler to get the real values

Current Vision

Same performance characteristics (storage, retrieval, space) of current tsdb, but with tag based queries.

Bugs

test-simple fails sometimes, related to tracing

Steps to reproduce:

Force a chunk growth with test-simple

The test typically fails with:

main Line 75 Expected 1000 but was 0

Ways to make the test pass:

Set log level below warnings
Comment out the trace msg
trace something before the test is run (e.g. at the top of test_simple:main)
Comment out this line in tsdb_trace.c:
strftime(theDate, 32, “%d/%b/%Y %H:%M:%S”, localtime(&theTime));

This very odd behavior feels like a race condition, but on what I have no idea.

Next Steps

Clean up source

Source code could use some reformatting:

79 max line length
Improved white spacing
Misc cleanup

traceEvent to trace_info, etc

Misc refactor

db_set -> db_put

lowerCamelCase to standard

consistent names

hash/hash_index -> index
name/hash_name -> key

Shorter names

num_xxx -> xxx
xxx_mode -> xxx
alive_and_kicking -> open

Refactor map-XXX

Rename to key-XXX

Store only the index

Drop index map

Refactor rsv-XXX

This is pointless. We can use the next counter authoritatively.

Refactor normalize_epoch

Should be a void.

Possibly simplify goto_epoch options

#	Exists	Option	Value	Create	Load	Ret	Writable	Scrap
1	no	create	0	no	n/a	-1	no
		ondemand	0
3	no	create	1	yes	yes	0	yes
		ondemand	0
5	no	create	0	no	no	0	no
		ondemand	1
7	no	create	1	yes	no	0	yes
		ondemand	1
2	yes	create	0	n/a	yes	0	yes	y
		ondemand	0
4	yes	create	1	n/a	yes	0	yes	y
		ondemand	0
6	yes	create	0	n/a	no	0	yes	y
		ondemand	1
8	yes	create	1	n/a	no	0	yes	y
		ondemand	1

Fail if the epoch isn’t there. This is useful for checking if an epoch exists. Could be handled with fail_if_missing=1 flag.
Actively load the epoch if it’s there.
if epoch found: load current epoch else: if error_on_missing flag: return error else: set current epoch as missing, create/load on demand on first write return success

Proposed API:

tsdb_goto_epoch(db, epoch, fail_on_missing, growable)

get_key_offset -> prepare_read_write

Lookup by index

Provide an API for looking up epoch values using the epoch array offset directly, rather than dereference it each time:

tsdb_get_by_index(db, index)

This would require a corresponding function for getting a key’s array offset:

tsdb_get_key_index(db, key)

We’d use this for range queries of a particular key by looking up the index first, then using it for each epoch.

More direct index support

Return index used with tsdb_set

We might as well return this value, as it’s convenient. It can be used for followup index related operations, e.g. adding tags via index.

add_tag_for_index

We’ll have the index when we set a value (see previous bullet) and can followup by adding applicable tags for it.

If we maintained an index of uncompressed tags, this update operation would be 0(1) as it’s a bit set operation on the array index.

If we’re lazy and reload the tag array on each operation, it’s still O(1) for the insert, but the disk/db load cost (probably) overwhelms that savings.

Consider: tag array cache

There will be a number of common tags that should be cached, saving the compress/decompress cycle and taking advantage of the O(1) cost of setting an index for the tag.

This looks like a simple starting point for a LRU cache we could use for uncompressed tag arrays:

http://www.geeksforgeeks.org/implement-lru-cache/

Sure up read only mode

Need some tests for read only mode.

Test growable = false and new keys

Exit on impossible conditions

malloc, e.g. should cause an immediate exit.

Fix tracing

I’m not seeing value from the tracing facility so far. If it continues to be more distration than helpful, nuke it.

Files

NOTES

Latest commit

History