-
Notifications
You must be signed in to change notification settings - Fork 0
Data Restoration
The best kind of replication is one that requires no manual intervention, like ClickHouse's ReplicatedMergeTree. High-availability systems can be constructed that tolerate single node failures. Sooner or later, maintainers of larger scale setups also have to consider whole-cluster failures and those call for some degree of manual data restoration.
At the scale of my time-series deployment, several million messages/sec with several aggregations, I have been finding it useful to have a small restoration procedure separate from a big one, as described below.
A quick replacement of a small gap in data in the order of tens to hundreds million rows (minutes of my time-series stream). Such gap may be caused by an awkward node restart or a network glitch. Use ClickHouse Copier or simply a manual DELETE-SELECT-INSERT. I might do:
ALTER TABLE x DELETE WHERE t BETWEEN t0 AND t1
-
SELECT FROM x WHERE t BETWEEN t0 AND t1
andINSERT INTO x
, native protocol network transport between the two using netcat or rsync - repeat for each table, rounding
(t0,t1)
interval appropriately
The rounding is necessary for those of my tables that are time-resolution aggregates.
This is an easy (bunch of SQL), flexible (one can transform schema in SELECT statement as desired + arbitrary resharding) and safe (command line client) procedure. It is slowed down by having to serialise-deserialise and re-merge inserted data. I might get a million rows/sec on my cluster with magnetic disk drives.
Repair of incidents lasting up to several days. Copy and attach entire merged parts. What I do is (see script example):
-
SELECT path FROM system.parts WHERE table = 'x' AND partition BETWEEN t0 and t1
to list parts in partitions overlapping given interval, rounded to table's partition size - copy those parts from their filesystem location to destination cluster's table detached directory (I use netcat over TCP)
ALTER TABLE x DELETE WHERE t BETWEEN t0 AND t1
-
ALTER TABLE x ATTACH PART
on all copied parts - repeat for each table
Rounding to partition size accommodates parts that have merged fully – a single part worth one partition.
This procedure is faster than SELECT-INSERT patching. I might get a couple million rows/sec on my cluster. A decently formulated benchmark would be useful here, and I might add one here at some point.
Main disadvantage is the potential for messing something up through a direct filesystem access. Schema of source and destination tables must be identical, with no SQL tweaks possible. Finally, you cannot reshuffle many shards to one; part names collide.