Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clickhouse-replicated dialect #809

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

chapsuk
Copy link
Contributor

@chapsuk chapsuk commented Aug 20, 2024

This PR introduces a new clickhouse-replicated dialect, and here’s why it’s needed:

  • The ReplicatedMergeTree engine is required.
  • The ON CLUSTER clause is a must for CREATE TABLE queries.
  • You need to set insert_quorum=2 for inserts and select_sequential_consistency=1 for selects.

These differences between a single-node ClickHouse instance and a replicated cluster are why we’re adding this new dialect.

Added the environment variable GOOSE_CLICKHOUSE_CLUSTER_NAME (default: {cluster}) allows to set Clickhouse cluster name

var _ Querier = (*ClickhouseReplicated)(nil)

func (c *ClickhouseReplicated) CreateTable(tableName string) string {
q := `CREATE TABLE IF NOT EXISTS %s ON CLUSTER '{cluster}' (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to make the macro {cluster} configurable, since it is not standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @funvit! Thanks for the feedback 👍

I understand the concern about the {cluster} macro not being a standard. While it might not be universally recognized, it is widely used when deploying clusters via the Altinity ClickHouse Operator, and it’s also mentioned in the official ClickHouse documentation.

I believe using the {cluster} macro is a good practice because it simplifies cluster management and cluster queries.

Regarding making it configurable, the reason I didn’t pursue that route is due to the current limitations in the implementation of dialects, which don’t support additional settings. To avoid significant changes, something I’d prefer to minimize when adding a new dialect, the most viable approach is to pass the cluster name via an environment variable, like CLICKHOUSE_REPLICATED_CLUSTER_NAME, defaulting to {cluster}.

What do you think - should we keep using the macro as is, or switch to an environment variable? Or maybe there's a better solution I haven't thought of?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this macro is standard or recommended, but in tests it would be easier to replace it with some custom value.

For example, now I had to implement Store for an older version of goose, and there I just made a Cluster field for ClickhouseReplicated.

This simplified the tests (using testcontainers), because it was no longer necessary to support the configuration of macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation, it makes sense!

I'm added extra env var GOOSE_CLICKHOUSE_CLUSTER_NAME which allows redefine cluster name and added clickhouse-replicated dialect tests. PTAL

@chapsuk chapsuk force-pushed the clickhouse-replicated branch 2 times, most recently from ee7131e to 8a0af1a Compare September 21, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants