DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520

svoj · 2024-09-13T16:03:34Z

[] The Jira issue number for this PR is: MDEV-______

Description

Various fixes to make DDL recovery work with high-level indexes.

Release Notes

None.

How can this PR be tested?

mtr

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
This is a bug fix to a new feature and the PR is based against the feature tree.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

as it can never be null (only "" or "disabled")

Bounded_queue<> pretended to be a typesafe C++ wrapper on top of pure C queues.h. But it wasn't, it was tightly bounded to filesort and only useful there. * implement Queue<> - a typesafe C++ wrapper on top of QUEUE * move Bounded_queue to filesort.cc, remove pointless "generalizations" change it to use Queue. * remove bounded_queue.h * change subselect_rowid_merge_engine to use Queue, not QUEUE

the information about index algorithm was stored in two places inconsistently split between both. BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not. RTREE index had key->algorithm == HA_KEY_ALG_RTREE and always had key->flags & HA_SPATIAL FULLTEXT index had key->algorithm == HA_KEY_ALG_FULLTEXT and always had key->flags & HA_FULLTEXT HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH In this commit: All indexes except BTREE and HASH always have key->algorithm set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except for storage to keep frms backward compatible). As a side effect ALTER TABLE now detects FULLTEXT index renames correctly

needed to get partitioning and information about secondary objects

…EM VERSIONING" This partially reverts 43623f0 Engines have to set ::position() after ::write_row(), otherwise the server won't be able to refer to the row just inserted. This is important for high-level indexes. heap part isn't reverted, so heap doesn't support high-level indexes. to fix this, it'll need info->lastpos in addition to info->current_ptr

create templates thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n) and the same for thd->calloc(). By the default the type is char, so old usage of thd->alloc(size) works too.

let the caller tell init_tmp_table_share() whether the table should be thread_specific or not. In particular, internal tmp tables created in the slave thread are perfectly thread specific

MDEV-33407 Parser support for vector indexes The syntax is create table t1 (... vector index (v) ...); limitation: * v is a binary string and NOT NULL * only one vector index per table * temporary tables are not supported MDEV-33404 Engine-independent indexes: subtable method added support for so-called "high level indexes", they are not visible to the storage engine, implemented on the sql level. For every such an index in a table, say, t1, the server implicitly creates a second table named, like, t1#i#05 (where "05" is the index number in t1). This table has a fixed structure, no frm, not accessible directly, doesn't go into the table cache, needs no MDLs. MDEV-33406 basic optimizer support for k-NN searches for a query like SELECT ... ORDER BY func() optimizer will use item_func->part_of_sortkey() to decide what keys can be used to resolve ORDER BY.

also add const to methods in List<> and Hash_set<> while we're at it

This commit includes the work done in collaboration with Hugo Wen from Amazon: MDEV-33408 Alter HNSW graph storage and fix memory leak This commit changes the way HNSW graph information is stored in the second table. Instead of storing connections as separate records, it now stores neighbors for each node, leading to significant performance improvements and storage savings. Comparing with the previous approach, the insert speed is 5 times faster, search speed improves by 23%, and storage usage is reduced by 73%, based on ann-benchmark tests with random-xs-20-euclidean and random-s-100-euclidean datasets. Additionally, in previous code, vector objects were not released after use, resulting in excessive memory consumption (over 20GB for building the index with 90,000 records), preventing tests with large datasets. Now ensure that vectors are released appropriately during the insert and search functions. Note there are still some vectors that need to be cleaned up after search query completion. Needs to be addressed in a future commit. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc. As well as the commit: Introduce session variables to manage HNSW index parameters Three variables: hnsw_max_connection_per_layer hnsw_ef_constructor hnsw_ef_search ann-benchmark tool is also updated to support these variables in commit HugoWenTD/ann-benchmarks@e09784e for branch https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc. Co-authored-by: Hugo Wen <[email protected]>

* sysvars should be REQUIRED_ARG * fix a mix of US and UK spelling (use US) * use consistent naming * work if VEC_DISTANCE arguments are in the swapped order (const, col) * work if VEC_DISTANCE argument is NULL/invalid or wrong length * abort INSERT if the value is invalid or wrong length * store the "number of neighbors" in a blob in endianness-independent way * use field->store(longlong, bool) not field->store(double) * a lot more error checking everywhere * cleanup after errors * simplify calling conventions, remove reinterpret_cast's * todo/XXX comments * whitespaces * use float consistently memory management is still totally PoC quality Initial HNSW implementation

instead of pointers to FVectorRef's (which are stored elsewhere) let's return one big array of all refs. Freeing this array will free the complete result set.

move everything into a query-local memroot which is freed at the end

This patch fixes only TRUNCATE by recreate variant, there seem to be no reasonable engine that uses TRUNCATE by handler method for testing. Reset index_cinfo so that mi_create is not confused by garbage passed via index_file_name and sets MY_DELETE_OLD flag. Review question: can we add a test case to make sure VECTOR index is empty indeed?

Rename high-level indexes along with a table.

1. randomize all vectors via multiplication by a random orthogonal matrix * to generate the matrix fill the square matrix with normally distributed random values and create an orthogonal matrix with the QR decomposition * the rnd generator is seeded with the number of dimensions, so the matrix will be always the same for a given table * multiplication by an orthogonal matrix is a "rotation", so does not change distances or angles 2. when calculating the distance, first calculate a "subdistance", the distance between projections to the first subdist_part coordinates (=192, best by test, if it's larger it's less efficient, if it's smaller the error rate is too high) 3. calculate the full distance only if "subdistance" isn't confidently higher (above subdist_margin) than the distance we're comparing with * it might look like it would make sense to do a second projection at, say, subdist_part*2, and so on - but in practice one check is enough, the projected distance converges quickly and if it isn't confidently higher at subdist_part, it won't be later either This optimization introduces a constant overhead per insert/search operation - an input/query vector has to be multiplied by the matrix. And the optimization saves on every distance calculation. Thus it is only beneficial when a number of distance calculations (which grows with M and with the table size) is high enough to outweigh the constant overhead. Let's use MIN_ROWS table option to estimate the number of rows in the table. use_subdist_heuristic() is optimal for mnist and fashion-mnist (784 dimensions, 60k rows) and variations of gist (960 dimensions, 200k, 400k, 600k, 800k, 1000k rows)

remove unused methods, reorder methods, add comments

into a separate transaction_participant structure handlerton inherits it, so handlerton itself doesn't change. but entities that only need to participate in a transaction, like binlog or online alter log, use a transaction_participant and no longer need to pretend to be a full-blown but invisible storage engine which doesn't support create table.

and create a parent Item_func_vec_distance_common class

it's measurably faster even in items

Fixes for ALTER TABLE ... ADD/DROP COLUMN, ALGORITHM=COPY. Let quick_rm_table() remove high-level indexes along with original table. Avoid locking uninitialized LOCK_share for INTERNAL_TMP_TABLEs. Don't enable bulk insert when altering a table containing vector index. InnoDB can't handle situation when bulk insert is enabled for one table but disabled for another. We can't do bulk insert on vector index as it does table updates currently.

Disable non-copy ALTER algorithms when VECTOR index is affected. Engines are not supposed to handle high-level indexes anyway. Also fixed misbehaving IF [NOT] EXISTS variants.

quick_rm_table() expects .frm to exist when it removes high-level indexes. For cases like ALTER TABLE t1 RENAME TO t2, ENGINE=other_engine .frm was removed earlier. Another option would be removing high-level indexes explicitly before the first quick_rm_table() and skipping high-level indexes for subsequent quick_rm_table(NO_FRM_RENAME). But this suggested order may also help with ddl log recovery. That is if we crash before high-level indexes are removed, .frm is going to exist.

Replaced obscure FRM_ONLY, NO_FRM_RENAME, NO_HA_TABLE, NO_PAR_TABLE with straightforward explicit flags: QRMT_FRM - [re]moves .frm QRMT_PAR - [re]moves .par QRMT_HANDLER - calls ha_delete_table()/ha_rename_table() and [re]moves high-level indexes QRMT_DEFAULT - same as QRMT_FRM | QRMT_HANDLER, which is regular table drop/rename.

also: renames, s/const/constexpr/ for consistency

Covers CREATE, DROP and RENAME.

... until a few bugs that cause server crash are fixed.

CLAassistant · 2024-09-13T16:03:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Sergey Vojtovich seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

vuvova · 2024-09-14T09:16:53Z

sql/ddl_log.cc

+      file->ha_rename_table(idx_from, idx_to);
+    }
+  }
+


I'm not sure this is a good approach. It feels like you put too much inside one DDL_RENAME_PHASE_TABLE entry. There can be crashes between renames of individual vector indexes (presuming there will be many some day).

Perhaps, DDL_RENAME_PHASE_TABLE should be one ha_rename_table only? That is, a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE entry per index (and one for the main table, of course)?

Same for DROP, etc.

There can be crashes between renames of individual vector indexes (presuming there will be many some day).

Yes, sure. And my code handles it inline with what original code have been doing, that is ignores non_existing_table_errors(), e.g. if handler files were moved and .frm not. Or as you suggested above crash between different high-level indexes rename.

... a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE entry per index (and one for the main table, of course)?

PHASE is just an enum member of an ACTION, it can't have stuff like table name or whatever attached.

ACTION is more logical (or high-level) entity, which may do many things like binlogging, handling stat tables, triggers, handler files and of course .frm.

Here're options:

We could add special phases to handle high-level indexes, but there's going to be no much difference between my solution. And it will introduce compatibility issues. Though it is very true that older versions won't be able to handle high-level indexes properly even with this patch.

We can't re-use existing actions like DDL_LOG_RENAME_TABLE_ACTION, as I mentioned before they're way to sophisticated for high-level indexes.

In theory we could try re-using special actions like DDL_LOG_RENAME_ACTION for high level indexes. That is issue one DDL_LOG_RENAME_TABLE_ACTION and then DDL_LOG_RENAME_ACTION for each high-level index. It may work, it has no compatibility concerns, but it is a question if we're going to be able to integrate it with e.g. ALTER.

Your thoughts?

vuvova · 2024-09-14T09:17:32Z

sql/ddl_log.cc

+      file->ha_rename_table(idx_from, idx_to);
+    }
+  }
+


Same for DROP, etc.

mysql-test/main/AAA.test

storage/maria/ha_maria.cc

vuvova and others added 30 commits September 11, 2024 16:38

MDEV-32885 VEC_DISTANCE() function

917a503

make INFORMATION_SCHEMA.STATISTICS.COMMENT not nullable

c6309c3

as it can never be null (only "" or "disabled")

fix main.plugin_vars test to cleanup after itself

149d3d2

fix a typo and an old bug in prefschema.transaction test

c7ca7bd

cleanup: spaces, casts, comments

70b534f

cleanup: CREATE_TYPELIB_FOR() helper

d26ad24

cleanup: pass TABLE_SHARE to store_key_options()

cce9b16

reject invalid spatial key declarations in the parser

59c4e3e

cleanup: const in List::push_front()

8463133

cleanup: remove unconditional #ifdef's

2e60bc8

cleanup: lex_string_set3()

9b4bc61

cleanup: make_long_hash_field_name() and add_hash_field()

b9abbbf

cleanup: generalize ER_SPATIAL_CANT_HAVE_NULL

75b8466

cleanup: generalize ER_INNODB_NO_FT_TEMP_TABLE

ff5431c

cleanup: extract ha_create_table_from_share()

a1cc262

open frm for DROP TABLE

7e7358e

needed to get partitioning and information about secondary objects

cleanup: unused function argument

594733a

cleanup: thd->alloc<>() and thd->calloc<>()

ccb69f9

create templates thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n) and the same for thd->calloc(). By the default the type is char, so old usage of thd->alloc(size) works too.

cleanup: init_tmp_table_share(bool thread_specific)

e335474

let the caller tell init_tmp_table_share() whether the table should be thread_specific or not. In particular, internal tmp tables created in the slave thread are perfectly thread specific

HA_KEY_ALG_MHNSW->HA_KEY_ALG_VECTOR

003fe6d

cleanup: C++11 range-based for loop for Hash_set<>

4a9814c

cleanup: simplify Queue<>, add const

93ab541

also add const to methods in List<> and Hash_set<> while we're at it

mhnsw: simplify memory management of returned results

9e997bb

instead of pointers to FVectorRef's (which are stored elsewhere) let's return one big array of all refs. Freeing this array will free the complete result set.

mhnsw: fix memory management

a35dc59

move everything into a query-local memroot which is freed at the end

svoj and others added 22 commits September 12, 2024 19:06

Test insert ... select with vector index

e8ca1c1

test that the vector index size is zero after TRUNCATE in MyISAM

b55fb30

VECTOR indexes support for RENAME TABLE

6004803

Rename high-level indexes along with a table.

fix for rename

5251aa9

make rename test to work with InnoDB too

4bbcc0e

AVX-512 support

149db02

cleanup: handlerton

7d4e9ae

remove unused methods, reorder methods, add comments

cleanup: index options don't need hton anymore

0d7d572

mhnsw: use plugin index options and transaction_participant API

d9182f8

rename VEC_Distance to VEC_Distance_Euclidean

d9b8839

and create a parent Item_func_vec_distance_common class

VEC_Distance_Cosine()

1f0e4e5

if we require Eigen, we can as well use it everywhere

d8f3532

it's measurably faster even in items

ALTER TABLE fixes for high-level indexes

da05342

Disable non-copy ALTER algorithms when VECTOR index is affected. Engines are not supposed to handle high-level indexes anyway. Also fixed misbehaving IF [NOT] EXISTS variants.

cleanup: TABLE_SHARE::lock_share() helper

447f819

also: renames, s/const/constexpr/ for consistency

DDL recovery for high-level indexes

faa3bb7

Covers CREATE, DROP and RENAME.

Disabled high-level indexes with Aria

cc35d56

... until a few bugs that cause server crash are fixed.

svoj requested a review from vuvova September 13, 2024 16:03

vuvova reviewed Sep 14, 2024

View reviewed changes

vuvova requested changes Sep 14, 2024

View reviewed changes

sql/ddl_log.cc

file->ha_rename_table(idx_from, idx_to);

}

}

Copy link

Member

vuvova Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for DROP, etc.

mysql-test/main/AAA.test Show resolved Hide resolved

vuvova reviewed Sep 14, 2024

View reviewed changes

storage/maria/ha_maria.cc Show resolved Hide resolved

vuvova force-pushed the bb-11.6-MDEV-32887-vector branch from 658f587 to 70ff3b6 Compare September 14, 2024 17:51

vuvova force-pushed the bb-11.6-MDEV-32887-vector branch 2 times, most recently from fd141e4 to 5d724ba Compare September 21, 2024 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520

DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520

svoj commented Sep 13, 2024

CLAassistant commented Sep 13, 2024

vuvova Sep 14, 2024

vuvova Sep 14, 2024

svoj Sep 14, 2024 •

edited

Loading

vuvova Sep 14, 2024

DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520

Are you sure you want to change the base?

DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520

Conversation

svoj commented Sep 13, 2024

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

CLAassistant commented Sep 13, 2024

vuvova Sep 14, 2024

Choose a reason for hiding this comment

vuvova Sep 14, 2024

Choose a reason for hiding this comment

svoj Sep 14, 2024 • edited Loading

Choose a reason for hiding this comment

vuvova Sep 14, 2024

Choose a reason for hiding this comment

svoj Sep 14, 2024 •

edited

Loading