Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDL recovery for high-level indexes (CREATE, DROP and RENAME) #3520

Open
wants to merge 83 commits into
base: bb-11.6-MDEV-32887-vector
Choose a base branch
from

Conversation

svoj
Copy link
Contributor

@svoj svoj commented Sep 13, 2024

  • [] The Jira issue number for this PR is: MDEV-______

Description

Various fixes to make DDL recovery work with high-level indexes.

Release Notes

None.

How can this PR be tested?

mtr

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
  • This is a bug fix to a new feature and the PR is based against the feature tree.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

vuvova and others added 30 commits September 11, 2024 16:38
as it can never be null (only "" or "disabled")
Bounded_queue<> pretended to be a typesafe C++ wrapper
on top of pure C queues.h.

But it wasn't, it was tightly bounded to filesort and only useful there.

* implement Queue<> - a typesafe C++ wrapper on top of QUEUE
* move Bounded_queue to filesort.cc, remove pointless "generalizations"
  change it to use Queue.
* remove bounded_queue.h
* change subselect_rowid_merge_engine to use Queue, not QUEUE
the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
needed to get partitioning and information about
secondary objects
…EM VERSIONING"

This partially reverts 43623f0

Engines have to set ::position() after ::write_row(), otherwise
the server won't be able to refer to the row just inserted.
This is important for high-level indexes.

heap part isn't reverted, so heap doesn't support high-level indexes.
to fix this, it'll need info->lastpos in addition to info->current_ptr
create templates

  thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)

and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.

In particular, internal tmp tables created in the slave thread
are perfectly thread specific
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
also add const to methods in List<> and Hash_set<>
while we're at it
This commit includes the work done in collaboration with Hugo Wen from
Amazon:

    MDEV-33408 Alter HNSW graph storage and fix memory leak

    This commit changes the way HNSW graph information is stored in the
    second table. Instead of storing connections as separate records, it now
    stores neighbors for each node, leading to significant performance
    improvements and storage savings.

    Comparing with the previous approach, the insert speed is 5 times faster,
    search speed improves by 23%, and storage usage is reduced by 73%, based
    on ann-benchmark tests with random-xs-20-euclidean and
    random-s-100-euclidean datasets.

    Additionally, in previous code, vector objects were not released after
    use, resulting in excessive memory consumption (over 20GB for building
    the index with 90,000 records), preventing tests with large datasets.
    Now ensure that vectors are released appropriately during the insert and
    search functions. Note there are still some vectors that need to be
    cleaned up after search query completion. Needs to be addressed in a
    future commit.

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

As well as the commit:

    Introduce session variables to manage HNSW index parameters

    Three variables:

    hnsw_max_connection_per_layer
    hnsw_ef_constructor
    hnsw_ef_search

    ann-benchmark tool is also updated to support these variables in commit
    HugoWenTD/ann-benchmarks@e09784e for branch
    https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

Co-authored-by: Hugo Wen <[email protected]>
* sysvars should be REQUIRED_ARG
* fix a mix of US and UK spelling (use US)
* use consistent naming
* work if VEC_DISTANCE arguments are in the swapped order (const, col)
* work if VEC_DISTANCE argument is NULL/invalid or wrong length
* abort INSERT if the value is invalid or wrong length
* store the "number of neighbors" in a blob in endianness-independent way
* use field->store(longlong, bool) not field->store(double)
* a lot more error checking everywhere
* cleanup after errors
* simplify calling conventions, remove reinterpret_cast's
* todo/XXX comments
* whitespaces
* use float consistently

memory management is still totally PoC quality

Initial HNSW implementation
instead of pointers to FVectorRef's (which are stored elsewhere)
let's return one big array of all refs. Freeing this array will
free the complete result set.
move everything into a query-local memroot which is freed at the end
svoj and others added 22 commits September 12, 2024 19:06
This patch fixes only TRUNCATE by recreate variant, there seem to be no
reasonable engine that uses TRUNCATE by handler method for testing.

Reset index_cinfo so that mi_create is not confused by garbage passed via
index_file_name and sets MY_DELETE_OLD flag.

Review question: can we add a test case to make sure VECTOR index is empty
indeed?
Rename high-level indexes along with a table.
1. randomize all vectors via multiplication by a random orthogonal
   matrix
   * to generate the matrix fill the square matrix with normally
     distributed random values and create an orthogonal matrix with
     the QR decomposition
   * the rnd generator is seeded with the number of dimensions,
     so the matrix will be always the same for a given table
   * multiplication by an orthogonal matrix is a "rotation", so
     does not change distances or angles
2. when calculating the distance, first calculate a "subdistance",
   the distance between projections to the first subdist_part
   coordinates (=192, best by test, if it's larger it's less efficient,
   if it's smaller the error rate is too high)
3. calculate the full distance only if "subdistance" isn't confidently
   higher (above subdist_margin) than the distance we're comparing with
   * it might look like it would make sense to do a second projection
     at, say, subdist_part*2, and so on - but in practice one check
     is enough, the projected distance converges quickly and if it
     isn't confidently higher at subdist_part, it won't be later either

This optimization introduces a constant overhead per insert/search
operation - an input/query vector has to be multiplied by the matrix.
And the optimization saves on every distance calculation. Thus it is only
beneficial when a number of distance calculations (which grows with M
and with the table size) is high enough to outweigh the constant
overhead. Let's use MIN_ROWS table option to estimate the number of rows
in the table. use_subdist_heuristic() is optimal for mnist and
fashion-mnist (784 dimensions, 60k rows) and variations of gist (960
dimensions, 200k, 400k, 600k, 800k, 1000k rows)
remove unused methods, reorder methods, add comments
into a separate transaction_participant structure

handlerton inherits it, so handlerton itself doesn't change.
but entities that only need to participate in a transaction,
like binlog or online alter log, use a transaction_participant
and no longer need to pretend to be a full-blown but invisible
storage engine which doesn't support create table.
and create a parent Item_func_vec_distance_common class
it's measurably faster even in items
Fixes for ALTER TABLE ... ADD/DROP COLUMN, ALGORITHM=COPY.

Let quick_rm_table() remove high-level indexes along with original table.

Avoid locking uninitialized LOCK_share for INTERNAL_TMP_TABLEs.

Don't enable bulk insert when altering a table containing vector index.
InnoDB can't handle situation when bulk insert is enabled for one table
but disabled for another. We can't do bulk insert on vector index as it
does table updates currently.
Disable non-copy ALTER algorithms when VECTOR index is affected. Engines
are not supposed to handle high-level indexes anyway.

Also fixed misbehaving IF [NOT] EXISTS variants.
quick_rm_table() expects .frm to exist when it removes high-level indexes.
For cases like ALTER TABLE t1 RENAME TO t2, ENGINE=other_engine .frm was
removed earlier.

Another option would be removing high-level indexes explicitly before the
first quick_rm_table() and skipping high-level indexes for subsequent
quick_rm_table(NO_FRM_RENAME).

But this suggested order may also help with ddl log recovery. That is
if we crash before high-level indexes are removed, .frm is going to
exist.
Replaced obscure FRM_ONLY, NO_FRM_RENAME, NO_HA_TABLE, NO_PAR_TABLE with
straightforward explicit flags:

QRMT_FRM - [re]moves .frm
QRMT_PAR - [re]moves .par
QRMT_HANDLER - calls ha_delete_table()/ha_rename_table() and [re]moves
               high-level indexes
QRMT_DEFAULT - same as QRMT_FRM | QRMT_HANDLER, which is regular table
               drop/rename.
also: renames, s/const/constexpr/ for consistency
Covers CREATE, DROP and RENAME.
... until a few bugs that cause server crash are fixed.
@svoj svoj requested a review from vuvova September 13, 2024 16:03
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Sergey Vojtovich seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

file->ha_rename_table(idx_from, idx_to);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a good approach. It feels like you put too much inside one DDL_RENAME_PHASE_TABLE entry. There can be crashes between renames of individual vector indexes (presuming there will be many some day).

Perhaps, DDL_RENAME_PHASE_TABLE should be one ha_rename_table only? That is, a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE entry per index (and one for the main table, of course)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for DROP, etc.

Copy link
Contributor Author

@svoj svoj Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be crashes between renames of individual vector indexes (presuming there will be many some day).

Yes, sure. And my code handles it inline with what original code have been doing, that is ignores non_existing_table_errors(), e.g. if handler files were moved and .frm not. Or as you suggested above crash between different high-level indexes rename.

... a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE entry per index (and one for the main table, of course)?

  1. PHASE is just an enum member of an ACTION, it can't have stuff like table name or whatever attached.
  2. ACTION is more logical (or high-level) entity, which may do many things like binlogging, handling stat tables, triggers, handler files and of course .frm.

Here're options:

  1. We could add special phases to handle high-level indexes, but there's going to be no much difference between my solution. And it will introduce compatibility issues. Though it is very true that older versions won't be able to handle high-level indexes properly even with this patch.
  2. We can't re-use existing actions like DDL_LOG_RENAME_TABLE_ACTION, as I mentioned before they're way to sophisticated for high-level indexes.
  3. In theory we could try re-using special actions like DDL_LOG_RENAME_ACTION for high level indexes. That is issue one DDL_LOG_RENAME_TABLE_ACTION and then DDL_LOG_RENAME_ACTION for each high-level index. It may work, it has no compatibility concerns, but it is a question if we're going to be able to integrate it with e.g. ALTER.

Your thoughts?

file->ha_rename_table(idx_from, idx_to);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for DROP, etc.

mysql-test/main/AAA.test Show resolved Hide resolved
@vuvova vuvova force-pushed the bb-11.6-MDEV-32887-vector branch 2 times, most recently from fd141e4 to 5d724ba Compare September 21, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants