Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pick up index dimension on first insert for an empty table #251

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

therealdarkknight
Copy link
Contributor

Addresses issue 198 using the approach documented in the discussion of PR 209.

When creating an index with no dimension specified on an empty table, we set the dimension in the index header to 0. Later, during the first insert, we infer the dimension from the inserted first row and then update it in the index header.

… 0 and then later update it after inferring dimension from first insert
Copy link
Collaborator

@var77 var77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Do you think adding this code

isNull = true;
while(isNull) {
    // Get the indexed column out of the row and return it's dimensions
    datum = heap_getattr(tuple, indexCol, RelationGetDescr(heap), &isNull);
    if(!isNull) {
        array = DatumGetArrayTypePCopy(datum);
        n_items = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
        break;
    }
    tuple = heap_getnext(scan, ForwardScanDirection);
    if(tuple == NULL) {
        heap_endscan(scan);
        return 0;
    }
}

In line 246 makes sense?

Because now this will fail to infer the dimensions:

create table test_emp (v real[]);
insert into test_emp (v) values (NULL),( '{1,1,1}');
create index on test_emp using hnsw(v);

cc: @Ngalstyan4

@therealdarkknight
Copy link
Contributor Author

Looks good. Do you think adding this code

isNull = true;
while(isNull) {
    // Get the indexed column out of the row and return it's dimensions
    datum = heap_getattr(tuple, indexCol, RelationGetDescr(heap), &isNull);
    if(!isNull) {
        array = DatumGetArrayTypePCopy(datum);
        n_items = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
        break;
    }
    tuple = heap_getnext(scan, ForwardScanDirection);
    if(tuple == NULL) {
        heap_endscan(scan);
        return 0;
    }
}

In line 246 makes sense?

Because now this will fail to infer the dimensions:

create table test_emp (v real[]);
insert into test_emp (v) values (NULL),( '{1,1,1}');
create index on test_emp using hnsw(v);

cc: @Ngalstyan4

Yes, great point! I agree with this change. As far as I know, this behavior was present prior to this PR/issue (I just realized I also brought it up earlier, see number 2 in my first PR for this ).

I have some other ideas on how the way we do dimension inference can be refactored-- see number 3 in my first PR above. Since this edit is relevant to index creation on non-empty tables, (and this issue/PR is concerned with index creation on empty tables) I can submit a separate PR with this change and also refactor those functions there.

What do you think?

@var77
Copy link
Collaborator

var77 commented Dec 22, 2023

Yep that will work! Actually maybe we won't need that as per discussion with Narek we decided that the tradeoff of UX doesn't worth the complexity added (counting edge cases as well)

Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small ask, then should be good to go!

begin;
-- Our index then infers the dimension from the first inserted row
INSERT INTO small_world5 (id, v) VALUES
('000', '{1,0,0,0,1}'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you insert a NULL value here first, before inserting anything else, to trigger this case ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants