Implement Connection.predefine_table_schema (#422) #749

shmygol · 2020-01-30T08:11:29Z

Implement Connection.predefine_table_schema, which provides manual populating of cached table schemes (Connection._tables) and helps avoid to request DescribeTable which takes some time.

The method is integrated in TableConnection, which is used in model. It means all models don't send DescribeTable requests before querying the data any more, but Model.describe_table sends the real request every time it's called.

Provide existing model schema to a table connection in model to reduce DescribeTable requests

Avoid using unittest.TestCase.subTest, which is supported only since version 3.4

shmygol · 2020-01-30T08:11:38Z

I didn't do any refactoring to avoid unnecessary big pull requests, but I'd suggest Model._get_schema and Model._get_indexes to return the some keys in camel case instead of underscore to avoid transforming the data over and over again. Example:

        self.test_predefined_schema = {
            "attribute_definitions": [
                {
                    "AttributeName": "ForumName",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "LastPostDateTime",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "Subject",
                    "AttributeType": "S"
                }
            ],
            "key_schema": [
                {
                    "AttributeName": "ForumName",
                    "KeyType": "HASH"
                },
                {
                    "AttributeName": "Subject",
                    "KeyType": "RANGE"
                }
            ],
            "global_secondary_indexes": [
                {
                    "IndexName": "LastPostIndex",
                    "KeySchema": [
                        {
                            "AttributeName": "ForumName",
                            "KeyType": "HASH"
                        },
                        {
                            "AttributeName": "LastPostDateTime",
                            "KeyType": "RANGE"
                        }
                    ],
                    "Projection": {
                        "ProjectionType": "KEYS_ONLY"
                    }
                }
            ],
            "local_secondary_indexes": [
                {
                    "IndexName": "LastPostIndex",
                    "KeySchema": [
                        {
                            "AttributeName": "ForumName",
                            "KeyType": "HASH"
                        },
                        {
                            "AttributeName": "LastPostDateTime",
                            "KeyType": "RANGE"
                        }
                    ],
                    "projection": {
                        "ProjectionType": "KEYS_ONLY"
                    }
                }
            ],
        }

I could do it if there are no objections.

shmygol · 2020-03-08T20:11:57Z

Do anybody has an idea when the PR is going to be reviewed? Thanks!

ikonst · 2020-04-04T20:16:40Z

Would it be possible to make that DescribeTable call unnecessary? Our model definition already describes the table's keys and indices so we can just use them.

Of course DescribeTable gets that same information from the source of truth, but we don't actually take advantage of that to call out discrepancies between the model definition and the table definition: instead, in those cases PynamoDB fails in confusing ways (e.g. if the table defines a key with a different underlying data type vs. our model definition).

I remember @jpinner-lyft was working on removing that DescribeTable call, but I don't think he ever completed that effort.

shmygol · 2020-04-04T21:10:32Z

That’s the point of the PR. Connection doesn’t call DescribeTable if the scheme is provided, which is already the case for models. Ilya Konstantinov <[email protected]> schrieb am Sa. 4. Apr. 2020 um 22:16:

Would it be possible to make that DescribeTable call unnecessary? Our model definition already describes the table's keys and indices so we can just use them. Of course DescribeTable gets that same information from the source of truth, but we don't actually take advantage of that to call out discrepancies between the model definition and the table definition: instead, in those cases PynamoDB fails in confusing ways (e.g. if the table defines a key with a different underlying data type vs. our model definition). I remember @jpinner-lyft <https://github.com/jpinner-lyft> was working on removing that DescribeTable call, but I don't think he ever completed that effort. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#749 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGFF33SRR6X6VIO2METBGTRK6ITNANCNFSM4KNRTQWQ> .

-- Best Regards, Ilya Shmygol.

ikonst · 2020-04-06T15:52:10Z

Ah, I missed the fact that you automatically produce a predefined schema in Model, nice.

One thought, though, is that the only reason TableConnection._tables exists is for TableConnection.get_meta_table. Perhaps the concept of storing table schemas could be removed from TableConnection entirely? After all, if in all practical cases it'll be echoing back to the Model class the predefined schema that the Model class supplied it with, perhaps there's no need for this complexity?

shmygol · 2020-04-15T17:13:04Z

Ah, I missed the fact that you automatically produce a predefined schema in Model, nice.

One thought, though, is that the only reason TableConnection._tables exists is for TableConnection.get_meta_table. Perhaps the concept of storing table schemas could be removed from TableConnection entirely? After all, if in all practical cases it'll be echoing back to the Model class the predefined schema that the Model class supplied it with, perhaps there's no need for this complexity?

@ikonst do you mean avoid caching schemas in a class and store the table schema in an instance variable instead (we still have to store the schema somewhere)? I was afraid to introduce decline of performance for clients who use TableConnection without a model for some reason.

But I find your suggestion reasonable and can do it. Here how I see it: TableConnection.predefine_table_schema becomes an instance method and populate an instance variable TableConnection._table. If table schema is not predefined it's fetched lazily in TableConnection.get_meta_table from DynamoDB, but only for this instance.

Does it make sense to you?

ikonst · 2020-04-17T03:17:16Z

(Sorry, been very busy lately, will have a look over the weekend)

ikonst · 2020-04-22T02:24:16Z

@ikonst do you mean avoid caching schemas in a class and store the table schema in an instance variable instead (we still have to store the schema somewhere)? I was afraid to introduce decline of performance for clients who use TableConnection without a model for some reason.

It's hard for me to imagine TableConnection is used by anyone but the library itself, though with enough users there's someone for every obscure use case :)

If table schema is not predefined it's fetched lazily in TableConnection.get_meta_table from DynamoDB, but only for this instance.

What I'm questioning is why the connection/model needs to be aware of the server's table schema. While it's the source of truth, in many other places we act based on the local schema. The schemas can be out of sync -- for example, server thinks HK is a number (N), locally it's a UnicodeAttribute (S). Do we use server's schema or local schema when serializing and deserializing? From last time I read the code, it's a mix of both, which is probably the most confusing and error prone approach :/

The library defines a local schema that's more extensive than the server-side schema -- we define non-key attributes while DynamoDB defines the keys only, and we have some complex data types on top of DynamoDB's built-in ones, so the server's schema is not sufficient to serialize/deserialize a model; thus, we need the local schema. Because of this, I'd rather see PynamoDB use only the local schema in all serialization/deserialization, in which case DescribeTable should be completely optional in normal operation.

I think there's a good function where DescribeTable would be useful, and that's an (imaginary) "Model.validate_schema" method that would ensure that the local schema overlaps with the server's one (e.g. that we don't mistype indexes, which can create some really confusing error messages... from experience).

shmygol added 4 commits January 29, 2020 14:05

Implement Connection.predefine_table_schema (pynamodb#422)

2326ce0

Fix attribute naming confusion in a unit test (pynamodb#422)

9461cc4

Use model schema instead of DescribeTable API (pynamodb#422)

bf7cbff

Provide existing model schema to a table connection in model to reduce DescribeTable requests

Avoid using sub tests in the unit tests (pynamodb#422)

83e1dbf

Avoid using unittest.TestCase.subTest, which is supported only since version 3.4

Merge upstream/master

0eb07c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Connection.predefine_table_schema (#422) #749

Implement Connection.predefine_table_schema (#422) #749

shmygol commented Jan 30, 2020 •

edited

Loading

shmygol commented Jan 30, 2020

shmygol commented Mar 8, 2020

ikonst commented Apr 4, 2020

shmygol commented Apr 4, 2020 via email

ikonst commented Apr 6, 2020

shmygol commented Apr 15, 2020

ikonst commented Apr 17, 2020

ikonst commented Apr 22, 2020

Implement Connection.predefine_table_schema (#422) #749

Are you sure you want to change the base?

Implement Connection.predefine_table_schema (#422) #749

Conversation

shmygol commented Jan 30, 2020 • edited Loading

shmygol commented Jan 30, 2020

shmygol commented Mar 8, 2020

ikonst commented Apr 4, 2020

shmygol commented Apr 4, 2020 via email

ikonst commented Apr 6, 2020

shmygol commented Apr 15, 2020

ikonst commented Apr 17, 2020

ikonst commented Apr 22, 2020

shmygol commented Jan 30, 2020 •

edited

Loading