Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Connection.predefine_table_schema (#422) #749

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

shmygol
Copy link

@shmygol shmygol commented Jan 30, 2020

Implement Connection.predefine_table_schema, which provides manual populating of cached table schemes (Connection._tables) and helps avoid to request DescribeTable which takes some time.

The method is integrated in TableConnection, which is used in model. It means all models don't send DescribeTable requests before querying the data any more, but Model.describe_table sends the real request every time it's called.

Provide existing model schema to a table connection in model to reduce
DescribeTable requests
Avoid using unittest.TestCase.subTest,
which is supported only since version 3.4
@shmygol
Copy link
Author

shmygol commented Jan 30, 2020

I didn't do any refactoring to avoid unnecessary big pull requests, but I'd suggest Model._get_schema and Model._get_indexes to return the some keys in camel case instead of underscore to avoid transforming the data over and over again. Example:

        self.test_predefined_schema = {
            "attribute_definitions": [
                {
                    "AttributeName": "ForumName",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "LastPostDateTime",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "Subject",
                    "AttributeType": "S"
                }
            ],
            "key_schema": [
                {
                    "AttributeName": "ForumName",
                    "KeyType": "HASH"
                },
                {
                    "AttributeName": "Subject",
                    "KeyType": "RANGE"
                }
            ],
            "global_secondary_indexes": [
                {
                    "IndexName": "LastPostIndex",
                    "KeySchema": [
                        {
                            "AttributeName": "ForumName",
                            "KeyType": "HASH"
                        },
                        {
                            "AttributeName": "LastPostDateTime",
                            "KeyType": "RANGE"
                        }
                    ],
                    "Projection": {
                        "ProjectionType": "KEYS_ONLY"
                    }
                }
            ],
            "local_secondary_indexes": [
                {
                    "IndexName": "LastPostIndex",
                    "KeySchema": [
                        {
                            "AttributeName": "ForumName",
                            "KeyType": "HASH"
                        },
                        {
                            "AttributeName": "LastPostDateTime",
                            "KeyType": "RANGE"
                        }
                    ],
                    "projection": {
                        "ProjectionType": "KEYS_ONLY"
                    }
                }
            ],
        }

I could do it if there are no objections.

@shmygol
Copy link
Author

shmygol commented Mar 8, 2020

Do anybody has an idea when the PR is going to be reviewed? Thanks!

@ikonst
Copy link
Contributor

ikonst commented Apr 4, 2020

Would it be possible to make that DescribeTable call unnecessary? Our model definition already describes the table's keys and indices so we can just use them.

Of course DescribeTable gets that same information from the source of truth, but we don't actually take advantage of that to call out discrepancies between the model definition and the table definition: instead, in those cases PynamoDB fails in confusing ways (e.g. if the table defines a key with a different underlying data type vs. our model definition).

I remember @jpinner-lyft was working on removing that DescribeTable call, but I don't think he ever completed that effort.

@shmygol
Copy link
Author

shmygol commented Apr 4, 2020 via email

@ikonst
Copy link
Contributor

ikonst commented Apr 6, 2020

Ah, I missed the fact that you automatically produce a predefined schema in Model, nice.

One thought, though, is that the only reason TableConnection._tables exists is for TableConnection.get_meta_table. Perhaps the concept of storing table schemas could be removed from TableConnection entirely? After all, if in all practical cases it'll be echoing back to the Model class the predefined schema that the Model class supplied it with, perhaps there's no need for this complexity?

@shmygol
Copy link
Author

shmygol commented Apr 15, 2020

Ah, I missed the fact that you automatically produce a predefined schema in Model, nice.

One thought, though, is that the only reason TableConnection._tables exists is for TableConnection.get_meta_table. Perhaps the concept of storing table schemas could be removed from TableConnection entirely? After all, if in all practical cases it'll be echoing back to the Model class the predefined schema that the Model class supplied it with, perhaps there's no need for this complexity?

@ikonst do you mean avoid caching schemas in a class and store the table schema in an instance variable instead (we still have to store the schema somewhere)? I was afraid to introduce decline of performance for clients who use TableConnection without a model for some reason.

But I find your suggestion reasonable and can do it. Here how I see it: TableConnection.predefine_table_schema becomes an instance method and populate an instance variable TableConnection._table. If table schema is not predefined it's fetched lazily in TableConnection.get_meta_table from DynamoDB, but only for this instance.

Does it make sense to you?

@ikonst
Copy link
Contributor

ikonst commented Apr 17, 2020

(Sorry, been very busy lately, will have a look over the weekend)

@ikonst
Copy link
Contributor

ikonst commented Apr 22, 2020

@ikonst do you mean avoid caching schemas in a class and store the table schema in an instance variable instead (we still have to store the schema somewhere)? I was afraid to introduce decline of performance for clients who use TableConnection without a model for some reason.

It's hard for me to imagine TableConnection is used by anyone but the library itself, though with enough users there's someone for every obscure use case :)

If table schema is not predefined it's fetched lazily in TableConnection.get_meta_table from DynamoDB, but only for this instance.

What I'm questioning is why the connection/model needs to be aware of the server's table schema. While it's the source of truth, in many other places we act based on the local schema. The schemas can be out of sync -- for example, server thinks HK is a number (N), locally it's a UnicodeAttribute (S). Do we use server's schema or local schema when serializing and deserializing? From last time I read the code, it's a mix of both, which is probably the most confusing and error prone approach :/

The library defines a local schema that's more extensive than the server-side schema -- we define non-key attributes while DynamoDB defines the keys only, and we have some complex data types on top of DynamoDB's built-in ones, so the server's schema is not sufficient to serialize/deserialize a model; thus, we need the local schema. Because of this, I'd rather see PynamoDB use only the local schema in all serialization/deserialization, in which case DescribeTable should be completely optional in normal operation.

I think there's a good function where DescribeTable would be useful, and that's an (imaginary) "Model.validate_schema" method that would ensure that the local schema overlaps with the server's one (e.g. that we don't mistype indexes, which can create some really confusing error messages... from experience).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants