Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing Connection in Model Class to Reduce Initial Operation Time and Memory Usage #1245

Open
lymmurrain opened this issue Jun 4, 2024 · 0 comments

Comments

@lymmurrain
Copy link

lymmurrain commented Jun 4, 2024

Problem Statement

Currently, each table class inheriting from Model uses its own TableConnection instance, which has its own low-level API Connection. I have 20 table classes inheriting from Model. When using these with Lambda, I found that each table using its own Connection leads to two problems:

  1. Initial Connection Time: Each table requires instantiating and using a Connection for the first time, which takes about 0.6 seconds. If an interface calls multiple tables for data aggregation during the first call, it results in significant waiting time.
  2. Memory Usage: Memory usage increases with the number of table classes used. This issue memory usage growing as we add models #876 (comment) raises the same problem. I conducted a test where 20 table classes were used simultaneously, resulting in a memory usage of 216 MB. When all table classes shared a single connection, memory usage was 78 MB. This means each connection uses about 7 MB of memory. I will provide the test code below to ensure that my testing method did not lead to incorrect conclusions.

My primary execution environment is AWS Lambda, where the cost is related to both memory usage and execution time. Therefore, these two issues nearly double the execution cost of my Lambda functions.

I am considering whether all table classes can share a single low-level API Connection to address these issues.

Solution

My approach focuses on the __init__ method of TableConnection. As each table class has its own TableConnection instance, and each TableConnection has its own low-level API Connection, we can specify a shared Connection during TableConnection initialization.

The original TableConnection initialization is as follows:

class TableConnection:

    def __init__(
        self,
        table_name: str,
        region: Optional[str] = None,
        host: Optional[str] = None,
        connect_timeout_seconds: Optional[float] = None,
        read_timeout_seconds: Optional[float] = None,
        max_retry_attempts: Optional[int] = None,
        base_backoff_ms: Optional[int] = None,
        max_pool_connections: Optional[int] = None,
        extra_headers: Optional[Mapping[str, str]] = None,
        aws_access_key_id: Optional[str] = None,
        aws_secret_access_key: Optional[str] = None,
        aws_session_token: Optional[str] = None,
        *,
        meta_table: Optional[MetaTable] = None,
    ) -> None:
        self.table_name = table_name
        self.connection = Connection(region=region,
                                     host=host,
                                     connect_timeout_seconds=connect_timeout_seconds,
                                     read_timeout_seconds=read_timeout_seconds,
                                     max_retry_attempts=max_retry_attempts,
                                     base_backoff_ms=base_backoff_ms,
                                     max_pool_connections=max_pool_connections,
                                     extra_headers=extra_headers,
                                     aws_access_key_id=aws_access_key_id,
                                     aws_secret_access_key=aws_secret_access_key,
                                     aws_session_token=aws_session_token)

        if meta_table is not None:
            self.connection.add_meta_table(meta_table)

I currently use a patch to solve this problem. The code is as follows:

def _patch_pynamodb_connection(region):
    # This function needs to be called globally once
    from pynamodb.connection import TableConnection,Connection
    from typing import Any, Mapping, Optional
    connection = Connection(region)
    def patch_init(
            self,
            table_name: str,
            region: Optional[str] = None,
            host: Optional[str] = None,
            connect_timeout_seconds: Optional[float] = None,
            read_timeout_seconds: Optional[float] = None,
            max_retry_attempts: Optional[int] = None,
            base_backoff_ms: Optional[int] = None,
            max_pool_connections: Optional[int] = None,
            extra_headers: Optional[Mapping[str, str]] = None,
            aws_access_key_id: Optional[str] = None,
            aws_secret_access_key: Optional[str] = None,
            aws_session_token: Optional[str] = None,
            *,
            meta_table: Optional[Any] = None,
    ) -> None:
        self.table_name = table_name
        self.connection = connection

        if meta_table is not None:
            self.connection.add_meta_table(meta_table)
    TableConnection.__init__ = patch_init

This function needs to be called globally once.

However, I recommend directly supporting a connection attribute in the Meta class of pynamodb table classes. The logic is as follows:

specfic_connection = Connection(region)

class Example(Model):

    class Meta:
        table_name = 'Example'
        region = region
        ...
        connection = specfic_connection 

# /pynamodb/connection/table.py
class TableConnection:
    def __init__(
        self,
        table_name: str,
        region: Optional[str] = None,
        host: Optional[str] = None,
        connect_timeout_seconds: Optional[float] = None,
        read_timeout_seconds: Optional[float] = None,
        max_retry_attempts: Optional[int] = None,
        base_backoff_ms: Optional[int] = None,
        max_pool_connections: Optional[int] = None,
        extra_headers: Optional[Mapping[str, str]] = None,
        aws_access_key_id: Optional[str] = None,
        aws_secret_access_key: Optional[str] = None,
        aws_session_token: Optional[str] = None,
        connection: Optional[Connection] = None, # Specify connection, use default if not provided
        *,
        meta_table: Optional[MetaTable] = None,
    ) -> None:
        self.table_name = table_name
        if connection is not None:
            self.connection = connection
        else:
            self.connection = Connection(region=region,
                                         host=host,
                                         connect_timeout_seconds=connect_timeout_seconds,
                                         read_timeout_seconds=read_timeout_seconds,
                                         max_retry_attempts=max_retry_attempts,
                                         base_backoff_ms=base_backoff_ms,
                                         max_pool_connections=max_pool_connections,
                                         extra_headers=extra_headers,
                                         aws_access_key_id=aws_access_key_id,
                                         aws_secret_access_key=aws_secret_access_key,
                                         aws_session_token=aws_session_token)

        if meta_table is not None:
            self.connection.add_meta_table(meta_table)onnection.add_meta_table(meta_table)

Additionally, the MetaProtocol should include a connection attribute, and the Model class's _get_connection class method should pass cls.Meta.connection when instantiating TableConnection.

Testing

The test program runs on AWS Lambda.

The test code is as follows:

def test_connection():
    all_custom_model = list(get_all_models())
    result = {}
    sum_time = 0
    for one_model in all_custom_model:
        t1 = time.time()
        one_model.exists()
        t2 = time.time()
        result[one_model.Meta.table_name] = t2 - t1
        sum_time += t2 - t1
    result['sum_time'] = sum_time
    return result

The following is the output when running directly:

START RequestId: 8dc9e79d-e2ed-4a4f-9759-a9476b911f74 Version: $LATEST
[INFO] - 2024-06-03 07:49:59,182 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:49:59,701 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:00,202 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:00,580 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:00,961 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:01,321 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:02,099 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:02,462 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:02,842 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:03,202 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:03,600 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:04,582 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:04,960 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:05,342 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:05,721 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:06,940 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:07,302 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:07,681 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:08,060 : Found credentials in environment variables.
[INFO] - 2024-06-03 07:50:08,422 : Found credentials in environment variables.
{'table_1': 0.5074574947357178, 'table_2': 0.4994988441467285, 'table_3': 0.38129639625549316, 'table_4': 0.377758264541626, 'table_5': 0.3798940181732178, 'table_6': 0.76141357421875, 'table_7': 0.35941481590270996, 'table_8': 0.39995408058166504, 'table_9': 0.36011552810668945, 'table_10': 0.3620445728302002, 'table_11': 1.0177617073059082, 'table_12': 0.360119104385376, 'table_13': 0.4000821113586426, 'table_14': 0.3615438938140869, 'table_15': 1.2000653743743896, 'table_16': 0.3983919620513916, 'table_17': 0.3790755271911621, 'table_18': 0.3429124355316162, 'table_19': 0.3988931179046631, 'table_20': 1.4980473518371582, 'sum_time': 10.745740175247192, 'table_num': 20}
cost time: 10.746082544326782
END RequestId: 8dc9e79d-e2ed-4a4f-9759-a9476b911f74
REPORT RequestId: 8dc9e79d-e2ed-4a4f-9759-a9476b911f74    Duration: 10766.41 ms    Billed Duration: 10767 ms    Memory Size: 300 MB    Max Memory Used: 216 MB    Init Duration: 450.61 ms

The output after calling _patch_pynamodb_connection:

START RequestId: d2a05b75-8e5d-4790-ac50-7dcd94f1015a Version: $LATEST
[INFO] - 2024-06-03 07:52:50,273 : Found credentials in environment variables.
{'table_1': 0.555250883102417, 'table_2': 0.0002193450927734375, 'table_3': 8.821487426757812e-05, 'table_4': 0.00013756752014160156, 'table_5': 8.320808410644531e-05, 'table_6': 0.00011301040649414062, 'table_7': 7.700920104980469e-05, 'table_8': 0.00010442733764648438, 'table_9': 7.915496826171875e-05, 'table_10': 0.00011444091796875, 'table_11': 7.534027099609375e-05, 'table_12': 0.00010156631469726562, 'table_13': 0.00011038780212402344, 'table_14': 7.677078247070312e-05, 'table_15': 7.176399230957031e-05, 'table_16': 7.677078247070312e-05, 'table_17': 7.224082946777344e-05, 'table_18': 0.00010085105895996094, 'table_19': 0.00012922286987304688, 'table_20': 7.486343383789062e-05, 'sum_time': 0.557157039642334, 'table_num': 20}
cost time: 0.5574638843536377
END RequestId: d2a05b75-8e5d-4790-ac50-7dcd94f1015a
REPORT RequestId: d2a05b75-8e5d-4790-ac50-7dcd94f1015a    Duration: 559.39 ms    Billed Duration: 560 ms    Memory Size: 300 MB    Max Memory Used: 78 MB    Init Duration: 517.43 ms

We can see that the results are significant, with both time and memory usage being reduced.

However, I am not entirely sure if this approach might cause some logical issues, which is why I am raising this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant