Fix core utils #1967

ternaus · 2024-10-05T00:43:21Z

Fixes #1958

Summary by Sourcery

Fix numerical label handling in the LabelEncoder class, refactor label processing logic, enhance utility functions for better type handling, and update tests to cover new scenarios. Update pre-commit configuration to the latest Ruff version.

Bug Fixes:

Fix the handling of numerical labels in the LabelEncoder class to correctly process and transform numerical data without encoding.

Enhancements:

Refactor the label processing logic by introducing helper methods for validation, encoding, and decoding of label fields.
Improve the to_tuple function to handle various input types and apply optional low bounds or biases, with added overloads for type safety.
Enhance the create_symmetric_range function with overloads for better type handling of integer and float inputs.

Tests:

Add tests for the LabelEncoder to ensure correct handling of numpy arrays and 2D arrays, verifying the shape and content of encoded and decoded labels.

Chores:

Update the pre-commit configuration to use Ruff version v0.6.9.

sourcery-ai · 2024-10-05T00:43:26Z

Reviewer's Guide by Sourcery

This pull request implements several improvements and bug fixes to the core utils of the Albumentations library, focusing on enhancing the LabelEncoder class and the to_tuple function. The changes aim to improve type handling, add support for numerical labels, and refactor code for better maintainability.

Updated class diagram for LabelEncoder

classDiagram
    class LabelEncoder {
        - classes_: dict[str | int | float, int]
        - inverse_classes_: dict[int, str | int | float]
        - num_classes: int
        - is_numerical: bool
        + fit(y: Sequence[Any] | np.ndarray) LabelEncoder
        + transform(y: Sequence[Any] | np.ndarray) np.ndarray
        + fit_transform(y: Sequence[Any] | np.ndarray) np.ndarray
        + inverse_transform(y: Sequence[Any] | np.ndarray) np.ndarray
    }

Updated class diagram for DataProcessor

classDiagram
    class DataProcessor {
        - data_fields: list[str]
        - label_encoders: dict[str, dict[str, LabelEncoder]]
        - is_sequence_input: dict[str, bool]
        - is_numerical_label: dict[str, dict[str, bool]]
        + add_label_fields_to_data(data: dict[str, Any]) dict[str, Any]
        + remove_label_fields_from_data(data: dict[str, Any]) dict[str, Any]
        + _process_label_fields(data: dict[str, Any], data_name: str) np.ndarray
        + _validate_label_field_length(data: dict[str, Any], data_name: str, label_field: str) void
        + _encode_label_field(data: dict[str, Any], data_name: str, label_field: str) np.ndarray
        + _handle_empty_data_array(data: dict[str, Any]) void
        + _remove_label_fields(data: dict[str, Any], data_name: str) void
        + _decode_label_field(data_name: str, label_field: str, encoded_labels: np.ndarray) np.ndarray
    }

Updated class diagram for to_tuple function

classDiagram
    class to_tuple {
        + validate_args(low: ScaleType | None, bias: ScalarType | None) void
        + process_sequence(param: Sequence[ScalarType]) tuple[ScalarType, ScalarType]
        + process_scalar(param: ScalarType, low: ScalarType | None) tuple[ScalarType, ScalarType]
        + apply_bias(min_val: ScalarType, max_val: ScalarType, bias: ScalarType) tuple[ScalarType, ScalarType]
        + ensure_int_output(min_val: ScalarType, max_val: ScalarType, param: ScalarType) tuple[int, int] | tuple[float, float]
        + to_tuple(param: ScaleType, low: ScaleType | None = None, bias: ScalarType | None = None) tuple[int, int] | tuple[float, float]
    }

File-Level Changes

Change	Details	Files
Enhance LabelEncoder to handle numerical labels	Add is_numerical flag to determine if labels are numerical Modify fit method to handle numerical labels Update transform and inverse_transform methods to process numerical labels Add support for numpy array inputs	`albumentations/core/utils.py` `tests/test_core_utils.py`
Refactor and improve to_tuple function	Add type hints and overloads for better type checking Improve error handling and input validation Split functionality into smaller, more focused functions Enhance documentation with examples and more detailed explanations	`albumentations/core/utils.py` `albumentations/core/pydantic.py`
Update type definitions and imports	Add new type definitions for ScaleFloatType and ScaleIntType Update imports to include new types and functions Add overloads for create_symmetric_range function	`albumentations/core/types.py` `albumentations/core/pydantic.py`
Enhance test coverage	Add new test cases for LabelEncoder with numpy arrays Include tests for 2D array inputs Update existing tests to cover new functionality	`tests/test_core_utils.py`
Update dependencies	Update ruff-pre-commit from v0.6.8 to v0.6.9	`.pre-commit-config.yaml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @ternaus - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 4 issues found
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

tests/test_core_utils.py

albumentations/core/utils.py

ternaus added 6 commits October 3, 2024 19:56

Empty-Commit

6905689

Cleanup

77d52a5

Updated to_tuple

2a1bbb4

Fix in LabelEncoder

4b0a805

Do not encode numerical labels

404f4d3

Fix in encode labels

1ce519c

sourcery-ai bot reviewed Oct 5, 2024

View reviewed changes

ternaus added 2 commits October 6, 2024 13:43

Fix for mixed labels

a1070fe

Cleanup

ab38b6c

ternaus merged commit b8648ff into main Oct 6, 2024
17 checks passed

ternaus deleted the fix_core_utils branch October 6, 2024 21:08

ternaus mentioned this pull request Oct 6, 2024

TypeError with v1.4.15 and newer for masks augmentation #1958

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix core utils #1967

Fix core utils #1967

ternaus commented Oct 5, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 5, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Fix core utils #1967

Fix core utils #1967

Conversation

ternaus commented Oct 5, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Oct 5, 2024 • edited Loading

Reviewer's Guide by Sourcery

Updated class diagram for LabelEncoder

Updated class diagram for DataProcessor

Updated class diagram for to_tuple function

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

ternaus commented Oct 5, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 5, 2024 •

edited

Loading