Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#7308: Interoperability between query compilers #7376

Merged
merged 34 commits into from
Sep 2, 2024

Conversation

arunjose696
Copy link
Collaborator

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Interoperability between DataFrames using different query compilers #7308
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

arunjose696 and others added 28 commits August 26, 2024 16:37
Signed-off-by: arunjose696 <[email protected]>
…rialized_dtypes to query compiler layer as in the code in multiple places the methods of private _modin_frame were used
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Co-authored-by: Iaroslav Igoshev <[email protected]>
Signed-off-by: arunjose696 <[email protected]>
Co-authored-by: Iaroslav Igoshev <[email protected]>
Signed-off-by: arunjose696 <[email protected]>
try:
operation()
# `except` for non callable attributes
except TypeError:

Check notice

Code scanning / CodeQL

Empty except Note test

'except' clause does nothing but pass and there is no explanatory comment.
modin_series_without_index, _ = create_test_series(
np.arange(col_len), data_frame_mode=data_frame_mode_pair[1]
)
modin_df @ modin_series_without_index

Check notice

Code scanning / CodeQL

Statement has no effect Note test

This statement has no effect.
modin/tests/pandas/utils.py Fixed Show fixed Hide fixed
@arunjose696 arunjose696 changed the title FEAT-#7308: Fix inserting datelike values into a DataFrame FEAT-#7308: Interoperability between query compilers Aug 26, 2024
Signed-off-by: arunjose696 <[email protected]>
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
Comment on lines -2996 to -2998
assert (
isinstance(new_query_compiler, type(self._query_compiler))
or type(new_query_compiler) in self._query_compiler.__class__.__bases__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_query_compiler can also be an instance of a different query compiler(Nativequerycompiler or PandasQueryCompiler),

Eg during operations like insert in the _create_or_update_from_compiler the constructor gets called directly eg ,

Thus it would be normal for the cases where constructor may return a new query compiler for a case where user changes the query compiler mode between creating data_frame and insert operation

Comment on lines 1217 to 1219
if data_frame_mode:
actual_data_frame_mode = NativeDataframeMode().get()
NativeDataframeMode().put(data_frame_mode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arunjose696 I would recommend doing this at a higher level, using fixture modify_config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not be possible to use this in the fixture , as the purpose of this is to set two different NativeDataframeMode for 2 different dataframes in the pytest, We dont want to be setting the NativeDataframeMode in the fixture as these dataframes wont be created in the fixture.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case let's create create_test_series_in_defined_mode function in modin/tests/pandas/native_df_mode/utils.py. Something like:

def create_test_series_in_defined_mode(vals, sort=False, backend=None, df_mode=None, **kwargs):
    with context(NativeDataframeMode=df_mode):
        return create_test_series(vals, sort=False, backend=None, **kwargs)

And similar for the dataframes.

The main motivation is not to introduce functionality where it is rarely used. Also, given that there is a separate folder with tests, it seems better to try to localize the necessary changes.

arunjose696 and others added 2 commits August 28, 2024 10:55
Co-authored-by: Iaroslav Igoshev <[email protected]>
Co-authored-by: Anatoly Myachev <[email protected]>
modin_df2, pandas_df2 = create_test_dfs(data, backend=backend, df_mode=df_mode2)
md_kwargs, pd_kwargs = {}, {}

def execute_callable(fn, inplace=False, md_kwargs={}, pd_kwargs={}):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note test

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 2 times, most recently from 94e8e48 to af0a7ba Compare August 28, 2024 10:10
Signed-off-by: arunjose696 <[email protected]>
from modin.core.storage_formats.base.query_compiler import BaseQueryCompiler
from modin.core.storage_formats.pandas.query_compiler_caster import QueryCompilerCaster

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
modin.core.storage_formats.pandas.query_compiler_caster
begins an import cycle.

from pandas.core.indexes.frozen import FrozenList

from modin.core.storage_formats.base.query_compiler import BaseQueryCompiler

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
modin.core.storage_formats.base.query_compiler
begins an import cycle.
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 3 times, most recently from 04b7f60 to 03d137c Compare August 29, 2024 07:15
Copy link
Collaborator

@YarShev YarShev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arunjose696, please take a look at the failing tests.

modin/tests/pandas/native_df_mode/utils.py Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
modin/tests/pandas/utils.py Outdated Show resolved Hide resolved
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 2 times, most recently from 136bf1e to f488872 Compare September 2, 2024 08:03
return staticmethod(apply_argument_cast(obj.__func__))

@functools.wraps(obj)
def cast_args(*args: Tuple, **kwargs: Dict) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arunjose696 doesn't this function break type hints?

Copy link
Collaborator Author

@arunjose696 arunjose696 Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear how this would break type hints, as this function is similar to run_and_log function used in modin logging which would also be wrapping the qc layer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear how this would break type hints, as this function is similar to run_and_log function used in modin logging which would also be wrapping the qc layer

It should be visible in IDE. Do you see problems with it or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
I can see it in IDE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. | Any seems redundant, someone will need to sort this out in the future.

@YarShev YarShev merged commit cf5d638 into modin-project:main Sep 2, 2024
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interoperability between DataFrames using different query compilers
3 participants