Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast paths of common methods assume no builtin subclassing #4490

Open
ChayimFriedman2 opened this issue Aug 25, 2024 · 2 comments
Open

Fast paths of common methods assume no builtin subclassing #4490

ChayimFriedman2 opened this issue Aug 25, 2024 · 2 comments
Labels

Comments

@ChayimFriedman2
Copy link
Contributor

Bug Description

Many implementations of common protocols for builtin Python types, perform a direct access to the relevant fields in the Python object descriptor, without calling into Python C APIs.

This is good for performance, but it actually has a problem for correctness: if those builtins are subclassed, the subclass can be passed as Bound<Builtin> (or other smart pointers), but the fields won't be updated - a call to the method implementing the protocol in Python will be needed (if it is overridden), meaning that calls to PyO3 functions will return an incorrect result.

Steps to Reproduce

Here's an example to demonstrate the problem with PyTuple and len(), but the problem exists with more methods and in more types:

use pyo3::prelude::*;
use pyo3::types::{IntoPyDict, PyTuple};

fn main() {
    Python::with_gil(|py| {
        let from_code = PyModule::from_code(
            py,
            cr#"
class my_tuple(tuple):
    def __len__(self): return 42
"#,
            c"my_tuple.py",
            c"my_tuple",
        );
        let m = from_code.unwrap();
        let my_tuple = m.getattr("my_tuple").unwrap();
        let instance = my_tuple.call1(((1, 2, 3),)).unwrap();
        let python_result = py
            .eval(
                c"len(instance)",
                None,
                Some(&[("instance", &instance)].into_py_dict(py)),
            )
            .unwrap()
            .extract::<usize>()
            .unwrap();
        assert_eq!(python_result, 42);
        let py_any_result = instance.len().unwrap();
        assert_eq!(py_any_result, 42);
        let tuple = instance.downcast::<PyTuple>().unwrap();
        let py_tuple_result = tuple.len();
        assert_eq!(py_tuple_result, 3); // Oops!
    });
}

Not only the result is incorrect, it is also inconsistent with the result we get from PyO3 itself for PyAny.

Backtrace

No response

Your operating system and version

Windows 11

Your Python version (python --version)

3.12.0

Your Rust version (rustc --version)

rustc 1.80.1 (3f5fd8dd4 2024-08-06)

Your PyO3 version

Tip of git (but also reproduces with latest published version)

How did you install python? Did you use a virtualenv?

The official website.

Additional Info

Fixing this bug without harming performance is difficult and may prove impossible; I am filling this bug so we will be aware at least.

If we do decide it's worth fixing, a potential path will be to differentiate in the type system between Bound<PyClass> and Bound<Exact<PyClass>>. Making the latter deref to the former, downcast() return the former but downcast_exact() return the latter, only specialize the latter, and tell people that care about micro-performance to use the latter (if they don't care about subclasses).

@birkenfeld
Copy link
Member

As a first reaction, this is the same as working with C extensions that use the APIs that PyO3 uses. So at the very least it's not a PyO3 specific bug/incompatibility. And most advanced Python programmers subclassing builtin types are already sensitized to issues arising there (another is that using e.g. a builtin method on a str subclass yields a regular str).

@davidhewitt
Copy link
Member

I am inclined to agree with @birkenfeld though at the same time I would love for us to not have this surprise factor. The idea of Bound<Exact<PyDict>> could work, maybe another option is to have a Bound<PyDict, Exact> second generic argument which can default to Lax. We could probably even do this without too much backwards-compatibility worry, because Bound<Exact<PyDict>> and Bound<PyDict> could both implement PyDictMethods (with different API calls under the hood.)

There seems to me to be a little bit of overlap with #3226 and the idea of splitting strict / lax function argument inputs too, perhaps.

I'm not entirely keen to complicate the API in this manner though perhaps this will be the right step in the long run. I think the GIL Refs migration and upcoming introduction of IntoPyObject is already creating a lot of churn, so I'd be glad to let things settle down a bit before we make any moves too quickly here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants