Make it easier to create python stub files #3268

chadrik · 2024-09-03T01:11:09Z

This PR makes it easier to create python stub files, which are used for IDE completion and static type analysis. The direct benefactors of this change are a very niche group, but we're a group that support the community by providing tools that make developing with OpenUSD a more productive and pleasurable experience.

I'm using these changes to create stubs which are published to pypi here. You can peruse the stub files here. For the past year or two I've had to maintain my own fork of OpenUSD to make this possible.

In order to sell you on this change, I'm going to demonstrate that the function signatures in the stub files that I'm creating are superior to those included with docstrings that ship with OpenUSD, and I'd like to plant the idea that this PR is the first step towards an eventual integration of stub generation into the native OpenUSD build process.

Background

As someone who wants very accurate python stubs, I've faced a handful of challenges:

Boost does not have native support for generating stubs. In 2019 I created a ticket to add stub generation support to Boost, and while it's gotten some +1's there is still zero movement. There seems to be little hope of side-stepping this issue unless OpenUSD plans to move to pybind11, which apparently does have stubgen support.
Using the C++ Doxygen documentation alone does not provide enough information to generate highly accurate stubs. The actual number of python overloads often differs from the number of C++ overloads. A cause of this and numerous other hard-to-solve idiosyncracies is that there are many manual adjustments made in the boost wrappers within this project.
Boost does have the ability to generate docstrings with python function signatures, but they are incomplete: many types are simply refered to as object. However, it does give us the actual number of python overloads, and is reliable about certain result types like list and tuple, and this extra information can help fill the gap between live inspection of python objects and the rich information in the C++ docs.

Comparison between python signatures: yours (native) vs mine (cg-stubs)

Overview

As part of its build process, OpenUSD generates docstrings in side-car python files named __DOC.py, which are loaded at import time. In order to make use of these docstrings within your IDE, it must be configured to be able to import the OpenUSD python modules to inspect them.

By comparison, pyi stub files are a lightweight approach to load not only function and class documentation into your IDE but also type information, which is informative, and is used to drive code completion. Additionally, stub files can be used by static type analysis tools like mypy to validate your code. Providing access to stub files within your IDE is typically as simple as running pip install types-usd within your project's virtualenv.

Quality

This is the docstring that OpenUSD generates for the method Usd.ClipsAPI.GetClipAssetPaths:

GetClipAssetPaths(assetPaths, clipSet) -> bool

assetPaths : VtArray[AssetPath]
clipSet : str


List of asset paths to the clips in the clip set named C{clipSet}.


This list is unordered, but elements in this list are referred to by
index in other clip-related fields.


----------------------------------------------------------------------
GetClipAssetPaths(assetPaths) -> bool

assetPaths : VtArray[AssetPath]


This is an overloaded member function, provided for convenience. It
differs from the above function only in what argument(s) it accepts.
This function operates on the default clip set.



UsdClipsAPISetNames

These are my stubs generated for the same methods:

class ClipsAPI(APISchemaBase):
    ...
    @overload
    def GetClipAssetPaths(self, clipSet: str | pxr.Ar.ResolvedPath) -> list[pxr.Sdf.AssetPath]:
        """
        List of asset paths to the clips in the clip set named C{clipSet}.


        This list is unordered, but elements in this list are referred to by
        index in other clip-related fields.
        """
    @overload
    def GetClipAssetPaths(self) -> list[pxr.Sdf.AssetPath]:
        """
        This is an overloaded member function, provided for convenience. It
        differs from the above function only in what argument(s) it accepts.
        This function operates on the default clip set.



        UsdClipsAPISetNames
        """

You'll notice that the docstrings present in yours and mine are equivalent (and share the same whitespace idiosyncracies) because I'm using the same doxygenlib modules to process docstrings.

The primary difference is in the overload signatures. Here's a more direct side-by-side comparsion:

First overload

# yours
def GetClipAssetPaths(assetPaths: VtArray[AssetPath], clipSet: str) -> bool: ...
# mine
def GetClipAssetPaths(self, clipSet: str | pxr.Ar.ResolvedPath) -> list[pxr.Sdf.AssetPath]: ...

Second overload

# yours
def GetClipAssetPaths(assetPaths: VtArray[AssetPath]) -> bool: ...
# mine
def GetClipAssetPaths(self) -> list[pxr.Sdf.AssetPath]: ...

tl;dr The native signatures are wrong in many ways. Below is a table explaining the sources of these differences:

OpenUSD	cg-stubs
number of overloads may not be accurate	number of overloads always matches those in python
doesn't handle pointers that become results	inspects C++ signatures for pointer arguments and converts them to python result when the the boost signatures indicate a match
only converts `std::vector` and `std::sequence`	also converts `std::set`, `std::unordered_set`, `std::function`, `std::map`, `std::unordered_map`, `std::optional`
doesn't handle implicitly convertible types like `Ar.ResolvedPath`	scans headers for `typedef` and `using` statements to substitute these aliases for their actual types
intended for documentation reference	intended for documentation reference, IDE code completion, and static type analysis
does not include `self` or `cls` args	adds `self` and `cls` args for methods
not validated	validated against actual python code
only visits functions which have matches in Doxygen	processes every function in every mdoule

Ok, I'm convinced. What next?

Here's my proposed 3-step plan:

Step 1

Merge this PR.

Step 2

I will contribute type annotations to several open source project which I can use to further validate my stubs, and continue to broadcast the existence of these stubs far and wide to drive adoption. I've already created a PR for the great USD Cookbook repo. Ideally, if Pixar is open to it, I would like to contribute type annotations to OpenUSD itself, for example in Usdviewq. To ease into this, I have a followup to this PR to add annotations to doxygenlib.

Step 3

If we are all in agreement, then the final step would be to move the pyi stub generation process into OpenUSD itself. I'm a pretty busy guy and moving the project into OpenUSD would ensure that it undergoes more regular releases than I can manage. I will be around to help out with maintenance of the generator.

Q & A

What has changed under the hood?

There are two main changes in this PR, both limited to the build process:

OpenUSD python bindings are built with the boost option to include boost-generated signatures.
doxygenlib is updated to hide these signatures from users.

Tf.PreparePythonModule overrides function __doc__ attributes with values from __DOC.py. This behavior has remained unchanged. What has changed is the functions that are included in __DOC.py files during the build process. Prior to this change, cdWriterDocstring omitted entries in __DOC.py for functions which had a docstring provided by Boost, because an existing docstring indicated that it had been manually authored in the C++ wrapper.

After this PR, all functions have docstrings generated by Boost (because of the new Boost signatures), so we have to do a little extra work to determine if the Boost docstrings are just the auto-generated python signatures, or custom docstrings from the wrappers. Now when cdWriterDocstring processes an existing docstring, it removes the Boost signatures and if there's anything left it writes it to __DOC.py.

The upshot is that the additional docstrings make the python lib directory slightly larger, 25MB, or about 0.6% to a full OpenUSD build:

before: 4141 MB
after:  4166 MB

What are the differences for the user?

As mentioned, I've taken great pains to preserve the original behavior as much as possible, however, there are some edge cases.

Functions for which cdWriterDocstring found a C++ match in doxygen will remain the same, because they will recieve an entry in __DOC.py. However, there are functions which cdWriterDocstring never visits and which therefore do not receive an override in __DOC.py. These functions will now have docstrings generated by Boost.

For example, before this PR, the following function had no docstring:

$ python3.9 -c "import pxr.Sdf as Sdf;print(Sdf.ListEditorProxy_SdfNameKeyPolicy.ContainsItemEdit.__doc__)"
None

After this PR, it has a docstring generated by Boost:

$ python3.9 -c "import pxr.Sdf as Sdf;print(Sdf.ListEditorProxy_SdfNameKeyPolicy.ContainsItemEdit.__doc__)"
ContainsItemEdit( (ListEditorProxy_SdfNameKeyPolicy)arg1, (object)item [, (bool)onlyAddOrExplicit=False]) -> bool

Why enable the boost signatures by default?

If we follow through with the 3-step plan above, building with signatures will need to be the default, because we will not want to build one version of OpenUSD with signatures enabled to create the stubs, and then rebuild with signatures disabled.

Why not provide an option to enable the Boost signatures?

I'm open to doing this, but adding an option to enable the Boost signatures adds complexity to the doxygenlib code because it needs to produce parity in output with both scenarios.

I know this is a lot to read, but I thought I'd front load all of the context necessary to make a decision on this PR. Take your time and ask as many questions as you like.

Thanks!

…t signatures are present

jesschimein · 2024-09-03T15:55:44Z

Filed as internal issue #USD-10058

jesschimein · 2024-09-03T15:55:45Z

/AzurePipelines run

azure-pipelines · 2024-09-03T15:55:56Z

Azure Pipelines successfully started running 1 pipeline(s).

spiffmon · 2024-10-23T22:36:26Z

Hi @chadrik - sorry for the delay; we finally had a chance to talk this over, and we like the idea of at least your step #1 at this point. We're going to do a little more testing for build-time impact, and we're pretty sure the new pxr_boost should support this, though if you had a chance to rebase against current dev (or 24.11) to verify, that would be awesome also!

chadrik added 3 commits September 2, 2024 18:02

Inject boost-python signatures into function docstrings.

6ac61dd

doxygenlib: ensure that generated docstrings remain correct when boos…

7ee47d9

…t signatures are present

Add option to disable loading docstrings

a473d76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to create python stub files #3268

Make it easier to create python stub files #3268

chadrik commented Sep 3, 2024 •

edited

Loading

jesschimein commented Sep 3, 2024

jesschimein commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

spiffmon commented Oct 23, 2024

Make it easier to create python stub files #3268

Are you sure you want to change the base?

Make it easier to create python stub files #3268

Conversation

chadrik commented Sep 3, 2024 • edited Loading

Background

Comparison between python signatures: yours (native) vs mine (cg-stubs)

Overview

Quality

Ok, I'm convinced. What next?

Step 1

Step 2

Step 3

Q & A

What has changed under the hood?

What are the differences for the user?

Why enable the boost signatures by default?

Why not provide an option to enable the Boost signatures?

jesschimein commented Sep 3, 2024

jesschimein commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

spiffmon commented Oct 23, 2024

chadrik commented Sep 3, 2024 •

edited

Loading