Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libonx 2024 win private #113

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

hmaarrfk
Copy link
Contributor

I'm trying a slightly different patch for linking to windows. I think it might actually also be useful for linux maybe..

This patch might be upstreamable

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

if(ONNX_USE_LITE_PROTO)
if(TARGET protobuf::libprotobuf-lite)
- target_link_libraries(onnx_proto PUBLIC protobuf::libprotobuf-lite PRIVATE ${protobuf_ABSL_USED_TARGETS})
+ target_link_libraries(onnx_proto PRIVATE protobuf::libprotobuf-lite PRIVATE ${protobuf_ABSL_USED_TARGETS})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari do you understand why the error:

ld: illegal thread local variable reference to regular symbol __ZN6google8protobuf8internal15ThreadSafeArena13thread_cache_E for architecture x86_64

might appear on OSX after I added this patch linking with PRIVATE instead of PUBLIC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First guess is that the symbol appears somewhere in the ABI of onnx (or even the API); plus making protobuf private hides the symbol from the linker so it fails when it's forced to search for it.

In other words, if the protobuf usage were truly private to onnx_proto, this should work. Either it explicitly isn't private, or it's a bug in the privateness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so maybe link onnx to proton if as well.

@hmaarrfk
Copy link
Contributor Author

TODO: create patches for updated cmake file and summary file that use the new find_python stuff

CMakeLists.txt
summary.cmake.txt

- message(STATUS " Python includes : ${PYTHON_INCLUDE_DIR}")
+ message(STATUS " Python version : ${Python_VERSION}")
+ message(STATUS " Python executable : ${Python_EXECUTABLE}")
+ message(STATUS " Python includes : ${Python_INCLUDE_DIR}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ message(STATUS " Python includes : ${Python_INCLUDE_DIR}")
+ message(STATUS " Python includes : ${Python_INCLUDE_DIRS}")

@hmaarrfk
Copy link
Contributor Author

So close and yet so far on windows….

@h-vetinari
Copy link
Member

So close and yet so far on windows….

You get

ImportError: DLL load failed while importing onnx_cpp2py_export: The specified module could not be found.

if a symbol cannot be found by the linker - unfortunately without much context.

Based on

  -- Installing: D:/bld/onnx-split_1710114664126/_h_env/Library/lib/onnx.lib
  -- Installing: D:/bld/onnx-split_1710114664126/_h_env/Library/lib/onnx_proto.lib

it seems you're getting static builds. I wouldn't be surprised if that doesn't work somehow; normally missing symbols should show up during the build phase already, but with the switch to private (why?) and using protobuf-lite (why?)1, you have two steps that are reducing the amount of exposed symbols, and thus increasing the surface for the kind of import error you're seeing.

Footnotes

  1. libprotobuf-lite is a subset of libprotobuf; they're marked as conflicting in CMake metadata; we probably shouldn't ship both in the same library in conda-forge, but haven't untangled this. Using -lite with our current packages is pointless, because we'll always get a run-export on the full libprotobuf anyway.

recipe/bld_onnx.bat Outdated Show resolved Hide resolved
@hmaarrfk
Copy link
Contributor Author

but with the switch to private (why?)

Don't you want to link privately so that you don't export protobuf's symbols onto onnx-proto?

using protobuf-lite (why?)

mistake, or hold over.

I'll look at this again next weekend.

@h-vetinari
Copy link
Member

Don't you want to link privately so that you don't export protobuf's symbols onto onnx-proto?

Sure, private is cleaner, but if a consumer of onnx ends up using symbols from libonnx (which is hard-required by onnx), the fallout is likely very limited already. So I don't really care all that much about that it should "morally" be private, at least once that distinction becomes troublesome. For grpc we had the incentive to avoid some extra run-exports at least, but here it all seems to be happening within the same stack of builds.

@h-vetinari
Copy link
Member

OK so I see now that this is about protobuf symbols in onnx_proto, and not libonnx symbols in onnx, sorry. However, given that the source code seems to have explicitly chose PUBLIC linkage for protobuf (as abseil is linked privately on the same line), there's a real possibility that protobuf symbols are used in the API of onnx.

If that's the case you won't ever be able to get around linking protobuf. Also, since we're now forced to do full rebuilds for every libprotobuf version anyway (down to the patch level...), there's no real distinction here anyway. Everything gets built on a consistent libprotobuf version anyway, so who cares if an onnx-consumer links it too... 🤷

@hmaarrfk
Copy link
Contributor Author

If that's the case you won't ever be able to get around linking protobuf.

linking to protobuf is one thing. But having the symbols exposed through two dlls is problematic.

@hmaarrfk
Copy link
Contributor Author

dlls is problematic

at least it is on linux.

@h-vetinari
Copy link
Member

linking to protobuf is one thing. But having the symbols exposed through two dlls is problematic.

PUBLIC linkage is not the same as exposing the symbols. AFAIU it just means anything in the library linkage surface needs to be available when linking said library.

@hmaarrfk
Copy link
Contributor Author

PUBLIC linkage is not the same as exposing the symbols. AFAIU it just means anything in the library linkage surface needs

So this is equvalent for linux, but not for windows?

@h-vetinari
Copy link
Member

I can't comment as confidently on the linux side, but even there I'd be surprised if - for libA linking to libB - B's symbols get re-exported (which usually takes explicit steps). But the publicly-linked libB will of course be necessary on the linker (and include) paths for consuming libA, and libB's symbols will be searched if used (and if they weren't, why link to A to B in the first place...). But I don't see why they should be duplicated in libA.

In any case, I recommend rereading this

.azure-pipelines/azure-pipelines-linux.yml Outdated Show resolved Hide resolved
Comment on lines +21 to +22
outputs:
- name: libonnx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember ever seeing you voluntarily turn something into a multi-output recipe 😜

But looks like a nice improvement to me in general 💪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh troubleshooting is terrible. don't get me wrong.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants