-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40343: [C++] Move S3FileSystem to the registry #41559
base: main
Are you sure you want to change the base?
Conversation
589e7d7
to
9cc73fc
Compare
7c7a277
to
970eaa4
Compare
970eaa4
to
976a912
Compare
976a912
to
39e2e0e
Compare
@@ -18,6 +18,7 @@ | |||
#include "arrow/adapters/orc/util.h" | |||
|
|||
#include <cmath> | |||
#include <sstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in GetOrcMajorVersion()
, so it should be #included here by IWYU. I found it because compilation failed at one point (compilation can spuriously succeed when standard headers include more definitions than the standard guarantees; for example, inclusion of only <string>
or <iostream>
might fully define std::stringstream
rather than only forward declaring it)
39e2e0e
to
f8f8f2c
Compare
f8f8f2c
to
130be67
Compare
target_link_libraries(arrow_s3fs PRIVATE ${AWSSDK_LINK_LIBRARIES} arrow_shared) | ||
set_source_files_properties(filesystem/s3fs.cc filesystem/s3fs_module.cc | ||
PROPERTIES SKIP_PRECOMPILE_HEADERS ON | ||
SKIP_UNITY_BUILD_INCLUSION ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we remove the linking of the main arrow library against AWSSDK_LINK_LIBRARIES
when ARROW_S3_MODULE
is ON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to require the main arrow library to load the module. After the module can reproduce the full functionality of the S3FileSystem (S3ProxyOptions, for example) we can deprecate building s3 in the main library. That's not in scope for now, though
EXTRA_LINK_LIBS | ||
Boost::filesystem | ||
Boost::system) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@assignUser does the auto-formatter for CMake has the ability to indent these?
SOURCES
s3fs_module_test.cc
s3_test_util.cc
EXTRA_LABELS
filesystem
DEFINITIONS
ARROW_S3_LIBPATH="$<TARGET_FILE:arrow_s3fs>"
EXTRA_LINK_LIBS
Boost::filesystem
Boost::system)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we could, but it'd involve declaring each function like add_arrow_library
to additional_commands
https://cmake-format.readthedocs.io/en/latest/configuration.html#configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed that ping. But yes, that is possible as Ben says, I have done something similar for velox custom functions. But currently I am a little concerned that cmake-format seems to be unmaintained https://github.com/cheshirekow/cmake_format/issues/340
uri = "s3://" + options().GetAccessKey() + ":" + options().GetSecretKey() + "@" + | ||
uri.substr("file:///"s.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be in URIs? Has this always been the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been in place since S3 URIs were supported, #6403
The URI is intended to be a fully self contained initializer for a filesystem, so if the filesystem requires secrets for initialization then the URI must contain them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still recommended practice? It seems Azure recommends not doing this to prevent accidental exposure of credentials in logs etc. https://lists.apache.org/thread/mhwr2lvtxvjcqos12k7hr4cqkdofrxxo
I don't know anything about S3 auth though so 🤷 (in any case probably something for a follow up)
@pitrou do you want to merge this one? |
Rationale for this change
Moving S3 to a separate module will allow separate build the module for use by builds of arrow which did not include S3 funcionality.
What changes are included in this PR?
Are these changes tested?
A test for just the modularization is added which unregisters built in factories. This way we're definitely testing loading new factories from module libraries.
Are there any user-facing changes?
No