Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40343: [C++] Move S3FileSystem to the registry #41559

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented May 6, 2024

Rationale for this change

Moving S3 to a separate module will allow separate build the module for use by builds of arrow which did not include S3 funcionality.

What changes are included in this PR?

  • the factory for S3 filesystems is added to the registry
  • an internal function is added which unregisters functions for testing purposes

Are these changes tested?

A test for just the modularization is added which unregisters built in factories. This way we're definitely testing loading new factories from module libraries.

Are there any user-facing changes?

No

@bkietz bkietz force-pushed the 40343-s3fs-into-registry branch 2 times, most recently from 7c7a277 to 970eaa4 Compare May 23, 2024 21:25
@bkietz bkietz marked this pull request as ready for review May 23, 2024 21:25
@bkietz bkietz requested a review from wgtmac as a code owner May 23, 2024 21:25
@@ -18,6 +18,7 @@
#include "arrow/adapters/orc/util.h"

#include <cmath>
#include <sstream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in GetOrcMajorVersion(), so it should be #included here by IWYU. I found it because compilation failed at one point (compilation can spuriously succeed when standard headers include more definitions than the standard guarantees; for example, inclusion of only <string> or <iostream> might fully define std::stringstream rather than only forward declaring it)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 18, 2024
cpp/src/arrow/CMakeLists.txt Outdated Show resolved Hide resolved
target_link_libraries(arrow_s3fs PRIVATE ${AWSSDK_LINK_LIBRARIES} arrow_shared)
set_source_files_properties(filesystem/s3fs.cc filesystem/s3fs_module.cc
PROPERTIES SKIP_PRECOMPILE_HEADERS ON
SKIP_UNITY_BUILD_INCLUSION ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we remove the linking of the main arrow library against AWSSDK_LINK_LIBRARIES when ARROW_S3_MODULE is ON?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to require the main arrow library to load the module. After the module can reproduce the full functionality of the S3FileSystem (S3ProxyOptions, for example) we can deprecate building s3 in the main library. That's not in scope for now, though

Comment on lines +134 to +136
EXTRA_LINK_LIBS
Boost::filesystem
Boost::system)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assignUser does the auto-formatter for CMake has the ability to indent these?

                   SOURCES
                     s3fs_module_test.cc
                     s3_test_util.cc
                   EXTRA_LABELS
                     filesystem
                   DEFINITIONS
                     ARROW_S3_LIBPATH="$<TARGET_FILE:arrow_s3fs>"
                   EXTRA_LINK_LIBS
                     Boost::filesystem
                     Boost::system)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we could, but it'd involve declaring each function like add_arrow_library to additional_commands https://cmake-format.readthedocs.io/en/latest/configuration.html#configuration

Copy link
Member

@assignUser assignUser Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed that ping. But yes, that is possible as Ben says, I have done something similar for velox custom functions. But currently I am a little concerned that cmake-format seems to be unmaintained https://github.com/cheshirekow/cmake_format/issues/340

cpp/src/arrow/filesystem/s3fs.cc Show resolved Hide resolved
Comment on lines +3043 to +3044
uri = "s3://" + options().GetAccessKey() + ":" + options().GetSecretKey() + "@" +
uri.substr("file:///"s.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be in URIs? Has this always been the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been in place since S3 URIs were supported, #6403

The URI is intended to be a fully self contained initializer for a filesystem, so if the filesystem requires secrets for initialization then the URI must contain them

Copy link
Member

@assignUser assignUser Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still recommended practice? It seems Azure recommends not doing this to prevent accidental exposure of credentials in logs etc. https://lists.apache.org/thread/mhwr2lvtxvjcqos12k7hr4cqkdofrxxo

I don't know anything about S3 auth though so 🤷 (in any case probably something for a follow up)

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting merge Awaiting merge labels Sep 20, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 26, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Sep 26, 2024
@felipecrv
Copy link
Contributor

@pitrou do you want to merge this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants