-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37938: [Swift] Add initial C data interface implementation #41342
Conversation
|
461cf90
to
453cf4f
Compare
The go test for the C data interface are currently not included in CI testing. I plan to update this in a future PR. |
It’s great to add those interfaces to add Swift support. Thanks! |
ci/scripts/swift_test.sh
Outdated
@@ -40,4 +40,4 @@ popd | |||
source_dir=${1}/swift/ArrowFlight | |||
pushd ${source_dir} | |||
swift test | |||
popd | |||
popd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary change
739c452
to
c269261
Compare
Np, thank you! |
@dbtsai Are you familiar with Swift? If so, could you also review this carefully? |
swift/Arrow/Package.swift
Outdated
name: "ArrowC", // your C/C++ library's name | ||
path: "Sources/ArrowC" // your path to the C/C++ library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments look like from some examples and maybe unrelated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I will remove.
cArray.n_children = 0 | ||
cArray.children = nil | ||
cArray.dictionary = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this for TODO later? Otherwise I think n_children
should not be zero always? And also for children
and dictionary
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Swift version of Arrow currently doesn't support having children (Struct type) or dictionaries. I will add a comment mentioning this so it can be updated when those constructs are added.
let arraySchema = data!.pointee | ||
let exportId = Int(bitPattern: arraySchema.private_data) | ||
guard ArrowCExporter.exportedData[exportId] != nil else { | ||
fatalError("Export schema not found with id \(exportId)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does fatalError
actually do something like C exit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the behavior is the same. Description from Swift docs for fatalError: Unconditionally prints a given message and stops execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this file to swiftlint
targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added ArrowFlight and CDataWGo Package files as well, thanks!
@kou I have made the requested changes. Please review again. |
@abandy Could you please rebase from git main, if not recently done, so that we get cleaner CI results? |
} | ||
} | ||
|
||
private static var exportedData = [Int: ExportData]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this container cleaned up at shutdown? I'm wondering what happens if a release callback is called after this container is cleaned up. Presumably the release callback would crash because of the fatalError
?
This is not necessarily a problem, but I'm curious :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the memory should be released during shutdown. The container should be valid in the process that loaded the library until the process has been shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my question was more along the lines of: what happens if the release callback is called after finalization of Swift internals has started? Granted, we might not be able to do much about it, but it's probably good to keep this in mind (and/or add a comment about it).
self.arrowType = arrowType | ||
// keeping the name str to ensure the cstring buffer remains valid | ||
self.name = name | ||
self.arrowTypeName = (try arrowType.cDataFormatId as NSString).utf8String! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only ok because arrowType.cDataFormatId
always returns a static string, right?
Otherwise you would also have to store arrowType.cDataFormatId
in ExportSchema
so that its buffer remains valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I suppose that's not very future-proof, then? Some types such as timestamp, union... have dynamic format strings which embody the type's parameters. You may want to add a comment to remind readers about this?
(or perhaps simply arrange to keep the arrowType.cDataFormatId
result alive in the ExportSchema
class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR ready for Struct types and plan to follow up with Unions afterward. I was going to address this issue when I added support for these types to the C data interface in a future PR but I will make this change now so it is one less thing that will be an issue at that time. Thank you!
return .failure(.invalid("Children currently not supported")) | ||
} else if cSchema.dictionary != nil { | ||
return .failure(.invalid("Dictinoary types currently not supported")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also call ArrowCImporter.release(cSchema)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, agreed, I will update.
case .success(let holder): | ||
return .success(ImportArrayHolder(holder, cArrayPtr: cArrayPtr)) | ||
case .failure(let err): | ||
return .failure(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also call ArrowCImporter.release
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, smaller PRs are easy to review/merge than one large PR.
If you want to proceed quickly as much as possible, could you use the smaller PRs approach next time like you will do for struct type?
For example, we might be able to choose the following approach for this case:
- Implement
ArrowSchema
export for 1 simple type such asbool
. - Implement
ArrowSchema
import for 1. - Implement
ArrowSchema
export/import for one more simple type such asint8
. - ...
FYI: MATLAB uses the small PRs approach: https://github.com/apache/arrow/issues?q=is%3Aissue+matlab+%22C+data+interface%22
swift/.swiftlint.yml
Outdated
@@ -16,10 +16,15 @@ | |||
# under the License. | |||
|
|||
included: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kou, thank you example on the smaller project. I will definitely follow that format for the struct type change! Thank you very much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I added some minor comments but I'll merge this. If we need to work on these comments, let's work on them as follow-up tasks.
#define ARROW_FLAG_NULLABLE 2 | ||
#define ARROW_FLAG_MAP_KEYS_SORTED 4 | ||
|
||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
#define ARROW_FLAG_MAP_KEYS_SORTED 4 | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to ArrowCData.c
?
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
#include <stdint.h> // Must have this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <stdint.h> // Must have this! | |
#include <stdint.h> // For int64_t |
require ( | ||
github.com/andybalholm/brotli v1.0.5 // indirect | ||
github.com/apache/arrow/go/v12 v12.0.1 // indirect | ||
github.com/apache/arrow/go/v16 v16.0.0-20240203105949-22f2cfd1e1eb // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really indirect dependency?
### Rationale for this change Follow up changes requested in the #41342 ### What changes are included in this PR? Update includes header file and go dependencies changes related to the C Data interface changes. Authored-by: Alva Bandy <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 4d524eb. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Continuation for PR: #39091
This add an initial implementation of the C Data interface for swift. During development it was found that null count was not being properly maintained on the arrow buffers and this change is included as well. Also some minor refactoring was done to existing sources to enable this feature.
This has been tested from Swift calling into C to import data but not from Swift to C exporting data. Test is currently ongoing.