Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41803: [MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch #41817

Merged
merged 16 commits into from
May 28, 2024

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented May 24, 2024

Rationale for this change

This pull requests adds two new APIs for importing and exporting arrow.tabular.RecordBatch instances using the C Data Interface format.

Example:

>> T = table((1:3)', ["A"; "B"; "C"]);
>> expected = arrow.recordBatch(T)

expected = 

  Arrow RecordBatch with 3 rows and 2 columns:

    Schema:

        Var1: Float64 | Var2: String

    First Row:

        1 | "A"

>> cArray = arrow.c.Array();
>> cSchema = arrow.c.Schema();

% Export the RecordBatch to C Data Interface Format
>> expected.export(cArray.Address, cSchema.Address);

% Import the RecordBatch from C Data Interface Format
>> actual = arrow.tabular.RecordBatch.import(cArray, cSchema)

actual = 

  Arrow RecordBatch with 3 rows and 2 columns:

    Schema:

        Var1: Float64 | Var2: String

    First Row:

        1 | "A"

% The RecordBatch is the same after round-tripping to the C Data Interface format
>> isequal(actual, expected)

ans =

  logical

   1

What changes are included in this PR?

  1. Added a new method arrow.tabular.RecordBatch.export for exporting RecordBatch objects to the C Data Interface format.
  2. Added a new static method arrow.tabular.RecordBatch.import for importing RecordBatch objects from the C Data Interface format.
  3. Added a new internal class arrow.c.internal.RecordBatchImporter for importing RecordBatch objects from the C Data Interface format.

Are these changes tested?

Yes.

  1. Added a new test file matlab/test/arrow/c/tRoundtripRecordBatch.m which has basic round-trip tests for importing and exporting RecordBatch objects.

Are there any user-facing changes?

Yes.

  1. Two new user-facing methods were added to arrow.tabular.RecordBatch. The first is arrow.tabular.RecordBatch.export(cArrowArrayAddress, cArrowSchemaAddress). The second is arrow.tabular.RecordBatch.import(cArray, cSchema). These APIs can be used to export/import RecordBatch objects using the C Data Interface format.

Future Directions

  1. Add integration tests for sharing data between MATLAB/mlarrow and Python/pyarrow running in the same process using the MATLAB interface to Python.
  2. Add support for the Arrow C stream interface format.

Notes

  1. Thanks to @kevingurney for the help with this feature!

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

We need RecordBatchImporter because we can't define custom constructor (something like static libmexclass::proxyMakeResult arrow::matlab::tabular::proxy::RecordBatch::import(....)), right?

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels May 25, 2024
@sgilmore10
Copy link
Member Author

+1

We need RecordBatchImporter because we can't define custom constructor (something like static libmexclass::proxyMakeResult arrow::matlab::tabular::proxy::RecordBatch::import(....)), right?

Yes, that's correct. @kevingurney and I have talked about extending mathworks/libmexclass to support either static proxy methods or free-floating functions at some point. We just haven't been able to get around to adding that functionality yet.

@sgilmore10 sgilmore10 merged commit fe2d926 into apache:main May 28, 2024
10 checks passed
@sgilmore10 sgilmore10 deleted the GH-41803 branch May 28, 2024 13:37
@sgilmore10 sgilmore10 removed the awaiting merge Awaiting merge label May 28, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit fe2d926.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 13 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants