Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geoparquet filtering and simplification #895

Merged
merged 25 commits into from
Oct 14, 2024

Conversation

bchapuis
Copy link
Member

@bchapuis bchapuis commented Oct 10, 2024

  • Adds bbox based filtering capabilities to the geoparquet reader, hence loading only relevant records to the db.
  • Simplifies the object model to minimize the creation of wrappers and arrays to hold values.
  • Removes unused classes and methods related to the geoparquet writer.

}

private FileInfo getFileInfo(FileStatus fileStatus) {
try {
return buildFileInfo(fileStatus);
ParquetMetadata parquetMetadata =
ParquetFileReader.readFooter(configuration, fileStatus.getPath());

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ParquetFileReader.readFooter
should be avoided because it has been deprecated.
@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch from 07f53a3 to 930cc0f Compare October 11, 2024 22:08
return getEnvelopeValues(parquetSchema.getFieldIndex(fieldName));
}

public String toString() {

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
Object.toString
; it is advisable to add an Override annotation.
@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch 2 times, most recently from 2703d41 to b4a5f1a Compare October 12, 2024 12:13
@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch from b4a5f1a to ffbc74c Compare October 12, 2024 12:16
@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch from c602218 to 1f9cc17 Compare October 12, 2024 22:43
@bchapuis bchapuis requested a review from sebr72 October 13, 2024 07:42
@bchapuis
Copy link
Member Author

@sebr72 I hope you are doing well. I refactored the GeoParquetGroup so that we don't need wrapper objects and instantiate arrays only when it is necessary. I also adapted the spliterator so that we can filter the files and the records based on their respective bbox. Please let me now what you think about these changes.

@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch from e00983f to 5ea6c90 Compare October 13, 2024 09:44
@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch from 5ea6c90 to 985ef2c Compare October 13, 2024 09:46
@bchapuis bchapuis force-pushed the geoparquet-filtering-and-simplification branch from 42eacb7 to cdb9f6a Compare October 14, 2024 19:24
Copy link

sonarcloud bot commented Oct 14, 2024

@bchapuis bchapuis merged commit ba6350b into main Oct 14, 2024
8 checks passed
@bchapuis bchapuis deleted the geoparquet-filtering-and-simplification branch October 14, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant