-
Notifications
You must be signed in to change notification settings - Fork 366
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce File Extension Filtering Logic for GCP Storage and AWS S3 C…
…onnectors (#64) This PR introduces new filtering logic to the GCP Storage Source and AWS S3 Source connectors, allowing users to include or exclude files based on their extensions during the source file search process. This enhancement provides finer control over which files are processed, improving the flexibility and efficiency of data ingestion. GCP Storage Source Connector New Properties: connect.gcpstorage.source.extension.excludes: Description: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered. Default: null (No filtering is enabled by default; all files are considered) connect.gcpstorage.source.extension.includes: Description: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered. Default: null (All extensions are included by default) AWS S3 Source Connector New Properties: connect.s3.source.extension.excludes: Description: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered. Default: null (No filtering is enabled by default; all files are considered) connect.s3.source.extension.includes: Description: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered. Default: null (All extensions are included by default) How It Works Include Filtering: If the source.extension.includes property is set, only files with extensions listed in this property will be considered for processing. Exclude Filtering: If the source.extension.excludes property is set, files with extensions listed in this property will be ignored during processing. Combined Use: When both properties are set, the connector will only include files that match the includes property and do not match the excludes property. Use Cases: Inclusion: Users can specify certain file types to process (e.g., .csv, .json), ensuring that only relevant files are ingested. Exclusion: Users can exclude files with extensions that should not be processed (e.g., temporary files like .tmp or backup files like .bak). * Source extension filters: part 1 * Wiring in * Addressing review comments * Making documentation more specific
- Loading branch information
1 parent
e902f7f
commit e9df8d2
Showing
21 changed files
with
460 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 0 additions & 15 deletions
15
...la/io/lenses/streamreactor/connect/cloud/common/source/config/S3SourceBucketOptions.scala
This file was deleted.
Oops, something went wrong.
63 changes: 63 additions & 0 deletions
63
...src/main/scala/io/lenses/streamreactor/connect/cloud/common/storage/ExtensionFilter.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
/* | ||
* Copyright 2017-2024 Lenses.io Ltd | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.lenses.streamreactor.connect.cloud.common.storage | ||
|
||
/** | ||
* A class used to filter files based on their extensions. | ||
* It allows to include or exclude files with certain extensions. | ||
* | ||
* @constructor create a new ExtensionFilter with allowed and excluded extensions. | ||
* @param allowedExtensions the set of extensions that are allowed. | ||
* @param excludedExtensions the set of extensions that are excluded. | ||
*/ | ||
class ExtensionFilter( | ||
val allowedExtensions: Set[String], | ||
val excludedExtensions: Set[String], | ||
) { | ||
|
||
/** | ||
* Filters the metadata of a file based on its extension. | ||
* | ||
* @param metadata the metadata of the file to be filtered. | ||
* @return true if the file passes the filter, false otherwise. | ||
*/ | ||
def filter[MD <: FileMetadata](metadata: MD): Boolean = | ||
ExtensionFilter.performFilterLogic(metadata.file.toLowerCase, allowedExtensions, excludedExtensions) | ||
|
||
/** | ||
* Filters a file based on its name. | ||
* | ||
* @param fileName the name of the file to be filtered. | ||
* @return true if the file passes the filter, false otherwise. | ||
*/ | ||
def filter(fileName: String): Boolean = | ||
ExtensionFilter.performFilterLogic(fileName.toLowerCase, allowedExtensions, excludedExtensions) | ||
|
||
} | ||
|
||
object ExtensionFilter { | ||
|
||
def performFilterLogic( | ||
fileName: String, | ||
allowedExtensions: Set[String], | ||
excludedExtensions: Set[String], | ||
): Boolean = { | ||
val allowedContainsEx = allowedExtensions.exists(ext => fileName.endsWith(ext)) | ||
val excludedNotContainsEx = excludedExtensions.forall(ext => !fileName.endsWith(ext)) | ||
(allowedExtensions.isEmpty || allowedContainsEx) && excludedNotContainsEx | ||
} | ||
|
||
} |
Oops, something went wrong.