-
Notifications
You must be signed in to change notification settings - Fork 743
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Roman/s3 minio all cloud support (#1606)
### Description Exposes the endpoint url as an access kwarg when using the s3 filesystem library via the fsspec abstraction. This allows for any non-aws data providers that support the s3 protocol to be used with the s3 connector (i.e. minio) Closes out #950 --------- Co-authored-by: ryannikolaidis <[email protected]> Co-authored-by: rbiseck3 <[email protected]>
- Loading branch information
1 parent
1fb4642
commit 8821689
Showing
11 changed files
with
161 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#!/usr/bin/env bash | ||
|
||
SCRIPT_DIR=$(dirname "$(realpath "$0")") | ||
|
||
secret_key=minioadmin | ||
access_key=minioadmin | ||
region=us-east-2 | ||
endpoint_url=http://localhost:9000 | ||
bucket_name=utic-dev-tech-fixtures | ||
|
||
function upload(){ | ||
echo "Uploading test content to new bucket in minio" | ||
AWS_REGION=$region AWS_SECRET_ACCESS_KEY=$secret_key AWS_ACCESS_KEY_ID=$access_key \ | ||
aws --output json --endpoint-url $endpoint_url s3api create-bucket --bucket $bucket_name | jq | ||
AWS_REGION=$region AWS_SECRET_ACCESS_KEY=$secret_key AWS_ACCESS_KEY_ID=$access_key \ | ||
aws --endpoint-url $endpoint_url s3 cp "$SCRIPT_DIR"/wiki_movie_plots_small.csv s3://$bucket_name/ | ||
} | ||
|
||
# Create Minio single server | ||
docker compose version | ||
docker compose -f "$SCRIPT_DIR"/docker-compose.yaml up --wait | ||
docker compose -f "$SCRIPT_DIR"/docker-compose.yaml ps | ||
|
||
echo "Cluster is live." | ||
upload |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
services: | ||
minio: | ||
image: quay.io/minio/minio | ||
container_name: minio-test | ||
ports: | ||
- 9000:9000 | ||
- 9001:9001 | ||
command: server --console-address ":9001" /data | ||
healthcheck: | ||
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] | ||
interval: 5s | ||
timeout: 20s | ||
retries: 3 |
Large diffs are not rendered by default.
Oops, something went wrong.
19 changes: 19 additions & 0 deletions
19
test_unstructured_ingest/expected-structured-output/s3-minio/wiki_movie_plots_small.csv.json
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e | ||
|
||
|
||
SCRIPT_DIR=$(dirname "$(realpath "$0")") | ||
cd "$SCRIPT_DIR"/.. || exit 1 | ||
OUTPUT_FOLDER_NAME=s3-minio | ||
OUTPUT_DIR=$SCRIPT_DIR/structured-output/$OUTPUT_FOLDER_NAME | ||
DOWNLOAD_DIR=$SCRIPT_DIR/download/$OUTPUT_FOLDER_NAME | ||
max_processes=${MAX_PROCESSES:=$(python3 -c "import os; print(os.cpu_count())")} | ||
secret_key=minioadmin | ||
access_key=minioadmin | ||
|
||
# shellcheck disable=SC1091 | ||
source "$SCRIPT_DIR"/cleanup.sh | ||
|
||
function cleanup() { | ||
# Kill the container so the script can be repeatedly run using the same ports | ||
echo "Stopping Minio Docker container" | ||
docker-compose -f scripts/minio-test-helpers/docker-compose.yaml down --remove-orphans -v | ||
|
||
cleanup_dir "$OUTPUT_DIR" | ||
} | ||
|
||
trap cleanup EXIT | ||
|
||
# shellcheck source=/dev/null | ||
scripts/minio-test-helpers/create-and-check-minio.sh | ||
wait | ||
|
||
AWS_SECRET_ACCESS_KEY=$secret_key AWS_ACCESS_KEY_ID=$access_key PYTHONPATH=. ./unstructured/ingest/main.py \ | ||
s3 \ | ||
--num-processes "$max_processes" \ | ||
--download-dir "$DOWNLOAD_DIR" \ | ||
--metadata-exclude coordinates,filename,file_directory,metadata.data_source.date_processed,metadata.data_source.date_modified,metadata.last_modified,metadata.detection_class_prob,metadata.parent_id,metadata.category_depth \ | ||
--strategy hi_res \ | ||
--preserve-downloads \ | ||
--reprocess \ | ||
--output-dir "$OUTPUT_DIR" \ | ||
--verbose \ | ||
--remote-url s3://utic-dev-tech-fixtures/ \ | ||
--endpoint-url http://localhost:9000 | ||
|
||
|
||
"$SCRIPT_DIR"/check-diff-expected-output.sh $OUTPUT_FOLDER_NAME |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
__version__ = "0.10.19-dev7" # pragma: no cover | ||
__version__ = "0.10.19-dev8" # pragma: no cover |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters