Add Lambda function for the Amazon Security Lake integration (#189)

* Migrate from #147 * Update amazon-security-lake integration - Improved documentation. - Python code has been moved to `wazuh-indexer/integrations/amazon-security-lake/src`. - Development environment now uses OpenSearch 2.12.0. - The `wazuh.integration.security.lake` container now displays logs, by watching logstash's log file. - [**NEEDS FIX**] As a temporary solution, the `INDEXER_USERNAME` and `INDEXER_PASSWORD` values have been added as an environment variable to the `wazuh.integration.security.lake` container. These values should be set at Dockerfile level, but isn't working, probably due to permission denied on invocation of the `setup.sh` script. - [**NEEDS FIX**] As a temporary solution, the output file of the `indexer-to-file` pipeline as been moved to `/var/log/logstash/indexer-to-file`. Previous path `/usr/share/logstash/pipeline/indexer-to-file.json` results in permission denied. - [**NEEDS FIX**] As a temporary solution, the input.opensearch.query has been replaced with `match_all`, as the previous one does not return any data, probably to the use of time filters `gt: now-1m`. - Standard output enable for `/usr/share/logstash/pipeline/indexer-to-file.json`. - [**NEEDS FIX**] ECS compatibility disabled: `echo "pipeline.ecs_compatibility: disabled" >> /etc/logstash/logstash.yml` -- to be included automatically - Python3 environment path added to the `indexer-to-integrator` pipeline. * Disable ECS compatibility (auto) - Adds pipeline.ecs_compatibility: disabled at Dockerfile level. - Removes `INDEXER_USERNAME` and `INDEXER_PASSWORD` as environment variables on the `wazuh.integration.security.lake` container. * Add @timestamp field to sample alerts * Fix Logstash pipelines * Add working indexer-to-s3 pipeline * Add working Python script up to S3 upload * Add latest changes * Remove duplicated line * Add working environment with minimal AWS lambda function * Mount src folder to Lambda's workdir * Add first functional lambda function Tested on local environment, using S3 Ninja and a Lambda container * Working state * Add documentation * Improve code * Improve code * Clean up * Add instructions to build a deployment package * Make zip file lighter * Use default name for aws_region * Add destination bucket validation * Add env var validation and full destination S3 path * Add AWS_ENDPOINT environment variable * Rename AWS_DEFAULT_REGION * Remove unused env vars * Remove unused file and improve documentation a bit. * Makefile improvements * Use dummy env variables --------- Signed-off-by: Álex Ruiz <[email protected]>
wazuh · Aug 20, 2024 · c616df5 · c616df5
1 parent 073a304
commit c616df5
Show file tree

Hide file tree

Showing 24 changed files with 481 additions and 329 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,11 @@
 # build files
 artifacts/
+*.deb
+*.rpm
+*.zip
+*.tar.gz
+
+integrations/amazon-security-lake/package
 
 .java
 .m2

diff --git a/docker/dev/dev.yml b/docker/dev/dev.yml
@@ -5,7 +5,7 @@ services:
     image: wi-dev:${VERSION}
     container_name: wi-dev_${VERSION}
     build:
-      context: ./../..
+      context: ${REPO_PATH}
       dockerfile: ${REPO_PATH}/docker/dev/images/Dockerfile
     ports:
       # OpenSearch REST API

diff --git a/integrations/README.md b/integrations/README.md
@@ -1,58 +1,93 @@
 ## Wazuh indexer integrations
 
-This folder contains integrations with third-party XDR, SIEM and cybersecurity software. 
+This folder contains integrations with third-party XDR, SIEM and cybersecurity software.
 The goal is to transport Wazuh's analysis to the platform that suits your needs.
 
 ### Amazon Security Lake
 
-Amazon Security Lake automatically centralizes security data from AWS environments, SaaS providers, 
-on premises, and cloud sources into a purpose-built data lake stored in your account. With Security Lake, 
-you can get a more complete understanding of your security data across your entire organization. You can 
-also improve the protection of your workloads, applications, and data. Security Lake has adopted the 
-Open Cybersecurity Schema Framework (OCSF), an open standard. With OCSF support, the service normalizes 
+Amazon Security Lake automatically centralizes security data from AWS environments, SaaS providers,
+on premises, and cloud sources into a purpose-built data lake stored in your account. With Security Lake,
+you can get a more complete understanding of your security data across your entire organization. You can
+also improve the protection of your workloads, applications, and data. Security Lake has adopted the
+Open Cybersecurity Schema Framework (OCSF), an open standard. With OCSF support, the service normalizes
 and combines security data from AWS and a broad range of enterprise security data sources.
 
-##### Usage
+#### Development guide
 
 A demo of the integration can be started using the content of this folder and Docker.
 
 ```console
 docker compose -f ./docker/amazon-security-lake.yml up -d
 ```
 
-This docker compose project will bring a *wazuh-indexer* node, a *wazuh-dashboard* node, 
-a *logstash* node and our event generator. On the one hand, the event generator will push events 
-constantly to the indexer, on the `wazuh-alerts-4.x-sample` index by default (refer to the [events 
-generator](./tools/events-generator/README.md) documentation for customization options). 
-On the other hand, logstash will constantly query for new data and deliver it to the integration 
-Python program, also present in that node. Finally, the integration module will prepare and send the 
-data to the Amazon Security Lake's S3 bucket.
+This docker compose project will bring a _wazuh-indexer_ node, a _wazuh-dashboard_ node,
+a _logstash_ node, our event generator and an AWS Lambda Python container. On the one hand, the event generator will push events
+constantly to the indexer, to the `wazuh-alerts-4.x-sample` index by default (refer to the [events
+generator](./tools/events-generator/README.md) documentation for customization options).
+On the other hand, logstash will constantly query for new data and deliver it to output configured in the
+pipeline, which can be one of `indexer-to-s3` or `indexer-to-file`.
+
+The `indexer-to-s3` pipeline is the method used by the integration. This pipeline delivers
+the data to an S3 bucket, from which the data is processed using a Lambda function, to finally
+be sent to the Amazon Security Lake bucket in Parquet format.
+
 <!-- TODO continue with S3 credentials setup -->
 
 Attach a terminal to the container and start the integration by starting logstash, as follows:
 
 ```console
-/usr/share/logstash/bin/logstash -f /usr/share/logstash/pipeline/indexer-to-integrator.conf --path.settings /etc/logstash
+/usr/share/logstash/bin/logstash -f /usr/share/logstash/pipeline/indexer-to-s3.conf --path.settings /etc/logstash
 ```
 
-Unprocessed data can be sent to a file or to an S3 bucket.
-```console
-/usr/share/logstash/bin/logstash -f /usr/share/logstash/pipeline/indexer-to-file.conf --path.settings /etc/logstash
-/usr/share/logstash/bin/logstash -f /usr/share/logstash/pipeline/indexer-to-s3.conf --path.settings /etc/logstash
+After 5 minutes, the first batch of data will show up in http://localhost:9444/ui/wazuh-indexer-aux-bucket.
+You'll need to invoke the Lambda function manually, selecting the log file to process.
+
+```bash
+export AUX_BUCKET=wazuh-indexer-aux-bucket
+
+bash amazon-security-lake/src/invoke-lambda.sh <file>
 ```
 
-All three pipelines are configured to fetch the latest data from the *wazuh-indexer* every minute. In
-the case of `indexer-to-file`, the data is written at the same pace, whereas `indexer-to-s3`, data 
-is uploaded every 5 minutes.
+Processed data will be uploaded to http://localhost:9444/ui/wazuh-indexer-amazon-security-lake-bucket. Click on any file to download it,
+and check it's content using `parquet-tools`. Just make sure of installing the virtual environment first, through [requirements.txt](./amazon-security-lake/).
 
-For development or debugging purposes, you may want to enable hot-reload, test or debug on these files, 
+```bash
+parquet-tools show <parquet-file>
+```
+
+Bucket names can be configured editing the [amazon-security-lake.yml](./docker/amazon-security-lake.yml) file.
+
+For development or debugging purposes, you may want to enable hot-reload, test or debug on these files,
 by using the `--config.reload.automatic`, `--config.test_and_exit` or `--debug` flags, respectively.
 
 For production usage, follow the instructions in our documentation page about this matter.
 (_when-its-done_)
 
 As a last note, we would like to point out that we also use this Docker environment for development.
 
+#### Deployment guide
+
+- Create one S3 bucket to store the raw events, for example: `wazuh-security-lake-integration`
+- Create a new AWS Lambda function
+  - Create an IAM role with access to the S3 bucket created above.
+  - Select Python 3.12 as the runtime
+  - Configure the runtime to have 512 MB of memory and 30 seconds timeout
+  - Configure an S3 trigger so every created object in the bucket with `.txt` extension invokes the Lambda.
+  - Run `make` to generate a zip deployment package, or create it manually as per the [AWS Lambda documentation](https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-create-dependencies).
+  - Upload the zip package to the bucket. Then, upload it to the Lambda from the S3 as per these instructions: https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-package.html#gettingstarted-package-zip
+- Create a Custom Source within Security Lake for the Wazuh Parquet files as per the following guide: https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html
+- Set the **AWS account ID** for the Custom Source **AWS account with permission to write data**.
+
+<!-- TODO Configure AWS Lambda Environment Variables /-->
+<!-- TODO Install and configure Logstash /-->
+
+The instructions on this section have been based on the following AWS tutorials and documentation.
+
+- [Tutorial: Using an Amazon S3 trigger to create thumbnail images](https://docs.aws.amazon.com/lambda/latest/dg/with-s3-tutorial.html)
+- [Tutorial: Using an Amazon S3 trigger to invoke a Lambda function](https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html)
+- [Working with .zip file archives for Python Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/python-package.html)
+- [Best practices for working with AWS Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html)
+
 ### Other integrations
 
 TBD
diff --git a/integrations/amazon-security-lake/Makefile b/integrations/amazon-security-lake/Makefile
@@ -0,0 +1,28 @@
+
+ZIP_NAME = wazuh_to_amazon_security_lake
+TARGET = package
+SRC = src
+
+# Main target
+.ONESHELL:
+$(ZIP_NAME).zip: $(TARGET) $(SRC)/lambda_function.py $(SRC)/wazuh_ocsf_converter.py
+	@cd $(TARGET)
+	@zip -r ../$(ZIP_NAME).zip .
+	@cd ../$(SRC)
+	@zip ../$@ lambda_function.py wazuh_ocsf_converter.py
+	@zip ../$@ models -r
+
+$(TARGET):
+	docker run -v `pwd`:/src -w /src \
+		python:3.12 \
+		pip install \
+		--platform manylinux2014_x86_64 \
+		--target=$(TARGET) \
+		--implementation cp \
+		--python-version 3.12 \
+		--only-binary=:all: --upgrade \
+		-r requirements.aws.txt
+
+clean:
+	@rm -rf $(TARGET)
+	@py3clean .
diff --git a/integrations/amazon-security-lake/aws-lambda.dockerfile b/integrations/amazon-security-lake/aws-lambda.dockerfile
@@ -0,0 +1,17 @@
+# docker build --platform linux/amd64 --no-cache -f aws-lambda.dockerfile -t docker-image:test .
+# docker run --platform linux/amd64 -p 9000:8080 docker-image:test
+
+# FROM public.ecr.aws/lambda/python:3.9
+FROM amazon/aws-lambda-python:3.12
+
+# Copy requirements.txt
+COPY requirements.aws.txt ${LAMBDA_TASK_ROOT}
+
+# Install the specified packages
+RUN pip install -r requirements.aws.txt
+
+# Copy function code
+COPY src ${LAMBDA_TASK_ROOT}
+
+# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
+CMD [ "lambda_function.lambda_handler" ]
diff --git a/integrations/amazon-security-lake/invoke-lambda.sh b/integrations/amazon-security-lake/invoke-lambda.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+
+export AUX_BUCKET=wazuh-indexer-aux-bucket
+
+curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
+  "Records": [
+    {
+      "eventVersion": "2.0",
+      "eventSource": "aws:s3",
+      "awsRegion": "us-east-1",
+      "eventTime": "1970-01-01T00:00:00.000Z",
+      "eventName": "ObjectCreated:Put",
+      "userIdentity": {
+        "principalId": "AIDAJDPLRKLG7UEXAMPLE"
+      },
+      "requestParameters":{
+        "sourceIPAddress":"127.0.0.1"
+      },
+      "responseElements":{
+        "x-amz-request-id":"C3D13FE58DE4C810",
+        "x-amz-id-2":"FMyUVURIY8/IgAtTv8xRjskZQpcIZ9KG4V5Wp6S7S/JRWeUWerMUE5JgHvANOjpD"
+      },
+      "s3": {
+        "s3SchemaVersion": "1.0",
+        "configurationId": "testConfigRule",
+        "bucket": {
+          "name": "'"${AUX_BUCKET}"'",
+          "ownerIdentity": {
+            "principalId":"A3NL1KOZZKExample"
+          },
+          "arn": "'"arn:aws:s3:::${AUX_BUCKET}"'"
+        },
+        "object": {
+          "key": "'"${1}"'",
+          "size": 1024,
+          "eTag":"d41d8cd98f00b204e9800998ecf8427e",
+          "versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko"
+        }
+      }
+    }
+  ]
+}'
diff --git a/integrations/amazon-security-lake/logstash/pipeline/indexer-to-integrator.conf b/integrations/amazon-security-lake/logstash/pipeline/indexer-to-integrator.conf
diff --git a/integrations/amazon-security-lake/logstash/pipeline/indexer-to-s3.conf b/integrations/amazon-security-lake/logstash/pipeline/indexer-to-s3.conf
@@ -10,12 +10,12 @@ input {
             "query": {
                "range": {
                   "@timestamp": {
-                     "gt": "now-1m"
+                     "gt": "now-5m"
                   }
                }
             }
       }'
-      schedule => "5/* * * * *"
+      schedule => "*/5 * * * *"
    }
 }
 
@@ -26,15 +26,15 @@ output {
    }
    s3 {
       id => "output.s3"
-      access_key_id => "${AWS_KEY}"
-      secret_access_key => "${AWS_SECRET}"
+      access_key_id => "${AWS_ACCESS_KEY_ID}"
+      secret_access_key => "${AWS_SECRET_ACCESS_KEY}"
       region => "${AWS_REGION}"
-      endpoint => "http://s3.ninja:9000"
-      bucket => "${AWS_BUCKET}"
-      codec => "json"
+      endpoint => "${AWS_ENDPOINT}"
+      bucket => "${AUX_BUCKET}"
+      codec => "json_lines"
       retry_count => 0
       validate_credentials_on_root_bucket => false
-      prefix => "%{+YYYY}/%{+MM}/%{+dd}"
+      prefix => "%{+YYYY}%{+MM}%{+dd}"
       server_side_encryption => true
       server_side_encryption_algorithm => "AES256"
       additional_settings => {

diff --git a/integrations/amazon-security-lake/requirements.aws.txt b/integrations/amazon-security-lake/requirements.aws.txt
@@ -0,0 +1,2 @@
+pyarrow>=10.0.1
+pydantic>=2.6.1
diff --git a/integrations/amazon-security-lake/requirements.txt b/integrations/amazon-security-lake/requirements.txt
@@ -1,4 +1,4 @@
 pyarrow>=10.0.1
 parquet-tools>=0.2.15
-pydantic==2.6.1
+pydantic>=2.6.1
 boto3==1.34.46