Skip to content

Commit

Permalink
feat: add salesforce connector (#1168)
Browse files Browse the repository at this point in the history
  • Loading branch information
potter-potter authored Sep 2, 2023
1 parent 1a0b737 commit b710baf
Show file tree
Hide file tree
Showing 29 changed files with 1,191 additions and 6 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,9 @@ jobs:
MS_TENANT_ID: ${{ secrets.MS_TENANT_ID }}
MS_USER_EMAIL: ${{ secrets.MS_USER_EMAIL }}
MS_USER_PNAME: ${{ secrets.MS_USER_PNAME }}
SALESFORCE_USERNAME: ${{secrets.SALESFORCE_USERNAME}}
SALESFORCE_CONSUMER_KEY: ${{secrets.SALESFORCE_CONSUMER_KEY}}
SALESFORCE_PRIVATE_KEY: ${{secrets.SALESFORCE_PRIVATE_KEY}}
SHAREPOINT_CLIENT_ID: ${{secrets.SHAREPOINT_CLIENT_ID}}
SHAREPOINT_CRED: ${{secrets.SHAREPOINT_CRED}}
SHAREPOINT_SITE: ${{secrets.SHAREPOINT_SITE}}
Expand Down Expand Up @@ -313,6 +316,7 @@ jobs:
make install-ingest-gitlab
make install-ingest-onedrive
make install-ingest-outlook
make install-ingest-salesforce
make install-ingest-slack
make install-ingest-wikipedia
make install-ingest-notion
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/ingest-test-fixtures-update-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ jobs:
MS_TENANT_ID: ${{ secrets.MS_TENANT_ID }}
MS_USER_EMAIL: ${{ secrets.MS_USER_EMAIL }}
MS_USER_PNAME: ${{ secrets.MS_USER_PNAME }}
SALESFORCE_USERNAME: ${{secrets.SALESFORCE_USERNAME}}
SALESFORCE_CONSUMER_KEY: ${{secrets.SALESFORCE_CONSUMER_KEY}}
SALESFORCE_PRIVATE_KEY: ${{secrets.SALESFORCE_PRIVATE_KEY}}
SHAREPOINT_CLIENT_ID: ${{secrets.SHAREPOINT_CLIENT_ID}}
SHAREPOINT_CRED: ${{secrets.SHAREPOINT_CRED}}
SHAREPOINT_SITE: ${{secrets.SHAREPOINT_SITE}}
Expand Down Expand Up @@ -104,6 +107,7 @@ jobs:
make install-ingest-gitlab
make install-ingest-onedrive
make install-ingest-outlook
make install-ingest-salesforce
make install-ingest-slack
make install-ingest-wikipedia
make install-ingest-notion
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
## 0.10.12-dev3

### Enhancements

### Features

* Add Salesforce Connector to be able to pull Account, Case, Campaign, EmailMessage, Lead

### Fixes

## 0.10.12-dev2

### Enhancements
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,10 @@ install-ingest-local:
install-ingest-notion:
python3 -m pip install -r requirements/ingest-notion.txt

.PHONY: install-ingest-salesforce
install-ingest-salesforce:
python3 -m pip install -r requirements/ingest-salesforce.txt

.PHONY: install-unstructured-inference
install-unstructured-inference:
python3 -m pip install -r requirements/local-inference.txt
Expand Down
1 change: 1 addition & 0 deletions docs/source/upstream_connectors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ in our community `Slack. <https://join.slack.com/t/unstructuredw-kbe4326/shared_
upstream_connectors/outlook
upstream_connectors/reddit
upstream_connectors/s3
upstream_connectors/salesforce
upstream_connectors/sharepoint
upstream_connectors/slack
upstream_connectors/wikipedia
129 changes: 129 additions & 0 deletions docs/source/upstream_connectors/salesforce.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
Salesforce
==========
Connect Salesforce to your preprocessing pipeline, and batch process Salesforce data using ``unstructured-ingest`` to store structured outputs locally on your filesystem.

First you'll need to install the Salesforce dependencies as shown here.

.. code:: shell
pip install "unstructured[salesforce]"
Run Locally
-----------

.. tabs::

.. tab:: Shell

.. code:: shell
unstructured-ingest \
salesforce \
--salesforce-username "$SALESFORCE_USERNAME" \
--salesforce-consumer-key "$SALESFORCE_CONSUMER_KEY" \
--salesforce-private-key-path "$SALESFORCE_PRIVATE_KEY_PATH" \
--salesforce-categories "EmailMessage,Account,Lead,Case,Campaign" \
--structured-output-dir salesforce-output \
--num-processes 2 \
--recursive \
--verbose
.. tab:: Python

.. code:: python
import subprocess
command = [
"unstructured-ingest",
"salesforce",
"--salesforce-username" "$SALESFORCE_USERNAME"
"--salesforce-consumer-key" "$SALESFORCE_CONSUMER_KEY"
"--salesforce-private-key-path" "$SALESFORCE_PRIVATE_KEY_PATH"
"--salesforce-categories" "EmailMessage,Account,Lead,Case,Campaign"
"--structured-output-dir" "salesforce-output"
"--box_app_config", "$BOX_APP_CONFIG_PATH"
"--remote-url", "box://utic-test-ingest-fixtures"
"--structured-output-dir", "box-output"
"--num-processes", "2"
"--recursive",
"--verbose",
]
# Run the command
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output, error = process.communicate()
# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())
Run via the API
---------------

You can also use upstream connectors with the ``unstructured`` API. For this you'll need to use the ``--partition-by-api`` flag and pass in your API key with ``--api-key``.

.. tabs::

.. tab:: Shell

.. code:: shell
unstructured-ingest \
salesforce \
--salesforce-username "$SALESFORCE_USERNAME" \
--salesforce-consumer-key "$SALESFORCE_CONSUMER_KEY" \
--salesforce-private-key-path "$SALESFORCE_PRIVATE_KEY_PATH" \
--salesforce-categories "EmailMessage,Account,Lead,Case,Campaign" \
--structured-output-dir salesforce-output \
--num-processes 2 \
--recursive \
--verbose
--partition-by-api \
--api-key "<UNSTRUCTURED-API-KEY>"
.. tab:: Python

.. code:: python
import subprocess
command = [
"unstructured-ingest",
"salesforce",
"--salesforce-username" "$SALESFORCE_USERNAME"
"--salesforce-consumer-key" "$SALESFORCE_CONSUMER_KEY"
"--salesforce-private-key-path" "$SALESFORCE_PRIVATE_KEY_PATH"
"--salesforce-categories" "EmailMessage,Account,Lead,Case,Campaign"
"--structured-output-dir" "salesforce-output"
"--box_app_config", "$BOX_APP_CONFIG_PATH"
"--remote-url", "box://utic-test-ingest-fixtures"
"--structured-output-dir", "box-output"
"--num-processes", "2"
"--recursive",
"--verbose",
"--partition-by-api",
"--api-key", "<UNSTRUCTURED-API-KEY>",
]
# Run the command
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output, error = process.communicate()
# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())
Additionaly, you will need to pass the ``--partition-endpoint`` if you're running the API locally. You can find more information about the ``unstructured`` API `here <https://github.com/Unstructured-IO/unstructured-api>`_.

For a full list of the options the CLI accepts check ``unstructured-ingest salesforce --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
4 changes: 2 additions & 2 deletions examples/ingest/box/ingest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# Maybe check 'Make api calls as the as-user header'
# REAUTHORIZE app after making any of the above changes

# box_app_config is the path to a json file, available in the App Settings section of your Box App
# box-app-config is the path to a json file, available in the App Settings section of your Box App
# More info to set up the app:
# https://developer.box.com/guides/authentication/jwt/jwt-setup/
# and set up the app config.json file here:
Expand All @@ -24,7 +24,7 @@ cd "$SCRIPT_DIR"/../../.. || exit 1

PYTHONPATH=. ./unstructured/ingest/main.py \
box \
--box_app_config "$BOX_APP_CONFIG_PATH" \
--box-app-config "$BOX_APP_CONFIG_PATH" \
--remote-url box://utic-test-ingest-fixtures \
--structured-output-dir box-output \
--num-processes 2 \
Expand Down
29 changes: 29 additions & 0 deletions examples/ingest/salesforce/ingest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env bash

# Processes multiple files in a nested folder structure from Salesforce
# through Unstructured's library in 2 processes.

# Available categories are: Account, Case, Campaign, EmailMessage, Lead

# Structured outputs are stored in salesforce-output/

# Using JWT authorization
# https://developer.salesforce.com/docs/atlas.en-us.sfdx_dev.meta/sfdx_dev/sfdx_dev_auth_key_and_cert.htm
# https://developer.salesforce.com/docs/atlas.en-us.sfdx_dev.meta/sfdx_dev/sfdx_dev_auth_connected_app.htm

# private-key-path is the path to the key file

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd "$SCRIPT_DIR"/../../.. || exit 1


PYTHONPATH=. ./unstructured/ingest/main.py \
salesforce \
--username "$SALESFORCE_USERNAME" \
--consumer-key "$SALESFORCE_CONSUMER_KEY" \
--private-key-path "$SALESFORCE_PRIVATE_KEY_PATH" \
--categories "EmailMessage,Account,Lead,Case,Campaign" \
--structured-output-dir salesforce-output \
--preserve-downloads \
--reprocess \
--verbose
3 changes: 3 additions & 0 deletions requirements/ingest-salesforce.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-c constraints.in
-c base.txt
simple-salesforce
63 changes: 63 additions & 0 deletions requirements/ingest-salesforce.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile requirements/ingest-salesforce.in
#
attrs==23.1.0
# via zeep
certifi==2023.7.22
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# requests
cffi==1.15.1
# via cryptography
charset-normalizer==3.2.0
# via
# -c requirements/base.txt
# requests
cryptography==41.0.3
# via simple-salesforce
idna==3.4
# via
# -c requirements/base.txt
# requests
isodate==0.6.1
# via zeep
lxml==4.9.3
# via
# -c requirements/base.txt
# zeep
platformdirs==3.10.0
# via zeep
pycparser==2.21
# via cffi
pyjwt==2.8.0
# via simple-salesforce
pytz==2023.3
# via zeep
requests==2.31.0
# via
# -c requirements/base.txt
# requests-file
# requests-toolbelt
# simple-salesforce
# zeep
requests-file==1.5.1
# via zeep
requests-toolbelt==1.0.0
# via zeep
simple-salesforce==1.12.4
# via -r requirements/ingest-salesforce.in
six==1.16.0
# via
# isodate
# requests-file
urllib3==1.26.16
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# requests
zeep==4.2.1
# via simple-salesforce
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ def load_requirements(file_list: Optional[Union[str, List[str]]] = None) -> List
"airtable": load_requirements("requirements/ingest-airtable.in"),
"sharepoint": load_requirements("requirements/ingest-sharepoint.in"),
"delta-table": load_requirements("requirements/ingest-delta-table.in"),
"salesforce": load_requirements("requirements/ingest-salesforce.in"),
# Legacy extra requirements
"huggingface": load_requirements("requirements/huggingface.in"),
"local-inference": all_doc_reqs,
Expand Down
Loading

0 comments on commit b710baf

Please sign in to comment.