Skip to content

Commit

Permalink
update wikipedia doc
Browse files Browse the repository at this point in the history
  • Loading branch information
rbiseck3 committed Sep 18, 2023
1 parent eb8ce89 commit 035ef6a
Showing 1 changed file with 33 additions and 46 deletions.
79 changes: 33 additions & 46 deletions docs/source/source_connectors/wikipedia.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,21 @@ Run Locally

.. code:: python
import subprocess
command = [
"unstructured-ingest",
"wikipedia",
"--page-title", "Open Source Software",
"--output-dir", "dropbox-output",
"--num-processes", "2",
"--verbose",
]
# Run the command
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output, error = process.communicate()
# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())
from unstructured.ingest.runner.wikipedia import wikipedia
from unstructured.ingest.interfaces import ReadConfig, PartitionConfig
if __name__ == "__main__":
wikipedia(
verbose=True,
read_config=ReadConfig(),
partition_config=PartitionConfig(
output_dir="wikipedia-ingest-output",
num_processes=2
),
page_title="Open Source Software",
auto_suggest=False,
)
Run via the API
---------------
Expand All @@ -75,30 +68,24 @@ You can also use upstream connectors with the ``unstructured`` API. For this you

.. code:: python
import subprocess
command = [
"unstructured-ingest",
"wikipedia",
"--page-title", "Open Source Software",
"--output-dir", "dropbox-output",
"--num-processes", "2",
"--verbose",
"--partition-by-api",
"--api-key", "<UNSTRUCTURED-API-KEY>",
]
# Run the command
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output, error = process.communicate()
# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())
import os
from unstructured.ingest.interfaces import PartitionConfig, ReadConfig
from unstructured.ingest.runner.wikipedia import wikipedia
if __name__ == "__main__":
wikipedia(
verbose=True,
read_config=ReadConfig(),
partition_config=PartitionConfig(
output_dir="wikipedia-ingest-output",
num_processes=2,
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
),
page_title="Open Source Software",
auto_suggest=False,
)
Additionally, you will need to pass the ``--partition-endpoint`` if you're running the API locally. You can find more information about the ``unstructured`` API `here <https://github.com/Unstructured-IO/unstructured-api>`_.

Expand Down

0 comments on commit 035ef6a

Please sign in to comment.