Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

[Feature Request]: Pass args and kwargs when calling partition or partition_via_api on Unstructured loader #949

Open
IgnacioPascale opened this issue Feb 13, 2024 · 0 comments
Labels
enhancement New feature or request triage

Comments

@IgnacioPascale
Copy link

IgnacioPascale commented Feb 13, 2024

Feature Description

Pass args and kwargs on Unstructured base for load_data() and pass them when calling partition() or partition_via_api().

This would add flexibility to manipulate the (far too many) kwargs from the paritition library.

Reason

Over the last week, I tried taking advantage of the many good advantages partition offers through this loader. To give a few examples,

  • For .docx I intended to use include_page_breaks, which is set True by default on their docx.py but False on their "auto" method partition -> this is the one called by the loader.

  • For .pdf, I intended to use cool features such as infer_table_structure or strategy (to set hi_res). Similarly, I intended to use the former kwarg for .pptx as well.

The fact that I cannot manipulate the kwargs passed onto partition prevents me from manipulating data extraction the way I intend, and it's forcing me to subclass and override behavior for a very simple change.

Value of Feature

As explained before, users would be able to take advantage of the many great functionalities unstructured can offer, namely infer_table_structure, strategy, include_page_breaks, etc, by simply passing args and kwargs to the partition() or partition_via_api() methods.

@IgnacioPascale IgnacioPascale added enhancement New feature or request triage labels Feb 13, 2024
@IgnacioPascale IgnacioPascale changed the title [Feature Request]: [Feature Request]: Pass args and kwargs when calling partition or partition_via_api on Unstructured loader Feb 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant