-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/add sources from unstructured inference #1538
feat/add sources from unstructured inference #1538
Conversation
…fixtures update (#1544) This pull request includes updated ingest test fixtures. Please review and merge if appropriate. Co-authored-by: benjats07 <[email protected]>
…update (#1562) This pull request includes updated ingest test fixtures. Please review and merge if appropriate. Co-authored-by: benjats07 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some quick feedback. this is mainly for debugging purposes, right?
- let's avoid overloading the word "source" which is has a meaning for data connectors.
- assuming this "breadcrumb" exists for debugging purposes, it should not exist in any element json output by default
…update (#1566) This pull request includes updated ingest test fixtures. Please review and merge if appropriate. Co-authored-by: benjats07 <[email protected]>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
if UNSTRUCTURED_INCLUDE_DEBUG_METADATA: | ||
assert {element.metadata.detection_origin for element in result} == {origin} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use mocker to set the constant to true so we are certain this is tested; I understand that in the ci we set the env variable so this is tested in CI but this if statement here can cause confusion for local testing vs. ci; and potentially have code silently fail if ci for some reason dropped the env
...tructured_ingest/expected-structured-output/notion/c47a4566-4c7a-488b-ac2a-1292ee507fcb.json
Outdated
Show resolved
Hide resolved
detection_origin: Optional[str] = None | ||
|
||
def __setattr__(self, key, value): | ||
if not UNSTRUCTURED_INCLUDE_DEBUG_METADATA and key == "detection_origin": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below where we hard wire "detection_origin" maybe we can use a parameter to hold those debug_metadata_keys
for checks so it is easier to change them later (just change one place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but at this point, I prefer not doing that as this is the only parameter with that behavior, I can add it if we begin to add variables just during debugging
…update (#1655) This pull request includes updated ingest test fixtures. Please review and merge if appropriate. Co-authored-by: benjats07 <[email protected]>
This PR adds support for
source
property fromunstructured_inference
, allowing the user to be able to see the origin of the data underdetection_origin
field environment variable UNSTRUCTURED_INCLUDE_DEBUG_METADATA=trueIn order to try this feature you can use this code:
And will print 'yolox' as source for all the elements