Skip to content

Commit

Permalink
feat: added YoloX types (#1284)
Browse files Browse the repository at this point in the history
This PR adds extra element types so that additional output classes from Yolox may be mapped to those element types. E.g., a Yolox `List-item` class is now mapped to a ListItem element type, whereas before it would have been UncategorizedText.
  • Loading branch information
benjats07 authored Sep 3, 2023
1 parent a475b44 commit f2af953
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 10 deletions.
13 changes: 3 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,13 @@

### Enhancements

### Features

* Add Salesforce Connector to be able to pull Account, Case, Campaign, EmailMessage, Lead

### Fixes

## 0.10.12-dev2

### Enhancements

* Removed PIL pin as issue has been resolved upstream
* YoloX element types added

### Features

* Add Salesforce Connector to be able to pull Account, Case, Campaign, EmailMessage, Lead

### Fixes

* Update version-sync to prevent duplicate release versions
Expand Down
16 changes: 16 additions & 0 deletions unstructured/documents/elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,14 @@ def to_dict(self) -> dict:
return out


class Formula(Element):
"An element containing formulas in a document"

category = "Formula"

pass


class Text(Element):
"""Base element for capturing free text from within document."""

Expand Down Expand Up @@ -553,4 +561,12 @@ class Footer(Text):
"Table": Table,
"Header": Header,
"Footer": Footer,
"Caption": FigureCaption,
"Footnote": Footer,
"Formula": Formula,
"List-item": ListItem,
"Page-footer": Footer,
"Page-header": Header, # Title?
"Picture": Image,
"Section-header": Header,
}

0 comments on commit f2af953

Please sign in to comment.