Skip to content

Commit

Permalink
enhancement: implement full-page OCR(#1133)
Browse files Browse the repository at this point in the history
*implements full-page OCR as supported in unstructured-inference=0.5.11.
  • Loading branch information
christinestraub authored Aug 16, 2023
1 parent be093d2 commit 0a23139
Show file tree
Hide file tree
Showing 12 changed files with 105 additions and 105 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
## 0.10.1-dev0
## 0.10.1-dev1

### Enhancements
* Bump unstructured-inference==0.5.10:
- implement full-page OCR

### Features

### Fixes
* Fix dead links in repository README (Quick Start > Install for local development, and Learn more > Batch Processing)
Expand Down
2 changes: 2 additions & 0 deletions requirements/constraints.in
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,5 @@ Pillow<10.0.0
# NOTE(alan) Pinned to avoid error that occurs with 2.4.3:
# AttributeError: 'ResourcePath' object has no attribute 'collection'
Office365-REST-Python-Client<2.4.3
# NOTE(christine) Pinned to set the `unstructured-inference` version
unstructured-inference==0.5.10
2 changes: 1 addition & 1 deletion requirements/extra-pdf-image.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ pdfminer.six
# NOTE(robinson) - See this issue here
# https://github.com/facebookresearch/detectron2/issues/5010
Pillow<10
unstructured-inference==0.5.9
unstructured-inference
6 changes: 4 additions & 2 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -205,8 +205,10 @@ typing-extensions==4.7.1
# torch
tzdata==2023.3
# via pandas
unstructured-inference==0.5.9
# via -r requirements/extra-pdf-image.in
unstructured-inference==0.5.10
# via
# -c requirements/constraints.in
# -r requirements/extra-pdf-image.in
urllib3==1.26.16
# via
# -c requirements/base.txt
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
[
{
"type": "Title",
"element_id": "0c4e18d78e721c8179f3946b75b17d15",
"element_id": "88591a76b54e47215c0827ae8838ec13",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Instructions for Form 3115 (Rev. November 1987) Annlicatinn far Chance in Accounting Mathond"
"text": "Instructions for Form 3115 (Rev. November 1987)"
},
{
"type": "NarrativeText",
"element_id": "41f3d9c83b2b4679195c9796134fd8f5",
"element_id": "766cf1d1243ef2cdbb0db5ad32d7f9c9",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -21,7 +21,7 @@
},
{
"type": "ListItem",
"element_id": "97968e4ba14bd2d082a70ec61ef2d9b1",
"element_id": "36a565493a214d3f7e7f24794c1dc7f4",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand Down Expand Up @@ -111,7 +111,7 @@
},
{
"type": "ListItem",
"element_id": "f0d2beb7f43493694a91137e8e65b5f3",
"element_id": "59bc2945a7f606bd5078bac3bc1199d4",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -121,7 +121,7 @@
},
{
"type": "ListItem",
"element_id": "13f2a282f705590fbe7b6ce15b08862a",
"element_id": "5157d731aa6a97c9b166799db2295bce",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -141,7 +141,7 @@
},
{
"type": "ListItem",
"element_id": "9820f79275e683f5afe3f2f1283de4ca",
"element_id": "34b66452ca63c465c69d849e4acf6d46",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -161,7 +161,7 @@
},
{
"type": "ListItem",
"element_id": "a98378f4a88db65dff42b7d8bd75be92",
"element_id": "b0fa5aaff0cee8574822dd8ac6537c06",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -181,7 +181,7 @@
},
{
"type": "ListItem",
"element_id": "3cb57c50002187a715e1c5048e643c65",
"element_id": "13f155c0754434406190f3cf49c82c3c",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -201,33 +201,33 @@
},
{
"type": "ListItem",
"element_id": "beeb50db70ce1aa76813cce98e46bd56",
"element_id": "178d6933ed193747b1c4aa1c048e7f94",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "for these changes. Tb od Db bee Cl"
"text": "for these changes."
},
{
"type": "NarrativeText",
"element_id": "640a100da1a3bee6f1f134c51a2c8648",
"element_id": "7685df2334a5f6c8c8099dea61a8f1b4",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Long-term contracts.—If you are required to change your method of accounting for long-term contracts under section 460, see Notice 87-61 (9/21/87), 1987-38 IRB 40, for the notification procedures that must be followed"
"text": "Long-term contracts.—If you are required to change your method of accounting for long-term contracts under section 460, see Notice 87-61 (9/21/87), 1987-38 IRB 40, for the notification procedures that must be followed."
},
{
"type": "Title",
"element_id": "a232d246e22a4f6bb8dcab62cffb2567",
"element_id": "61ed58fa51293f429f87e8cf1896c9e4",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Paperwork Reduction Act Notice We ack for thic infarenatinn te marry mye the."
"text": "Paperwork Reduction Act Notice"
},
{
"type": "Title",
Expand All @@ -241,37 +241,27 @@
},
{
"type": "ListItem",
"element_id": "58f1649a32eda8b8c513e51a209666a6",
"element_id": "5f8051f8010896bab02aaf784c04ae02",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Signature Individuals.—An individual desiring the change should sign the application. Ifthe application pertains to a husband and wife filing a joint Income tax return, the names of both should appear in the heading and both should sign Partnerships.—The form should be signed with the partnership name followed by the signature of one of the general partners and the words “General Partner.” Corporations, cooperatives, and insurance companies.—The form should show the name of the corporation, cooperative, or insurance Company and the signature of the president, vice president, treasurer, assistant treasurer, or chief accounting officer (such as tax officer) authorized tosign, and his or her official title. Receivers, trustees, or assignees must sign any application they are required to file, For a subsidiary corporation filing a consolidated return with its parent, the form should be signed by an officer of the parent corporation, Fiduciaries.—The-form should show the name of the estate or trust and be signed by the fiduciary, personal representative, executor, executrix, administrator, administratrx, etc’, having legal authority to'sign, and his or her ttle. Preparer other than partner, officer, etc.—The signature of the individual preparing the application should appear in the space provided on page"
},
{
"type": "ListItem",
"element_id": "586e989b479e4362ebe28a6954c1427b",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "If the individual or firm is also authorized to"
"text": "Individuals.—An individual desiring the change should sign the application. Ifthe application pertains to a husband and wife filing a joint Income tax return, the names of both should appear in the heading and both should sign Partnerships.—The form should be signed with the partnership name followed by the signature of one of the general partners and the words “General Partner.” Corporations, cooperatives, and insurance companies.—The form should show the name of the corporation, cooperative, or insurance Company and the signature of the president, vice president, treasurer, assistant treasurer, or chief accounting officer (such as tax officer) authorized tosign, and his or her official title. Receivers, trustees, or assignees must sign any application they are required to file, For a subsidiary corporation filing a consolidated return with its parent, the form should be signed by an officer of the parent corporation, Fiduciaries.—The-form should show the name of the estate or trust and be signed by the fiduciary, personal representative, executor, executrix, administrator, administratrx, etc’, having legal authority to'sign, and his or her ttle. Preparer other than partner, officer, etc.—The signature of the individual preparing the application should appear in the space provided on page"
},
{
"type": "NarrativeText",
"element_id": "446ccb7d96fea659d50aef8a6dd670df",
"element_id": "4660422c06dddc914ab634c5e4045dec",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "We ask for this information to carry out the Internal Revenue laws of the United States. We need it to ensure that taxpayers are complying with these laws an¢ to allow us to figure and collect the right amount of tax. You are required to give us this information,"
"text": "We ask for this information to carry out the Internal Revenue laws of the United States. We need it to ensure that taxpayers are complying with these laws an¢ to allow us to figure and collect the nght amount of tax. You are required to give us this information."
},
{
"type": "Title",
"element_id": "226fa83297914d5195e002508d61fb1d",
"element_id": "a1547a4ed1611eee44b15e99120fb978",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -281,77 +271,77 @@
},
{
"type": "Title",
"element_id": "f0e951e5bcb4a6070fa6672b37822348",
"element_id": "68a3289177b49b285e133a5267eb355f",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Purpose of Form Cin bce Secon te cece cget."
"text": "Purpose of Form"
},
{
"type": "NarrativeText",
"element_id": "5e5451e052baf894b2bdad4132f6cd2f",
"element_id": "f9b8e17da7a31507773f78959378e09c",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "ee File this form to request a change in your accounting method, including the accounting treatment of any item. if you are requesting 2 change in accounting period, use Form 1128, Application for Change in Accounting Period. For more information, see Publication 538, Accounting Periods and Methods,"
"text": "File this form to request a change in your accounting method, including the accounting treatment of any item. if you are requesting 2 change in accounting period, use Form 1128, Application for Change in Accounting Period. For more information, see Publication 538, Accounting Periods and Methods,"
},
{
"type": "NarrativeText",
"element_id": "cc1701e3ce9347e344b3df80d426bd21",
"element_id": "b3859f2f29884b1d3ba0892e52859a99",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Seti aes When filing Form 3115, taxpayers are reminded to determine if IRS has published a ruling or procedure dealing with the specific type of change since November 1987 (the current. revision date of Form 3115)"
"text": "When filing Form 3115, taxpayers are reminded to determine if IRS has published a ruling or procedure dealing with the specific type of change since November 1987 (the current. revision date of Form 3115)"
},
{
"type": "NarrativeText",
"element_id": "b81dc18d0f8666f9bf7400a00657dc72",
"element_id": "e5a95dc10d4071983b70898a21f11175",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "POMS SANE OPFOR DA 29). Generally, applicants must complete Section ‘A. In addition, complete the appropriate sections (B:1 through H) for which a change is desired. You must give alll relevant facts, including a"
"text": "Generally, applicants must complete Section ‘A. In addition, complete the appropriate sections (B:1 through H) for which a change is desired."
},
{
"type": "Title",
"element_id": "c7502aa5b000d6446f3eca882518a260",
"element_id": "5756fb398995bb6518a87637f24f426e",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Time and Place for Filing amarall, ammlimeete maet file snete"
"text": "Time and Place for Filing"
},
{
"type": "NarrativeText",
"element_id": "8b35e7c212710b1099b675ce9394fb47",
"element_id": "25f830e7c39c115c9937eb9d11cfb1f2",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Se NB ON State whether you desire a conference in the National Office if the Service proposes to disapprove your application."
"text": "State whether you desire a conference in the National Office if the Service proposes to disapprove your application"
},
{
"type": "Title",
"element_id": "0a16a0fea889be77576c0fd88575554a",
"element_id": "8b06cd6e2bf7fc15130d5d9ed7e66283",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Affiliated Groups Tavmayare that ara mam)"
"text": "Affiliated Groups"
},
{
"type": "Title",
"element_id": "68b58298cabd9069c975b192a7183139",
"element_id": "242a9dba10a04654d4adef9c58ff96f6",
"metadata": {
"data_source": {},
"filetype": "image/png",
Expand All @@ -361,62 +351,62 @@
},
{
"type": "Title",
"element_id": "6a8881a6e87021b2362243f7df3e4b1d",
"element_id": "11c98a9cbd6a200fbc5b93fed15007ac",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Uniform capitalization rules and limitation on cash method.—If you are required to char"
"text": "Uniform capitalization rules and limitation on"
},
{
"type": "Title",
"element_id": "8daeb8b48fb666f1dd54e2af283d0c22",
"element_id": "58703de56debc34a1d68e6ed6f8fd067",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Specific Instructions Section A Neem Ea mama 1 !Taeahle inemes"
"text": "Specific Instructions Section A"
},
{
"type": "Title",
"element_id": "09203a0c6955f64ca8eb52cd6ea47034",
"element_id": "a4316c02df07840f1beb56609cb09735",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Late Applications Me coup armlimatinm te ler"
"text": "Late Applications"
},
{
"type": "NarrativeText",
"element_id": "962e3f0ceb1f0b1b08a1c19adde8d962",
"element_id": "39458f370b98a606db29ac6dee975e07",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "lethal elaine bela Disregard the instructions under Time and Place for Filing and Late Applications. instead, attach Form 3115 to your income tax return for the year of change; do not file it separately. Also include on a separate statement accompanying the Form 3115 the period over which the section 481(2) adjustment will be taken into account and the basis for that conclusion. Identify the"
"text": "Disregard the instructions under Time and Place for Filing and Late Applications. instead, attach Form 3115 to your income tax return for the year of change; do not file it separately. Also include on a separate statement accompanying the Form 3115 the period over which the section 481(2) adjustment will be taken into account and"
},
{
"type": "Title",
"element_id": "bfe98eb672d95c15a11ed3e618928b4e",
"element_id": "025a65465b6fd9635316e92633b24c7e",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "Identifying Number Ndiuidesale Am omptisoehesal"
"text": "Identifying Number"
},
{
"type": "NarrativeText",
"element_id": "87f8128b03a72c616ee1a1bb91e11c56",
"element_id": "9240bfa889b87dc2fb3fa746ca4eeeb4",
"metadata": {
"data_source": {},
"filetype": "image/png",
"page_number": 1
},
"text": "—e—e—— eee Others.-—The employer identification number of an applicant other than an individual should be entered in this block,"
"text": "Others.-—The employer identification number of an applicant other than an individual should be entered in this block,"
}
]
Loading

0 comments on commit 0a23139

Please sign in to comment.