Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not Pulling in All the Text #18

Open
mda1125 opened this issue Feb 8, 2023 · 3 comments
Open

Not Pulling in All the Text #18

mda1125 opened this issue Feb 8, 2023 · 3 comments
Assignees

Comments

@mda1125
Copy link

mda1125 commented Feb 8, 2023

Using Adobe DC on a MacOS
Saved as txt
The script runs without any errors

INFO:root:Total lines: 50061
INFO:root:Written Rows: 16

When I open it in Excel it's nicely formatted but 16 rows? Not sure why it's stopping short.

Text file looks good

Excel file looks great but incomplete
CSV file starts great and then just dumps to text as if the lines are broken

Using the CIS Microsoft Windows Server Benchmark 1.4.0.pdf

@mda1125
Copy link
Author

mda1125 commented Feb 8, 2023

I did download the older version that matches what's listed here

CIS_Microsoft_Windows_Server_2016_RTM_Release_1607_Benchmark_v1.1.0.txt

INFO:root:Total lines: 32672
INFO:root:Written Rows: 39

It didn't match what I see as the benchmarks Total lines or Written Rows here.. figured maybe it was a newer version of the CIS PDF that changed something but it doesn't seem to work with the original archived one

@mda1125
Copy link
Author

mda1125 commented Feb 17, 2023

It's because the 1.1 sections.. if they have any WRAPPING on them from the export from Adobe, it gets skipped. There's no options in Adobe Acrobat DC to select UFT-8 and it might be something CIS did with the PDF anyway.

Fix for me has been to go thru and upwrap the lines for the CIS # sections so that at least it gets imported. The descriptions, rationate and such seem to come over and you can work some magic with Excel or Google Sheets

But if there is a wrap on the CIS #... then it can get skipped entirely

@justin376802
Copy link

Can confirm this issue also occurs in CIS Google Chrome Benchmark v3.0.0.pdf. Removed wrap-around for this file:
CIS_Google_Chrome_Benchmark_v3.0.0_no_wrap_around.txt

@fragtastic fragtastic self-assigned this Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants