This repository has been archived by the owner on Jun 5, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Flight data for Malaysian Airlines Flight MH370
License
cfinch/MH370_data
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Data for Flight MH370 Aircraft Location Data ---------------------- To my knowledge, precise aircraft location data has not been released. If this data is available, please add it! Satellite Data -------------- This data was extracted from the PDF file provided by the government of Malaysia to the public at the URL shown in Reference 1. The process used to extract the data is described below. KNOWN ERRORS The OCR software (tesseract) had some problems recognizing periods and colons in the time stamps for the in-flight data. For some reason, the pre-flight data and the Appendices did not have this issue. NOTES I split the first column (Time) into separate columns for Date and Time. Note that the date format is day/month/year. EXTRACTION PROCESS Convert a multi-page PDF to a series of images: gs -r300 -sDEVICE=pngmono -dTextAlphaBits=4 -o Pages/mh370_p%02d.png mh370.pdf To crop out headers and page numbers, first find the bounding box [2]: gs -q -dBATCH -dNOPAUSE -sDEVICE=bbox -dLastPage=1 mh370.pdf 2>&1 | grep %%BoundingBox Result in points (72p = 1 inch): %%BoundingBox: 42 51 768 544 lower left corner: (42, 51) upper right corner: (768, 544) Need to multiply size by 720/72 because of 720dpi output Produce a test PNG by cropping table header and page number: gs -dNOPAUSE -dBATCH -sDEVICE=pnggray -r720 -dFirstPage=4 -dLastPage=4 -sOutputFile=test.png -g7690x3940 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf gs -dNOPAUSE -dBATCH -sDEVICE=pnggray -r720 -dFirstPage=21 -dLastPage=21 -sOutputFile=test.png -g7690x4100 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf Cropping for pages 4-20: gs -dNOPAUSE -dBATCH -sDEVICE=pngmono -r720 -sOutputFile=Pages/mh370_p%02d.png -g7690x3940 -dFirstPage=1 -dLastPage=20 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf Cropping for pages 21-41: gs -dNOPAUSE -dBATCH -sDEVICE=pngmono -r720 -sOutputFile=Pages/mh370_p%02d.png -g7690x4100 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf OCR to convert image to text: tesseract mh370_p04.png text_p04 -psm 6 REFERENCES [1] http://www.dca.gov.my/mainpage/MH370%20Data%20Communication%20Logs.pdf [2] http://stackoverflow.com/questions/12484353/how-to-crop-a-section-of-a-pdf-file-to-png-using-ghostscript [3] https://code.google.com/p/tesseract-ocr/
About
Flight data for Malaysian Airlines Flight MH370
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published