macSubtitleOCR

Overview

macSubtitleOCR is used to covert a file containing a PGS Subtitle stream to SubRip subtitles using OCR. Currently the supported input file types are .mkv and .sup. It uses the built in OCR engine in macOS to perform the text recognition, which works really well. For more information on accuracy, see Accuracy below.

Options

Ability to export images in the subtitle stream to compare and manually refine OCR output
Ability to use language recognition in the macOS OCR engine to improve OCR accuracy
Ability to export raw JSON output from OCR engine for inspection

Building

Important

This project requires Swift 6 work properly!

To get started with macSubtitleOCR, clone the repository and then build the project with Swift.

git clone https://github.com/ecdye/macSubtitleOCR
cd macSubtitleOCR
swift build

The completed build should be available in the .build/debug directory.

Testing

Tests compare the output to a know good output. We target a match of at least 90% as different machines will produce different output.

swift test

Accuracy

In simple tests against the Tesseract OCR engine the accuracy of the macOS OCR engine has been significantly better. This improvement is especially noticable with words like 'I', especially when italicized. The binary image compare method used in projects like SubtitleEdit may be slightly more accurate, but it depends on the use case.

TODO (not necessarily in order)

Implement complete testing and formal linting / style guidelines
Implement an option to not output the .sup file when parsing from .mkv files (ie. perform the operation completely in memory)
Add additional test cases
Implement the ability to read .sub VobSub files and VobSub streams from .mkv files

Reference

https://blog.thescorpius.com/index.php/2017/07/15/presentation-graphic-stream-sup-files-bluray-subtitle-format/ https://www.matroska.org/technical/elements.html

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
Sources/macSubtitleOCR		Sources/macSubtitleOCR
Tests/macSubtitleOCRTests		Tests/macSubtitleOCRTests
.gitignore		.gitignore
.periphery.yml		.periphery.yml
.swiftformat		.swiftformat
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

macSubtitleOCR

Overview

Options

Building

Testing

Accuracy

TODO (not necessarily in order)

Reference

About

Releases

Languages

License

ecdye/macSubtitleOCR

Folders and files

Latest commit

History

Repository files navigation

macSubtitleOCR

Overview

Options

Building

Testing

Accuracy

TODO (not necessarily in order)

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages