Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printer MF3228 encoded data [CCITT G4/CARPS] #23

Closed
redjoe opened this issue Nov 27, 2022 · 7 comments
Closed

Printer MF3228 encoded data [CCITT G4/CARPS] #23

redjoe opened this issue Nov 27, 2022 · 7 comments
Labels
wontfix This will not be worked on

Comments

@redjoe
Copy link

redjoe commented Nov 27, 2022

Pls help understand encoded data printer MF3228

White pages encoded:
ff ff ff … 00 08 80 A4 300dpi, length 426 bytes
ff ff ff … 00 08 80 A4 600dpi, length 846 bytes
ff ff ff … 00 08 80 A5 300dpi, length 298 bytes

duplicate \xff bytes omitted

Black fill pages:
64 05 74 a0 f8 ff … ff 01 10 00 01 A4 300dpi, length 854 bytes
64 05 7c 40 01 93 ff … ff 1f 00 01 10 A4 600dpi, length 1701 bytes
64 05 da 60 F5 ff … ff 01 10 00 01 A5 300dpi, length 598 bytes

Left vertical black line. Origin top left corner.
All samples with format A4, 300 dpi. Width 2360px, height 3384px
Offset of the printable area relative to the left side (the origin) is 5.165mm.
Pixel width is approximately 0.085mm.
I printed samples with the Inkscape.

example data for width 5.165mm, 1px
64 d5 ff … ff 0f 80 00 08 data length 1274 bytes

hex bytes pixels width, mm end bytes
64 d5 01100100 11010101 1px 5.165 0f 80 00 08
64 fd 01100100 11111101 2px 5.25 07 40 00 04
64 ed 01100100 11101101 3px 5.335 07 40 00 04
64 f5 01100100 11110101 4px 5.42 0f 80 00 08
64 e5 01100100 11100101 5px 5.505 1f 00 01 10
64 a5 01100100 10100101 6px 5.59 1f 00 01 10
64 c5 01100100 11000101 7px 5.675 3f 00 02 20
64 45 01100100 01000101 8px 5.76 7f 00 04 40
64 45 ef 01100100 01000101 11101111 9px 5.843 7f 00 04 40
64 85 fc 01100100 10000101 11111100 10px 5.93 00 08 80
64 85 fe 01100100 10000101 11111110 11px 6.015 00 08 80
64 85 ff 01100100 10000101 11111111 12px 6.1 00 08 80
64 05 66 b0 fe 01100100 00000101 01100110 10110000 11111110 - 35mm 01 10 00 01
64 05 16 20 f9 01100100 00000101 00010110 00100000 11111001 - 42mm 01 10 00 01
64 05 36 60 f2 01100100 00000101 00010110 00100000 11111001 - 52.2mm 02 20 00 02
64 05 6e 30 f3 01100100 00000101 01101110 00110000 11110011 - 105mm 03 20 00 02
64 05 04 fa 01100100 00000101 00000100 11111010 - 157.5mm 03 20 00 02

left_vertical_black_line

@redjoe redjoe changed the title printer MF3228 encoded data Printer MF3228 encoded data Nov 27, 2022
@mounaiban
Copy link
Owner

mounaiban commented Dec 4, 2022

MF3228 reportedly uses CARPS, not CAPT.

The carps-cups driver would likely be the place to start. The carps.txt file documents the compression format on known devices, hopefully the MF3228 won't be too different.

UPDATE: I just realised there is an existing issue on the carps-cups repo that concerns MF3228 support, I'll just link it so the others can find the really useful info you posted here: ondrej-zary/carps-cups#15

I'm afraid I won't be able to help you much at this point, as I am not yet familiar with some of the techniques used in the compression. This issue has been closed as support for this device is beyond the scope of captdriver.

@mounaiban
Copy link
Owner

Maybe I understand what's going on with the white pages.
I'm just going to assume that the uncompressed image is encoded like Netpbm P4, based on the assumption that MF3228 only does mono printing: one bit per pixel, eight pixels per byte

The white pages appear to be RLE-compressed:

2360 * 3384px / A4 300 dpi (120px W, 123px H crop)

Encoded in 295B * 3384 == 998280B
Assuming that 00 is some indicator for RLE mode and 08 80 (2176) is a repeat count, the compressed version is encoded in
423B * 2176 == 920448B which is somewhat close. There might be something else going on...
The 3384 line count might be the result of rounding to the next lower multiple of 8 or 4.

A4 600 dpi (120px crop in both sides)

The data size is twice as large as the compressed A4 300dpi white page 😁

A5 300 dpi

A5 is 5.8x8.3 in == 1470x2490px raw
1350 * 2370 with 120px crop
1344 * 2370 with rounding to byte size == 168B * 2370 == 398160B
295B * 2370 == 641920B, close to 320960B * 2 (did you mean A5 600 dpi?)

Black pages and Black Stripe

Black pages seem to have very different starting bytes for different image sizes. Both black and white pixels are referenced with \xff. Could there be some kind of dictionary in use? Could it be LZ77 (which can look like RLE in some cases)?

The black stripe pages seem to suggest that some kind of dictionary encoding is in use, which LZ family encoders are based on.

Earlier this year, I wrote a Python script sample_blots.py that generates a bunch of patterns for studying RLE compression. I hope it helps here too...

Just be careful to avoid accidentally overwriting files with the script, the overwrite detection in the script is a little lacking ⚠️

@redjoe
Copy link
Author

redjoe commented Dec 6, 2022

Data compression by CCITT Group 4.

@mounaiban
Copy link
Owner

I repeated your black page and white page experiments with a hand-coded SVG and an rsvg-convert-GhostScript pipeline, and got similar but different results:

Black Page SVG ⚫

<?xml version='1.0' encoding='UTF-8' standalone='no' ?>
<svg width='210mm' height='297mm' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'>
<desc>Just a blank, black A4 page</desc>
<rect x='0' y='0' width='210mm' height='297mm' stroke='black' fill='black' />
</svg>

White Page SVG ⚪

<?xml version='1.0' encoding='UTF-8' standalone='no' ?>
<svg width='210mm' height='297mm' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'>
<desc>Just a blank, white A4 page</desc>
<rect x='0' y='0' width='210mm' height='297mm' stroke='white' fill='white' />
</svg>

My pipeline:

rsvg-convert -f pdf -o $PDF_FILE $SVG_FILE
gs -dSAFER -dNOPAUSE -dNOPROMPT -r 600 -SDEVICE=faxg4 -o $IMAGE_FILE $PDF_FILE

Try rsvg-convert -x 0.801 -y 0.801 if the resulting PDF has a larger page size than expected (version 2.40.2 needs this fix)

Results:
A4 600dpi Black Page: 26 a0 3e 03 81 af ff ... ff fe 0a (1762 bytes)
A4 600dpi White Page: ff .. ff 80 0a (879 bytes)

@redjoe
Copy link
Author

redjoe commented Dec 13, 2022

Repeated steps and add option page size -g4720x6768.
gs -dSAFER -dNOPAUSE -dNOPROMPT -r600 -SDEVICE=faxg4 -g4720x6768 -o $IMAGE_FILE $PDF_FILE
I got a result 26 a0 3e 02 80 c9 ff ... ff f8 black page 600dpi.
Compare with implement driver 64 05 7c 40 01 93 ff ... see difference bit numbering.

26 a0 3e 02 80 c9 -> 00100110 10100000 00111110 00000010 10000000 11001001
64 05 7c 40 01 93 -> 01100100 00000101 01111100 01000000 00000001 10010011

Try decompose result from GhostScript 26 a0 3e 02 80 c9 where MSB2LSB bit order:

|Horizontal Mode Coding
|--
|  |a0a1, distance = 0 (White codes)
|  |--------
|  |        |a1a2, coding length 2560 (Black codes)
|  |        |------------
|  |        |            |a1a2, coding length 2112 (Black codes)
|  |        |            |-------------
|  |        |            |             |a1a2, coding length 48 (Black codes)
|  |        |            |             |------------
00100110 10100000 00111110 00000010 10000000 11001001

2560 + 2112 + 48 = 4720px.
Get code word length by link https://www.itu.int/rec/T-REC-T.6-198811-I/en or libtiff/t4.h.

For your example A4 600dpi Black Page 26 a0 3e 03 81 af. I got width 2560 + 2368 + 39 = 4967px.

|Horizontal Mode Coding
|--
|  |a0a1, distance = 0 (White codes)
|  |--------
|  |        |a1a2, coding length 2560 (Black codes)
|  |        |------------
|  |        |            |a1a2, coding length 2368 (Black codes)
|  |        |            |-------------
|  |        |            |             |a1a2, coding length 39 (Black codes)
|  |        |            |             |------------
00100110 10100000 00111110 00000011 10000001 10101111

I got difference ending file from GhostScript. I didn't see end-of-facsimile block (EOF). The format if EOF 0000 0000 0001 0000 0000 0001. Maybe faxg4:

Group 4 fax, with EOLs but no header or EOD.

A4 600dpi Black Page: 26 a0 3e 03 81 af ff ... ff f0 where width 4967px
A4 600dpi Black Page: 26 a0 3e 03 81 af ff ... ff f8 where width 4720px
A4 600dpi Black Page: 64 05 7c 40 01 93 ff … ff 1f 00 01 10 canon driver with LSB2MSB bit order

@mounaiban
Copy link
Owner

Just in case it matters, I was using GhostScript 9.26 from late 2018, but I doubt that makes much of a difference, unlike JPEG or other lossy compression codecs.

Looks like using the GS encoder as-is won't work, but it looks to me that the changes required won't be too difficult to implement. Or maybe I might have missed some option that enables LSB-first mode? (it would make things so easy if there was such a thing!)

@redjoe
Copy link
Author

redjoe commented Jan 4, 2023

I found parameter dFillOrder=2 stored in lower-order bits of the byte.

@mounaiban mounaiban added the wontfix This will not be worked on label Jun 19, 2023
@mounaiban mounaiban changed the title Printer MF3228 encoded data Printer MF3228 encoded data [CCITT G4/CARPS] Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants