Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid codes being created during decode: debug_asserts for invariants #31

Open
HeroicKatora opened this issue Apr 20, 2022 · 0 comments

Comments

@HeroicKatora
Copy link
Member

HeroicKatora commented Apr 20, 2022

These codes are inserted into the table, but can't be used or referenced in the code text. They are just 'waste'.

I've thrown together a patch for to put in debug_asserts for the actual invariants that the code is working under:

Patch file
From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001
From: Andreas Molzer <[email protected]>
Date: Tue, 19 Apr 2022 21:29:13 +0200
Subject: [PATCH] Add debug assertions on internal invariants

---
 src/decode.rs | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/decode.rs b/src/decode.rs
index 283e31f..49f3bfd 100644
--- a/src/decode.rs
+++ b/src/decode.rs
@@ -711,7 +711,7 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
             Some(tup) => {
                 status = Ok(LzwStatus::Ok);
                 code_link = Some(tup)
-            },
+            }
         };
 
         // Track an empty `burst` (see below) means we made no progress.
@@ -827,6 +827,7 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
                 // the case of requiring an allocation (which can't occur in practice).
                 let new_link = self.table.derive(&link, cha, code);
                 self.next_code += 1;
+                debug_assert!(self.next_code as usize <= MAX_ENTRIES);
                 code = burst;
                 link = new_link;
             }
@@ -918,6 +919,8 @@ impl<C: CodeBuffer> Stateful for DecodeState<C> {
                 }
 
                 self.next_code += 1;
+                debug_assert!(self.next_code as usize <= MAX_ENTRIES);
+
                 new_link = link;
             } else {
                 // It's actually quite likely that the next code will be a reset but just in case.
@@ -1203,6 +1206,13 @@ impl Table {
     }
 
     fn derive(&mut self, from: &Link, byte: u8, prev: Code) -> Link {
+        debug_assert!(
+            self.inner.len() < MAX_ENTRIES,
+            "Invalid code would be created {:?} {} {:?}",
+            from.prev,
+            byte,
+            prev
+        );
         let link = from.derive(byte, prev);
         let depth = self.depths[usize::from(prev)] + 1;
         self.inner.push(link.clone());
-- 
2.35.1

The trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses == it relies on self.next_code <= self.code_buffer.max_code() but that doesn't hold. When we reach 12-bits then the code buffer does not get larger and max_code() remains at 4095. At the same time next_code will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.

But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance next_code beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:

if potential_code >= self.code_buffer.max_code() - Code::from(self.is_tiff) {

I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.

Originally posted by @HeroicKatora in #30 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant