ERROR: UID 11544 has defects preventing it from being processed! #165

poidl · 2023-10-20T15:14:57Z

General informations

system/distribution (with version): Arch Linux
offlineimap version (offlineimap -V): offlineimap v8.0.0, imaplib2 v3.06, Python v3.11.5 19 Sep 2023
Python version: Python v3.11.5

Logs, error

Oct 20 09:47:34 myhostname offlineimap[3647]: Copy message UID 11544 (4224/5482) Remote:INBOX -> Local:INBOX
Oct 20 09:47:34 myhostname offlineimap[3647]: UID 11544 has defects: [StartBoundaryNotFoundDefect(), MultipartInvariantViolationDefect()]
Oct 20 09:47:34 myhostname offlineimap[3647]: ERROR: UID 11544 (<[email protected]>) has defects preventing it from being processed!
Oct 20 09:47:34 myhostname offlineimap[3647]:   UnicodeEncodeError: 'ascii' codec can't encode characters in position 102-104: ordinal not in range(128)

Steps to reproduce the error

Construct email with header

Content-Type: multipart/alternative; boundary="yikVu2khOYniio2Jx"

but no start or stop boundary. For example, take any plain/text email and change the 'Content-Type' to the above

Then put any non-8-bit UTF-8 character into the email body.

Questions

I'm a beginner trying to archive my email, and there is one specific type of emails from a mailing list that caused the above error logs. I tried to find one of the troublesome emails folling this guide:

https://www.offlineimap.org/server/imap/error/2016/01/27/error-no-such-number.html

After identifying the email, I found it in my Kmail. I opened the raw message in KMail and noticed the following headers:

X-Virus-Scanned: amavisd-new at redacted.example.com
X-Amavis-Alert: BAD HEADER SECTION, MIME error: error: unexpected end of preamble

Then I opened

https://github.com/OfflineIMAP/offlineimap3/blob/master/offlineimap/folder/IMAP.py

changed the following section

        if len(ndata1.defects) > 0:
            # We don't automatically apply fixes as to attempt to preserve the original message
            self.ui.warn("UID {} has defects: {}".format(uids, ndata1.defects))
            if any(isinstance(defect, NoBoundaryInMultipartDefect) for defect in ndata1.defects):
                # (Hopefully) Rare defect from a broken client where multipart boundary is
                # not properly quoted.  Attempt to solve by fixing the boundary and parsing
                self.ui.warn(" ... applying multipart boundary fix.")
                ndata1 = self.parser['8bit-RFC'].parsebytes(self._quote_boundary_fix(data[0][1]))

to

        if len(ndata1.defects) > 0:
            # We don't automatically apply fixes as to attempt to preserve the original message
            self.ui.warn("UID {} has defects: {}".format(uids, ndata1.defects))
            if any(isinstance(defect, NoBoundaryInMultipartDefect) for defect in ndata1.defects):
                # (Hopefully) Rare defect from a broken client where multipart boundary is
                # not properly quoted.  Attempt to solve by fixing the boundary and parsing
                self.ui.warn(" ... applying multipart boundary fix.")
                ndata1 = self.parser['8bit-RFC'].parsebytes(self._quote_boundary_fix(data[0][1]))
            if myid.split('@')[1] == 'sender.example.com>':
                if any(isinstance(defect, MultipartInvariantViolationDefect) for defect in ndata1.defects):
                    ndata1.replace_header("content-type", 'plain/text')

which resulted in offlineimap3 downloading the file. My lines may very well be nonsense, I only started to read about emails today. The point of the modification was to download the actual message to see what's going on.

Then I thought: offlineimap3 should indeed throw an error here, if the email isn't constructed correctly.

But then I had some doubts, and now I'm not sure what to think anymore:

First, note that the error logged above only appears if there is a non-ascii character in the (intended) body. If there are all ascii characters, the file is downloaded and stored fine. If I understand correctly, it's because even if the "body" is interpreted as metadata/headers, it does not throw an exception if the parsing as ascii works. But isn't the email still corrupt? For example, I tried to search it with notmuch, specifically using the body: search term, and it could not find anything. Perhaps parses the email again, separately from offlineimap3, and also interprets the (intended) body as metadata. When I trid searching with Kmail, again filtering the body, it worked.
Second, for archiving purposes, I'd like to download pretty much everything, even poorly constructed emails.

The forum software that sent the email is phpBB, but only specific types of emails (of type "watch forum") have the weired content-type, the others are fine with

Content-Type: text/plain; charset="us-ascii"

Edit UTC 15:17,Friday, 20 October 2023 : I notified the forum administrator and they said they are looking if a bug is filed in phpBB regarding this problem.

Edit 15:56:48 UTC Friday, 20 October 2023
Related:
#160
#107

The text was updated successfully, but these errors were encountered:

thekix · 2023-11-22T02:01:11Z

Hi,

after apply #107, is this problem solved?

Best regards,
kix

poidl · 2023-11-22T08:03:47Z

Sorry I can't test it any longer. Feel free to close, I'll reopen an issue if the bug appears again.

Thanks!

poidl mentioned this issue Oct 20, 2023

offlineimap3 produces error due to defect in Spam email #160

Open

thekix closed this as completed Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: UID 11544 has defects preventing it from being processed! #165

ERROR: UID 11544 has defects preventing it from being processed! #165

poidl commented Oct 20, 2023 •

edited

Loading

thekix commented Nov 22, 2023 •

edited

Loading

poidl commented Nov 22, 2023

ERROR: UID 11544 has defects preventing it from being processed! #165

ERROR: UID 11544 has defects preventing it from being processed! #165

Comments

poidl commented Oct 20, 2023 • edited Loading

General informations

Logs, error

Steps to reproduce the error

Questions

thekix commented Nov 22, 2023 • edited Loading

poidl commented Nov 22, 2023

poidl commented Oct 20, 2023 •

edited

Loading

thekix commented Nov 22, 2023 •

edited

Loading