eml_parser serves as a python module for parsing eml files and returning various information found in the e-mail as well as computed information.
Extracted and generated information include but are not limited to:
- attachments
- hashes
- names
- from, to, cc
- received servers path
- subject
- list of URLs parsed from the text content of the mail (including HTML body/attachments)
Please feel free to send me your comments / pull requests.
For the changelog, please see CHANGELOG.md.
pip install eml_parser[filemagic]
pip install eml_parser
Make sure to install libmagic, else eml_parser will not work.
It has been reported (in #60) that there are parsing issues in some particular cases which seem to be caused by a bug in the email module of the Python standard library. At least versions <=3.7.4 are affected.
Python versions >=3.7.11 are not affected. If you do get KeyError exceptions on header field parsing, you should consider upgrading to a more recent version of Python.
-> Please open an issue if the error persists after upgrading.
import datetime
import json
import eml_parser
def json_serial(obj):
if isinstance(obj, datetime.datetime):
serial = obj.isoformat()
return serial
with open('sample.eml', 'rb') as fhdl:
raw_email = fhdl.read()
ep = eml_parser.EmlParser()
parsed_eml = ep.decode_email_bytes(raw_email)
print(json.dumps(parsed_eml, default=json_serial))
Which gives for a minimalistic EML file something like this:
{
"body": [
{
"content_header": {
"content-language": [
"en-US"
]
},
"hash": "6c9f343bdb040e764843325fc5673b0f43a021bac9064075d285190d6509222d"
}
],
"header": {
"received_src": null,
"from": "[email protected]",
"to": [
"[email protected]"
],
"subject": "Sample EML",
"received_foremail": [
"[email protected]"
],
"date": "2013-04-26T11:15:47+00:00",
"header": {
"content-language": [
"en-US"
],
"received": [
"from localhost\tby mta.example.com (Postfix) with ESMTPS id 6388F684168\tfor <[email protected]>; Fri, 26 Apr 2013 13:15:55 +0200"
],
"to": [
"[email protected]"
],
"subject": [
"Sample EML"
],
"date": [
"Fri, 26 Apr 2013 11:15:47 +0000"
],
"message-id": [
"<[email protected]>"
],
"from": [
"John Doe <[email protected]>"
]
},
"received_domain": [
"mta.example.com"
],
"received": [
{
"with": "esmtps id 6388f684168",
"for": [
"[email protected]"
],
"by": [
"mta.example.com"
],
"date": "2013-04-26T13:15:55+02:00",
"src": "from localhost by mta.example.com (postfix) with esmtps id 6388f684168 for <[email protected]>; fri, 26 apr 2013 13:15:55 +0200"
}
]
}
}