Пытаясь сохранить электронную почту в виде HTML и PDF - кодирования проблемы, продолжайте иметь � , Â и \u2020
I'm trying to write a program that will download my emails and save them as PDF. I've encountered a problem with encoding. I'm using the email and imaplib modules. When I use this method to write the file: part.get_payload(decode=True) I get an html file with \u2013 and � in it. Writing the raw email in html works and doesn't show any � but it also shows the header of the email message, trying to get rid of the headers makes the � return. I've tried changing the encoding to ISO-8859-1 which removes the � but instead I get \u2020 and \u2013 Removing this line from the html solved the problem, until I converted it to PDF: <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:asp="remove"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"></meta><meta name="format-detection" content="telephone=no, date=no, address=no, email=no, url=no"></meta><style type="text/css"> When I converted it to PDF Â and â started appearing on the document. This is the code I wrote:
Что я уже пробовал:
m = imaplib.IMAP4_SSL('imap.mail.yahoo.com') m.login('xxxx@yahoo.com', 'xxxxx') m.select('IL', readonly=True) resp, data = m.search(None, '(SINCE "01-Jul-2019" BEFORE "29-Oct-2020" SUBJECT \"Your order\")') messages = data[0].split() for item in messages: typ, data = m.fetch(item, '(RFC822)') raw_email = data[0][1].decode("utf-8") email_message = email.message_from_string(raw_email) to_ = email_message['To'] from_ = email_message['From'] subject_= email_message['Subject'] date_ = email_message['date'] counter = 1 for part in email_message.walk(): if part.get_content_maintype() == "multipart": continue filename = part.get_filename() content_type = part.get_content_type() if not filename: ext = mimetypes.guess_extension(content_type) if not ext: ext = '.bin' filename = 'msg-part-%08d%s' %(counter, ext) counter +=1 save_path = os.path.join(os.getcwd(), "emails", date_, subject_) if not os.path.exists(r'save_path'): print (save_path) os.makedirs(r'save_path') with open(os.path.join(r'save_path', filename), 'wb') as fp: fp.write(part.get_payload(decode=True)) pdfkit.from_file('msg-part-00000001.htm', 'test.pdf')
<pre lang="Python">
Gerry Schmitz
Это пунктуация; проверьте свой набор символов.