formataddr() and unicode

I often see code like this:

message["To"] = formataddr((name, email))

This looks like it should work, especially since the docstring of formataddr() says that it will return a string value suitable for a To or Cc header. However, while it works most of the time, it fails if name is a unicode string containing non-ascii characters. It may look ok if you look simply read message["To"], but as soon as you convert the message or header to a byte string, you will see the problem.

>>> from email.Message import Message
>>> from email.Utils import formataddr
>>> msg = Message()
>>> msg["To"] = formataddr((u"Björn", "bjorn@tillenius.me"))
>>> msg["To"]
u'Bj\xf6rn <bjorn@tillenius.me>'
>>> msg.as_string()
'To: =?utf-8?b?QmrDtnJuIDxiam9ybkB0aWxsZW5pdXMubWU+?=\n\n'

Most code that will use the To address in the example will fail, since there’s no visible e-mail address in there. The header should look like this, i.e. only the name itself should be encoded:

To: =?utf-8?b?QmrDtnJu?= <bjorn@tillenius.me>

I wish Python would handle this better. I usually end up writing a helper function like this for projects I work on:

def format_address(name, email):
    email = str(email)
    if not name:
        return email
    name = str(Header(name))
    return formataddr((name, email))
Add post to:   Delicious Reddit Slashdot Digg Technorati Google
Make comment

Pingbacks

27.06.2013 8:55 propecia @www.treintadetreinta.org
21.06.2013 10:03 rimonabantexcellence site title @www.rimonabantexcellence.com
Hello http://tillenius.me/blog/2011/02/11/formataddr-and-unicode/

Comments

Hey, Björn, long time no see!

Here’s my version from way back when: http://mg.pov.lt/blog/unicode-emails-in-python.html

Looks like it could be simplified somewhat for the modern age. I remember trying to encode in Latin-1 before falling back to UTF-8. You probably won’t find an email client that doesn’t support UTF-8 in 2011.

Björn Tillenius 18.02.2011 8:16

Hi Marius!

I seem to remember that parseaddr() had some problems dealing with certain unicode strings. But I can’t reproduce it, so either I’m remembering incorrectly, or it has been fixed. It’s been a long time since I ran into this the first time, that’s why I wrote it down this time, to make it easier to remember.

Required. 30 chars of fewer.

Required.