Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/docs/topics/python3.txt
blob: dfe19c0a373f3366625f610d48c94213f4c10d8b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
===================
Porting to Python 3
===================

Django 1.5 is the first version of Django to support Python 3. The same code
runs both on Python 2 (≥ 2.6.5) and Python 3 (≥ 3.2), thanks to the six_
compatibility layer.

.. _six: http://pythonhosted.org/six/

This document is primarily targeted at authors of pluggable applications
who want to support both Python 2 and 3. It also describes guidelines that
apply to Django's code.

Philosophy
==========

This document assumes that you are familiar with the changes between Python 2
and Python 3. If you aren't, read `Python's official porting guide`_ first.
Refreshing your knowledge of unicode handling on Python 2 and 3 will help; the
`Pragmatic Unicode`_ presentation is a good resource.

Django uses the *Python 2/3 Compatible Source* strategy. Of course, you're
free to chose another strategy for your own code, especially if you don't need
to stay compatible with Python 2. But authors of pluggable applications are
encouraged to use the same porting strategy as Django itself.

Writing compatible code is much easier if you target Python ≥ 2.6. Django 1.5
introduces compatibility tools such as :mod:`django.utils.six`, which is a
customized version of the :mod:`six module <six>`. For convenience,
forwards-compatible aliases were introduced in Django 1.4.2. If your
application takes advantage of these tools, it will require Django ≥ 1.4.2.

Obviously, writing compatible source code adds some overhead, and that can
cause frustration. Django's developers have found that attempting to write
Python 3 code that's compatible with Python 2 is much more rewarding than the
opposite. Not only does that make your code more future-proof, but Python 3's
advantages (like the saner string handling) start shining quickly. Dealing
with Python 2 becomes a backwards compatibility requirement, and we as
developers are used to dealing with such constraints.

Porting tools provided by Django are inspired by this philosophy, and it's
reflected throughout this guide.

.. _Python's official porting guide: https://docs.python.org/3/howto/pyporting.html
.. _Pragmatic Unicode: http://nedbatchelder.com/text/unipain.html

Porting tips
============

Unicode literals
----------------

This step consists in:

- Adding ``from __future__ import unicode_literals`` at the top of your Python
  modules -- it's best to put it in each and every module, otherwise you'll
  keep checking the top of your files to see which mode is in effect;
- Removing the ``u`` prefix before unicode strings;
- Adding a ``b`` prefix before bytestrings.

Performing these changes systematically guarantees backwards compatibility.

However, Django applications generally don't need bytestrings, since Django
only exposes unicode interfaces to the programmer. Python 3 discourages using
bytestrings, except for binary data or byte-oriented interfaces. Python 2
makes bytestrings and unicode strings effectively interchangeable, as long as
they only contain ASCII data. Take advantage of this to use unicode strings
wherever possible and avoid the ``b`` prefixes.

.. note::

    Python 2's ``u`` prefix is a syntax error in Python 3.2 but it will be
    allowed again in Python 3.3 thanks to :pep:`414`. Thus, this
    transformation is optional if you target Python ≥ 3.3. It's still
    recommended, per the "write Python 3 code" philosophy.

String handling
---------------

Python 2's `unicode`_ type was renamed :class:`str` in Python 3,
``str()`` was renamed :class:`bytes`, and `basestring`_ disappeared.
six_ provides :ref:`tools <string-handling-with-six>` to deal with these
changes.

Django also contains several string related classes and functions in the
:mod:`django.utils.encoding` and :mod:`django.utils.safestring` modules. Their
names used the words ``str``, which doesn't mean the same thing in Python 2
and Python 3, and ``unicode``, which doesn't exist in Python 3. In order to
avoid ambiguity and confusion these concepts were renamed ``bytes`` and
``text``.

Here are the name changes in :mod:`django.utils.encoding`:

==================  ==================
Old name            New name
==================  ==================
``smart_str``       ``smart_bytes``
``smart_unicode``   ``smart_text``
``force_unicode``   ``force_text``
==================  ==================

For backwards compatibility, the old names still work on Python 2. Under
Python 3, ``smart_str`` is an alias for ``smart_text``.

For forwards compatibility, the new names work as of Django 1.4.2.

.. note::

    :mod:`django.utils.encoding` was deeply refactored in Django 1.5 to
    provide a more consistent API. Check its documentation for more
    information.

:mod:`django.utils.safestring` is mostly used via the
:func:`~django.utils.safestring.mark_safe` and
:func:`~django.utils.safestring.mark_for_escaping` functions, which didn't
change. In case you're using the internals, here are the name changes:

==================  ==================
Old name            New name
==================  ==================
``EscapeString``    ``EscapeBytes``
``EscapeUnicode``   ``EscapeText``
``SafeString``      ``SafeBytes``
``SafeUnicode``     ``SafeText``
==================  ==================

For backwards compatibility, the old names still work on Python 2. Under
Python 3, ``EscapeString`` and ``SafeString`` are aliases for ``EscapeText``
and ``SafeText`` respectively.

For forwards compatibility, the new names work as of Django 1.4.2.

:meth:`~object.__str__` and ` __unicode__()`_ methods
-----------------------------------------------------

In Python 2, the object model specifies :meth:`~object.__str__` and
` __unicode__()`_ methods. If these methods exist, they must return
``str`` (bytes) and ``unicode`` (text) respectively.

The ``print`` statement and the :class:`str` built-in call
:meth:`~object.__str__` to determine the human-readable representation of an
object. The ``unicode`` built-in calls ` __unicode__()`_ if it
exists, and otherwise falls back to :meth:`~object.__str__` and decodes the
result with the system encoding. Conversely, the
:class:`~django.db.models.Model` base class automatically derives
:meth:`~object.__str__` from ` __unicode__()`_ by encoding to UTF-8.

In Python 3, there's simply :meth:`~object.__str__`, which must return ``str``
(text).

(It is also possible to define :meth:`~object.__bytes__`, but Django applications
have little use for that method, because they hardly ever deal with ``bytes``.)

Django provides a simple way to define :meth:`~object.__str__` and
` __unicode__()`_ methods that work on Python 2 and 3: you must
define a :meth:`~object.__str__` method returning text and to apply the
:func:`~django.utils.encoding.python_2_unicode_compatible` decorator.

On Python 3, the decorator is a no-op. On Python 2, it defines appropriate
` __unicode__()`_ and :meth:`~object.__str__` methods (replacing the
original :meth:`~object.__str__` method in the process). Here's an example::

    from __future__ import unicode_literals
    from django.utils.encoding import python_2_unicode_compatible

    @python_2_unicode_compatible
    class MyClass(object):
        def __str__(self):
            return "Instance of my class"

This technique is the best match for Django's porting philosophy.

For forwards compatibility, this decorator is available as of Django 1.4.2.

Finally, note that :meth:`~object.__repr__` must return a ``str`` on all
versions of Python.

:class:`dict` and :class:`dict`-like classes
--------------------------------------------

:meth:`dict.keys`, :meth:`dict.items` and :meth:`dict.values` return lists in
Python 2 and iterators in Python 3. :class:`~django.http.QueryDict` and the
:class:`dict`-like classes defined in ``django.utils.datastructures``
behave likewise in Python 3.

six_ provides compatibility functions to work around this change:
:func:`~six.iterkeys`, :func:`~six.iteritems`, and :func:`~six.itervalues`.
It also contains an undocumented ``iterlists`` function that works well for
``django.utils.datastructures.MultiValueDict`` and its subclasses.

:class:`~django.http.HttpRequest` and :class:`~django.http.HttpResponse` objects
--------------------------------------------------------------------------------

According to :pep:`3333`:

- headers are always ``str`` objects,
- input and output streams are always ``bytes`` objects.

Specifically, :attr:`HttpResponse.content <django.http.HttpResponse.content>`
contains ``bytes``, which may become an issue if you compare it with a
``str`` in your tests. The preferred solution is to rely on
:meth:`~django.test.SimpleTestCase.assertContains` and
:meth:`~django.test.SimpleTestCase.assertNotContains`. These methods accept a
response and a unicode string as arguments.

Coding guidelines
=================

The following guidelines are enforced in Django's source code. They're also
recommended for third-party applications that follow the same porting strategy.

Syntax requirements
-------------------

Unicode
~~~~~~~

In Python 3, all strings are considered Unicode by default. The ``unicode``
type from Python 2 is called ``str`` in Python 3, and ``str`` becomes
``bytes``.

You mustn't use the ``u`` prefix before a unicode string literal because it's
a syntax error in Python 3.2. You must prefix byte strings with ``b``.

In order to enable the same behavior in Python 2, every module must import
``unicode_literals`` from ``__future__``::

    from __future__ import unicode_literals

    my_string = "This is an unicode literal"
    my_bytestring = b"This is a bytestring"

If you need a byte string literal under Python 2 and a unicode string literal
under Python 3, use the :class:`str` builtin::

    str('my string')

In Python 3, there aren't any automatic conversions between ``str`` and
``bytes``, and the :mod:`codecs` module became more strict. :meth:`str.encode`
always returns ``bytes``, and ``bytes.decode`` always returns ``str``. As a
consequence, the following pattern is sometimes necessary::

    value = value.encode('ascii', 'ignore').decode('ascii')

Be cautious if you have to `index bytestrings`_.

.. _index bytestrings: https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data

Exceptions
~~~~~~~~~~

When you capture exceptions, use the ``as`` keyword::

    try:
        ...
    except MyException as exc:
        ...

This older syntax was removed in Python 3::

    try:
        ...
    except MyException, exc:    # Don't do that!
        ...

The syntax to reraise an exception with a different traceback also changed.
Use :func:`six.reraise`.

Magic methods
-------------

Use the patterns below to handle magic methods renamed in Python 3.

Iterators
~~~~~~~~~

::

    class MyIterator(six.Iterator):
        def __iter__(self):
            return self             # implement some logic here

        def __next__(self):
            raise StopIteration     # implement some logic here

Boolean evaluation
~~~~~~~~~~~~~~~~~~

::

    class MyBoolean(object):

        def __bool__(self):
            return True             # implement some logic here

        def __nonzero__(self):      # Python 2 compatibility
            return type(self).__bool__(self)

Division
~~~~~~~~

::

    class MyDivisible(object):

        def __truediv__(self, other):
            return self / other     # implement some logic here

        def __div__(self, other):   # Python 2 compatibility
            return type(self).__truediv__(self, other)

        def __itruediv__(self, other):
            return self // other    # implement some logic here

        def __idiv__(self, other):  # Python 2 compatibility
            return type(self).__itruediv__(self, other)

Special methods are looked up on the class and not on the instance to reflect
the behavior of the Python interpreter.

.. module: django.utils.six

Writing compatible code with six
--------------------------------

six_ is the canonical compatibility library for supporting Python 2 and 3 in
a single codebase. Read its documentation!

A :mod:`customized version of six <django.utils.six>` is bundled with Django
as of version 1.4.2. You can import it as ``django.utils.six``.

Here are the most common changes required to write compatible code.

.. _string-handling-with-six:

String handling
~~~~~~~~~~~~~~~

The ``basestring`` and ``unicode`` types were removed in Python 3, and the
meaning of ``str`` changed. To test these types, use the following idioms::

    isinstance(myvalue, six.string_types)       # replacement for basestring
    isinstance(myvalue, six.text_type)          # replacement for unicode
    isinstance(myvalue, bytes)                  # replacement for str

Python ≥ 2.6 provides ``bytes`` as an alias for ``str``, so you don't need
:data:`six.binary_type`.

``long``
~~~~~~~~

The ``long`` type no longer exists in Python 3. ``1L`` is a syntax error. Use
:data:`six.integer_types` check if a value is an integer or a long::

    isinstance(myvalue, six.integer_types)      # replacement for (int, long)

``xrange``
~~~~~~~~~~

If you use ``xrange`` on Python 2, import ``six.moves.range`` and use that
instead. You can also import ``six.moves.xrange`` (it's equivalent to
``six.moves.range``) but the first technique allows you to simply drop the
import when dropping support for Python 2.

Moved modules
~~~~~~~~~~~~~

Some modules were renamed in Python 3. The ``django.utils.six.moves``
module (based on the :mod:`six.moves module <six.moves>`) provides a
compatible location to import them.

PY2
~~~

If you need different code in Python 2 and Python 3, check :data:`six.PY2`::

    if six.PY2:
        # compatibility code for Python 2

This is a last resort solution when :mod:`six` doesn't provide an appropriate
function.

.. module:: django.utils.six

Django customized version of six
--------------------------------

The version of six bundled with Django (``django.utils.six``) includes a few
customizations for internal use only.

.. _unicode: https://docs.python.org/2/library/functions.html#unicode
.. _ __unicode__(): https://docs.python.org/2/reference/datamodel.html#object.__unicode__
.. _basestring: https://docs.python.org/2/library/functions.html#basestring