buffer and memoryview

Sun 10 February 2013 | tags: python, buffer, memoryview, -- (permalink)

Overview

buffer and memoryview are functions that allow direct access to an object's byte-oriented data without needing to copy it first.

All native types are read-only via a direct call to buffer (<= 2.7) (writable buffers may be obtained by using 3rd party modules such as numpy). memoryview exposes writable buffers (>= 2.7) for mutable types and read-only buffers for immutable types. In all cases, the given object must implement (in C) the corresponding buffer interface for either buffer or memoryview (or both in 2.7 as we'll visit).

Usage

You might be interested in using buffer or memoryview if you're operating on a large object which implements one of the two required APIs in order to avoid copying when slicing. To illustrate the benefit of using buffer or memoryview over direct slices:

In [1]: from pympler.asizeof import asizeof
In [2]: ba = bytearray(1 for _ in range(100000))

In [3]: asizeof(ba)
Out[3]: 103104

In [4]: asizeof(ba[:-1])
Out[4]: 100048

In [5]: asizeof(memoryview(ba)[:-1])
Out[5]: 176

What this allows us to do is access a subset of a large object without copying it first (if you were unaware, accessing data via slicing (my_val[a:b]) actually copies the data first).

buffer and memoryview data can be accessed through slicing (memoryview requires an additional step):

# 2.7
In [1]: buffer('foo')[0:1]
Out[1]: 'f'
In [2]: memoryview('foo')[0:1].tobytes()
Out[2]: 'f'

# 3.4
In [1]: memoryview(bytes('foo', 'ascii'))[:1].tobytes()
Out[1]: b'f'

Writable buffers can be obtained through numpy:

# 2.7
In [13]: import numpy as np
In [14]: np.getbuffer('foo')[0:1]
Out[14]: 'f'

Any way you slice it (sadly, pun intended), writable buffers can only be obtained for objects that are defined as mutable. In other words, the following is illegal as a string is an immutable type:

# 2.7 with numpy (illegal)
In [16]: np.getbuffer('foo')[0] = 'b'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-6f1ae079dbe3> in <module>()
----> 1 np.getbuffer('foo')[0] = 'b'

TypeError: buffer is read-only

# 2.7 memoryview (illegal)
In [40]: s = 'foo'
In [41]: memoryview(s)[0] = 'b'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-41-f1f0e31eb18c> in <module>()
----> 1 memoryview(s)[0] = 'b'

TypeError: cannot modify read-only memory

Historical Context

buffer has been available since Python 1.6.

PEP 3118 was introduced in 2006, fixing perceived problems with the old buffer API.

PEP 3137, authored by GvR in 2007 advocates the removal of the old buffer object due to redundancy of memoryview:

"The old type named buffer is so similar to the new type memoryview, introduce by PEP 3118, that it is redundant. The rest of this PEP doesn't discuss the functionality of memoryview; it is just mentioned here to justify getting rid of the old buffer type. (An earlier version of this PEP proposed buffer as the new name for PyBytes; in the end this name was deemed to confusing given the many other uses of the word buffer.)"

memoryview was implemented for Python 3.0 and buffer was removed.

The memoryview object was backported to 2.7 and the memoryview C API was backported to 2.6. That's where some of the ugliness begins (and ends thankfully).

Ugliness of 2.7

Both buffer and memoryview (objects and API) are present and usable in 2.7. This violates PEP 20:

"There should be one-- and preferably only one --obvious way to do it."

Two objects (admittedly with different semantics) with nearly identical functionality. Confused? Given the remark in PEP 3137 citing redundancy, and recommending removal of buffer in 3.0, it seems that others were as well. Let's try to clear this up for 2.7:

The following table displays native types that implement the buffer and/or memoryview interface and what will be exposed by numpy.getbuffer:

Object buffer memoryview numpy.getbuffer
mmap.mmap y n r+w
array.array y n r+w
bytearray y y r+w
str y y r
unicode y n r

memoryview in 3.4

The following objects have implemented the memoryview buffer interface:

Object Mutability
mmap.mmap r+w
array.array r+w
bytearray r+w
bytes r

Performance Cost

Even though the performance delta is minimal for practical application, there is a notable performance delta in 2.7, whereas 3.4 is almost identical:

# 2.7
In [2]: from timeit import timeit

In [3]: timeit('mv[0:4]', setup='mv = memoryview("s" * 100000)')
Out[3]: 0.14322400093078613

In [4]: timeit('s[0:4]', setup='s = "s" * 100000')
Out[4]: 0.09086894989013672

# 3.4
In [5]: timeit('mv[0:4]', setup='mv = memoryview(bytes("s" * 100000, "ascii"))')
Out[5]: 0.13585162698291242

In [6]: timeit('s[0:4]', setup='s = bytes("s" * 100000, "ascii")')
Out[6]: 0.11710237199440598

Conclusion

If you're doing memory-intensive processing in Python and you're not using buffer or memoryview, you may want to. If you're on <= 2.7, the recommended path is to use memoryview as it's forward-compatible with 3. If you have compatability issues (i.e. with array.array), use buffer for those and memoryview for others.

comments powered by Disqus