Skip to content

Commit 745e7be

Browse files
committed
Add documentation
1 parent 926170a commit 745e7be

File tree

3 files changed

+213
-59
lines changed

3 files changed

+213
-59
lines changed

Doc/library/pickle.rst

Lines changed: 211 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -195,34 +195,29 @@ The :mod:`pickle` module provides the following constants:
195195
The :mod:`pickle` module provides the following functions to make the pickling
196196
process more convenient:
197197

198-
.. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
198+
.. function:: dump(obj, file, protocol=None, \*, fix_imports=True, buffer_callback=None)
199199

200200
Write a pickled representation of *obj* to the open :term:`file object` *file*.
201201
This is equivalent to ``Pickler(file, protocol).dump(obj)``.
202202

203-
The optional *protocol* argument, an integer, tells the pickler to use
204-
the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
205-
If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative
206-
number is specified, :data:`HIGHEST_PROTOCOL` is selected.
207-
208-
The *file* argument must have a write() method that accepts a single bytes
209-
argument. It can thus be an on-disk file opened for binary writing, an
210-
:class:`io.BytesIO` instance, or any other custom object that meets this
211-
interface.
203+
Arguments *file*, *protocol*, *fix_imports* and *buffer_callback* have
204+
the same meaning as in :class:`Pickler`.
212205

213-
If *fix_imports* is true and *protocol* is less than 3, pickle will try to
214-
map the new Python 3 names to the old module names used in Python 2, so
215-
that the pickle data stream is readable with Python 2.
206+
.. versionchanged:: 3.8
207+
The *buffer_callback* argument was added.
216208

217-
.. function:: dumps(obj, protocol=None, \*, fix_imports=True)
209+
.. function:: dumps(obj, protocol=None, \*, fix_imports=True, buffer_callback=None)
218210

219211
Return the pickled representation of the object as a :class:`bytes` object,
220212
instead of writing it to a file.
221213

222-
Arguments *protocol* and *fix_imports* have the same meaning as in
223-
:func:`dump`.
214+
Arguments *protocol*, *fix_imports* and *buffer_callback* have the same
215+
meaning as in :class:`Pickler`.
216+
217+
.. versionchanged:: 3.8
218+
The *buffer_callback* argument was added.
224219

225-
.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
220+
.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
226221

227222
Read a pickled object representation from the open :term:`file object`
228223
*file* and return the reconstituted object hierarchy specified therein.
@@ -232,24 +227,13 @@ process more convenient:
232227
protocol argument is needed. Bytes past the pickled object's
233228
representation are ignored.
234229

235-
The argument *file* must have two methods, a read() method that takes an
236-
integer argument, and a readline() method that requires no arguments. Both
237-
methods should return bytes. Thus *file* can be an on-disk file opened for
238-
binary reading, an :class:`io.BytesIO` object, or any other custom object
239-
that meets this interface.
240-
241-
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
242-
which are used to control compatibility support for pickle stream generated
243-
by Python 2. If *fix_imports* is true, pickle will try to map the old
244-
Python 2 names to the new names used in Python 3. The *encoding* and
245-
*errors* tell pickle how to decode 8-bit string instances pickled by Python
246-
2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
247-
be 'bytes' to read these 8-bit string instances as bytes objects.
248-
Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
249-
instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
250-
:class:`~datetime.time` pickled by Python 2.
230+
Arguments *file*, *fix_imports*, *encoding*, *errors* and *strict*
231+
have the same meaning as in :class:`Unpickler`.
232+
233+
.. versionchanged:: 3.8
234+
The *buffers* argument was added.
251235

252-
.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
236+
.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
253237

254238
Read a pickled object hierarchy from a :class:`bytes` object and return the
255239
reconstituted object hierarchy specified therein.
@@ -258,16 +242,11 @@ process more convenient:
258242
protocol argument is needed. Bytes past the pickled object's
259243
representation are ignored.
260244

261-
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
262-
which are used to control compatibility support for pickle stream generated
263-
by Python 2. If *fix_imports* is true, pickle will try to map the old
264-
Python 2 names to the new names used in Python 3. The *encoding* and
265-
*errors* tell pickle how to decode 8-bit string instances pickled by Python
266-
2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
267-
be 'bytes' to read these 8-bit string instances as bytes objects.
268-
Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
269-
instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
270-
:class:`~datetime.time` pickled by Python 2.
245+
Arguments *file*, *fix_imports*, *encoding*, *errors* and *strict*
246+
have the same meaning as in :class:`Unpickler`.
247+
248+
.. versionchanged:: 3.8
249+
The *buffers* argument was added.
271250

272251

273252
The :mod:`pickle` module defines three exceptions:
@@ -295,10 +274,10 @@ The :mod:`pickle` module defines three exceptions:
295274
IndexError.
296275

297276

298-
The :mod:`pickle` module exports two classes, :class:`Pickler` and
299-
:class:`Unpickler`:
277+
The :mod:`pickle` module exports three classes, :class:`Pickler`,
278+
:class:`Unpickler` and :class:`PickleBuffer`:
300279

301-
.. class:: Pickler(file, protocol=None, \*, fix_imports=True)
280+
.. class:: Pickler(file, protocol=None, \*, fix_imports=True, buffer_callback=None)
302281

303282
This takes a binary file for writing a pickle data stream.
304283

@@ -316,6 +295,17 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
316295
map the new Python 3 names to the old module names used in Python 2, so
317296
that the pickle data stream is readable with Python 2.
318297

298+
If *buffer_callback* is None (the default), buffer views are
299+
serialized into *file* as part of the pickle stream.
300+
301+
If *buffer_callback* is not None, then it can be called any number
302+
of times with a buffer view. If the callback returns a false value
303+
(such as None), the given buffer is out-of-band; otherwise the
304+
buffer is serialized in-band, i.e. inside the pickle stream.
305+
306+
.. versionchanged:: 3.8
307+
The *buffer_callback* argument was added.
308+
319309
.. method:: dump(obj)
320310

321311
Write a pickled representation of *obj* to the open file object given in
@@ -379,26 +369,43 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
379369
Use :func:`pickletools.optimize` if you need more compact pickles.
380370

381371

382-
.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
372+
.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
383373

384374
This takes a binary file for reading a pickle data stream.
385375

386376
The protocol version of the pickle is detected automatically, so no
387377
protocol argument is needed.
388378

389-
The argument *file* must have two methods, a read() method that takes an
390-
integer argument, and a readline() method that requires no arguments. Both
391-
methods should return bytes. Thus *file* can be an on-disk file object
379+
The argument *file* must have three methods, a read() method that takes an
380+
integer argument, a readinto() method that takes a buffer argument
381+
and a readline() method that requires no arguments, as in the
382+
:class:`io.BufferedIOBase` interface. Thus *file* can be an on-disk file
392383
opened for binary reading, an :class:`io.BytesIO` object, or any other
393384
custom object that meets this interface.
394385

395-
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
396-
which are used to control compatibility support for pickle stream generated
397-
by Python 2. If *fix_imports* is true, pickle will try to map the old
398-
Python 2 names to the new names used in Python 3. The *encoding* and
399-
*errors* tell pickle how to decode 8-bit string instances pickled by Python
400-
2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
386+
The optional arguments *fix_imports*, *encoding* and *errors* are used
387+
to control compatibility support for pickle stream generated by Python 2.
388+
If *fix_imports* is true, pickle will try to map the old Python 2 names
389+
to the new names used in Python 3. The *encoding* and *errors* tell
390+
pickle how to decode 8-bit string instances pickled by Python 2;
391+
these default to 'ASCII' and 'strict', respectively. The *encoding* can
401392
be 'bytes' to read these 8-bit string instances as bytes objects.
393+
Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
394+
instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
395+
:class:`~datetime.time` pickled by Python 2.
396+
397+
If *buffers* is None (the default), then all data necessary for
398+
deserialization must be contained in the pickle stream. This means
399+
that the *buffer_callback* argument was None when a :class:`Pickler`
400+
was instantiated (or when :func:`dump` or :func:`dumps` was called).
401+
402+
If *buffers* is not None, it should be an iterable of buffer-enabled
403+
objects that is consumed each time the pickle stream references
404+
an out-of-band buffer view. Such buffers have been given in order
405+
to the *buffer_callback* of a Pickler object.
406+
407+
.. versionchanged:: 3.8
408+
The *buffers* argument was added.
402409

403410
.. method:: load()
404411

@@ -428,6 +435,34 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
428435
:ref:`pickle-restrict` for details.
429436

430437

438+
.. class:: PickleBuffer(buffer)
439+
440+
A wrapper for a potentially out-of-band buffer. *buffer* must be a
441+
:ref:`buffer-providing <bufferobjects>` object, such as a
442+
:term:`bytes-like object` or a N-dimensional array.
443+
444+
:class:`PickleBuffer` is itself a buffer provider, therefore it is
445+
possible to pass it to other APIs expecting a buffer-providing object,
446+
such as :class:`memoryview`.
447+
448+
:class:`PickleBuffer` objects can only be serialized using pickle
449+
protocol 5 or higher. They are eligible for
450+
:ref:`out-of-band serialization <pickle-oob>`.
451+
452+
.. versionadded:: 3.8
453+
454+
.. method:: raw()
455+
456+
Return a :class:`memoryview` of the memory area underlying this buffer.
457+
The returned object is a one-dimensional, C-contiguous memoryview
458+
with format ``B`` (unsigned bytes). :exc:`BufferError` is raised if
459+
the buffer is neither C- nor Fortran-contiguous.
460+
461+
.. method:: release()
462+
463+
Release the underlying buffer exposed by the PickleBuffer object.
464+
465+
431466
.. _pickle-picklable:
432467

433468
What can be pickled and unpickled?
@@ -863,6 +898,125 @@ a given class::
863898
assert unpickled_class.my_attribute == 1
864899

865900

901+
.. _pickle-oob:
902+
903+
Out-of-band Buffers
904+
-------------------
905+
906+
.. versionadded:: 3.8
907+
908+
In some contexts, the :mod:`pickle` module is used to transfer massive amounts
909+
of data. Therefore, it can be important to minimize the number of memory
910+
copies, to preserve performance and resource consumption. However, normal
911+
operation of the :mod:`pickle` module, as it transforms a graph-like structure
912+
of objects into a sequential stream of bytes, intrinsically involves copying
913+
data to and from the pickle stream.
914+
915+
This constraint can be eschewed if both the *provider* (the implementation
916+
of the object types to be transferred) and the *consumer* (the implementation
917+
of the communications system) support the out-of-band transfer facilities
918+
provided by pickle protocol 5 and higher.
919+
920+
Provider API
921+
^^^^^^^^^^^^
922+
923+
The large data objects to be pickled must implement a :meth:`__reduce_ex__`
924+
method specialized for protocol 5 and higher, which returns a
925+
:class:`PickleBuffer` instance (instead of e.g. a :class:`bytes` object)
926+
for any large data.
927+
928+
A :class:`PickleBuffer` object *signals* that the underlying buffer is
929+
eligible for out-of-band data transfer. Those objects remain compatible
930+
with normal usage of the :mod:`pickle` module. However, consumers can also
931+
opt-in to tell :mod:`pickle` that they will handle those buffers by
932+
themselves.
933+
934+
Consumer API
935+
^^^^^^^^^^^^
936+
937+
A communications system can enable custom handling of the :class:`PickleBuffer`
938+
objects generated when serializing an object graph.
939+
940+
On the sending side, it needs to pass a *buffer_callback* argument to
941+
:class:`Pickler` (or to the :func:`dump` or :func:`dumps` function), which
942+
will be called with each :class:`PickleBuffer` generated while pickling
943+
the object graph. Buffers accumulated by the *buffer_callback* will not
944+
see their data copied into the pickle stream, only a cheap marker will be
945+
inserted.
946+
947+
On the receiving side, it needs to pass a *buffers* argument to
948+
:class:`Unpickler` (or to the :func:`load` or :func:`loads` function),
949+
which is an iterable of the buffers which were passed to *buffer_callback*.
950+
That iterable should produce buffers in the same order as they were passed
951+
to *buffer_callback*. Those buffers will provide the data expected by the
952+
reconstructors of the objects whose pickling produced the original
953+
:class:`PickleBuffer` objects.
954+
955+
Between the sending side and the receiving side, the communications system
956+
is free to implement its own transfer mechanisms for out-of-band buffers.
957+
Potential optimizations include the use of shared memory or datatype-dependent
958+
compression.
959+
960+
Example
961+
^^^^^^^
962+
963+
Here is a trivial example where we implement a :class:`bytearray` subclass
964+
able to participate in out-of-band buffer pickling::
965+
966+
class ZeroCopyByteArray(bytearray):
967+
968+
def __reduce_ex__(self, protocol):
969+
if protocol >= 5:
970+
return type(self)._reconstruct, (PickleBuffer(self),), None
971+
else:
972+
# PickleBuffer is forbidden with pickle protocols <= 4.
973+
return type(self)._reconstruct, (bytearray(self),)
974+
975+
@classmethod
976+
def _reconstruct(cls, obj):
977+
with memoryview(obj) as m:
978+
# Get a handle over the original buffer object
979+
obj = m.obj
980+
if type(obj) is cls:
981+
# Original buffer object is a ZeroCopyByteArray, return it
982+
# as-is.
983+
return obj
984+
else:
985+
return cls(obj)
986+
987+
We see that the reconstructor (the ``_reconstruct`` class method) returns
988+
the buffer's providing object if it has the right type. This is an easy way
989+
to simulate zero-copy behaviour on this toy example.
990+
991+
On the consumer side, we can pickle those objects the usual way, which
992+
when unserialized will give us a copy of the original object::
993+
994+
b = ZeroCopyByteArray(b"abc")
995+
data = pickle.dumps(b, protocol=5)
996+
new_b = pickle.loads(data)
997+
print(b == new_b) # True
998+
print(b is new_b) # False: a copy was made
999+
1000+
But if we pass a *buffer_callback* and then give back the accumulated
1001+
buffers when unserializing, we are able to get back the original object::
1002+
1003+
b = ZeroCopyByteArray(b"abc")
1004+
buffers = []
1005+
data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append)
1006+
new_b = pickle.loads(data, buffers=buffers)
1007+
print(b == new_b) # True
1008+
print(b is new_b) # True: no copy was made
1009+
1010+
This example is limited by the fact that :class:`bytearray` allocates its
1011+
own memory: you cannot create a :class:`bytearray` instance that is backed
1012+
by another object's memory. However, third-party datatypes such as NumPy
1013+
arrays do not have this limitation, and allow use of zero-copy pickling
1014+
(or making as few copies as possible) when transferring between distinct
1015+
processes or systems.
1016+
1017+
.. seealso:: :pep:`574` -- Pickle protocol 5 with out-of-band data
1018+
1019+
8661020
.. _pickle-restrict:
8671021

8681022
Restricting Globals

Lib/test/test_pickle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ class SizeofTests(unittest.TestCase):
271271
check_sizeof = support.check_sizeof
272272

273273
def test_pickler(self):
274-
basesize = support.calcobjsize('6P2n3i2n3i2P')
274+
basesize = support.calcobjsize('7P2n3i2n3i2P')
275275
p = _pickle.Pickler(io.BytesIO())
276276
self.assertEqual(object.__sizeof__(p), basesize)
277277
MT_size = struct.calcsize('3nP0n')

Objects/picklebufobject.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ static PyMethodDef picklebuf_methods[] = {
206206
PyTypeObject PyPickleBuffer_Type = {
207207
PyVarObject_HEAD_INIT(NULL, 0)
208208
.tp_name = "pickle.PickleBuffer",
209-
.tp_doc = "Out-of-band buffer",
209+
.tp_doc = "Wrapper for potentially out-of-band buffers",
210210
.tp_basicsize = sizeof(PyPickleBufferObject),
211211
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,
212212
.tp_new = picklebuf_new,

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy