.. include:: header.txt

==============
 Introduction
==============

Threads, processes and the GIL
==============================

To run more than one piece of code at the same time on the same
computer one has the choice of either using multiple processes or
multiple threads.

Although a program can be made up of multiple processes, these
processes are in effect completely independent of one another:
different processes are not able to cooperate with one another unless
one sets up some means of communication between them (such as by using
sockets).  If a lot of data must be transferred between processes then
this can be inefficient.

On the other hand, multiple threads within a single process are
intimately connected: they share their data but often can interfere
badly with one another.

CPython has a Global Interpreter Lock (GIL) which in many ways makes
threading easier than it is in most languages by making sure that only
one thread can manipulate the interpreter's Python objects at a time.
As a result, it is often safe to let multiple threads access variables
without using any locking as one would need to in a language such as
C.

One downside of the GIL is that on multi-processor (or multi-core)
systems a Python program can only make use of one processor at a time,
leaving the other(s) idle.  This is a problem that can be overcome by
using multiple processes.

Python gives little direct support for writing programs using multiple
process.  This package allows one to write multi-process programs
using much the same API that one uses for writing threaded programs.


Forking and spawning
====================

There are two ways of creating a new process in Python:

* The current process can *fork* a new child process by using the
  ``os.fork()`` function.  This effectively creates an identical copy
  of the current process which is now able to go off and perform some
  task set by the parent process.  This means that the child process
  inherits *copies* of all variables that the parent process had.

  However, ``os.fork()`` is not available on every platform: in
  particular Windows does not support it.

* The current process can spawn a completely new Python interpreter by
  using ``os.spawnv()`` or some similar function.

  Getting this new interpreter in to a fit state to perform the task
  set for it by its parent process is, however, a bit of a challenge.

The ``processing`` package uses ``os.fork()`` if it is available since
it makes life a lot simpler.  Forking the process is also more
efficient in terms of memory usage and the time needed to create the
new process.


Example
=======

To demonstrate how similar writing multi-threaded and multi-process
programs can be we give a simple program using the ``threading``
module, followed by what it looks like after being converted to using
the ``processing`` package.

*Using threads*::

   from threading import Thread                         # !
   from Queue import Queue                              # -

   def f(q):
       for i in range(10):
           q.put(i*i)
       q.put('STOP')

   if __name__ == '__main__':
       queue = Queue(maxsize=3)                         # !

       t = Thread(target=f, args=[queue])
       t.start()

       result = None
       while result != 'STOP':
           result = queue.get()
           print result

       t.join()

*Using processes*::

   from processing import Process as Thread, Manager    # !

   def f(q):
       for i in range(10):
           q.put(i*i)
       q.put('STOP')

   if __name__ == '__main__':
       manager = Manager()                              # +
       queue = manager.Queue(maxsize=3)                 # !

       t = Thread(target=f, args=[queue])
       t.start()

       result = None
       while result != 'STOP':
           result = queue.get()
           print result

       t.join()

Not every threaded program can be converted to use processes so
trivially.  See `Programming guidelines
<programming-guidelines.html>`_ for details of the restrictions and
the idioms to follow.

More examples can be seen in `test_processing.py
<../test/test_processing.py>`_.

.. admonition:: Switching between processes and threads

   If a program is written using ``processing`` then it can be made to
   use threads instead of processes by using ``processing.dummy`` in
   place of ``processing``.  So one would, for instance, change the
   import line ::

       from processing import ...

   to ::

       from processing.dummy import ...


Speed
=====

As one would expect, communication between processes is slower than
communication between threads.  Using the C extensions Windows (using
named pipes) and Linux (using Unix domain sockets) are roughly equal
in speed.

For example (on a Pentium 4, 2.5 Ghz) a value can be retrieved from a
shared dictionary 18,000-20,000 times/sec compared to around 5 million
times/sec for a normal dictionary.

With one process sending objects to another process using a shared
Queue I get around 7,500 times/sec.  Doing the same using a
normal Queue and threads I get 18,000-50,000 times/sec.  (For some
reason on Linux it is usually around 18,000 but sometimes around
45,000, whereas on windows it is consistently 40,000-50,000.)

For communication between two processes one can use `Listener` and
`Client` from the `connection <connection-ref.html>`_ sub-package for
a much faster and simpler alternative to a Queue.

.. _Prev: index.html
.. _Up: index.html
.. _Next: processing-ref.html

