In this post, I am going to introduce my favorite way to make cells in Jupyter notebook run in parallel.
1. Initialize cluster using command lines or use Python `popen` (In the example below, I create a cluster with 2 workers):
from subprocess import Popen p = Popen(['ipcluster', 'start', '-n', '2'])
2. Then, programmatically set up the client. I usually set each worker non-blocking so that even when workers are running heavy tasks I can still do experiments in other cells:
from ipyparallel import Client rc = Client() dview = rc[:] print "%d workers are running" % len(rc) e0 = rc[0] e0.block = False e0.activate('0') e1 = rc[1] e1.block = False e1.activate('1')
3. Then, to run one cell, add the magic command `%px0` at the first line of the cell. Here `0` means you designate the cell to be run on the first worker. You can replace `0` with any number as long as it is within the number of your workers. Here is one example:
%%px1 # An example of asynchronous parallelism print len(df) import time c = 0 print time.time() while c < 3: time.sleep(3) c += 1 print time.time()
4. You can think each cell starting with `%px[num]` as an independent workspace. Therefore, you need to explicitly import modules or data objects you want to use within the parallel cells. For example, in the example above, I must write `import time` in order to use the `time` module in the cell. The alternative is to import module/data programmatically:
# push data to all workers. the passing objects must be in a dict form. dview.push({'churn':churn, 'data_end_date':data_end_date, 'CHURN_INT':CHURN_INT}) # import any modules you want with dview.sync_imports(): import sys
5. Finally, to get results, use `%pxresult0` (Similarly, you can replace `0` with other number denoting specific worker.)
%pxresult0
Note that `%pxresult0` is blocking if the result has not come out yet. If you want to do experiments in other cells, don’t run `%pxresult0` too early.
Reference:
http://minrk.github.io/drop/nbconverttest.html
https://ipython.org/ipython-doc/3/parallel/magics.html (for old ipython notebook)
http://ipyparallel.readthedocs.io/en/latest/ (for jupyter)