Tutorial ======== This tutorial will walk you through the main features of Psyrun. Depending on your usage you might want to read up on more details in the detailed :ref:`guide`. In this tutorial it is assumed that `NumPy `_ has been imported with:: import numpy as np But it is not a strict requirement to use Psyrun. Parameter space exploration --------------------------- Assume we have a function *objective* that we want to evaluate for different parameters:: def objective(a, b, c): return {'result': a * b + c} In standard Python this would require to nest a bunch of for-loops like so:: results = [] for a in np.arange(1, 5): for b in np.linspace(0, 1, 10): for c in [1., 1.5, 10., 10.5]: results.append(objective(a, b, c)['result']) For a complex function with a lot of parameters, this can get a quite deep nesting! Psyrun allows you to do this more conveniently by defining a parameter space with the `Param` class:: from psyrun import map_pspace, Param pspace = (Param(a=np.arange(1, 5)) * Param(b=np.linspace(0, 1, 10)) * Param(c=[1., 1.5, 10., 10.5])) results = map_pspace(objective, pspace) The multiplication operator ``*`` is defined as the Cartesian product on `Param` instances. Similar code to the example above could also be written with the Python *itertools* module. But the `Param` class provides a number of other useful operations like concatenation or difference operations explained in more detail in :ref:`guide-pspace`. It is also the basis to the usage of the parallelization and serial farming features. Parallelization --------------- It is easy to evaluate multiple parameter assignments in parallel with Psyrun:: from psyrun import map_pspace_parallel results = map_pspace_parallel(objective, pspace) This parallelization is based on `joblib `_ which by default uses the *multiprocessing* module that spawns multiple Python processes. This requires, however, that the *objective* function can be imported from a module, i.e. this does not work if it is only defined in an interactive interpreter session. More details are to be found in the :ref:`guide-mapping` section. Tasks ----- Tasks are actually the main feature of Psyrun. To see what makes them useful, it is easiest to define a task and then see what we can do with it. The Psyrun ``psy`` command looks for tasks in the *psy-tasks* directory relative to the current directory by default. Each tasks is defined in a Python file named ``task_.py``. For example, we could define a task *example* with a few lines in a file ``psy-tasks/task_example.py``:: import numpy as np from psyrun import Param pspace = (Param(a=np.arange(1, 5)) * Param(b=np.linspace(0, 1, 10)) * Param(c=[1., 1.5, 10., 10.5])) def execute(a, b, c): return {'result': a * b + c} Note that ``pspace`` and ``execute`` are names with a special meaning in this task file. The ``pspace`` variable defines the parameter space explored in the task and ``execute`` is the function to be invoked with each parameter assignment. It has to return a dictionary which allows to return multiple, named values. We can now run this task by invoking ``psy run example`` on the command line (or just ``psy run`` to run all defined tasks and not just *example*). This will create a directory *psy-work/example* with a bunch of files supporting the task execution and most importantly the file ``psy-work/example/result.pkl``, a Python `pickle file `_ with the results:: import pickle with open('psy-work/example/result.pkl', 'rb') as f: print(pickle.load(f)) # prints: # {'b': [0.66666666666666663, 0.44444444444444442, ...], # 'a': [1, 2, 2, 2, 2, 4, 4, 1, 1, 2, 2, 2, 2, 3, 3, 1, 2, 2, ...], # 'c': [1.5, 1.0, 1.5, 1.0, 1.5, 1.0, 1.5, 10.5, 1.0, 1.0, 1.5, ...], # 'result': [2.1666666666666665, 1.8888888888888888, ...]} If you execute ``psy run`` again it will automatically detect whether the results are still up-to-date and only rerun the tasks if it needs to be updated. One advantage of using the ``psy run`` command is that partial results will be written to the disks in *psy-work/example/out*. This means if the certain parameter assignments fail with an exception, not everything is lost. The individual files in the *out* directory can be merged into a result file with ``psy merge psy-work/example/out partial-result.pkl``. To get information on which results are missing use the the ``psy status -v example`` command Sometimes it is desirable to add the results of additional parameters assignments to the existing result. This can be done by editing the task file and then using ``psy run --continue example`` to instruct Psyrun to preserve the existing results and add new parameter assignments. Psyrun uses pickle files by default because they support the most data types. Unfortunately they are not the most efficient. Psyrun allows to use `NumPy `_ NPZ files or `HDF5 `_ instead. See :ref:`guide-stores` for details. Serial farming -------------- If you have access to a high performance computing (HPC) cluster, you can use Psyrun for serial farming. That means you run a large number of serial jobs, i.e. jobs that have no interdependency and can be run in any order, on the cluster. To do so you have to set the *scheduler* and *scheduler_args* variables in your task file to the appropriate value (it also a good idea to set *max_jobs* and *min_items*). More details can be found in :ref:`guide-task-files`. Psyrun comes with support for `Sharcnet `_'s `sqsub `_ scheduler. If your HPC cluster uses a different scheduler, you will have to write some code to inform Psyrun on how to interface the scheduler. It can be useful to test a task first by running a single parameter assignment with the :ref:`cmd-test` command.