Tutorial¶
This tutorial will walk you through the main features of Psyrun. Depending on your usage you might want to read up on more details in the detailed User Guide.
In this tutorial it is assumed that NumPy has been imported with:
import numpy as np
But it is not a strict requirement to use Psyrun.
Parameter space exploration¶
Assume we have a function objective that we want to evaluate for different parameters:
def objective(a, b, c):
return {'result': a * b + c}
In standard Python this would require to nest a bunch of for-loops like so:
results = []
for a in np.arange(1, 5):
for b in np.linspace(0, 1, 10):
for c in [1., 1.5, 10., 10.5]:
results.append(objective(a, b, c)['result'])
For a complex function with a lot of parameters, this can get a quite deep
nesting! Psyrun allows you to do this more conveniently by defining a parameter
space with the Param
class:
from psyrun import map_pspace, Param
pspace = (Param(a=np.arange(1, 5))
* Param(b=np.linspace(0, 1, 10))
* Param(c=[1., 1.5, 10., 10.5]))
results = map_pspace(objective, pspace)
The multiplication operator *
is defined as the Cartesian product on
Param
instances. Similar code to the example above could also be written
with the Python itertools module. But the Param
class provides a number
of other useful operations like concatenation or difference operations explained
in more detail in Constructing parameter spaces. It is also the basis to the usage of the
parallelization and serial farming features.
Parallelization¶
It is easy to evaluate multiple parameter assignments in parallel with Psyrun:
from psyrun import map_pspace_parallel
results = map_pspace_parallel(objective, pspace)
This parallelization is based on joblib which by default uses the multiprocessing module that spawns multiple Python processes. This requires, however, that the objective function can be imported from a module, i.e. this does not work if it is only defined in an interactive interpreter session. More details are to be found in the Evaluating functions on parameter spaces section.
Tasks¶
Tasks are actually the main feature of Psyrun. To see what makes them useful,
it is easiest to define a task and then see what we can do with it. The Psyrun
psy
command looks
for tasks in the psy-tasks directory relative to the current directory by
default. Each tasks is defined in a Python file named task_<name>.py
. For
example, we could define a task example with a few lines in
a file psy-tasks/task_example.py
:
import numpy as np
from psyrun import Param
pspace = (Param(a=np.arange(1, 5))
* Param(b=np.linspace(0, 1, 10))
* Param(c=[1., 1.5, 10., 10.5]))
def execute(a, b, c):
return {'result': a * b + c}
Note that pspace
and execute
are names with a special meaning in this
task file. The pspace
variable defines the parameter space explored in the
task and execute
is the function to be invoked with each parameter
assignment. It has to return a dictionary which allows to return multiple,
named values.
We can now run this task by invoking psy run example
on the command line
(or just psy run
to run all defined tasks and not just example). This
will create a directory psy-work/example with a bunch of files supporting
the task execution and most importantly the file
psy-work/example/result.pkl
, a Python
pickle file with the
results:
import pickle
with open('psy-work/example/result.pkl', 'rb') as f:
print(pickle.load(f))
# prints:
# {'b': [0.66666666666666663, 0.44444444444444442, ...],
# 'a': [1, 2, 2, 2, 2, 4, 4, 1, 1, 2, 2, 2, 2, 3, 3, 1, 2, 2, ...],
# 'c': [1.5, 1.0, 1.5, 1.0, 1.5, 1.0, 1.5, 10.5, 1.0, 1.0, 1.5, ...],
# 'result': [2.1666666666666665, 1.8888888888888888, ...]}
If you execute psy run
again it will automatically detect whether the
results are still up-to-date and only rerun the tasks if it needs to be
updated.
One advantage of using the psy run
command is that partial results will be
written to the disks in psy-work/example/out. This means if the certain
parameter assignments fail with an exception, not everything is lost. The
individual files in the out directory can be merged into a result file with
psy merge psy-work/example/out partial-result.pkl
. To get information on
which results are missing use the the psy status -v example
command
Sometimes it is desirable to add the results of additional parameters
assignments to the existing result. This can be done by editing the task file
and then using psy run --continue example
to instruct Psyrun to preserve
the existing results and add new parameter assignments.
Psyrun uses pickle files by default because they support the most data types. Unfortunately they are not the most efficient. Psyrun allows to use NumPy NPZ files or HDF5 instead. See Data stores for details.
Serial farming¶
If you have access to a high performance computing (HPC) cluster, you can use Psyrun for serial farming. That means you run a large number of serial jobs, i.e. jobs that have no interdependency and can be run in any order, on the cluster. To do so you have to set the scheduler and scheduler_args variables in your task file to the appropriate value (it also a good idea to set max_jobs and min_items). More details can be found in Writing task-files.
Psyrun comes with support for Sharcnet’s sqsub scheduler. If your HPC cluster uses a different scheduler, you will have to write some code to inform Psyrun on how to interface the scheduler.
It can be useful to test a task first by running a single parameter assignment with the test command.