-
Notifications
You must be signed in to change notification settings - Fork 28
Documentation: Faq
-
Frequently Asked Questions
- Missing Feature
- "PipelineStep_*" missing
- Invalid value for ExecField executable
- KeyError 'mapfile'
- IndexError: list index out of range
- Out of Memory
- Random Error with
{{ <variable_name> }}
- Running prefactor on multiple nodes
- My pipeline crashes and I cannot find the problem here
- Missing "h5imp_cal_losoto.h5"
- "DEPRECATED" and log4cplus errors
- Not-so-frequently Asked Questions
There is a feature that you would like to have, but isn't implemented in prefactor (yet).
The reason for this is that you didn't implement it! Everybody can contribute to prefactor: get a GitHub account, fork the prefactor repository, implement your feature, and issue a pull request. All this can be done without any special permissions from anyone. But it is usually a good idea to either start a new issue on the prefactor issues list or use an existing one to let everyone know what you are working on.
Your pipeline run fails like that:
2016-02-04 13:33:56 ERROR genericpipeline: *******************************************
2016-02-04 13:33:56 ERROR genericpipeline: Failed pipeline run: Pre-Facet-Cal
2016-02-04 13:33:56 ERROR genericpipeline: Detailed exception information:
2016-02-04 13:33:56 ERROR genericpipeline: <type 'exceptions.ImportError'>
2016-02-04 13:33:56 ERROR genericpipeline: No module named PipelineStep_createMapfile
2016-02-04 13:33:56 ERROR genericpipeline: *******************************************
(The exact name of the missing module varies.) You are probably missing one of the entries in the recipe_directories
setting in your pipeline.cfg
, or one of those entries doesn't work. Make sure both entries point to the correct directories, and that the missing module can be found in the plugins
subdirectory of one of those two entries. Check the full documentation at http://www.astron.nl/citt/prefactor for a description of how to set up the pipeline.cfg
.
Your pipeline run fails like that:
2016-04-25 15:53:23 ERROR genericpipeline: *******************************************
2016-04-25 15:53:23 ERROR genericpipeline: Failed pipeline run: Initial-Subtract
2016-04-25 15:53:23 ERROR genericpipeline: Detailed exception information:
2016-04-25 15:53:23 ERROR genericpipeline: <type 'exceptions.TypeError'>
2016-04-25 15:53:23 ERROR genericpipeline: /homea/htb00/htb001/prefactor/bin/InitSubtract_sort_and_compute.py is an invalid value for ExecField executable
2016-04-25 15:53:23 ERROR genericpipeline: *******************************************
The given path points to a file that either doesn't exist or that does not have the execute flag set on the file system ("chmod +x
"). Usually this affects executables that are defined in the pipeline parset. So make sure that the variables in the pipeline parset point to the right files, and check if the execute flag is set.
Your pipeline run fails like that:
2016-02-07 14:48:58 ERROR genericpipeline: *******************************************
2016-02-07 14:48:58 ERROR genericpipeline: Failed pipeline run: Pre-Facet-Cal
2016-02-07 14:48:58 ERROR genericpipeline: Detailed exception information:
2016-02-07 14:48:58 ERROR genericpipeline: <type 'exceptions.KeyError'>
2016-02-07 14:48:58 ERROR genericpipeline: 'mapfile'
2016-02-07 14:48:58 ERROR genericpipeline: *******************************************
That happens when one step didn't generate a mapfile. Usually that means that the pipeline was looking for its input data, but couldn't find any files that match. Please check your *_input_path
and *_input_pattern
in the parset file!
(Note: ls -d *_input_path/*_input_pattern
should find your data. But the only special character allowed in the pattern is "*"
! So not everything that works with ls
will work with the pipeline, but if it doesn't work with ls
then it also will not work with the pipeline.)
Your pipeline run fails as follows:
File "/opt/lofar/lib/python2.7/site-packages/lofarpipe/cuisine/WSRTrecipe.py", line 132, in run
status = self.go()
File "/opt/lofar/lib/python2.7/site-packages/lofarpipe/recipes/master/executable_args.py", line 357, in go
arglist_copy[ind] = arglist_copy[ind].replace(name, value[i])
IndexError: list index out of range
This happens when the lengths of the input mapfiles for the step do not match. Please check that they all have the same number of entries (single-entry mapfiles can be expanded using the expandMapfile plugin).
You pipeline runs out of memory, you either find error messages about that in the log-file, or you can see (with top
or so) your machine running out of memory.
With the parameters num_proc_per_node
, num_proc_per_node_limit
, and max_dppp_threads
you can control how many processes are started in parallel and how many threads DPPP may use. Those values affect how much memory per node the pipeline needs in total. I usually start a test run of the pipeline and check with top
how much memory the processes need, then I can adjust the parameters accordingly.
And of course it is possible that one of the processes has a memory leak, and will eat up memory over time. In this case report that as an "issue" on the github issue page.
Your pipeline run fails with a random error and somewhere in the log the string {{ <variable_name> }}
shows up.
E.g: ERROR genericpipeline: {{ msss_find_data_script }} is an invalid value for ExecField executable
or:
ERROR genericpipeline.executable_args: Remote process python /homea/htb00/htb003/lofar_jureca_2-15/lib/python2.7/site-packages/lofarpipe/recipes/nodes/python_plugin.py ['{{ field_name }}', [...]
You probably removed the "!" from the lines where the variables are defined. These are not remark tags, they are signs that tell the pipeline engine that this is a variable definition.
You have a nice, fast multi-node cluster and want to run you pipelines in parallel across several nodes
First: the clusterdesc
file has nothing to do with it.
Second: after having a look at the code for the job distribution in the pipeline framework I decided that I didn't get paid enough to fix that and decided to only patch it together to work after some fashion.
You need to set the method
in the [remote]
section of the pipeline.cfg that works on your system. Currently there are two methods supported by the genericpipeline (and thus prefactor):
-
slurm_srun
Which uses thesrun
command run commands on the nodes of the job reservation. (It runs "srun hostname
" to figure out which nodes are available and "srun -N 1 --cpu_bind=map_cpu:none -w <hostname> <command>
" to runcommand
on hosthostname
.) -
pbs_ssh
Which parses thePBS_NODEFILE
to figure out which nodes it may use and usesssh
to start the jobs on the nodes. - There is also a mode
ssh_generic
which takes the list of nodes from an environment variable and starts the jobs withssh
, but this mode hasn't made it into the release of the LOFAR software yet.
Your problem is not listed here.
If you cannot find the problem on your own, then put the log-file somewhere where it is accessible for the supporters and open a ticket on the github issue page. Please make sure that the pipeline was run with debug output switched on, and please don't e-mail logfiles to the authors: they tend to clog the inbox.
The log might contain errors about deprecation and log4cplus errors. These can be ignored. Examples of these errors:
ERROR node.lc02.calibrate_stand_alone.SB099_uv.MS.ndppp_prep_cal: support.utilities.spawn_process is DEPRECATED. Please use support.subprocessgroup.SubProcessGroup
WARNING node.lc02.calibrate_stand_alone.SB092_uv.MS.ndppp_prep_cal: /data/software/LOFAR/bin/makesourcedb stderr: log4cplus:WARN Property configuration file "makesourcedb.log_prop" not found.
(Which still may be of interest to someone sometime.)
Your pipeline runs slow. All NDPPP-/BBS-/whatever- processes use only little CPU time and only one core of the node is busy.
On clusters like CEP-3 the OpenBLAS library is built with threading affinity. This means that by default the different processes all try to use the same core(s). The ''use LofIm'' and ''use Lofar'' scripts set an environment variable that disables this threading affinity, but if the ''pipeline.cfg'' file does not have the ''[remote]'' section included, then this environment variable is not forwarded to the processes that are started by the pipeline.
So please set the ''[remote]'' section in your ''pipeline.cfg''.
Your pipeline fails with <class 'lofarpipe.support.lofarexceptions.PipelineRecipeFailed'>
and when digging through the logfile you find:
struct.error: unpack requires a string argument of length 4
Check your maxproc
limit (with the command ulimit -a
(on bash) or limit
(on (t)csh)). If it is less than 10000 then useful pipeline runs are next to impossible, common values are around 50000. If needed, ask your local sysadmin for help increasing the maxproc
limit. Explanation: The pipeline framework starts many threads that don't really do much. And on Linux machines each thread counts towards the maxproc
limit. So if that limit is too low then parts of the pipeline cannot get started.
Your pipeline fails with Too many open files
somewhere in the logfile
Check your limit for the number of open files that you are allowed to have: on bash with the command ulimit -a
(-> open files) or on (t)csh limit
(-> descriptors). If possible, you can try to increase the value to the hard limit (check with ulimit -Ha
or limit -h
). Values lower than 4096 are problematic, but even that might be too low for some runs. If needed, ask your local sysadmin for help increasing the limit.