Skip to content

Documentation: Faq

David Rafferty edited this page Jun 7, 2019 · 28 revisions

prefactor FAQ

Table of Contents

Frequently Asked Questions

Missing Feature

There is a feature that you would like to have, but isn't implemented in prefactor (yet).

The reason for this is that you didn't implement it! Everybody can contribute to prefactor: get a GitHub account, fork the prefactor repository, implement your feature, and issue a pull request. All this can be done without any special permissions from anyone. But it is usually a good idea to either start a new issue on the prefactor issues list or use an existing one to let everyone know what you are working on.

"PipelineStep_*" missing

Your pipeline run fails like that:

2016-02-04 13:33:56 ERROR   genericpipeline: *******************************************  
2016-02-04 13:33:56 ERROR   genericpipeline: Failed pipeline run: Pre-Facet-Cal  
2016-02-04 13:33:56 ERROR   genericpipeline: Detailed exception information:  
2016-02-04 13:33:56 ERROR   genericpipeline: <type 'exceptions.ImportError'>  
2016-02-04 13:33:56 ERROR   genericpipeline: No module named PipelineStep_createMapfile  
2016-02-04 13:33:56 ERROR   genericpipeline: *******************************************  

(The exact name of the missing module varies.) You are probably missing one of the entries in the recipe_directories setting in your pipeline.cfg, or one of those entries doesn't work. Make sure both entries point to the correct directories, and that the missing module can be found in the plugins subdirectory of one of those two entries. Check the full documentation at http://www.astron.nl/citt/prefactor for a description of how to set up the pipeline.cfg.

Invalid value for ExecField executable

Your pipeline run fails like that:

2016-04-25 15:53:23 ERROR   genericpipeline: *******************************************
2016-04-25 15:53:23 ERROR   genericpipeline: Failed pipeline run: Initial-Subtract
2016-04-25 15:53:23 ERROR   genericpipeline: Detailed exception information:
2016-04-25 15:53:23 ERROR   genericpipeline: <type 'exceptions.TypeError'>
2016-04-25 15:53:23 ERROR   genericpipeline: /homea/htb00/htb001/prefactor/bin/InitSubtract_sort_and_compute.py is an invalid value for ExecField executable
2016-04-25 15:53:23 ERROR   genericpipeline: *******************************************

The given path points to a file that either doesn't exist or that does not have the execute flag set on the file system ("chmod +x"). Usually this affects executables that are defined in the pipeline parset. So make sure that the variables in the pipeline parset point to the right files, and check if the execute flag is set.

KeyError 'mapfile'

Your pipeline run fails like that:

2016-02-07 14:48:58 ERROR   genericpipeline: *******************************************  
2016-02-07 14:48:58 ERROR   genericpipeline: Failed pipeline run: Pre-Facet-Cal  
2016-02-07 14:48:58 ERROR   genericpipeline: Detailed exception information:  
2016-02-07 14:48:58 ERROR   genericpipeline: <type 'exceptions.KeyError'>  
2016-02-07 14:48:58 ERROR   genericpipeline: 'mapfile'  
2016-02-07 14:48:58 ERROR   genericpipeline: *******************************************  

That happens when one step didn't generate a mapfile. Usually that means that the pipeline was looking for its input data, but couldn't find any files that match. Please check your *_input_path and *_input_pattern in the parset file!
(Note: ls -d *_input_path/*_input_pattern should find your data. But the only special character allowed in the pattern is "*"! So not everything that works with ls will work with the pipeline, but if it doesn't work with ls then it also will not work with the pipeline.)

IndexError: list index out of range

Your pipeline run fails as follows:

  File "/opt/lofar/lib/python2.7/site-packages/lofarpipe/cuisine/WSRTrecipe.py", line 132, in run
    status = self.go()
  File "/opt/lofar/lib/python2.7/site-packages/lofarpipe/recipes/master/executable_args.py", line 357, in go
    arglist_copy[ind] = arglist_copy[ind].replace(name, value[i])
IndexError: list index out of range

This happens when the lengths of the input mapfiles for the step do not match. Please check that they all have the same number of entries (single-entry mapfiles can be expanded using the expandMapfile plugin).

Out of Memory

You pipeline runs out of memory, you either find error messages about that in the log-file, or you can see (with top or so) your machine running out of memory.

With the parameters num_proc_per_node, num_proc_per_node_limit, and max_dppp_threads you can control how many processes are started in parallel and how many threads DPPP may use. Those values affect how much memory per node the pipeline needs in total. I usually start a test run of the pipeline and check with top how much memory the processes need, then I can adjust the parameters accordingly.
And of course it is possible that one of the processes has a memory leak, and will eat up memory over time. In this case report that as an "issue" on the github issue page.

Random Error with {{ <variable_name> }}

Your pipeline run fails with a random error and somewhere in the log the string {{ <variable_name> }} shows up.
E.g: ERROR genericpipeline: {{ msss_find_data_script }} is an invalid value for ExecField executable or:

ERROR   genericpipeline.executable_args: Remote process python /homea/htb00/htb003/lofar_jureca_2-15/lib/python2.7/site-packages/lofarpipe/recipes/nodes/python_plugin.py ['{{ field_name }}', [...]

You probably removed the "!" from the lines where the variables are defined. These are not remark tags, they are signs that tell the pipeline engine that this is a variable definition.

Running prefactor on multiple nodes

You have a nice, fast multi-node cluster and want to run you pipelines in parallel across several nodes

First: the clusterdesc file has nothing to do with it.
Second: after having a look at the code for the job distribution in the pipeline framework I decided that I didn't get paid enough to fix that and decided to only patch it together to work after some fashion.
You need to set the method in the [remote] section of the pipeline.cfg that works on your system. Currently there are two methods supported by the genericpipeline (and thus prefactor):

  • slurm_srun Which uses the srun command run commands on the nodes of the job reservation. (It runs "srun hostname" to figure out which nodes are available and "srun -N 1 --cpu_bind=map_cpu:none -w <hostname> <command>" to run command on host hostname.)
  • pbs_ssh Which parses the PBS_NODEFILE to figure out which nodes it may use and uses ssh to start the jobs on the nodes.
  • There is also a mode ssh_generic which takes the list of nodes from an environment variable and starts the jobs with ssh, but this mode hasn't made it into the release of the LOFAR software yet.

My pipeline crashes and I cannot find the problem here

Your problem is not listed here.
If you cannot find the problem on your own, then put the log-file somewhere where it is accessible for the supporters and open a ticket on the github issue page. Please make sure that the pipeline was run with debug output switched on, and please don't e-mail logfiles to the authors: they tend to clog the inbox.

"DEPRECATED" and log4cplus errors

The log might contain errors about deprecation and log4cplus errors. These can be ignored. Examples of these errors:

ERROR   node.lc02.calibrate_stand_alone.SB099_uv.MS.ndppp_prep_cal: support.utilities.spawn_process is DEPRECATED. Please use support.subprocessgroup.SubProcessGroup

WARNING node.lc02.calibrate_stand_alone.SB092_uv.MS.ndppp_prep_cal: /data/software/LOFAR/bin/makesourcedb stderr: log4cplus:WARN Property configuration file "makesourcedb.log_prop" not found.

Not-so-frequently Asked Questions

(Which still may be of interest to someone sometime.)

BLAS Core affinity

Your pipeline runs slow. All NDPPP-/BBS-/whatever- processes use only little CPU time and only one core of the node is busy.
On clusters like CEP-3 the OpenBLAS library is built with threading affinity. This means that by default the different processes all try to use the same core(s). The ''use LofIm'' and ''use Lofar'' scripts set an environment variable that disables this threading affinity, but if the ''pipeline.cfg'' file does not have the ''[remote]'' section included, then this environment variable is not forwarded to the processes that are started by the pipeline.
So please set the ''[remote]'' section in your ''pipeline.cfg''.

Low maxproc value

Your pipeline fails with <class 'lofarpipe.support.lofarexceptions.PipelineRecipeFailed'> and when digging through the logfile you find:

struct.error: unpack requires a string argument of length 4

Check your maxproc limit (with the command ulimit -a (on bash) or limit (on (t)csh)). If it is less than 10000 then useful pipeline runs are next to impossible, common values are around 50000. If needed, ask your local sysadmin for help increasing the maxproc limit. Explanation: The pipeline framework starts many threads that don't really do much. And on Linux machines each thread counts towards the maxproc limit. So if that limit is too low then parts of the pipeline cannot get started.

Too many open files

Your pipeline fails with Too many open files somewhere in the logfile

Check your limit for the number of open files that you are allowed to have: on bash with the command ulimit -a (-> open files) or on (t)csh limit (-> descriptors). If possible, you can try to increase the value to the hard limit (check with ulimit -Ha or limit -h). Values lower than 4096 are problematic, but even that might be too low for some runs. If needed, ask your local sysadmin for help increasing the limit.

Wiki Home

Documentation:
  1. FAQ
Clone this wiki locally