You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use lsf queue system to conduct dpgen in logining node of server cluster.After submitting the command,it reminds "RuntimeError: Meet errors will handle unexpected submission state." and suggest me to see the remote_root.But there are no mistake information in work dir. And in dp task dir, the jobs is still runing, the train.log is ok. And I can the jobs in queue system. I don't know where wrong, can you give me some hints. machine.jsons and mistake informarion attached.
When I use lsf queue system to conduct dpgen in logining node of server cluster.After submitting the command,it reminds "RuntimeError: Meet errors will handle unexpected submission state." and suggest me to see the remote_root.But there are no mistake information in work dir. And in dp task dir, the jobs is still runing, the train.log is ok. And I can the jobs in queue system. I don't know where wrong, can you give me some hints. machine.jsons and mistake informarion attached.
machine.json:
{
"api_version": "1.0",
"_deepmd_version": "2.1.0",
"train" :
{
"command": "dp",
"machine": {
"batch_type": "LSF",
"context_type": "local",
"local_root" : "./",
"remote_root":"/public/home/dmeng/DPGEN/0316testlsf/tmp"
},
"resources":
{
"number_node": 1,
"cpu_per_node": 8,
"gpu_per_node": 0,
"queue_name":"normal",
"group_size": 2,
"_batch_type": "LSF",
"_kwargs": {},
"source_list":["/public/home/dmeng/anaconda3/bin/activate deepmd"]
}
},
"model_devi":
{
"command": "lmp -i input.lammps -v restart 0",
"machine": {
"batch_type": "LSF",
"context_type": "local",
"local_root" : "./",
"remote_root":"/public/home/dmeng/DPGEN/0316testlsf/tmp"
"fp":
{
"command": "ulimit -s unlimited && mpirun -n 8 /public/home/dmeng/softwares/vasp.5.4/bin/vasp_std",
"machine": {
"batch_type": "LSF",
"context_type": "local",
"local_root" : "./",
"remote_root":"/public/home/dmeng/DPGEN/0316testlsf/tmp"
},
"resources": {
"number_node": 1,
"cpu_per_node": 8,
"gpu_per_node": 0,
"queue_name":"normal",
"group_size": 50,
"_batch_type": "LSF",
"_kwargs": {},
"source_list": ["/public/softwares/intel/oneapi/setvars.sh"]
}
}
}
The text was updated successfully, but these errors were encountered: