Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIO error messages in hurricane forward runs #482

Open
xylar opened this issue Dec 19, 2022 · 10 comments
Open

PIO error messages in hurricane forward runs #482

xylar opened this issue Dec 19, 2022 · 10 comments
Assignees
Labels
bug Something isn't working ocean

Comments

@xylar
Copy link
Collaborator

xylar commented Dec 19, 2022

Seeing the following error messages in MPAS-Ocean log files in hurricane sandy/forward step:

ERROR: MPAS IO Error: Bad return value from PIO
@xylar xylar added the bug Something isn't working label Dec 19, 2022
@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

@sbrus89, this may not be fatal (it the model seems to run okay) but it's disconcerting and should probably be tracked down.

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

Example output is at:

/lcrc/group/e3sm/ac.xylar/compass_1.2/chrysalis/test_20221219/baseline/hurricane/ocean/hurricane/DEQU120at30cr10rr2/sandy/forward/log.ocean.0000.err

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

In the more detailed error log, I'm seeing:

PIO: ERROR: Defining variable  (ndims = 1) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
PIO: ERROR: Defining variable  (ndims = 1) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
PIO: ERROR: Defining variable  (ndims = 2) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)

See

/lcrc/group/e3sm/ac.xylar/compass_1.2/chrysalis/test_20221219/baseline/hurricane/case_outputs/ocean_hurricane_DEQU120at30cr10rr2_sandy.log

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

I'm afraid I don't see what illegal character(s) this might be.

@sbrus89
Copy link
Collaborator

sbrus89 commented Dec 19, 2022

It seems like this only occurs on the initial write to the pointwiseStats.nc file:

 Constituent P1
   Frequency 0.725229500000000E-04
   Amplitude 0.468480000000000E-01
   LoveNumbers 0.706000000000000
   NodalAmplitude 1.00000000000000
   Astronomical argument 0.00000000000000
   NodalPhase 1.23913185109541
   Type 1 
 
  -- Reducing field latCell with nElements = 182
  -- Reducing field lonCell with nElements = 182
  -- Reducing field ssh with nElements = 182
ERROR: MPAS IO Error: Bad return value from PIO
ERROR: MPAS IO Error: Bad return value from PIO
ERROR: MPAS IO Error: Bad return value from PIO
 ... Updating 1d real field windSpeedU in stream 
 ... found 1d real named windSpeedU
 ... done updating field
 ... Updating 1d real field windSpeedV in stream 
 ... found 1d real named windSpeedV
 ... done updating field
 ... Updating 1d real field atmosPressure in stream 
 ... found 1d real named atmosPressure
 ... done updating field
 Doing timestep 2012-10-10_00:00:25
 Verifying that cells are not dry... 
 Minimum thickness is 2267.73554688385.
 Done verifying that cells are wet.
 Doing timestep 2012-10-10_00:00:50
 Verifying that cells are not dry... 
 Minimum thickness is 2267.73554688386.
 Done verifying that cells are wet.

Subsequent writes don't have this:

 Verifying that cells are not dry...
 Minimum thickness is 2267.73554331533.
 Done verifying that cells are wet.
  -- Reducing field latCell with nElements = 182
  -- Reducing field lonCell with nElements = 182
  -- Reducing field ssh with nElements = 182
 Doing timestep 2012-10-10_00:30:25
 Verifying that cells are not dry...
 Minimum thickness is 2267.73554304084.
 Done verifying that cells are wet.

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

In the analysis step, I see:

  * step: analysis

compass calling: compass.ocean.tests.hurricane.analysis.Analysis.run()
  in /gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py

      Failed
Exception raised while running the steps of the test case
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 145, in run_tests
    _run_test(test_case)
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 394, in _run_test
    _run_step(test_case, step, test_case.new_step_log_file)
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 437, in _run_step
    step.run()
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py", line 191, in run
    data[run] = self.read_pointstats(self.pointstats_file[run])
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py", line 103, in read_pointstats
    pointstats_nc.variables['lonCellPointStats'][:])
KeyError: 'lonCellPointStats'

I assume this is related. What do you think?

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

An ncdump on the file shows that it doesn't have any useful variables in it:

$ ncdump -h pointwiseStats.nc 
netcdf pointwiseStats {
dimensions:
	nPoints = 182 ;
	StrLen = 64 ;
	Time = UNLIMITED ; // (1153 currently)
variables:
	int pointCellGlobalID(nPoints) ;
		pointCellGlobalID:long_name = "List of global cell IDs in point set." ;
	char xtime(Time, StrLen) ;
		xtime:long_name = "model time, with format \'YYYY-MM-DD_HH:MM:SS\'" ;

// global attributes:
		:model_name = "mpas" ;
...

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

The run timed out and I had to rerun. Any chance that has something to do with it?

@xylar
Copy link
Collaborator Author

xylar commented Dec 19, 2022

Two potentially relevant things have changed recently. First, I switch the default MPI on Chrysalis to be OpenMPI. Second, I built new Spack environments for compass 1.2.0.alpha3. I think the former is more likely than the latter to be the reason that the problem has just emerged even though there aren't any obviously relevant changes to the test case. Nothing changed regarding SCORPIO in the latest spack build that I can think of, so I don't see why that would be relevant.

It would be easy to build and run with Intel-MPI instead (as I did in my previous tests of hurricane). I'll try that now but it's getting late.

@xylar
Copy link
Collaborator Author

xylar commented Dec 20, 2022

@sbrus89, no luck with Intel-MPI so that's not the reason. I'm pretty lost what could have changed here to cause this problem.

@xylar xylar added the ocean label Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ocean
Projects
None yet
Development

No branches or pull requests

2 participants