[ecco-support] [EXTERNAL] Cost function calculation problems when changing number of cores in ECCOv4-r4

Zhang, Hong (US 398K) hong.zhang at jpl.nasa.gov
Mon Dec 28 10:49:30 EST 2020



> On Dec 28, 2020, at 7:41 AM, Dan Jones <dcjones.work at gmail.com> wrote:
> 
> Hello,
> 
> I am attempting to carry out some scaling tests on an HPC platform using ECCOv4-r4. As part of the scaling exercises, I change the number of cores using the procedure below. 
> 
> If I want to use 360 cores, I use the following procedure. Before compiling, I change the following parameters in SIZE.h:
> 
> sNx = 15
> sNy = 15
> nPx = 360
> nPy = 1
> 
> Based on my experience with ECCOv4-r3, I believe this should give me an executable that uses 360 cores. At run-time, I uncomment the lines following #15x15 nprocs=360 and comment out the lines for the nprocs=96 case. I also have a configuration that uses nprocs=192. In all three cases, the code compiles and runs.
> 
> However, I noticed that the cost function changes dramatically based on the number of cores, and it sometimes returns NaN. In both forward and adjoint mode, I get the following values for "fc" in the file named costfunction0129:
> 
> 96 cores, fc = 6733184.16
> 192 cores, fc = NaN
> 360 cores, fc = 5883940.61
> 
> What am I missing in terms of calculating the cost function and changing the number of cores? Shouldn't fc be the same regardless of the number of cores used?
Hi Dan,
“fc” will change because “profilesfiles” (those of “*.nc”) are prescribed on 30x30 grid;
as to the NaN, could it also be related to profiles, or other issue?

cheers
Hong




More information about the ecco-support mailing list