Memory - overhead problem

Hey all,

This is an example of the processing ID’s which failed multiple times:

  • j-03fb0ef385fd4e0abd809c740416db7c
  • j-fd9d5d47599541999e73456928f4e353

The local error message: **JobFailedException**: Batch job 'j-03fb0ef385fd4e0abd809c740416db7c' didn't finish successfully. Status: error (after 3:11:54).

and the Web editor shows the error information: Uour batch job failed because workers used too much Python memory. The same task was attempted multiple times. Consider increasing executor-memoryOverhead or contact the developers to investigat.

Thus, I believe there is only the memory problem. This worklfow was tested successfully using synchronous downalod for a smaller spatial and temporal extent. Could you please confirm that this is only related to memory problem?
It is good to say that, I was running the completely same batch job (j-ef42883d631f4784812fc9116b5e6a86) without one extra step and it finished perfectly. The extra step that was added at the end of the not working workflow is: cube_threshold = s2_cube.mask(s2_cube.apply(lambda x: lt(x,0.75)))). Is this process problematic?

Hi Andrea,

there’s indeed a more efficient approach to filter on values within the same datacube. The idea is to use the ‘if-else’ openeo process, and a callback on the bands of the datacube. The snippet below tries to illustrate it:

from openeo.processes import if_
s2_cube_a = s2_cube.apply_dimension(dimension="bands", process=lambda x: if_(x < 0.75, x))

The other option is to configure memory, as your job is indeed reaching a level of complexity where the default settings may not be sufficient. The example below shows how to do that. Your job was failing on too few “executor-memory”, so that would be the setting to increase gradually. Of course, increasing memory also increases the cost of the job.

job_options = {
        "executor-memory": "3G",
        "executor-memoryOverhead": "4G",
        "executor-cores": "2"
}
cube.execute_batch(  out_format="GTiff",
        job_options=job_options)

Thanks Jeroen, it works fine with ‘if-else’.

Hey again,
I can see the same memory problem in other jobs. FX: the calculation of NDVI and NDWI for a couple of scenes. Is there more efficient approach how to calculate this? The current approach:

s2_cube = append_index(s2_cube,"NDWI") ## index 8
s2_cube = append_index(s2_cube,"NDVI") ## index 9

You can check jobs based on these IDs:

  • j-f134e2de601942cba7ace33ceccb3689' ,
  • j-ad7c818280b34b5b9b5e7fb7c1372e8d

Hi Andrea,

it’s already quite efficient. One trick we can add is to change the data type to something smaller. I’m guessing you now get float32 output, and we can scale this for instance to shorts. If your current range is between 0 and 1, you could do:
cube.linear_scale_range(0.0,1.0,0,10000)

It is however nog guaranteed to work, so simply increasing memory is also an option. You can try the settings below to add 1GB to each worker:

Hey Jeroen,

Could you please advice me if the job_option is applied correctly in send_job:

job_options = {
        "executor-memory": "3G",
        "executor-memoryOverhead": "3G",
        "executor-cores": "2"}

s2_cube_save = s2_cube_swf.save_result(format='netCDF') #GTiff #netCDF
my_job  = s2_cube_save .send_job(title="s2_cube", job_options=job_options)
results = my_job.start_and_wait().get_results()
results.download_files("s2_cube") 

Hi Andrea,
yes, this looks correct. You can increase memory values if jobs still give a memory related error.
If you have a job id, I can always check deeper if the settings are really picked up.

I have tried to scale it for a bigger spatial extent, however it doesn not work:

start_date           = '2021-06-01'
spatial_extent  = {'west': -74, 'east': -73, 'south': 4, 'north': 5, 'crs': 'epsg:4326'} #colombia

## Get the Sentinel-2 data for a 3 month window.
start_date_dt_object = datetime.strptime(start_date, '%Y-%m-%d')
end_date             = (start_date_dt_object + relativedelta(months = +1)).date() ## End date, 1 month later (1st Feb. 2021)
start_date_exclusion = (start_date_dt_object + relativedelta(months = -2)).date() ## exclusion date, to give a 3 month window.

bands                = ['B02', 'B03', 'B04', 'B08', 'CLP', 'SCL' , 'sunAzimuthAngles', 'sunZenithAngles'] 

s2_cube_scale = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands           = bands)

job_options = {
        "executor-memory": "5G",
        "executor-memoryOverhead": "5G",
        "executor-cores": "3"}

s2_cube_scale_save = s2_cube_scale.save_result(format='netCDF') #GTiff #netCDF
my_job  = s2_cube_scale_save .send_job(title="s2_cube_scale", job_options=job_options)
results = my_job.start_and_wait().get_results()
results.download_files("s2_cube_scale") 

This was error message:
Traceback (most recent call last): File “/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py”, line 1727, in get_log_entries with (self.get_job_output_dir(job_id) / “log”).open(‘r’) as f: File “/usr/lib64/python3.8/pathlib.py”, line 1221, in open return io.open(self, mode, buffering, encoding, errors, newline, File “/usr/lib64/python3.8/pathlib.py”, line 1077, in _opener return self._accessor.open(self, flags, mode) FileNotFoundError: [Errno 2] No such file or directory: ‘/data/projects/OpenEO/j-1bf5a22fd5674494b1e295c3a9654d16/log’

Also I have tried with a bit smaller area and bigger memory but not suceess, can you advice me what should be improved?

start_date           = '2021-06-01'
spatial_extent  = {'west': -74, 'east': -73.5, 'south': 4.5, 'north': 5, 'crs': 'epsg:4326'} #colombia

## Get the Sentinel-2 data for a 3 month window.
start_date_dt_object = datetime.strptime(start_date, '%Y-%m-%d')
end_date             = (start_date_dt_object + relativedelta(months = +1)).date() ## End date, 1 month later (1st Feb. 2021)
start_date_exclusion = (start_date_dt_object + relativedelta(months = -2)).date() ## exclusion date, to give a 3 month window.

bands                = ['B02', 'B03', 'B04', 'B08', 'CLP', 'SCL' , 'sunAzimuthAngles', 'sunZenithAngles'] 

s2_cube_scale = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands           = bands)

job_options = {
        "executor-memory": "100G",
        "executor-memoryOverhead": "100G",
        "executor-cores": "4"}

s2_cube_scale_save = s2_cube_scale.save_result(format='netCDF') #GTiff #netCDF
my_job  = s2_cube_scale_save .send_job(title="s2_cube_scale", job_options=job_options)
results = my_job.start_and_wait().get_results()
results.download_files("s2_cube_scale")

I have tried with: s2_cube_scale = s2_cube_scale.linear_scale_range(0.0,1.0,0,10000) but still no positive answer.

Printing logs:
[{'id': 'error', 'level': 'error', 'message': 'Traceback (most recent call last):\n  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1727, in get_log_entries\n    with (self.get_job_output_dir(job_id) / "log").open(\'r\') as f:\n  File "/usr/lib64/python3.8/pathlib.py", line 1221, in open\n    return io.open(self, mode, buffering, encoding, errors, newline,\n  File "/usr/lib64/python3.8/pathlib.py", line 1077, in _opener\n    return self._accessor.open(self, flags, mode)\nFileNotFoundError: [Errno 2] No such file or directory: \'/data/projects/OpenEO/j-3c44cd000ba648a8b6350a0765d8e48c/log\'\n'}]