Error in particular case of aggregate_temporal_period + aggregate_spatial

Hi,

I suddenly run into an error when using aggregate_temporal_period followed by aggregate_spatial. The error only occurs in a specific case. Below is a simplified script that explores 4 “test cases”, each with a different sequence of preprocessing steps. One of them fails consistently.

import openeo
import datetime

connection = openeo.connect("openeo-dev.vito.be").authenticate_oidc()

URL = "https://raw.githubusercontent.com/MargotVerhulst/raw/main/Plotbuffer10_8point_N5_random.json"

# Load datacube
cube = connection.load_collection(
    collection_id="TERRASCOPE_S2_TOC_V2",
    temporal_extent=[datetime.datetime(2018, 1, 1), datetime.datetime(2019, 1, 1)],
    bands=["B02", "B03", "B04", "SCL"])

# Test 1: no cloud masking, spatial aggregation
cube_raw = cube.filter_bands(["B02", "B03", "B04"])  # Remove SCL band
cube_raw_agg = cube.aggregate_spatial(geometries=URL, reducer="mean")

# Test 2: no cloud masking, temporal resampling, spatial aggregation
cube_dek = cube.aggregate_temporal_period(period="dekad", reducer="median")
cube_dek_agg = cube_dek.aggregate_spatial(geometries=URL, reducer="mean")

# Test 3: cloud masking, spatial aggregation
SCL = cube.band("SCL")
mask_scl = ~ ((SCL == 4) | (SCL == 5))
cube_mask1 = cube.mask(mask_scl)
cube_mask1 = cube_mask1.filter_bands(["B02", "B03", "B04"])  # Remove SCL band
cube_mask1_agg = cube_mask1.aggregate_spatial(geometries=URL, reducer="mean")

# Test 4: cloud masking, temporal resampling, spatial aggregation
cube_mask1_dek = cube_mask1.aggregate_temporal_period(period="dekad", reducer="median")
cube_mask1_dek_agg = cube_mask1_dek.aggregate_spatial(geometries=URL, reducer="mean")

# Batch jobs
res1 = cube_raw_agg.save_result(format="JSON")
job1 = res1.send_job(title="testscript_test1")
job1.start_job()

res2 = cube_dek_agg.save_result(format="JSON")
job2 = res2.send_job(title="testscript_test2")
job2.start_job()

res3 = cube_mask1_agg.save_result(format="JSON")
job3 = res3.send_job(title="testscript_test3")
job3.start_job()

res4 = cube_mask1_dek_agg.save_result(format="JSON")
job4 = res4.send_job(title="testscript_test4")
job4.start_job()

id job1: j-1cda64f0beaa4653b012696b26f4e753 → finishes
id job2: j-7d9ab30aabfe4752bd620204bdb8a7c0 → finishes
id job3: j-70e0c772b34c4a49b2019e12bde3c875 → finishes
id job4: j-77bf374100fc4094b369e06ab56fef2f → error

The error log of job4 mentions “ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2”, which I also saw when trying it as a synchronous job.

Any ideas on why this error arises in that particular case?

Thanks in advance.

Hi Margot,

The source of the error was indeed related to when you first use aggregate_temporal_period where one of the periods only has NoData values and then combine that result with a call to aggregate_spatial.

I was able to fix this issue and push it to the development environment:

openeo-dev.vito.be

Could you try running the failed batch job again using this environment?

Hi Jeroen,

I just ran everything again and all jobs finished succesfully, so the issue seems to be fixed.
I’m also going to check it again in my original script where I first got the error.
Thank you very much!