Denser time series of LAI

I found something!

It’s the when we use the filter_bands() procedure to selecte the layer that leads to the bug. This snippet is working as expected, when you directly select the layer:

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2015-07-01", "2022-08-15"],
                               bands="LAI_10M")

# temporal aggregation
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# linear interpolation
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_maps_Vesdre_interpolatelight")
job.start_job()

But if you select the same layer, for the same time period using filter_bands() from a datacube, you get the error (job id if needed: j-6bc83ac5189f4673834e1fedf39ca090)

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M","SCENECLASSIFICATION_20M"])

LAI = datacube.filter_bands(["LAI_10M"])
# temporal aggregation
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# linear interpolation
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_maps_Vesdre_interpolatelight")
job.start_job()

I hope this helps

hmm that’s interesting.

Another difference is that you include the “SCENECLASSIFICATION_20M” band in the second snippet too.

Does it also fail if you exclude that band from the start? so something like this:

LAI = con.load_collection(...
                              bands=["LAI_10M"])

LAI = datacube.filter_bands(["LAI_10M"])

Just tested this snippet… Successfully!

LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M"])

LAI = LAI.filter_bands(["LAI_10M"])

But this failed:

# Load data cube from TERRASCOPE
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M","SCENECLASSIFICATION_20M"])

LAI = datacube.filter_bands(["LAI_10M"])

Hi there,

Can I do something to make things moving forward? Should I try to do this task in an other way? At first look, generating interpolated and masked VI time series should be a typical open EO task?

Hi Adrien,

you’re right, we were a bit slow to look into this as I was not available for support in the past weeks, apologies for that.

In any case, your job is running into memory problems due to the long timeseries. The job with one band does work, probably simply because one band requires less memory.

The best solution for now is to increase memory, which is also described here:

Can you try that? I’m also looking a bit further at the job itself, sometimes there’s other ways to reduce memory usage.

best regards,
Jeroen

Hi Adrien,
just want to confirm that it works for me with these settings:


job_options = {
      "executor-memory": "6G",
      "executor-memoryOverhead": "2G",
      "executor-cores": "2"
  }
  LAI_month_interpolate.execute_batch("lai_adrien.nc", job_options=job_options)

you may also want to try and filter out clouds, which can also reduce memory usage.
For instance, this cloud filter process is quite aggressive, but reduces memory as well:
LAI = LAI.process("mask_scl_dilation", data=datacube, scl_band_name="SCENECLASSIFICATION_20M")

For these larger jobs (in space or time), it is however always possible that you need to increase memory a bit.

Jeroen,

No need to apologize! You guys are doing a great job with openEO :slight_smile:
My message was just in case I could do something by my own

Not sure to follow you on this. We shared a lot of snippets there :slight_smile:

Could you please share the last one you tested?

This snippet failed (job-id: j-d12273406d07409c97256289e29c85ad)

(also when I launched on the dev server)

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be/").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_LAI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M","SCENECLASSIFICATION_20M"])

LAI = datacube.filter_bands(["LAI_10M"])

LAI_masked = LAI.process("mask_scl_dilation", data=datacube, scl_band_name="SCENECLASSIFICATION_20M")

job_options = {
      "executor-memory": "6G",
      "executor-memoryOverhead": "2G",
      "executor-cores": "2"
  }
 
# temporal aggregation
LAI_masked_month = LAI_masked.aggregate_temporal_period(period =  "month" , reducer="mean")

# interpolation attemps
LAI_masked_month_interpolate = LAI_masked_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")

# saving results
res_month = LAI_masked_month_interpolate.save_result(format="netCDF")
job_month = res_month.create_job(title = "LAI_masked_month", job_options=job_options)
job_month.start_job()

Hi Adrien,

the version I’m running is slightly different, because mask_scl_dilation needs to happen before the ‘filter_bands’. However, when I do that, I run into another error that I now have to investigate. (I attached the stack trace below for my own reference.)

connection = openeo.connect("openeo.cloud").authenticate_oidc()
    LAI = connection.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M", "SCENECLASSIFICATION_20M"])

    LAI_masked = LAI.process("mask_scl_dilation", data=LAI, scl_band_name="SCENECLASSIFICATION_20M")

    LAI_masked = LAI_masked.filter_bands(["LAI_10M"])
    LAI_month = LAI_masked.aggregate_temporal_period(period="month", reducer="mean")

    # linear interpolation
    LAI_month_interpolate = LAI_month.apply_dimension(process="array_interpolate_linear", dimension="t")

    job_options = {
        "executor-memory": "6G",
        "executor-memoryOverhead": "2G",
        "executor-cores": "2"
    }
    LAI_month_interpolate.execute_batch("lai_adrien.nc", job_options=job_options)
java.lang.AssertionError: assertion failed: Band 3 cell type does not match, uint8ud0 != uint8ud127
	at scala.Predef$.assert(Predef.scala:223)
	at geotrellis.raster.ArrayMultibandTile.<init>(ArrayMultibandTile.scala:100)
	at geotrellis.raster.ArrayMultibandTile$.apply(ArrayMultibandTile.scala:46)
	at geotrellis.raster.MultibandTile$.apply(MultibandTile.scala:37)
	at org.openeo.geotrellis.OpenEOProcesses$.org$openeo$geotrellis$OpenEOProcesses$$timeseriesForBand(OpenEOProcesses.scala:45)
	at org.openeo.geotrellis.OpenEOProcesses.$anonfun$applyTimeDimension$7(OpenEOProcesses.scala:130)