Denser time series of LAI

adrien.michez · 14 September 2022 07:20

I found something!

It’s the when we use the filter_bands() procedure to selecte the layer that leads to the bug. This snippet is working as expected, when you directly select the layer:

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2015-07-01", "2022-08-15"],
                               bands="LAI_10M")

# temporal aggregation
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# linear interpolation
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_maps_Vesdre_interpolatelight")
job.start_job()

But if you select the same layer, for the same time period using filter_bands() from a datacube, you get the error (job id if needed: j-6bc83ac5189f4673834e1fedf39ca090)

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M","SCENECLASSIFICATION_20M"])

LAI = datacube.filter_bands(["LAI_10M"])
# temporal aggregation
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# linear interpolation
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_maps_Vesdre_interpolatelight")
job.start_job()

I hope this helps

stefaan.lippens · 14 September 2022 08:38

hmm that’s interesting.

Another difference is that you include the “SCENECLASSIFICATION_20M” band in the second snippet too.

Does it also fail if you exclude that band from the start? so something like this:

LAI = con.load_collection(...
                              bands=["LAI_10M"])

LAI = datacube.filter_bands(["LAI_10M"])

adrien.michez · 14 September 2022 09:10

Just tested this snippet… Successfully!

LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M"])

LAI = LAI.filter_bands(["LAI_10M"])

But this failed:

# Load data cube from TERRASCOPE
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M","SCENECLASSIFICATION_20M"])

LAI = datacube.filter_bands(["LAI_10M"])

adrien.michez · 22 September 2022 12:36

Hi there,

Can I do something to make things moving forward? Should I try to do this task in an other way? At first look, generating interpolated and masked VI time series should be a typical open EO task?

jeroen.dries · 22 September 2022 16:43

Hi Adrien,

you’re right, we were a bit slow to look into this as I was not available for support in the past weeks, apologies for that.

In any case, your job is running into memory problems due to the long timeseries. The job with one band does work, probably simply because one band requires less memory.

The best solution for now is to increase memory, which is also described here:

Can you try that? I’m also looking a bit further at the job itself, sometimes there’s other ways to reduce memory usage.

best regards,
Jeroen

jeroen.dries · 22 September 2022 17:56

Hi Adrien,
just want to confirm that it works for me with these settings:


job_options = {
      "executor-memory": "6G",
      "executor-memoryOverhead": "2G",
      "executor-cores": "2"
  }
  LAI_month_interpolate.execute_batch("lai_adrien.nc", job_options=job_options)

you may also want to try and filter out clouds, which can also reduce memory usage.
For instance, this cloud filter process is quite aggressive, but reduces memory as well:
LAI = LAI.process("mask_scl_dilation", data=datacube, scl_band_name="SCENECLASSIFICATION_20M")

For these larger jobs (in space or time), it is however always possible that you need to increase memory a bit.

adrien.michez · 23 September 2022 07:45

Jeroen,

No need to apologize! You guys are doing a great job with openEO
My message was just in case I could do something by my own

adrien.michez · 23 September 2022 07:48

Not sure to follow you on this. We shared a lot of snippets there

Could you please share the last one you tested?

adrien.michez · 23 September 2022 10:22

This snippet failed (job-id: j-d12273406d07409c97256289e29c85ad)

(also when I launched on the dev server)

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be/").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_LAI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M","SCENECLASSIFICATION_20M"])

LAI = datacube.filter_bands(["LAI_10M"])

LAI_masked = LAI.process("mask_scl_dilation", data=datacube, scl_band_name="SCENECLASSIFICATION_20M")

job_options = {
      "executor-memory": "6G",
      "executor-memoryOverhead": "2G",
      "executor-cores": "2"
  }
 
# temporal aggregation
LAI_masked_month = LAI_masked.aggregate_temporal_period(period =  "month" , reducer="mean")

# interpolation attemps
LAI_masked_month_interpolate = LAI_masked_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")

# saving results
res_month = LAI_masked_month_interpolate.save_result(format="netCDF")
job_month = res_month.create_job(title = "LAI_masked_month", job_options=job_options)
job_month.start_job()

jeroen.dries · 23 September 2022 13:46

Hi Adrien,

the version I’m running is slightly different, because mask_scl_dilation needs to happen before the ‘filter_bands’. However, when I do that, I run into another error that I now have to investigate. (I attached the stack trace below for my own reference.)

connection = openeo.connect("openeo.cloud").authenticate_oidc()
    LAI = connection.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M", "SCENECLASSIFICATION_20M"])

    LAI_masked = LAI.process("mask_scl_dilation", data=LAI, scl_band_name="SCENECLASSIFICATION_20M")

    LAI_masked = LAI_masked.filter_bands(["LAI_10M"])
    LAI_month = LAI_masked.aggregate_temporal_period(period="month", reducer="mean")

    # linear interpolation
    LAI_month_interpolate = LAI_month.apply_dimension(process="array_interpolate_linear", dimension="t")

    job_options = {
        "executor-memory": "6G",
        "executor-memoryOverhead": "2G",
        "executor-cores": "2"
    }
    LAI_month_interpolate.execute_batch("lai_adrien.nc", job_options=job_options)

java.lang.AssertionError: assertion failed: Band 3 cell type does not match, uint8ud0 != uint8ud127
	at scala.Predef$.assert(Predef.scala:223)
	at geotrellis.raster.ArrayMultibandTile.<init>(ArrayMultibandTile.scala:100)
	at geotrellis.raster.ArrayMultibandTile$.apply(ArrayMultibandTile.scala:46)
	at geotrellis.raster.MultibandTile$.apply(MultibandTile.scala:37)
	at org.openeo.geotrellis.OpenEOProcesses$.org$openeo$geotrellis$OpenEOProcesses$$timeseriesForBand(OpenEOProcesses.scala:45)
	at org.openeo.geotrellis.OpenEOProcesses.$anonfun$applyTimeDimension$7(OpenEOProcesses.scala:130)

jeroen.dries · 27 September 2022 11:28

I found this last issue as well, but still need to implement a fix.

It’s logged in our issue tracker here, and scheduled for a fix over next two weeks, is that fast enough?

github.com/Open-EO/openeo-geotrellis-extensions

celltype mismatch due to different nodata value in terrascope LAI vs SCL

opened 11:23AM - 27 Sep 22 UTC

jdries

This code results in an error. The celltype of the cube is: uint8ud0 The cellt…ype of the LAI band is uint8ud127 and of the SCL band is uint8ud0. SCL already has special nodata handling here: org/openeo/geotrellis/layers/FileLayerProvider.scala:650 Solution is probably to convert SCL band to use uint8ud127. ``` LAI = connection.load_collection("TERRASCOPE_S2_LAI_V2", spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7}, temporal_extent=["2020-07-01", "2020-08-01"], bands=["LAI_10M", "SCENECLASSIFICATION_20M"]) LAI_masked = LAI.process("mask_scl_dilation", data=LAI, scl_band_name="SCENECLASSIFICATION_20M") #LAI_masked = LAI_masked.filter_bands(["LAI_10M"]) LAI_month = LAI_masked.aggregate_temporal_period(period="month", reducer="mean") # linear interpolation #LAI_month_interpolate = LAI_month.apply_dimension(process="array_interpolate_linear", dimension="t") job_options = { "executor-memory": "6G", "executor-memoryOverhead": "2G", "executor-cores": "2" } LAI_masked.execute_batch("lai.tiff",title="issue79", job_options=job_options) ```

adrien.michez · 28 September 2022 07:09

Of course it is fast enough Thanks for the follow up

Is the fix already implemented in the dev server?

adrien.michez · 9 November 2022 12:59

Hi

I ran this snippet which is directly based on yours and it is still throwing errors

Is it possible that the solution is not yet implemented? Or maybe I’m doing something wrong?

import openeo

connection = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")
LAI = connection.load_collection("TERRASCOPE_S2_LAI_V2",
                              spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                              temporal_extent=["2015-07-01", "2022-08-15"],
                              bands=["LAI_10M", "SCENECLASSIFICATION_20M"])

LAI_masked = LAI.process("mask_scl_dilation", data=LAI, scl_band_name="SCENECLASSIFICATION_20M")

LAI_masked = LAI_masked.filter_bands(["LAI_10M"])
LAI_month = LAI_masked.aggregate_temporal_period(period="month", reducer="mean")

    # linear interpolation
LAI_month_interpolate = LAI_month.apply_dimension(process="array_interpolate_linear", dimension="t")

job_options = {
        "executor-memory": "6G",
        "executor-memoryOverhead": "2G",
        "executor-cores": "2"}

LAI_month_interpolate.execute_batch("lai_adrien.nc", job_options=job_options)

jeroen.dries · 10 November 2022 12:33

Apologies, I haven’t been able to get to a final solution yet, I was looking into it, but got interrupted by some critical issues. Will try to put it back on the agenda again!

jeroen.dries · 21 November 2022 18:07

Hi Adrien,
this now works on our development instance (openeo-dev.vito.be).
It took 1h20min, resulting in a netCDF file of about 1GB.

Please try it out and let me know if you still see issues!

best regards,
Jeroen

adrien.michez · 22 November 2022 15:34

It’s working just fine

Thank you