Denser time series of LAI

adrien.michez · 26 August 2022 10:30

Hi

I would like to produce LAI maps for small areas (< 100 km²) that I would like to process at the highest spatio temporal resolution … And with no missing data.

For the moment, I was thinking to build monthly LAI mean maps like this:

# temporal aggragation with openEO
library(openeo)

connect(host="https://openeo.vito.be/")
p <- processes()
login()

# create variables for loading collection
AOI <- list(west = 5.6131, south = 50.43213, east = 6.255395, north = 50.67681)
time_period <- c("2015-01-01", "2022-08-01")
collections_dispo <- list_collections()

# load datacube
LAI_HIGH_Res <- p$load_collection(
  id = collections_dispo$TERRASCOPE_S2_LAI_V2$id,
  spatial_extent = AOI,
  temporal_extent = time_period,
  bands=c("LAI_10M","SCENECLASSIFICATION_20M"))

# temporal aggreg
datacube_LAI_month <- p$aggregate_temporal_period(LAI_HIGH_Res,reducer = function(data, context) { p$mean(data) }, period =  "month")

But with this approach, I still have missing values and even wrong values du to clouds or poor quality estimates. I have two questions.

How to mask low quality values of LAI?
How to produce maps without missing values? For this second concern, I’m thinking about using the lower spatial resolution LAI product from copernicus but I would like to have your advice on how to do it with openEO.

# 300m LAI could be a good data source to fill the gaps?
LAI_LOW_Res <- p$load_collection(
  id = collections_dispo$CGLS_LAI300_V1_GLOBAL$id,
  spatial_extent = AOI,
  temporal_extent = time_period,
  bands="LAI")

stefaan.lippens · 26 August 2022 16:11

Hi Adrien,

Using the SCENECLASSIFICATION_20M layer you could use cloud masking and remove bad data. I currently haven’t the bandwidth to produce an example snippet for your use case, but this (Python) notebook contains and example of cloud masking of NDVI data: https://github.com/Open-EO/openeo-python-client/blob/master/examples/notebooks/openeo-terrascope-webinar.ipynb

I see you are experimenting with monthly aggregates. This should work after you have applied cloud masking, so that only valid data is aggregated.
On a monthly temporal resolution you still somewhat risk having no valid observations in a certain month. If you also want some output here, you could opt for temporal interpolation of the time series, which will fill the gaps in months without any observation. You can use the array_interpolate_linear process for that. I don’t have an example of that around, but maybe @michele.claus has some pointers?

adrien.michez · 29 August 2022 09:47

Thanks!

I’m not sure about how to properly access to the array (along time dimension). Do you have material to share (python or R - I don’t mind)? I tried something but not working (in python)

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_NDVI_V2 collection.
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2019-01-01", "2021-12-31"],
                               bands="LAI_10M")
# temporal aggregation
LAI_masked_month = LAI_masked.aggregate_temporal_period(period =  "month" , reducer="mean")

# interpolation attemps
LAI_masked_month_interpolate = LAI_masked_month.apply_dimension(dimension='t').array_interpolate_linear()

m.mohr · 29 August 2022 12:00

Please try this instead of your last line

LAI_masked_month_interpolate = LAI_masked_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")

stefaan.lippens · 29 August 2022 14:10

Your snippet is also missing how LAI_masked is constructed, but I guess you left it out as an exercise to the reader to keep it short?

adrien.michez · 30 August 2022 08:23

You’re right… But I cannot edit the code (slow mode). This last snippet is working as expected

Thanks for your help !

# interpolate VI time series
import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_NDVI_V2 collection.
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2015-07-01", "2022-08-15"],
                               bands="LAI_10M")

# temporal aggregation
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# linear interpolation
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_maps_Vesdre_interpolatelight")
job.start_job()

adrien.michez · 1 September 2022 11:53

I guess this is related to this thread but I cannot manage to generate an interpolated time series using mask values… I guess this is due to the modification to datacube properties when masking. Ther error I get when launching the 2 job is:

error processing batch job Traceback (most recent call last): File “batch_job.py”, line 328, in main run_driver() File “batch_job.py”, line 301, in run_driver run_job( File “/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py”, line 43, in memory_logging_wrapper return function(*args, **kwargs) File “batch_job.py”, line 397, in run_job assets_metadata = result.write_assets(str(output_file)) File “/opt/venv/lib64/python3.8/site-packages/openeo_driver/save_result.py”, line 110, in write_assets return self.cube.write_assets(filename=directory, format=self.format, format_options=self.options) File “/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py”, line 1691, in write_assets asset_paths = self._get_jvm().org.openeo.geotrellis.netcdf.NetCDFRDDWriter.writeRasters( File “/opt/spark3_2_0/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py”, line 1309, in call return_value = get_return_value( File “/opt/spark3_2_0/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py”, line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.netcdf.NetCDFRDDWriter.writeRasters. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 58 in stage 20.0 failed 4 times, most recent failure: Lost task 58.3 in stage 20.0 (TID 1515) (epod027.vgt.vito.be executor 109): ExecutorLostFailure (executor 109 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 135598 ms Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2403) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2352) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2351) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2351) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1109) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1109) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1109) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2591) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:898) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279) at org.apache.spark.rdd.RDD.count(RDD.scala:1253) at org.openeo.geotrellis.netcdf.NetCDFRDDWriter$.cacheAndRepartition(NetCDFRDDWriter.scala:211) at org.openeo.geotrellis.netcdf.NetCDFRDDWriter$.saveSingleNetCDFGeneric(NetCDFRDDWriter.scala:123) at org.openeo.geotrellis.netcdf.NetCDFRDDWriter$.saveSingleNetCDFGeneric(NetCDFRDDWriter.scala:105) at org.openeo.geotrellis.netcdf.NetCDFRDDWriter$.writeRasters(NetCDFRDDWriter.scala:77) at org.openeo.geotrellis.netcdf.NetCDFRDDWriter.writeRasters(NetCDFRDDWriter.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:829)

Here is my code, working as expected when not masking…

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_LAI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2022-01-01", "2022-08-15"],
                               bands=["LAI_10M","SCENECLASSIFICATION_20M"])

# get classification band
SCL = datacube.band("SCENECLASSIFICATION_20M")
LAI = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2022-01-01", "2022-08-15"],
                               bands=["LAI_10M"])

# masking
mask = ~ ((SCL == 4) | (SCL == 5) | (SCL == 6))
LAI_masked = LAI.mask(mask)

# temporal aggregation
LAI_masked_month = LAI_masked.aggregate_temporal_period(period =  "month" , reducer="mean")
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# interpolation attemps
LAI_masked_month_interpolate = LAI_masked_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")

# saving results
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_unmasked_monthly2")
job.start_job()

res2 = LAI_masked_month_interpolate.save_result(format="netCDF")
job2 = res2.create_job(title = "LAI_masked_monthly2")
job2.start_job()

adrien.michez · 5 September 2022 07:16

I wanted to say related to this thread: Filter_bands() VS band() - #6 by stefaan.lippens

I guess the masking process change the datacube parameter?

stefaan.lippens · 5 September 2022 10:36

I’m a bit short on time to have a closer look, but at first sight your code looks ok and should work.

I would however recommend to avoid these separate two load_collection calls and just do a single one, e.g.:

datacube = con.load_collection(
    "TERRASCOPE_S2_LAI_V2",
    spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
    temporal_extent=["2022-01-01", "2022-08-15"],
    bands=["LAI_10M","SCENECLASSIFICATION_20M"]
)
SCL = datacube.band("SCENECLASSIFICATION_20M")
LAI = datacube.filter_bands(["LAI_10M"])

I’m afraid however that this will not fix the issue.

Do you also get the error if you only do the aggregate_temporal_period but not the array_interpolate_linear operation?

adrien.michez · 5 September 2022 12:23

Nope… It didn’t work! I’m quite lost

stefaan.lippens · 6 September 2022 13:57

at the moment, when trying to reproduce your problem, I get this error:

OpenEoApiError: [500] Internal: Server error: 
TypeError('Cannot compare tz-naive and tz-aware timestamps') (ref: r-094595b16c...

Are you also seeing that, or do you still get that large error dump like this?

error processing batch job Traceback (most recent call last): File “batch_job.py”, line 328, in main run_driver() File “batch_job.py”, line 301, in run_driver run_job( File “/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py”, line 43, in memory_logging_wrapper return function(*args, **kwargs) File “batch_job.py”, line 397, in run_job assets_

adrien.michez · 6 September 2022 14:40

Thanks for the followup!

It looks like there are much more info’s in the logs (but not the smae as yours):

stefaan.lippens · 7 September 2022 08:27

In the meantime I fixed the Cannot compare tz-naive and tz-aware timestamps already (under aggregate_temporal_period: Cannot compare tz-naive and tz-aware timestamps · Issue #138 · Open-EO/openeo-python-driver · GitHub). I’m not sure when this will be available on dev or production (CI pipeline is failing at the moment)

adrien.michez · 9 September 2022 11:13

Can I do something by my side about this? Is this a bug to be fixed ? I don’t understand what is still buggy or even if I did something wrong in my code.

stefaan.lippens · 9 September 2022 14:51

At the moment I consider this a bug at our side, so not much you can do there.

However, I just quickly reran a test script based on your use case (against our development instance at openeo-dev.vito.be ) and it didn’t fail like before.
So could you try again using "openeo-dev.vito.be" as connection URL and report the outcome?

stefaan.lippens · 9 September 2022 15:53

yep, it seems to work now

I’ve played with this use case: openEO forum 452 use case · GitHub

See bottom of that notebook:

what you call LAI_masked_month has 80% non-nodata pixels
the interpolated version, LAI_masked_month_interpolate, goes to 88% valid pixels

Note that we don’t reach 100% because array_interpolate_linear only interpolates the part of the timeseries where there is valid data before and after temporarily. It does not extrapolate at the start or end of the timeseries with missing data.

adrien.michez · 9 September 2022 21:40

Thank you so much for the follow-up… I’m sorry but this snippet is still throwing me errors.

import openeo

# load data from openo
con  = openeo.connect("https://openeo-dev.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_LAI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2015-01-01", "2022-08-15"],
                               bands=["LAI_10M","SCENECLASSIFICATION_20M"])

# get classification band
SCL = datacube.band("SCENECLASSIFICATION_20M")
LAI = datacube.filter_bands(["LAI_10M"])

# masking
mask = ~ ((SCL == 4) | (SCL == 5) | (SCL == 6))
LAI_masked = LAI.mask(mask)

# temporal aggregation
LAI_masked_month = LAI_masked.aggregate_temporal_period(period =  "month" , reducer="mean")
LAI_month = LAI.aggregate_temporal_period(period =  "month" , reducer="mean")

# interpolation attemps
LAI_masked_month_interpolate = LAI_masked_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")
LAI_month_interpolate = LAI_month.apply_dimension(process = "array_interpolate_linear", dimension = "t")

# saving results
res = LAI_month_interpolate.save_result(format="netCDF")
job = res.create_job(title = "LAI_unmasked_monthly2_DEV")
job.start_job()

res2 = LAI_masked_month_interpolate.save_result(format="netCDF")
job2 = res2.create_job(title = "LAI_masked_monthly2_DEV")
job2.start_job()

stefaan.lippens · 12 September 2022 07:33

Do you still have the job-id of that failing job?

adrien.michez · 12 September 2022 14:35

yes

j-daafa35bfa2f4eba8b7b9f654d61dc78

stefaan.lippens · 13 September 2022 14:00

Hmm your snippet even fails when trying to save_result the LAI directly (so without masking, or aggregation).
It is a pretty large time window, but with monthly aggregation, that should still be reasonable.
Maybe it’s a particular date in the whole range that causes issues. Needs some more investigation.