Filter multiple dates with filter_labels

I am trying to extract from a datacube only specific dates.
The filter_temporal process allows only one time range, so it’s not suitable (workaround would be to chain multiple filter_temporal + merge_cubes, but not so efficient I guess).

So, from the Parcel Delineation notebook I found out that it’s possible to do that combining filter_labels, date_shift and date_between.

Still, after several trials, I’m not able to get only the dates I want to keep. What am I doing wrong?

import openeo
import xarray as xr
from numpy import datetime_as_string
from openeo import processes as eop

bounding_box_32632_10x10 = dict(
    west=680000, east=680100, south=5151500, north=5151600, crs="EPSG:32632"
)

temporal_interval = ["2022-06-01", "2022-07-01"]

conn = openeo.connect("openeo.cloud").authenticate_oidc()

cube = conn.load_collection("SENTINEL2_L2A",spatial_extent=bounding_box_32632_10x10,temporal_extent=temporal_interval,bands=["B04"])
cube.download("sample.nc")

ds = xr.open_dataset("sample.nc")
timesteps = [datetime_as_string(t, unit="s", timezone="UTC") for t in ds.t.values]
print(timesteps)

# Keep the second and fourth dates
dates_to_keep = [timesteps[1], timesteps[3]]

## Create a condition that checks if a date is one of the best timesteps: from https://github.com/Open-EO/openeo-community-examples/blob/815ab0cf4662a1b2be0881f55a9d4896467ed224/python/ParcelDelineation/Parcel%20delineation.ipynb
condition = lambda x : eop.any(
    [
        eop.date_between(
            x = x,
            min = timestep,
            max = eop.date_shift(date=timestep, value=1, unit='day')) 
        for timestep in dates_to_keep
    ]
)

## Filter the dates using the condition
cube_reduced = cube.filter_labels(
    condition = condition,
    dimension = "t"
)
cube_reduced.download("sample_dates5.nc")
print(xr.open_dataset("sample_dates5.nc").t.values)
Authenticated using refresh token.
['2022-06-02T00:00:00Z', '2022-06-05T00:00:00Z', '2022-06-07T00:00:00Z', '2022-06-10T00:00:00Z', '2022-06-12T00:00:00Z', '2022-06-15T00:00:00Z', '2022-06-17T00:00:00Z', '2022-06-20T00:00:00Z', '2022-06-22T00:00:00Z', '2022-06-25T00:00:00Z', '2022-06-27T00:00:00Z', '2022-06-30T00:00:00Z']
['2022-06-02T00:00:00.000000000' '2022-06-05T00:00:00.000000000'
 '2022-06-07T00:00:00.000000000' '2022-06-10T00:00:00.000000000'
 '2022-06-12T00:00:00.000000000' '2022-06-15T00:00:00.000000000'
 '2022-06-17T00:00:00.000000000' '2022-06-20T00:00:00.000000000'
 '2022-06-22T00:00:00.000000000' '2022-06-25T00:00:00.000000000'
 '2022-06-27T00:00:00.000000000' '2022-06-30T00:00:00.000000000']

Hi,

This is an unfortunate case where Terrascope has to fallback to sentinelhub to fetch SENTINEL2_L2A data. In such a case, we don’t support filter_labels yet.

This case will give a more clear error when this ticket is deployed: https://github.com/Open-EO/openeo-geopyspark-driver/issues/749

A workaround could be to switch to the CDSE backend for Sentinel2: openeo.connect("https://openeo.dataspace.copernicus.eu").authenticate_oidc()

You can also use max_cloud_cover:

cube = conn.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=bounding_box_32632_10x10,
    temporal_extent=temporal_interval,
    bands=["B04"],
    max_cloud_cover=80,  # Avoid Sentinel Hub fallback
)

@emile.sonneveld thanks for the feedback. We are not filtering cloud covered dates, so the last suggestion is not relevant in our case.

Additionally, we also can’t use CDSE, since @valentina.premier needs a specific collection which is not available there :sweat_smile:

Ah, maybe an other workaound:

cube = conn.load_collection(
    "TERRASCOPE_S2_TOC_V2",  # Force use Terrascope
    spatial_extent=bounding_box_32632_10x10,
    temporal_extent=temporal_interval,
    bands=["B04"],
)

It that does not do the trick, we can check it next week. I understood that @valentina.premier will pass by the VITO offices.