Sentinel-2 data: how to filter by tile or recognize provenance?

michele.claus · 2 October 2023 16:14

Hi all,

In my current use case I’m debugging this simple workflow (notebook using CDSE):
https://github.com/EO-College/cubes-and-clouds/blob/main/lectures/3.1_data_processing/exercises/31_data_processing_cdse.ipynb

I also tried to run the same using openEO Platform and the SENTINEL2_L2A_SENTINELHUB collection but it resulted in an excessive amount of credits used, which is going to be checked by SentinelHub (cc @daniel.thiex).

I finally created the (almost) same workflow using load_stac and client side processing. I’m just missing the filter_spatial/mask_polygon which are not yet available in openeo-processes-dask. Notebook available here:
https://github.com/EO-College/cubes-and-clouds/blob/main/lectures/3.1_data_processing/exercises/31_data_processing_stac.ipynb

The final plot in the CDSE linked notebook seems wrong to us. This might be due to what kind of data is returned when requesting our bbox.

Our AOI lay between two S2 tiles:

I guess that, when setting the spatial extent in load_collection to the bounds of this polygon, the back-end returns data of both tiles. Without knowing from which tile each time step comes from, it’s difficult for us to evaluate the data.

After this introduction, the questions:

Is it possible to filter the data using the tile id? Similarly as we do with the Sentinel-1 orbit or Sentinel-2 could cover?
I guess it’s not, since I tried and I got this error:

OpenEoApiError: [500] Internal: Failed to process synchronously on backend vito: OpenEoApiError(‘[500] Internal: Server error: org.openeo.geotrellissentinelhub.SentinelHubException: Sentinel Hub returned an error\nresponse: HTTP/1.1 400 Bad Request with body: {“code”:400,“description”:"Querying is not supported on property 'tileId'. Possible properties are 'eo:cloud_cover'."}\nrequest: POST https://services.sentinel-hub.com/api/v1/catalog/1.0.0/search with (possibly abbreviated) body: \n{\n “datetime”: “2018-02-10T00:00:00Z/2018-02-12T23:59:59.999999999Z”,\n “collections”: [“sentinel-2-l2a”],\n “filter”: “tileId = '32TPS'”,\n “intersects”: {“type”:“Polygon”,“coordinates”:[[[11.020833333333357,46.65359937879776],[11.020833333333357,46.954166666666694],[11.366666666666694,46.95416666666669],[11.366666666666694,46.653599378797765],[11.020833333333357,46.65359937879776]]]},\n “limit”: 100,\n “next”: null\n} (ref: r-2129874c3a5c41d08cde14cd734b98ee)’) (ref: r-4ec0c9702e844e3a906f7fcedd9a4381)

Is it possible to distinguish from which tile the data comes from to filter it later on?
If load_stac would properly work, I would like to try using the same data as the one used locally, but currently it doesn’t seem to be possible: https://helpcenter.dataspace.copernicus.eu/hc/en-gb/community/posts/13919325025181--openEO-load-stac-process-not-working-with-public-catalogs?page=1#community_comment_13930291736349

Thanks in advance for the help!
Michele

jeroen.dries · 2 October 2023 17:06

Hi Michele,
on CDSE SENTINEL2_L2A, you can try filtering on the ‘tileId’ property. I also needed it recently and it happens to work there because their catalog has it.

Did you also check the ‘derived-from’ links in the batch job metadata? These should actually include references to the products that were included in the result, which seems to be what you are after.

best regards,
Jeroen

jeroen.dries · 3 October 2023 05:05

By the way, another general recommendation is to inspect the raw bands and the cloudmasks instead of the computed index, this may allow to establish a bit faster where the problem is situated.

michele.claus · 3 October 2023 07:33

Thanks for the suggestion. The filter works for a small temporal extent, but as soon as I extend to the desired one it sets the job status to error, but no errors in the logs.
Code to reproduce it:

from openeo.processes import eq
import openeo
conn = openeo.connect('https://openeo.dataspace.copernicus.eu/').authenticate_oidc()
collection      = 'SENTINEL2_L2A'
spatial_extent  = {'west': 11.020833333333357,
                   'east': 11.366666666666694,
                   'south': 46.653599378797765,
                   'north': 46.954166666666694,
                   'crs': 4326}
# Working
# temporal_extent = ['2018-02-10', '2018-03-01']
# Not working
temporal_extent = ["2018-02-01", "2018-06-30"]
bands           = ['B03']
properties = {"tileId": lambda x: eq(x,"32TPS")}

s2 = conn.load_collection(collection,
                          spatial_extent=spatial_extent,
                          bands=bands,
                          temporal_extent=temporal_extent,
                          properties=properties)
s2_nc = s2.save_result(format="netCDF")
job = s2_nc.create_job(title="test_data_filter_tileid")
job.start_job()

wolvie1986 · 20 February 2024 09:23

Interesting discussion. I’m trying, starting from the load_stac function, to extract exactly a whole tile with a specific ID.

Is it possible to do this with load_stac?
How to set spatial_extent so that it takes the entire tile?

michele.claus · 19 April 2024 08:59

Dear @wolvie1986 , did you try to run my example with spatial_extent = None ? And maybe reduce also the temporal extent to cover a week only.

Edit:
Sorry, I see that you mention using load_stac. In this case it really depends on which STAC Collection you are using and if it has the proper metadata for filtering given the tile id.