Filter less cloudy image

If we create a datacube, how can we filter through this composite:

  • the date of scenes which have less than 50% clouds?
  • how we can see the dates of all scenes?

Is this possible to filter above query before applying results.download_file(“results.nc”)? So we can see results before we download all of them.

datacube = connection.load_collection(
    'SENTINEL2_L1C',
    spatial_extent  = {'west':bbox[0],'east':bbox[2],'south':bbox[1],'north':bbox[3]},
    temporal_extent = ("2022-01-01", "2022-01-31"),
    bands           = ["B02","B03","B04"],
)

Thank you for this really interesting question. The SENTINEL2_L1C collection is a Level-1 dataset, where you can basically apply the processes atmospheric_correction and ard_surface_reflectance. Once you have applied one of these two processes you can then also use all other available processes (including a number of filter processes). Per default these two processes erase clouds from scenes. If you do not want to erase the clouds, you can set options = {“erase_clouds”: False} in atmospheric_correction or cloud_detection_options = {“erase_clouds”: False} in ard_surface_reflectance. So you could compare a scene, where you erase the clouds, with a scene, where you do not do that. Unfortunately, it is not possible to filter by the amount of clouds before doing atmospheric_correction or ard_surface_reflectance.

The dates of the scenes are included in the filenames and the metadata.

Filtering the query before downloading results would be possible for example with the process filter_bbox. But for this collection it can only be applied after atmospheric_correction or ard_surface_reflectance. If you want to work with a collection where atmospheric_correction is already applied, you could use the “boa_sentinel_2” collection.

Thanks Vallentina!
If I use atmospheric_correction or ard_surface_reflectance for the S2 collection for the entire year, how can scenes with less than 50% clouds be selected?

Yes, seeing the metadata for one scene is fine but how can i can get dates for a stack of X scenes, i.e., an entire S2 collection covering the month data as we can see above in the script - datacube

The thing with selecting scenes with few clouds is that you would need to start two jobs, where one would erase the clouds and one would not erase them as described above and the filtering can only be done after these two jobs have finished. So if you plan to start these two jobs for an entire year, this will take quite some time to finish. What can be done faster, would be to just compare two scenes (one with clouds, one without clouds). You can find an example of comparing in this notebook (there are some figures included close to the bottom, under Step 4 - Open the new created files): SRR1_notebooks/OpenEO Platform - OpticalARD-options.ipynb at main · openEOPlatform/SRR1_notebooks · GitHub

As we store the files per timestamps, we do not really have the option to list the dates in one file yet. So the best option really is to look into the metadata. Thank you for pointing that out!

Hi @sulova.andrea. Here is an example on how to use the cloud coverage as an additional property for filtering the data. (I’m filtering out the 15th of February, when the city of Bolzano was covered by clouds).

This example works with pre-processed L2A data. So, if you can work directly with that instead of re applying the atmospheric correction this would be enough.

import openeo
from openeo.processes import between, eq, lte
openeoHost = "https://openeo.cloud"
conn = openeo.connect(openeoHost).authenticate_oidc("egi")

im = conn.load_collection(
        "SENTINEL2_L2A_SENTINELHUB",
        spatial_extent={'west': 11.304588, 'east': 11.377716, 'south': 46.465311, 'north': 46.516839},
        temporal_extent=["2022-02-14", "2022-02-18"],
        properties={
            "eo:cloud_cover": lambda x: lte(x, 50)
        })
im_nc = im.save_result(format='netCDF')
job = conn.create_job(im_nc,title="CLOUD_FILTER_TEST_BOZEN2")
job_id = job.job_id
if job_id:
    print("Batch job created with id: ",job_id)
    job.start_job()
else:
    print("Error! Job ID is None")