Cloud masking: Sentinel2_L2A_SENTINELHUB with mask and CLM band

I am running a simple job on the VITO dev backend doing cloudmasking and ndvi calculation. The job finishes without error (status = finished), but the result is not findable.
This is what I get when I click on download:

// 20221021163525
// https://openeo-dev.vito.be/openeo/1.1/jobs/j-e60e5c4107394a009b24286aecf8bf8f/results/assets/NmRlNWRiZjY5MjVkNjI5YzQ4N2EzMDhlZGQ5N2Q2ZWFiOTBhZTBjY2NkZTY2MmMzYTEyOTE1ZjRmMTlmMDE5MUBlZ2kuZXU%3D/8724ebed4e8724470922402000e5db8b/openEO.nc?expires=1666967709

{
  "code": "NotFound",
  "id": "r-0d158c807a3b4969994445bd511cf76e",
  "message": "404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again."
}

Here’s the PG in case it’s needed:

{
  "process_graph": {
    "load_collection_IYXPY4136P": {
      "arguments": {
        "bands": [
          "B04",
          "B08"
        ],
        "id": "SENTINEL2_L2A_SENTINELHUB",
        "spatial_extent": {
          "east": 11.53504,
          "north": 46.4459,
          "south": 46.4369,
          "west": 11.52191
        },
        "temporal_extent": [
          "2016-01-01",
          "2021-01-01"
        ]
      },
      "process_id": "load_collection"
    },
    "load_collection_KORGF6766Z": {
      "arguments": {
        "bands": [
          "CLM"
        ],
        "id": "SENTINEL2_L2A_SENTINELHUB",
        "spatial_extent": {
          "east": 11.53504,
          "north": 46.4459,
          "south": 46.4369,
          "west": 11.52191
        },
        "temporal_extent": [
          "2016-01-01",
          "2021-01-01"
        ]
      },
      "process_id": "load_collection"
    },
    "mask_ZLVSM9863H": {
      "arguments": {
        "data": {
          "from_node": "load_collection_IYXPY4136P"
        },
        "mask": {
          "from_node": "load_collection_KORGF6766Z"
        }
      },
      "process_id": "mask"
    },
    "reduce_dimension_AMUNX2227W": {
      "arguments": {
        "data": {
          "from_node": "mask_ZLVSM9863H"
        },
        "dimension": "bands",
        "reducer": {
          "process_graph": {
            "add_XNMVQ5622Q": {
              "arguments": {
                "x": {
                  "from_node": "array_element_UUJKP1906Z"
                },
                "y": {
                  "from_node": "array_element_SSYVV9002Q"
                }
              },
              "process_id": "add"
            },
            "array_element_SSYVV9002Q": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "index": 0,
                "return_nodata": false
              },
              "process_id": "array_element"
            },
            "array_element_UUJKP1906Z": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "index": 1,
                "return_nodata": false
              },
              "process_id": "array_element"
            },
            "divide_XCMSE5607C": {
              "arguments": {
                "x": {
                  "from_node": "subtract_QPLXP6946H"
                },
                "y": {
                  "from_node": "add_XNMVQ5622Q"
                }
              },
              "process_id": "divide",
              "result": true
            },
            "subtract_QPLXP6946H": {
              "arguments": {
                "x": {
                  "from_node": "array_element_UUJKP1906Z"
                },
                "y": {
                  "from_node": "array_element_SSYVV9002Q"
                }
              },
              "process_id": "subtract"
            }
          }
        }
      },
      "process_id": "reduce_dimension"
    },
    "save_result_MIPRN0888O": {
      "arguments": {
        "data": {
          "from_node": "reduce_dimension_AMUNX2227W"
        },
        "format": "netCDF",
        "options": {}
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

I checked the output dir of that job, and it does not contain a openEO.nc file (or any other real result file), so that’s why you get a 404 on that request.

In the job’s logs I also see this:


Oct 21, 2022 @ 16:31:34.030ERROR
No netCDF written to /data/projects/OpenEO/j-e60e5c4107394a009b24286aecf8bf8f/openEO.nc, the datacube was empty.

Thanks for checking so quickly @stefaan.lippens. I’ll check why it’s empty. Maybe something due to the masking.

I checked the process graph step by step.

  • Calculating the NDVI withouth masking works and gives expected results as netcdf (
    j-ca3f6a3540174eeea3ee0128323e438d)
  • Downloading only the band CLM as netcdf works (
    j-de00367064164b669deed846d01ceef5)
  • Using the CLM and the process mask() to mask the clouds returns an empty netcdf. I tried masking before reducing along bands to ndvi and after (same result). (
    j-391a33c4f2ca4885a1c6a6ceff54dbd8;
    j-96f570fd371549f2a52f82ee0373dc53)

The Documentation of the CLM band states that it has: 0 for non-clouds, 1 clouds, 255 no data. So it should work out of the box with the openEO mask process:

…A mask is a raster data cube for which corresponding pixels among data and mask are compared and those pixels in data are replaced whose pixels in mask are non-zero (for numbers) or true (for boolean values). The pixel values are replaced with the value specified for replacement , which defaults to null (no data)…

It is also mentioned in the documentation of CLM band has a resoluiton of 160m, maybe that causes the problem. But since it’s in the same datacube I suspect it to be interoperable. The temporal dimension should also be the same.

Here’s the process graph that I actually want to execute:

{
  "process_graph": {
    "load_collection_IYXPY4136P": {
      "arguments": {
        "bands": [
          "B04",
          "B08"
        ],
        "id": "SENTINEL2_L2A_SENTINELHUB",
        "spatial_extent": {
          "east": 11.53504,
          "north": 46.4459,
          "south": 46.4369,
          "west": 11.52191
        },
        "temporal_extent": [
          "2018-03-01",
          "2018-05-01"
        ]
      },
      "process_id": "load_collection"
    },
    "load_collection_KORGF6766Z": {
      "arguments": {
        "bands": [
          "CLM"
        ],
        "id": "SENTINEL2_L2A_SENTINELHUB",
        "spatial_extent": {
          "east": 11.53504,
          "north": 46.4459,
          "south": 46.4369,
          "west": 11.52191
        },
        "temporal_extent": [
          "2018-03-01",
          "2018-05-01"
        ]
      },
      "process_id": "load_collection"
    },
    "mask_ZLVSM9863H": {
      "arguments": {
        "data": {
          "from_node": "load_collection_IYXPY4136P"
        },
        "mask": {
          "from_node": "load_collection_KORGF6766Z"
        }
      },
      "process_id": "mask"
    },
    "reduce_dimension_AMUNX2227W": {
      "arguments": {
        "data": {
          "from_node": "mask_ZLVSM9863H"
        },
        "dimension": "bands",
        "reducer": {
          "process_graph": {
            "add_XNMVQ5622Q": {
              "arguments": {
                "x": {
                  "from_node": "array_element_UUJKP1906Z"
                },
                "y": {
                  "from_node": "array_element_SSYVV9002Q"
                }
              },
              "process_id": "add"
            },
            "array_element_SSYVV9002Q": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "index": 0,
                "return_nodata": false
              },
              "process_id": "array_element"
            },
            "array_element_UUJKP1906Z": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "index": 1,
                "return_nodata": false
              },
              "process_id": "array_element"
            },
            "divide_XCMSE5607C": {
              "arguments": {
                "x": {
                  "from_node": "subtract_QPLXP6946H"
                },
                "y": {
                  "from_node": "add_XNMVQ5622Q"
                }
              },
              "process_id": "divide",
              "result": true
            },
            "subtract_QPLXP6946H": {
              "arguments": {
                "x": {
                  "from_node": "array_element_UUJKP1906Z"
                },
                "y": {
                  "from_node": "array_element_SSYVV9002Q"
                }
              },
              "process_id": "subtract"
            }
          }
        }
      },
      "process_id": "reduce_dimension"
    },
    "save_result_MIPRN0888O": {
      "arguments": {
        "data": {
          "from_node": "reduce_dimension_AMUNX2227W"
        },
        "format": "netCDF",
        "options": {}
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

There seem to be different issues here.

I managed to reproduce: job finished without error, the result asset listing indeed lists an openEO.nc file, but download fails because it is not available. If the file is not there, we should list it in asset listing I think.

Second issue:

That is indeed a problem. Current API requires an explicit spatial resampling to align both cubes. The backend should have thrown an error because there is a difference in resolution.

Can you try adding a resample_cube_spatial before the masking?

Also, can you try a smaller temporal range? In principle, this shouldn’t be a problem, but I remember comparable issues related to certain dates and I just want to make sure we eliminate that as a possible cause.

I created github ticket about wrongly listing assets that were not generated: Better handling of empty results · Issue #255 · Open-EO/openeo-geopyspark-driver · GitHub

Thanks for the suggestions @stefaan.lippens!
I’ve tried now (with reduced temporal extent to 1 mnth):

  • loading two data cubes and resample_cube_spatial before going into mask, as you suggested (
    j-c5df3bdc4eae44b4be268ba340b78d68)
  • loading only one data cube with spectral bands and the CLM band and filter_bands before going into mask (
    j-8cb012c6639b4861b9fc0d1315a27747)

Both return empty results.

What I noticed when downloading the CLM band as .nc is that the values are 1 and 0, which is denoted as no data. Theoretically 255 should be no data. But don’t know if that affects anything within the processing chain (maybe that just occurs when writing to .nc). E.g.

gdalinfo openEO.nc

Driver: netCDF/Network Common Data Format
Files: openEO.nc
Size is 8, 7
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 32N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]]],
    CONVERSION["UTM zone 32N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Longitude of natural origin",9,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["m",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["m",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting",east,
            ORDER[1],
            LENGTHUNIT["m",1]],
        AXIS["northing",north,
            ORDER[2],
            LENGTHUNIT["m",1]],
    ID["EPSG",32632]]
Data axis to CRS axis mapping: 1,2
Origin = (693600.000000000000000,5146720.000000000000000)
Pixel Size = (160.000000000000000,-160.000000000000000)
Metadata:
  CLM#grid_mapping=crs
  CLM#long_name=CLM
  CLM#units=
  CLM#_FillValue=0
  CLM#_Unsigned=true
  crs#crs_wkt=PROJCS["WGS 84 / UTM zone 32N", GEOGCS["WGS 84", DATUM["World Geodetic System 1984", SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["EPSG","4326"]], PROJECTION["Transverse_Mercator", AUTHORITY["EPSG","9807"]], PARAMETER["central_meridian", 9.0], PARAMETER["latitude_of_origin", 0.0], PARAMETER["scale_factor", 0.9996], PARAMETER["false_easting", 500000.0], PARAMETER["false_northing", 0.0], UNIT["m", 1.0], AXIS["Easting", EAST], AXIS["Northing", NORTH], AUTHORITY["EPSG","32632"]]
  crs#spatial_ref=PROJCS["WGS 84 / UTM zone 32N", GEOGCS["WGS 84", DATUM["World Geodetic System 1984", SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["EPSG","4326"]], PROJECTION["Transverse_Mercator", AUTHORITY["EPSG","9807"]], PARAMETER["central_meridian", 9.0], PARAMETER["latitude_of_origin", 0.0], PARAMETER["scale_factor", 0.9996], PARAMETER["false_easting", 500000.0], PARAMETER["false_northing", 0.0], UNIT["m", 1.0], AXIS["Easting", EAST], AXIS["Northing", NORTH], AUTHORITY["EPSG","32632"]]
  NC_GLOBAL#Conventions=CF-1.9
  NC_GLOBAL#description=
  NC_GLOBAL#institution=openEO platform - Geotrellis backend: 0.6.5a1
  NC_GLOBAL#title=
  NETCDF_DIM_EXTRA={t}
  NETCDF_DIM_t_DEF={31,4}
  NETCDF_DIM_t_VALUES={10288,10290,10295,10298,10300,10305,10308,10310,10313,10315,10320,10323,10325,10328,10330,10333,10338,10340,10343,10345,10348,10350,10353,10355,10358,10360,10363,10365,10368,10370,10375}
  t#axis=T
  t#long_name=t
  t#standard_name=t
  t#units=days since 1990-01-01
  x#long_name=x coordinate of projection
  x#standard_name=projection_x_coordinate
  x#units=m
  y#long_name=y coordinate of projection
  y#standard_name=projection_y_coordinate
  y#units=m
Corner Coordinates:
Upper Left  (  693600.000, 5146720.000) ( 11d31'14.26"E, 46d26'46.56"N)
Lower Left  (  693600.000, 5145600.000) ( 11d31'12.59"E, 46d26'10.31"N)
Upper Right (  694880.000, 5146720.000) ( 11d32'14.20"E, 46d26'45.23"N)
Lower Right (  694880.000, 5145600.000) ( 11d32'12.52"E, 46d26' 8.98"N)
Center      (  694240.000, 5146160.000) ( 11d31'43.39"E, 46d26'27.77"N)
Band 1 Block=8x7 Type=Int16, ColorInterp=Undefined
  NoData Value=0
  Metadata:
    grid_mapping=crs
    long_name=CLM
    NETCDF_DIM_t=10288
    NETCDF_VARNAME=CLM
    units=
    _FillValue=0
    _Unsigned=true
Band 2 Block=8x7 Type=Int16, ColorInterp=Undefined
  NoData Value=0
  Metadata:
    grid_mapping=crs
    long_name=CLM
    NETCDF_DIM_t=10290
    NETCDF_VARNAME=CLM
    units=
    _FillValue=0
    _Unsigned=true

Also tried replacing the SENTINEL2_L2A_SENTINELHUB by SENTINEL2_L2A. Got the same empty result.

I’m getting a cloud masking to work using the CLP band (filter_band, apply: divide by 255, gt 0.3 as suggested here Cloud mask provides smaller spatial extent - #2 by bart.driessen). (j-a2ad0f3d530e4d55916ba0deed535501).
So cloud masking is not a blocking issue for me now.
Still testing the CLP and CLD bands a bit. Once I’m happy with a solution I’ll share the process graph and the R-Script.
Still I’d prefer using the CLM as a simple solution and see some value that masking with CLM works as well.