Masking does not work

The strange thing is that the only place where index=5 is used is:

scl = s2_cube.band("SCL")
mask_scl = (scl == 3)

(band SCL is sixth band in the initial SENTINEL2_L2A_SENTINELHUB cube)

Also strange is that I manage to get a result if I download synchronously (e.g. masked.download("res.nc")), while with the batch job, I also get the same error. I’m still trying to figure out what is going on.

If you are trying to run code and have a concrete problem (e.g. something does not work, or gives unexpected result), it always helps if you state which back-end you are trying on (openeo.vito.be, openeo-dev.vito.be, openeo.cloud, …) because the subtle feature differences might be relevant. Also, if you have the job id of a failing job (like j-c7e85b8f4df94475958e54ccb6d1e73a in your example): that also helps to zoom in on the problem quicker.

1 Like

This is currently our priority as we need to add more functions at the end of workflow, it would be great if you can spend time on this.

While I’m waiting for a batch job to return, some tips to improve the implementation:

This LOOKUPTABLE dictionary with string expressions and usage of eval is not a recommended code style. I think you can do the same in a more maintainable way as follows:

from openeo.processes import exp

LOOKUPTABLE = {
    "tropical": {
        "S1": lambda vv: 1 / (1 + exp(- (-7.17 + (-0.48 * vv)))),
        "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
        "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
        },
    ...

...

def water_function(data):
    return LOOKUPTABLE[zone]["S2"](ndwi=data[8], ndvi=data[9])

This will give less surprises in the long term.

1 Like

Thanks for letting us know, we try to digest in that manner then.

The batch job is more problematic as you know.

it is not related to this topic but I have also experienced a batch job problem related to the S1 collection - missing pixels in raster, however synchronous download provides perfect data.

I’m still working on this and figuring out what the root cause is,
but this might be a workaround at the moment:

The resample_cube_spatial call there is not necessary because exclusion_mask already has the S2 spatial sampling.

If I just do

masked = s1_s2_water.mask(exclusion_mask)

in my testing, it seems I can avoid the error.

I was trying this masked = s1_s2_water.mask(exclusion_mask) send as batch job but it was not working. Did you get results of masked?
We want to use batch_job as my tasks is bigger

If you want to inspect the job:'j-68506a2d7ec24c1a9e939b86c684bb0d'

The workaround I proposed about dropping the resample_cube_spatial indeed isn’t a solution for you problem.

We found another possible workaround however: building the SCL and CLP mask from separate load_collection cubes.

I haven’t tried yet on your complete implementation, but something like this might work:


s2_cube = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands  = ['B02', 'B03', 'B04', 'B08', 'sunAzimuthAngles', 'sunZenithAngles'] )

s2_cube_masking = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands = ['CLP', 'SCL'] )

scl = s2_cube_masking.band("SCL")
mask_scl = (scl == 3) | (scl == 8) | (scl == 9) | (scl == 10) |(scl == 11)
# mask {1 = cloud, 0 = free cloud}
s2_cube = s2_cube.mask(mask_scl)

# CLP (cloud probabilities) based on S2cloudless (160 blocks and 10m resolution)
clp = s2_cube_masking.band("CLP")
mask_clp = (clp / 255) > 0.3  
# mask {1 = cloud, 0 = free cloud}
s2_cube = s2_cube.mask(mask_clp)

note the usage of s2_cube_masking in a couple of places (instead of a single s2_cube)

unfortnuately, it was not working with s2_cube_masking:
connection.job('j-9f19f17fb03c4355b6221980179f4113').logs()

Edited Code:


start_date           = '2021-06-01'
spatial_extent       = {'west': -74.06810760, 'east': -73.90597343, 'south': 4.689864510, 'north': 4.724080996, 'crs': 'epsg:4326'} #colombia
zone                 = "tropical"

start_date_dt_object = datetime.strptime(start_date, '%Y-%m-%d')

end_date             = (start_date_dt_object + relativedelta(months = +1)).date() ## End date, 1 month later (1st Feb. 2021)
start_date_exclusion = (start_date_dt_object + relativedelta(months = -1)).date() 
bands                = ['B02', 'B03', 'B04', 'B08', 'CLP', 'SCL' , 'sunAzimuthAngles', 'sunZenithAngles'] 


LOOKUPTABLE = {
    "tropical": {
        "S1": lambda vv: 1 / (1 + exp(- (-7.17 + (-0.48 * vv)))),
        "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
        "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
        },
    "subtropical": 
       {
        "S1":  lambda vv, vh: 1 / (1 + exp(- (-8.1 + (-0.13 * vv) + (-0.27 * vh)))),
        "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
        "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
        }
    }


s2_cube = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands           = bands)

# Scene classification data, based on Sen2Cor
# scl == 3    Cloud Shadows 
# scl == 8    Clouds medium probability
# scl == 9    Clouds high probability
# scl == 10   Cirrus
# scl == 11   Snow / Ice

s2_cube_masking = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands = ['CLP', 'SCL'] )

scl = s2_cube_masking.band("SCL")
mask_scl = (scl == 3) | (scl == 8) | (scl == 9) | (scl == 10) |(scl == 11)
# mask {1 = cloud, 0 = free cloud}
s2_cube = s2_cube.mask(mask_scl)

# CLP (cloud probabilities) based on S2cloudless (160 blocks and 10m resolution)
clp = s2_cube_masking.band("CLP")
mask_clp = (clp / 255) > 0.3  
# mask {1 = cloud, 0 = free cloud}
s2_cube = s2_cube.mask(mask_clp)


s2_count = s2_cube.filter_bands(bands = ["B08"])
s2_count = s2_count.reduce_dimension(reducer=lambda data: data.count(), dimension = "t")
s2_count = s2_count.rename_labels("bands",["count"])


s2_cube = append_index(s2_cube,"NDWI") ## index 8
s2_cube = append_index(s2_cube,"NDVI") ## index 9


def water_function(data):
    return LOOKUPTABLE[zone]["S2"](ndwi=data[8], ndvi=data[9])


s2_cube_water = s2_cube.reduce_dimension(reducer = water_function, dimension = "bands")
s2_cube_water = s2_cube_water.add_dimension("bands", "water_prob", type = "bands") 


s2_cube_water_threshold = s2_cube_water.apply_dimension(dimension="bands", process=lambda x: if_(x > 0.75, x,0))
s2_cube_water_threshold = s2_cube_water_threshold.rename_labels("bands",["w_T75"])


s2_cube_water_sum = s2_cube_water_threshold.reduce_dimension(reducer = "sum", dimension = "t")
s2_cube_water_sum = s2_cube_water_sum.rename_labels("bands",["sum"])

s2_cube_swf = s2_cube_water_sum.resample_cube_spatial(s2_count) / s2_count
s2_cube_swf = s2_cube_swf.rename_labels("bands",["swf"])

## calculate the median s2 water probability for 1-month.
s2_median_water = s2_cube_water.filter_temporal([start_date, end_date]).median_time()

s2_cube_median = s2_cube.filter_temporal([start_date, end_date]).median_time()


## Get Sentinel-1 data for a 1 month window and convert to ARD data.
s1_cube = connection.load_collection(
    'SENTINEL1_GRD', 
     spatial_extent = spatial_extent, 
     temporal_extent = [start_date, end_date], 
     bands = ['VH','VV'],
     properties = {"polarization": lambda p: p == "DV"})


s1_cube = s1_cube.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")

s1_cube = s1_cube.rename_labels("bands",["VH","VV","mask", "incidence_angle"])
s1_cube_mask = s1_cube.band("mask")

# radar shadow mask has value 2
s1_mask_RS= (s1_cube_mask == 2)
s1_cube = s1_cube.mask(s1_mask_RS)

def log_(x):
      return 10*log (x, 10)

s1_median = s1_cube.median_time().apply(log_) 


def s1_water_function(data):
    return LOOKUPTABLE[zone]["S1"](vv=data[1])

s1_median_water = s1_median.reduce_dimension(reducer = s1_water_function, dimension = "bands")


exclusion_mask = (s1_median_water.resample_cube_spatial(s2_cube_swf) > 0.5) & (s2_cube_swf < 0.33)
exclusion_mask = exclusion_mask * 1.0

s1_median_water_mask = s1_median_water.mask(exclusion_mask.resample_cube_spatial(s1_median_water))
s1_median_water_mask = s1_median_water_mask * 1.0



def s1_s2_water_function(data):
    return LOOKUPTABLE[zone]["S1_S2"](vv=data[0], ndwi=data[1])

## Get S1 VV, and S2 NDWI, and apply s1_s2_water function

s1_s2_cube = s1_median.filter_bands(['VV']).resample_cube_spatial(s2_cube_median).merge_cubes(s2_cube_median.filter_bands(['NDWI']))  # works
s1_s2_water = s1_s2_cube.reduce_dimension(reducer = s1_s2_water_function, dimension = "bands").add_dimension("bands", "var", type = "bands")

s1_s2_water = s1_s2_water.rename_labels("bands",["var"])  
s1_s2_water = s1_s2_water.filter_bands(bands = ['var'])

exclusion_mask = exclusion_mask.add_dimension("bands", "var", type = "bands")
exclusion_mask = exclusion_mask.rename_labels("bands",["var"]) 
exclusion_mask = exclusion_mask.filter_bands(bands = ['var'])


masked = s1_s2_water.mask(exclusion_mask)
# s1_s2_mask = (s1_s2_water >= 0)*1.0
# s1_s2_masked = s1_s2_water.mask(s1_s2_water.apply(lambda x: x.eq(0)), replacement = 0)


s1_s2_masked_save = masked.save_result(format='netCDF') #GTiff #netCDF
my_job  = s1_s2_masked_save.send_job(title="masked")
results = my_job.start_and_wait().get_results()
results.download_files('masked')

Thanks for trying that. It doesn’t work yet, but at least we are past the original issue (Invalid band index 5, only 4 bands available.). Now the error is

File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1092, in mask
    return cube.mask(mask=mask, replacement=replacement)
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 932, in mask
    mask_pyramid_levels = {
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 933, in <dictcomp>
    k: l.tile_to_layout(layout=self.pyramid.levels[k])
  File "/opt/venv/lib64/python3.8/site-packages/geopyspark/geotrellis/layer.py", line 1819, in tile_to_layout
    raise ValueError("The layout needs to have the same crs as the TiledRasterLayer")
ValueError: The layout needs to have the same crs as the TiledRasterLayer

can you try adding resample_cube_spatial again here:

masked = s1_s2_water.mask(exclusion_mask.resample_cube_spatial(s1_s2_water))

No success - 'j-eccc8fa0910240e0a2b375a5e5c29532'

Still that ValueError('The layout needs to have the same crs as the TiledRasterLayer') issue there.
So there is still a resample issue, but as far as I seen in your process graph, the necessary resample_spatial_cube nodes are there.

Do you think, it is feasible to fix a root cause?

Hi Andrea,
thanks for your patience, the pieces of the puzzle are finally coming together.
So there’s two separate issues that we need to solve, but it could be that the suggested workarounds are also better for you from a performance/accuracy perspective:

  1. The use of this separate s2_cube_masking cube, suggested by Stefaan will avoid loading data from sentinelhub which is then discarded later on. The other option would be to apply the mask in an ‘apply_dimension’ process on the bands. The original option will always be slower than these two alternatives. (Which doesn’t mean we won’t look into the original problem.)

  2. The use of s1_cube.ard_normalized_radar_backscatter generates files in an EPSG:4326 projection. This was an original limitation of card4l backscatter, which is rather strict. If you replace that call with the very similar s1_cube = s1_cube.sar_backscatter(coefficient="gamma0-terrain", mask=True, elevation_model="COPERNICUS_30"), you will get the same backscatter values, and avoid unnecessary reprojection and loss of accuracy and efficiency. Of course, same story here, we will also tackle the problem with using the card4l variant, but in the way that you use it, it seems to add no advantages and only adds trouble, because card4l is more strict with respect to certain metadata files that you are not using.

Can you try this last suggestion as well in your more complex example? I hope that we’re almost there, and can then also solve the root causes.

best regards,
Jeroen

Hey Jeroen,

Thanks for providing the suggestions:

  1. s2_cube_masking does not work. Also, I have tested with s1_s2_water.mask(exclusion_mask.resample_cube_spatial(s1_s2_water))
    You check the porblem: j-7fc5bce0bd894fe490a91488d5cff498
    b) Could you specifiy how exactly to apply mask via apply_dimension?

  2. I have replaced the s1_cube but no success :
    Job 'j-13f0b8556e8c4ed18ee817e6e1e4056c'

This is the code with implementing your suggestions, is there any problem(?):

start_date           = '2021-06-01'
spatial_extent       = {'west': -74.06810760, 'east': -73.90597343, 'south': 4.689864510, 'north': 4.724080996, 'crs': 'epsg:4326'} #colombia
zone                 = "tropical"

start_date_dt_object = datetime.strptime(start_date, '%Y-%m-%d')

end_date             = (start_date_dt_object + relativedelta(months = +1)).date() ## End date, 1 month later (1st Feb. 2021)
start_date_exclusion = (start_date_dt_object + relativedelta(months = -1)).date() 



LOOKUPTABLE = {
    "tropical": {
        "S1": lambda vv: 1 / (1 + exp(- (-7.17 + (-0.48 * vv)))),
        "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
        "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
        },
    "subtropical": 
       {
        "S1":  lambda vv, vh: 1 / (1 + exp(- (-8.1 + (-0.13 * vv) + (-0.27 * vh)))),
        "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
        "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
        }
    }


s2_cube = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands           = ['B02', 'B03', 'B04', 'B08', 'sunAzimuthAngles', 'sunZenithAngles'] )


# This will avoid loading data from sentinelhub which is then discarded later on
s2_cube_masking = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent  = spatial_extent,
    temporal_extent = [start_date_exclusion, end_date],
    bands = ['CLP', 'SCL'] )

# Scene classification data, based on Sen2Cor
# scl == 3    Cloud Shadows 
# scl == 8    Clouds medium probability
# scl == 9    Clouds high probability
# scl == 10   Cirrus
# scl == 11   Snow / Ice

scl = s2_cube_masking.band("SCL")
mask_scl = (scl == 3) | (scl == 8) | (scl == 9) | (scl == 10) |(scl == 11)
# mask {1 = cloud, 0 = free cloud}
s2_cube = s2_cube.mask(mask_scl)

# CLP (cloud probabilities) based on S2cloudless (160 blocks and 10m resolution)
clp = s2_cube_masking.band("CLP")
mask_clp = (clp / 255) > 0.3  
# mask {1 = cloud, 0 = free cloud}
s2_cube = s2_cube.mask(mask_clp)


s2_count = s2_cube.filter_bands(bands = ["B08"])
s2_count = s2_count.reduce_dimension(reducer=lambda data: data.count(), dimension = "t")
s2_count = s2_count.rename_labels("bands",["count"])


s2_cube = append_index(s2_cube,"NDWI") ## index 8
s2_cube = append_index(s2_cube,"NDVI") ## index 9


def water_function(data):
    return LOOKUPTABLE[zone]["S2"](ndwi=data[8], ndvi=data[9])


s2_cube_water = s2_cube.reduce_dimension(reducer = water_function, dimension = "bands")
s2_cube_water = s2_cube_water.add_dimension("bands", "water_prob", type = "bands") 


s2_cube_water_threshold = s2_cube_water.apply_dimension(dimension="bands", process=lambda x: if_(x > 0.75, x,0))
s2_cube_water_threshold = s2_cube_water_threshold.rename_labels("bands",["w_T75"])


s2_cube_water_sum = s2_cube_water_threshold.reduce_dimension(reducer = "sum", dimension = "t")
s2_cube_water_sum = s2_cube_water_sum.rename_labels("bands",["sum"])

s2_cube_swf = s2_cube_water_sum.resample_cube_spatial(s2_count) / s2_count
s2_cube_swf = s2_cube_swf.rename_labels("bands",["swf"])

## calculate the median s2 water probability for 1-month.
s2_median_water = s2_cube_water.filter_temporal([start_date, end_date]).median_time()

s2_cube_median = s2_cube.filter_temporal([start_date, end_date]).median_time()


## Get Sentinel-1 data for a 1 month window and convert to ARD data.
s1_cube = connection.load_collection(
    'SENTINEL1_GRD', 
     spatial_extent = spatial_extent, 
     temporal_extent = [start_date, end_date], 
     bands = ['VH','VV'],
     properties = {"polarization": lambda p: p == "DV"})


# s1_cube = s1_cube.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")
s1_cube = s1_cube.sar_backscatter(coefficient="gamma0-terrain", mask=True, elevation_model="COPERNICUS_30")

s1_cube = s1_cube.rename_labels("bands",["VH","VV","mask", "incidence_angle"])
s1_cube_mask = s1_cube.band("mask")

# radar shadow mask has value 2
s1_mask_RS= (s1_cube_mask == 2)
s1_cube = s1_cube.mask(s1_mask_RS)

def log_(x):
      return 10*log (x, 10)

s1_median = s1_cube.median_time().apply(log_) 


def s1_water_function(data):
    return LOOKUPTABLE[zone]["S1"](vv=data[1])

s1_median_water = s1_median.reduce_dimension(reducer = s1_water_function, dimension = "bands")


exclusion_mask = (s1_median_water.resample_cube_spatial(s2_cube_swf) > 0.5) & (s2_cube_swf < 0.33)
exclusion_mask = exclusion_mask * 1.0

s1_median_water_mask = s1_median_water.mask(exclusion_mask.resample_cube_spatial(s1_median_water))
s1_median_water_mask = s1_median_water_mask * 1.0


def s1_s2_water_function(data):
    return LOOKUPTABLE[zone]["S1_S2"](vv=data[0], ndwi=data[1])

## Get S1 VV, and S2 NDWI, and apply s1_s2_water function

s1_s2_cube = s1_median.filter_bands(['VV']).resample_cube_spatial(s2_cube_median).merge_cubes(s2_cube_median.filter_bands(['NDWI']))  # works
s1_s2_water = s1_s2_cube.reduce_dimension(reducer = s1_s2_water_function, dimension = "bands").add_dimension("bands", "var", type = "bands")

s1_s2_water = s1_s2_water.rename_labels("bands",["var"])  
s1_s2_water = s1_s2_water.filter_bands(bands = ['var'])

exclusion_mask = exclusion_mask.add_dimension("bands", "var", type = "bands")
exclusion_mask = exclusion_mask.rename_labels("bands",["var"]) 
exclusion_mask = exclusion_mask.filter_bands(bands = ['var'])

masked = s1_s2_water.mask(exclusion_mask.resample_cube_spatial(s1_s2_water))


s1_s2_masked_save = masked.save_result(format='netCDF') #GTiff #netCDF
my_job  = s1_s2_masked_save.send_job(title="s1_s2_masked_save")
results = my_job.start_and_wait().get_results()
results.download_files('s1_s2_masked_save')

Hi Andrea,
I tried it, in fact, due to no longer loading SCL and CLP in the first cube, we need to adjust the indices.
I also made a minor adjustment to masking with SCL and CLP. This example now works here:

from openeo.processes import exp,log,if_

    connection = openeo.connect("openeo-dev.vito.be").authenticate_oidc()
    start_date = '2021-06-01'
    spatial_extent = {'west': -74.06810760, 'east': -73.90597343, 'south': 4.689864510, 'north': 4.724080996,
                      'crs': 'epsg:4326'}  # colombia
    zone = "tropical"

    start_date_dt_object = datetime.strptime(start_date, '%Y-%m-%d')

    end_date = (start_date_dt_object + relativedelta(months=+1)).date()  ## End date, 1 month later (1st Feb. 2021)
    start_date_exclusion = (start_date_dt_object + relativedelta(months=-1)).date()

    LOOKUPTABLE = {
        "tropical": {
            "S1": lambda vv: 1 / (1 + exp(- (-7.17 + (-0.48 * vv)))),
            "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
            "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
        },
        "subtropical":
            {
                "S1": lambda vv, vh: 1 / (1 + exp(- (-8.1 + (-0.13 * vv) + (-0.27 * vh)))),
                "S2": lambda ndvi, ndwi: 1 / (1 + exp(- (0.845 + (2.14 * ndvi) + (13.5 * ndwi)))),
                "S1_S2": lambda vv, ndwi: 1 / (1 + exp(- (-2.64 + (-0.23 * vv) + (8.6 * ndwi)))),
            }
    }

    s2_cube = connection.load_collection(
        'SENTINEL2_L2A_SENTINELHUB',
        spatial_extent=spatial_extent,
        temporal_extent=[start_date_exclusion, end_date],
        bands=['B02', 'B03', 'B04', 'B08', 'sunAzimuthAngles', 'sunZenithAngles'])

    # This will avoid loading data from sentinelhub which is then discarded later on
    s2_cube_masking = connection.load_collection(
        'SENTINEL2_L2A_SENTINELHUB',
        spatial_extent=spatial_extent,
        temporal_extent=[start_date_exclusion, end_date],
        bands=['CLP', 'SCL'])

    # Scene classification data, based on Sen2Cor
    # scl == 3    Cloud Shadows
    # scl == 8    Clouds medium probability
    # scl == 9    Clouds high probability
    # scl == 10   Cirrus
    # scl == 11   Snow / Ice

    scl = s2_cube_masking.band("SCL")
    mask_scl = (scl == 3) | (scl == 8) | (scl == 9) | (scl == 10) | (scl == 11)
    # mask {1 = cloud, 0 = free cloud}
    #s2_cube = s2_cube.mask(mask_scl)

    # CLP (cloud probabilities) based on S2cloudless (160 blocks and 10m resolution)
    clp = s2_cube_masking.band("CLP")
    mask_clp = mask_scl | (clp / 255) > 0.3
    # mask {1 = cloud, 0 = free cloud}
    s2_cube = s2_cube.mask(mask_clp)

    s2_count = s2_cube.filter_bands(bands=["B08"])
    s2_count = s2_count.reduce_dimension(reducer=lambda data: data.count(), dimension="t")
    s2_count = s2_count.rename_labels("bands", ["count"])

    s2_cube = append_indices(s2_cube, ["NDWI","NDVI"])  ## index 6,7
    #s2_cube = append_index(s2_cube, "NDVI")  ## index 8

    def water_function(data):
        return LOOKUPTABLE[zone]["S2"](ndwi=data[6], ndvi=data[7])

    s2_cube_water = s2_cube.reduce_dimension(reducer=water_function, dimension="bands")
    s2_cube_water = s2_cube_water.add_dimension("bands", "water_prob", type="bands")

    s2_cube_water_threshold = s2_cube_water.apply_dimension(dimension="bands", process=lambda x: if_(x > 0.75, x, 0))
    s2_cube_water_threshold = s2_cube_water_threshold.rename_labels("bands", ["w_T75"])

    s2_cube_water_sum = s2_cube_water_threshold.reduce_dimension(reducer="sum", dimension="t")
    s2_cube_water_sum = s2_cube_water_sum.rename_labels("bands", ["sum"])

    s2_cube_swf = s2_cube_water_sum.resample_cube_spatial(s2_count) / s2_count
    s2_cube_swf = s2_cube_swf.rename_labels("bands", ["swf"])

    ## calculate the median s2 water probability for 1-month.
    s2_median_water = s2_cube_water.filter_temporal([start_date, end_date]).median_time()

    s2_cube_median = s2_cube.filter_temporal([start_date, end_date]).median_time()

    ## Get Sentinel-1 data for a 1 month window and convert to ARD data.
    s1_cube = connection.load_collection(
        'SENTINEL1_GRD',
        spatial_extent=spatial_extent,
        temporal_extent=[start_date, end_date],
        bands=['VH', 'VV'],
        properties={"polarization": lambda p: p == "DV"})

    # s1_cube = s1_cube.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")
    s1_cube = s1_cube.sar_backscatter(coefficient="gamma0-terrain", mask=True, elevation_model="COPERNICUS_30")

    s1_cube = s1_cube.rename_labels("bands", ["VH", "VV", "mask", "incidence_angle"])
    s1_cube_mask = s1_cube.band("mask")

    # radar shadow mask has value 2
    s1_mask_RS = (s1_cube_mask == 2)
    s1_cube = s1_cube.mask(s1_mask_RS)

    def log_(x):
        return 10 * log(x, 10)

    s1_median = s1_cube.median_time().apply(log_)

    def s1_water_function(data):
        return LOOKUPTABLE[zone]["S1"](vv=data[1])

    s1_median_water = s1_median.reduce_dimension(reducer=s1_water_function, dimension="bands")

    exclusion_mask = (s1_median_water.resample_cube_spatial(s2_cube_swf) > 0.5) & (s2_cube_swf < 0.33)
    exclusion_mask = exclusion_mask * 1.0

    #s1_median_water_mask = s1_median_water.mask(exclusion_mask.resample_cube_spatial(s1_median_water))
    #s1_median_water_mask = s1_median_water_mask * 1.0

    def s1_s2_water_function(data):
        return LOOKUPTABLE[zone]["S1_S2"](vv=data[0], ndwi=data[1])

    ## Get S1 VV, and S2 NDWI, and apply s1_s2_water function

    s1_s2_cube = s1_median.filter_bands(['VV']).resample_cube_spatial(s2_cube_median).merge_cubes(
        s2_cube_median.filter_bands(['NDWI']))  # works
    s1_s2_water = s1_s2_cube.reduce_dimension(reducer=s1_s2_water_function, dimension="bands").add_dimension("bands",
                                                                                                             "var",
                                                                                                             type="bands")

    s1_s2_water = s1_s2_water.rename_labels("bands", ["var"])
    s1_s2_water = s1_s2_water.filter_bands(bands=['var'])

    exclusion_mask = exclusion_mask.add_dimension("bands", "var", type="bands")
    exclusion_mask = exclusion_mask.rename_labels("bands", ["var"])
    exclusion_mask = exclusion_mask.filter_bands(bands=['var'])

    masked = s1_s2_water.mask(exclusion_mask.resample_cube_spatial(s1_s2_water))

    s1_s2_masked_save = masked.save_result(format='netCDF')  # GTiff #netCDF
    my_job = s1_s2_masked_save.send_job(title="s1_s2_masked_save")
    results = my_job.start_and_wait().get_results()
    results.download_files('s1_s2_masked_save')

The ouput does not seem correct.


s2 cube is causing the smaller spatal extent:

Thanks, so one step further, we’ll figure out why the data isn’t there. So basically the sentinel-2 data is not loaded for the full cube or something? (Or it perhaps gets masked away.)

  1. s2_cube loading - fine

  2. mask_clp looks fine :

  3. Masking - s2_cube is masked by mask_clp then the final masked s2_cube does not look good: