Masked Sentinel 2 standard deviation

Hi, I have a question regarding the calculation of the reflectance standard deviation of a pixel: I’m currently working with spyder, but the standard deviation I obtain with the function “sd” is (very) different from the one I obtain by downloading them. Is there any problem with the NaN ingestion?
I paste below the code I’m using.

import openeo
from openeo.processes import is_nan, not_, sd, lte
import numpy as np

# %% Initialization

connection = openeo.connect("https://openeo.cloud")
connection.authenticate_oidc()

# Input variables

temp_ext = ["2020-01-31", "2020-12-31"] # fix temporal range
spat_extent={"west": 11.577, "south": 44.871, "east": 11.617, "north": 44.911} # fix spatial range
band=["B08"] # choose NIR band
maskband=["CLD","CLP"] # choose cloud mask bands
maskthr=50 # choose mask threshold
filt= (np.ones([5,5])/25).tolist() # create kernel weights

# S2 datacube

datacube = connection.load_collection(collection_id = "SENTINEL2_L1C_SENTINELHUB", spatial_extent = spat_extent, temporal_extent = temp_ext, bands = band,properties={"eo:cloud_cover": lambda x: lte(x, 70)}) # extract datacube of S2 L1C only if cloud cover <70

# mask datacube
mask0 = connection.load_collection(collection_id = "SENTINEL2_L2A_SENTINELHUB", spatial_extent = spat_extent, temporal_extent = temp_ext, bands = maskband,properties={"eo:cloud_cover": lambda x: lte(x, 70)}) # extract S2 L1C cloud probability
mask0 = mask0 >= maskthr
mask0 = mask0.band(maskband[0]).logical_or(mask0.band(maskband[1]))
mask0 = mask0.resample_cube_spatial(datacube)

# masking data

datacube=datacube.mask(mask0) # filter cloud mask

# %% CW mask calculation

s=datacube.apply_dimension(process='sd',dimension='t') # calculate S2 temporal standard deviation

s.download("std.nc", format="netcdf")
datacube.download("test.nc", format="netcdf")

I think I found the problem:
I did not rescale the S2 data, therefore the reflectance is up to 10000. Within the standard deviation calculation, the square is calculated and I guess that the maximum value that openeo can operate with is reached. Is there a way to do the operation without rescaling the data?

Indeed, that’s a good hypothesis.
Could you perhaps try rescaling anyway, just to confirm it?

datacube.linear_scale_range(0,10000,0,1).apply_dimension(process='sd',dimension='t')

would do it already, you can always scale back afterwards if desired.
If we can confirm this, I can see if there’s a way to also give the correct result for limited data types.

I did not obtain the exact same results as by downloading the dataset and applyng the function externally, but it works fine (the average difference are in the order of 10^-12).
Just a suggestion: It could be useful to add a filed within the dataset description where the maximum and minumum values of each band are reported, in order to simplify the linear_scale_range operation.

1 Like

Hi,
I have found other issues:
Following the standard deviation calculation, I would need to calculate the average value and to extract the 5th percentile value. I tried to do it but I found some errors.

Regarding the average operation:

rect={
  "type":"FeatureCollection",
  "features":[
    {
      "type":"Feature",
      "geometry": {
        "type":"Polygon",
        "coordinates":[[[11.577,44.871],[11.577,44.911],[11.617,44.911],[11.617,44.871],[11.577,44.871]]]
      },
      "properties":{"name":"area1"}
    }
  ]
}

datacube=datacube.apply_dimension(process='sd',dimension='t') 
m=aggregate_spatial(rect,'mean')
m.download("avrg.nc", format="netcdf")

error:
[500] Internal: Failed to process synchronously on backend vito: OpenEoApiError('[500] unknown: setting an array element with a sequence.') (ref: e261f57e-4f39-4220-8f77-abbf3f806303)

Regarding the quantiles operation, If I undestood correctly it is not directly appliable to the datacube I obtained since it should be calculated spatially and the “reduce_spatial” operation is still not available. Therefore, I tried to obtain a vector of data though the flatten_dimensions operation, but without any success. I tried both:

fl=datacube.process(process_id='flatten_dimensions',arguments={'dimensions':['x','y'],'target_dimension':'xy'})
fl.download("flat.nc", format="netcdf")

and

fl=datacube.flatten_dimensions(dimensions=['x','y'],target_dimension='xy')
fl.download("flat.nc", format="netcdf")

but i obtained error messages:

OpenEoApiError: [500] Internal: Failed to process synchronously on backend vito: OpenEoApiError("[400] ProcessUnsupported: Process with identifier 'flatten_dimensions' is not available in namespace 'backend'. (ref: 3e0b0435-9554-4b8d-b6bf-1cdc89548654)") (ref: 240e5514-f22d-4c36-a9df-5b20fac32293)

OpenEoApiError: [500] Internal: Failed to process synchronously on backend vito: OpenEoApiError("[400] ProcessUnsupported: Process with identifier 'flatten_dimensions' is not available in namespace 'backend'. (ref: 74ae3924-9c96-4b86-9c0f-6347ff6d1271)") (ref: 9cffc967-8437-4f46-917e-ce4af8c0756d)

Am I doing something wrong?

Hi Paolo,
the ‘setting an array element’ error appears to be a bug when writing this timeseries to netcdf. We’ll have to fix that, but a workaround is to use json or csv.

For the quantiles, I’m not entirely sure yet what you’re trying to do, but if you want to have it next to the mean, you can actually specify multiple reducers. This is an example:

datacube = datacube.apply_dimension(process='sd', dimension='t')
m = datacube.aggregate_spatial(rect, lambda pixels: array_create([quantiles(pixels,q=3), mean(pixels)]))
json_ts = m.execute()

I did observe an error when trying to set probabilities in quantiles, but can you let me know if this would solve your problem?

thanks,
Jeroen

hi @jeroen.dries ,
Actually, I was trying to follow an advise that I found in another discussion, Filter images and reducing spatially - #2 by m.mohr

I think you are looking for is the process reduce_spatial: openEO API Processes Unfortunately, it is not available yet in openEO Platform. A workaround could be another process called flatten_dimensions, which is currently worked on, but also still needs a bit of time until it’s available.

By reducing the 2d image to a flatten vector I was going to calculate the 5th percentile through the function reduce_dimension.

I tried to run your instruction and it kinda works. I said “kinda” for two reasons:

  1. I also tried to use “probabilities” and it doesn’t work.
  2. I downloaded the cv data to compare the results and they are slightly different: using the “array_create” method I am able to download the results also in netcdf, but the results are not coincident with the one I obtain by processing directly the downloaded dataset. the difference is in the order of 1-5%

a third issue:

  1. One I have obtained the percentiles, I should use the obtained value to create a mask. But I cannot select it: if I try to run a “filter_bands” operation on the quantiles result, an error is raised:

ValueError: Invalid band name/index 'band_0'. Valid names: ['B08']

while if I download the file, the “B08” band is not available at all. A workaround is to simply donwload the file and read the obtained value outside openeo