Problems when using apply_neighborhood

Dear all,

following a discussion that we had yesterday morning with @jeroen.verstraelen, I report some issues that I encountered when trying to use apply_neighborhood both for some basic example/test udfs and some more complicated udfs.

Here the main workflow:

from pathlib import Path
import openeo
import geopandas as gpd
import matplotlib.pyplot as plt
import os 
import xarray as xr
from openeo.udf import execute_local_udf
from openeo.processes import and_


# authentication
eoconn = openeo.connect("https://openeo-dev.vito.be")
eoconn.authenticate_oidc()
eoconn.describe_account()

# load the Copernicus fractional snow cover collection
scf = eoconn.load_collection(
    "FRACTIONAL_SNOW_COVER",
    spatial_extent  = {'west':10.728539,
                       'east':11.039333,
                       'south':46.647281,
                       'north':46.796379,
                       'crs':4326},
    temporal_extent=['2023-08-02','2023-08-15'],
    bands=["FSCTOC"]
)

# resample and align to a 500 m grid - should correspond to a MODIS datacube
scf_rsmpl = scf.resample_spatial(resolution=20, projection=32632,
                                        method = "near")
scf_bbox = scf_rsmpl.filter_bbox(west=631910, south=5167310, east=655890, 
                                 north=5184290, crs=32632)

# binarize
binarize = openeo.UDF.from_file('udf-binarize.py', 
                                context={"from_parameter": "context"})
scf_binary = scf_bbox.apply(process=binarize, 
                            context={"snowT": 20})


The code of udf-binarize.py is this (according to the previous discussion Passing user defined parameters to an UDF - #2 by stefaan.lippens):

from openeo.udf import XarrayDataCube

import numpy as np
import xarray as xr


def apply_datacube(cube: XarrayDataCube,
                   context: dict) -> XarrayDataCube:
    """
    If a pixel value is greater or equal then a threshold, will set up as 
    100. If smaller, will be set up as 0.
    
    FSCTOC (Copernicus) cloud values are set as 205 -> this value is kept
    0 (no snow) is set as no data
    """
    
    snowT = context['snowT']
    
    array = cube.get_array()
    

    
    # valid pixel, no cloud, SCF between snowT and 100 : set as 100 
    condition1 = array.notnull() & (array >= snowT) & (array <= 100) & (array!=205)
    array = xr.where(condition1, 100, array)

    # valid pixel, no cloud, SCF between 0 and snowT : set as 0 
    condition2 = array.isnull() | ((array >= 0) & (array < snowT) & (array!=205))
    array = xr.where(condition2, 0, array)

    array = array.astype(np.int16)

    return XarrayDataCube(array)

First issue: when testing udf_modify_spatial in this example User-Defined Functions (UDF) explained — openEO Python Client 0.28.0a1 documentation

udf_code = Path('udf_modify_spatial.py').read_text()

cube_updated = scf_binary.apply_neighborhood(
    lambda data: data.run_udf(udf=udf_code, runtime='Python-Jep', context=dict()),
    size=[
        {'dimension': 'x', 'value': 600, 'unit': 'px'},
        {'dimension': 'y', 'value': 500, 'unit': 'px'}
    ], overlap=[])

cube_updated.download(base_path + os.sep + 'scf_test.nc')

I get an error

OpenEoApiError: [500] Internal: Server error: Exception during Spark execution: jep.JepException: <class 'TypeError'>: unsupported operand type(s) for /: 'NoneType' and 'float' (ref: r-24021421af2e4effa52839680b5d29e5)

Second issue: similarly, I get the same error when testing the udf I created myself to do something similar - an aggregation to 500 m resolution by counting the number of snow covered pixels.



from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect
from openeo.metadata import CollectionMetadata

import numpy as np
import xarray as xr



def apply_metadata(input_metadata:CollectionMetadata, 
                   context:dict) -> CollectionMetadata:

    pixel_ratio = context["pixel_ratio"]
    
    xstep = input_metadata.get('x','step')
    ystep = input_metadata.get('y','step')
    new_metadata = {
          "x": {"type": "spatial", 
                "axis": "x", 
                "step": xstep/pixel_ratio, 
                "reference_system": 32632},
          "y": {"type": "spatial", 
                "axis": "y", 
                "step": ystep/pixel_ratio, 
                "reference_system": 32632},
          "t": {"type": "temporal"}
    }
    return CollectionMetadata(new_metadata)


def scf(array: np.array, 
        pixel_ratio: int = 20,
        scf_max: bool = False,
        scf_min: bool = False,
        codes: list = [205, 210, 254, 255],
        nv_thres: int = 40) -> np.array:
    
    """
    This function calculates the snow cover fraction from a snow cover map which 
    is given as input as a np.array
    
    Parameters
    ----------
    array : np.array 
        representing the snow cover map
    pixel_ratio : int, optional
        number of pixels to be aggregated to obtain the new resolution 
        (e.g. 20x20)
    scf_max : bool, optional
        if True, consider all the non valid pixels as snow
    scf_min : bool, optional
        if True, consider all the non valid pixels as snow free
    codes : list, optional
        list of the codes assigned as no data value
    nv_thres : int, optional
        max number of pixels which can be no data. If this number is matched or 
        exceeded, the aggregated pixel is assigned as no data
        
    Returns
    -------
    aggregated : np.array
        array with the aggregated fractional snow cover
    
    """
    #x and y dimensions
    assert array.ndim == 2
    
    
    # number of columns and rows of the output aggregated array
    nrows = int(np.shape(array)[0]/pixel_ratio)
    ncols = int(np.shape(array)[1]/pixel_ratio)
    
    #initialize aggregated array
    aggregated = np.ones(shape=(nrows, ncols)) * 255
    
    scf_correct = not(scf_min) and not(scf_max)
    
    # iterate over rows
    y_j = 0
    x_i = 0
    for j in range(0, nrows):     
        # reset column counter
        x_i = 0             
        # iterate over columns
        for i in range(0, ncols):

            # read the slice of the scene matching the current
            # estimator pixel
            data_ij = array[y_j:y_j + pixel_ratio,x_i: x_i + pixel_ratio]
            data_ij[np.isnan(data_ij)] = 0
            
            # check if the pixels are valid
            if any([erc in data_ij for erc in codes]):
                nv_sum = sum([np.sum(data_ij == erc) for erc in codes])
                
                if scf_min:
                    aggregated[j, i] = np.sum(data_ij[data_ij <= 100]) / data_ij.size 
                    
                if scf_max:
                    aggregated[j,i] = (np.sum(data_ij[data_ij <= 100]) + \
                                      nv_sum*100) / data_ij.size 
                
                
                if scf_correct and nv_sum < nv_thres:
                    # calculate snow cover fraction
                    aggregated[j, i] = np.sum(data_ij[data_ij <= 100]) / (data_ij.size - nv_sum)
            else: 
                
                aggregated[j,i] = np.sum(data_ij) / data_ij.size
            # advance column counter by number of high resolution pixels
            # contained in one low resoution pixels
            x_i += pixel_ratio
        
        # advance row counter by number of high resolution pixels
        # contained in one low resoution pixels
        y_j += pixel_ratio
        
    return aggregated
    

def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    
    # pixel_ratio = context["pixel_ratio"]
    # scf_max = context["scf_max"]
    # scf_min = context["scf_min"]
    # codes = context["codes"]
    # nv_thres = context["nv_thres"]
    # D = np.array ([datetime.strptime(str(d)[0:10],'%Y-%m-%d') for d in dat.t.values])
    pixel_ratio = 25
    array: xr.DataArray = cube.get_array()

    # We make prediction and transform numpy array back to datacube

    # Pixel size of the original image
    init_pixel_size_x = array.coords['x'][-1] - array.coords['x'][-2]
    init_pixel_size_y = array.coords['y'][-1] - array.coords['y'][-2]

    
        
    # for date in array.coords['t']:
    #     cubearray: xarray.DataArray = array['t'==date].copy()
    cubearray: xr.DataArray = cube.get_array().copy()
    print(cubearray.coords)
    # if cubearray.data.ndim == 3 and cubearray.data.shape[0] == 1:
    cubearray = cubearray[0]
    # print(cubearray.coords)
    # print(cubearray.data.shape)
    predicted_array = scf(cubearray.data, 
                          pixel_ratio=pixel_ratio)
    
    # NEW COORDINATES
    xmin = array.coords['x'].min() - 10 + 250
    xmax = array.coords['x'].max() + 10 - 250
    ymin = array.coords['y'].min() - 10 + 250
    ymax = array.coords['y'].max() + 10 - 250
    coord_x = np.linspace(start = xmin, 
                          stop = xmax,
                          num = predicted_array.shape[-2])
    coord_y = np.linspace(start = ymin, 
                          stop = ymax,
                          num = predicted_array.shape[-1])
    
    predicted_cube = xr.DataArray(predicted_array, 
                                  dims=['x', 'y'], 
                                  coords=dict(x=coord_x, y=coord_y))

    return XarrayDataCube(predicted_cube)

aggregation = Path('udf-scf.py').read_text()

scf_aggregated = scf_binary.apply_neighborhood(
    lambda data: data.run_udf(udf=aggregation, 
                              runtime='Python-Jep',
                             context={"pixel_ratio": 25}),
    size=[
        {'dimension': 'x', 'value': 600, 'unit': 'px'},
        {'dimension': 'y', 'value': 500, 'unit': 'px'}
    ], overlap=[])
scf_aggregated.download(base_path + os.sep + 'scf_aggregated2.nc')

The problems are:

  • I am able to run that udf with apply but I get a wrong output - shift probably due to the chunks + metadata are note changed

  • When using apply_neighborhood I get the previous errors

Thank you in advance!

Hi Valentina,

Jeroen Verstralen just went on vacation. I hear he was going to look into it, but will be back next monday.
Is this urgent? If so I can also take a look into it.

Emile

Hi Valentina,

It looks like the udf_modify_spatial.py script from the documentation is outdated, I logged an issue for that. Here is a version that should work instead:

import xarray
from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect
from openeo.metadata import CollectionMetadata, SpatialDimension
import numpy as np


def apply_metadata(metadata: CollectionMetadata,
                   context: dict) -> CollectionMetadata:
    """
    Modify metadata according to up-sampling factor
    """
    new_dimensions = metadata._dimensions.copy()
    for index, dim in enumerate(new_dimensions):
        if isinstance(dim, SpatialDimension):
            new_dim = SpatialDimension(name=dim.name,
                                       extent=dim.extent,
                                       crs=dim.crs,
                                       step=dim.step / 2.0)
            new_dimensions[index] = new_dim

    updated_metadata = metadata._clone_and_update(dimensions=new_dimensions)
    return updated_metadata


def fancy_upsample_function(array: np.array, factor: int = 2) -> np.array:
    assert array.ndim == 3
    return array.repeat(factor, axis=-1).repeat(factor, axis=-2)
    

def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    array: xarray.DataArray = cube.get_array()

    cubearray: xarray.DataArray = cube.get_array().copy() + 60

    # We make prediction and transform numpy array back to datacube

    # Pixel size of the original image
    init_pixel_size_x = cubearray.coords['x'][-1] - cubearray.coords['x'][-2]
    init_pixel_size_y = cubearray.coords['y'][-1] - cubearray.coords['y'][-2]

    if cubearray.data.ndim == 4:
        cubearray = cubearray[0]
    predicted_array = fancy_upsample_function(cubearray.data, 2)
    inspect(predicted_array, "test message")
    coord_x = np.linspace(start=cube.get_array().coords['x'].min(), stop=cube.get_array().coords['x'].max() + init_pixel_size_x,
                          num=predicted_array.shape[-1], endpoint=False)
    coord_y = np.linspace(start=cube.get_array().coords['y'].min(), stop=cube.get_array().coords['y'].max() + init_pixel_size_y,
                          num=predicted_array.shape[-2], endpoint=False)

    if cube.get_array().data.ndim == 4:
        # Add a new dimension for time.
        predicted_array = np.expand_dims(predicted_array, axis = 0)
        coord_t = cube.get_array().coords['t'].values[:1]
        predicted_cube = xarray.DataArray(predicted_array, dims=['t', 'bands', 'y', 'x'], coords=dict(t=coord_t, x=coord_x, y=coord_y))
        return XarrayDataCube(predicted_cube)

    predicted_cube = xarray.DataArray(predicted_array, dims = ['bands', 'y', 'x'], coords = dict(x = coord_x, y = coord_y))
    return XarrayDataCube(predicted_cube)

Using that script as a basis should also allow you to use apply_neighborhood for your second issue. I will take a look at the shift you mentioned too.

Hi @jeroen.verstraelen,

Thank you very much for the update.

Still, there are some small issues:

  • I suppose that there is a bug on the apply_metadata function. In fact, it returns a slight smaller output (2 pixels less) because the new spatial dimensions should be calculated w.r.t. the new resolution. In other words, extent should not correspond to dim.extent (old extent) but we need to update it… not clear how to do it. Is that extent a list, a tuple or what exactly?

  • I have tried to apply the function to my datacube scf_binary that contains 3 dates, but it returns only the first date as output. I am confused if it should loop automatically over the time dimension.

Thanks,
Valentina

Hi,

I will investigate the 2 pixel discrepancy that you mentioned in the output extent. For the current implementation the input and output extent of your UDF have to remain identical. So it is not possible to use something other than dim.extent. We are however working on making this possible in the future.

The script I provided is a demo to show how apply_metadata works and will require some changes for your usecase. For example, this section ensures that you only select the first date if you are working with a datacube that has 4 dimensions (time, bands, y, x):

if cubearray.data.ndim == 4:
        cubearray = cubearray[0]

It should be possible to keep the time dimension and only change the y, x coordinates/arrays.

Hope that helps,
Jeroen

Hi,

sorry but I did not get how to consider all the time steps.
I see that the provided scripts forces the first time step, but it is not clear to me how to repeat the same operation for all the time steps. Please, could you provide an example script? Also, could you look at the issue related to the 2 pixels discrepancy?

Thanks,
Valentina

Hi Valentina,

You can use this udf with the following apply_neighborhood:

udf = openeo.UDF.from_file('multires_udf.py', runtime="Python-Jep")
result = s2_cube.apply_neighborhood(
    udf,
    size=[
        {"dimension": "x", "value": 256, "unit": "px"},
        {"dimension": "y", "value": 256, "unit": "px"},
    ],
    overlap=[]
)

Hope that helps,
Jeroen

I have also created a github issue for the 2 pixels discrepancy here: