UDF chunking issue

dkvcsdvd · 26 August 2024 07:55

Hello All,

I am currently developing the pyeogpr package, which is a ML library running on openEO backends. I created an UDF that does the matrix multiplications for pre-trained models. However, I have an issue with the back-end “chunking” up the satellite image for (I guess) parallel processing.

The matrix calculations run on the pixel’s spectra, which is a 1D array of size 10 (for Sentinel 2 L2C I currently use 10 bands). My issue is that the back end chunks up the image into 3D blocks (bands, lat,lon) and it does not tell me what these shapes are. Sometimes these shapes are in form of 128 or 256 but in case for Sentinel 3 OLCI L1B I also saw sizes of 512. See my UDF below:

myudf = openeo.UDF(

hyperparameter = 1D numpy array of length 10

chunks = 256
def broadcaster(array):
    return np.broadcast_to(array[:, np.newaxis, np.newaxis], (10, chunks , chunks ))
    #TODO: use function to obtain x,ydim chunks dynamically instead of hard coding 256

init_xr = xr.DataArray()
def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:

    hyp_ell_GREEN = broadcaster(hyperparameter)
    pixel_spectra = (cube.values)  #needed to get the variable "chunks" from here somehow

    resulting_variable= pixel_spectra - hyperparameter
    # Doing math here with the chunked satellite image and the broadcasted hyperparameter

I would need help, on how to “dynamically” define the chunks from within the apply_datacube somehow.

Thanks in advance
Dávid

jeroen.verstraelen · 26 August 2024 10:01

Hi,

The apply_neighborhood process sounds like just what you need. Here’s an extract from a community example on parcel delineation:

## Apply the segmentation UDF using `apply_neighborhood`
## An overlap of 32px is used, resulting in a 128x128 pixel input
segmentationband = ndviband.apply_neighborhood(
    process=openeo.UDF.from_file("udf_segmentation.py"),
    size=[
        {"dimension": "x", "value": 64, "unit": "px"},
        {"dimension": "y", "value": 64, "unit": "px"},
    ],
    overlap=[
        {"dimension": "x", "value": 32, "unit": "px"},
        {"dimension": "y", "value": 32, "unit": "px"},
    ],
)

Hope that helps!

dkvcsdvd · 27 August 2024 08:32

Thanks for your reply, but I don’t understand why I need to use apply_neighborhood. My UDF is way too complex to rewrite, I would only need to define the “chunks” variable . I managed to get to the point where I would define the shapes as such:


myudf = openeo.UDF(

hyperparameter = 1D numpy array of length 10

def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:

    pixel_spectra = cube.values # after inspecting its shape: 10,128,128
    chunks = cube.values[1] # should be 128
    def broadcaster(array):
        return np.broadcast_to(array[:, np.newaxis, np.newaxis], (10, chunks , chunks ))

    hyp_ell_GREEN = broadcaster(hyperparameter)

    resulting_variable= pixel_spectra - hyperparameter
    # Doing math here with the chunked satellite image and the broadcasted hyperparameter

I would like to approach the problem this way as it is far more straightforward. However it gives me an error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

My job id is:
j-240827db892242b69c01d350b86c463a

Thanks
Dávid

jeroen.verstraelen · 28 August 2024 12:02

Hi,

The ValueError is a numpy error that occurs in the following line:

np.broadcast_to(array[:, np.newaxis, np.newaxis], (10, chunks , chunks ))

Are you sure that your hyperparameter array can be expanded to these dimensions?