How to export UdfData?

Hi,

The result of executing a UDF locally with execute_local_udf() is an object of the class <class 'openeo.udf.udf_data.UdfData'>. In my case, it holds a XarrayDataCube with dimensions (‘bands’, ‘y’, ‘x’).

What would the best way to export this as a netcdf (with correct georeferencing)?

Kind regards,
Margot

Hi Margot,
getting the xarray itself is relatively easy via:
https://open-eo.github.io/openeo-python-client/api.html#openeo.udf.udf_data.UdfData.get_datacube_list

So I guess your main problem is writing with georeferencing? This is indeed something that is not easily supported by XArray, and we also don’t have it in the python client.
@michele.claus do you happen to know if xarray-processes-dask has a builtin function for that?

best regards,
Jeroen

@margot.verhulst maybe you can post a print() of the xarray object you are getting as output? You should check if the CRS is in it.

Anyway, this code would write your object to a netCDF.

xarray_object = XarrayDataCube_object.get_datacube_list()[0].array

Optional: write missing CRS in the xarray object (set the desired one instead of 4326):

import rioxarray
xarray_object.rio.write_crs(4326, inplace=True)

Write to netCDF:

xarray_object.to_netcdf("xarray_object.nc")

Thank you both for your answers!

My main issue was indeed the georeferencing.

With rioxarray, it seems to attach the CRS to the object. But there seems to be something wrong with the coordinates. I think the problem occurs earlier, so I created a minimal example to demonstrate:

import openeo
from openeo.udf import execute_local_udf

connection = openeo.connect("openeo.vito.be", auto_validate=False).authenticate_oidc()

cube_s2 = connection.load_collection(
    collection_id="TERRASCOPE_S2_TOC_V2",
    spatial_extent={"west": 4.40, "south": 50.75, "east": 4.43, "north": 50.78},
    temporal_extent=["2018-06-01", "2018-09-01"],
    bands=["B04", "B08", "SCL"])

cube_s2.download("cube_s2.nc", format="netcdf")

udf_local = openeo.UDF("""
import xarray


def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:

    print(f"x coordinates: {cube.coords['x']}")
    print(f"y coordinates: {cube.coords['y']}")
    
    return cube
""")

test_cube_vi_masked_inv_int_predictions_local = execute_local_udf(udf=udf_local,
                                                                  datacube='cube_s2.nc',
                                                                  fmt="NetCDF")

The strange thing is that the print statements return this, which doesn’t look like coordinates:

x coordinates: <xarray.DataArray 'x' (x: 219)>
array([  0,   1,   2, ..., 216, 217, 218], dtype=int64)
Dimensions without coordinates: x
y coordinates: <xarray.DataArray 'y' (y: 339)>
array([  0,   1,   2, ..., 336, 337, 338], dtype=int64)
Dimensions without coordinates: y

In my actual udf, this causes problems at the end because I use coords['x'] and coord['y'] to assign the coordinates again to my output.

I am not if this is related or relevant, but my console also shows this warning (I’m pretty sure I didn’t get this warning a couple of days ago).

Warning 3: Cannot find header.dxf (GDAL_DATA is not defined)

I further noticed that in the setting of a regular UDF (not locally executed), the coordinates are correct. Again a minimal example to demonstrate:

import openeo

connection = openeo.connect("openeo.vito.be", auto_validate=False).authenticate_oidc()

cube_s2 = connection.load_collection(
    collection_id="TERRASCOPE_S2_TOC_V2",
    spatial_extent={"west": 4.40, "south": 50.75, "east": 4.43, "north": 50.78},
    temporal_extent=["2018-06-01", "2018-09-01"],
    bands=["B04", "B08", "SCL"])

udf = openeo.UDF("""
import xarray
from openeo.udf.debug import inspect


def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:

    inspect(message=f"x coordinates: {cube.coords['x']}")
    inspect(message=f"y coordinates: {cube.coords['y']}")

    return cube
""")

cube_s2_udf = cube_s2.apply(process=udf)

cube_result = cube_s2_udf.save_result(format="NetCDF")
job = cube_result.create_job(out_format="NetCDF")
job.start_and_wait()

The inspect statements return the following, which looks correct:

x coordinates: <xarray.DataArray 'x' (x: 256)> array([598695., 598705., 598715., ..., 601225., 601235., 601245.], dtype=float32) Coordinates: * x (x) float32 5.987e+05 5.987e+05 5.987e+05 ... 6.012e+05 6.012e+05
y coordinates: <xarray.DataArray 'y' (y: 256)> array([5626335., 5626325., 5626315., ..., 5623805., 5623795., 5623785.], dtype=float32) Coordinates: * y (y) float32 5.626e+06 5.626e+06 5.626e+06 ... 5.624e+06 5.624e+06

FYI I am using the openeo python client version 0.27.0.

Thanks in advance!

Kind regards,
Margot

Hi Margot,

that’s an issue we also noticed when trying to use the local implementation of the UDFs (cc @valentina.premier). The netCDF file has the correct coordinate values.

@jeroen.dries @stefaan.lippens were you aware of this issue with the local UDF execution?

Hi Michele,

I’m sorry but I don’t understand which netCDF has the correct coordinate values?

Kind regards,
Margot

the part about non-georeferenced coordinates in execute_local_udf indeed looks like a bug, so let’s move that discussion to spatial coordinates are lost with execute_local_udf · Issue #549 · Open-EO/openeo-python-client · GitHub

FYI I just merged the fix for spatial coordinates are lost with execute_local_udf · Issue #549 · Open-EO/openeo-python-client · GitHub in the main branch of the openeo Python client.

This fix will be available with next release (0.28.0) which I will try to do this week or next week.

1 Like

Great, thank you, Stefaan!