Returning GeoDataFrame from UDF

Hello,

I am trying to write a UDF that takes an xr.DataArray as input and returns a geopandas.GeoDataFrame.

I have been experimenting with apply_udf_data, and my current implementation looks like this:

def apply_udf_data(data: UdfData) -> UdfData:

    inspect(data=[data], message="Input UDFData inspection")

    cube = data.get_datacube_list()[0].get_array()
    inspect(data=[list(cube.dims), list(cube.shape)], message="Input UDF cube dims/shape")

    gdf = waterline_from_land_water_raster(
        da=cube,
        crs=data.user_context.get("crs"),
        simplify_tolerance=data.user_context.get("simplify_tolerance"),
        time_dim=data.user_context.get("time_dim", "time"),
    )

    inspect(data=[gdf], message="Output gdf")

    feature_collection = FeatureCollection(
        id=DEFAULT_OUT_LAYER,
        data=gdf,
    )

    data.set_feature_collection_list([feature_collection])

    inspect(data=[data], message="Output UDFData inspection")
    return data

I then execute the UDF like this:

from openeo import UDF
path_to_udf = Path("udf_waterlines_from_water_land_mask.py")
udf = UDF.from_file(path_to_udf, context={"from_parameter": "context"})
waterlines_cube = mask.apply_dimension(process=udf, dimension="t", context={"crs": "EPSG:3857", "time_dim": "t"})
job = waterlines_cube.create_job(title="waterlines", auto_add_save_result=True)
job.start_and_wait()

The job completes successfully. However, when I inspect the metadata or download the results, I only see the GeoTIFF outputs. The FeatureCollection produced by the UDF does not seem to be included in the job results.

My question is: how should I run or configure the job so that I can access, read, or download the FeatureCollection output from the UDF?

I would really appreciate help, as I was unable to find any example of apply_udf_data usage and there is not a lot of documentation available.

Hi, thanks for posting.

Do you have an example job-id for this? Something in the form of j-26xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. With this, we can more easily investigate the problem.

Emile Sonneveld

Hi Emile,

For example this job id: j-2604071429594d6089f608ee76038150

Many thanks for looking into it

Hi,

Typically, converting the raster to a vector is done with a separate process instead of an UDF. When replacing the last save_result from your process with this, I got a geojson as result:

{
  // ...
  "rastertovector1": {
    "arguments": {"data": {"from_node": "applydimension2"}},
    "process_id": "raster_to_vector"
  },
  "save1": {
    "arguments": {
      "data": {"from_node": "rastertovector1"},
      "format": "GEOJSON"
    },
    "process_id": "save_result",
    "result": true
  }
}

Or in Python:

waterlines_cube = waterlines_cube.raster_to_vector().save_result(format="GeoJSON")

(For future reference j-2604081147164cba8d00acf39434b4c3)

Does this work for you?
Emile

Hi Emile,

This approach works, but it’s not quite what I need, as it only performs a raster-to-vector conversion.

My UDF instead generates waterline vectors (coastal polylines) from Sentinel-2–derived water/land masks, so it goes beyond straightforward raster vectorisation.

When I inspect the UDF logs, data.feature_collection_list contains the expected results. What I’m currently missing is a way to properly extract these feature collections from the output.

Do you have any suggestions on how to access them? Am I on the right track using apply_udf_data for this?

Alternatively, would it make more sense to use raster_to_vector first and then apply a UDF with apply_vectorcube to derive the final waterlines? This however will require more code refactoring from my side…

Hi,

The more typical openEO approach would indeed be to first call raster_to_vector and then refine the result with apply_dimension on the geometry dimension.
Here is an example of what such code could look like:

cube = cube.raster_to_vector()
udf_code = """
from openeo.udf import UdfData, FeatureCollection
def process_vector_cube(udf_data: UdfData) -> UdfData:
    [feature_collection] = udf_data.get_feature_collection_list()
    gdf = feature_collection.data
    gdf["geometry"] = gdf["geometry"].buffer(distance=5, resolution=2)  # Example processing
    udf_data.set_feature_collection_list([
        FeatureCollection(id="_", data=gdf),
    ])
"""
cube = cube.apply_dimension(dimension="geometry", process=openeo.UDF(udf_code))
cube = cube.save_result(format="GeoJSON")

I saw that your UDF uses shapes to convert raster data to vector data, while the openEO backend used here relies on geotrellis.raster.vectorize.
I do not know off the top of my head what the exact differences are, but hopefully they are negligible.

We could also implement a way to run a UDF that allows a raster-to-vector conversion in one go, but that would require some development effort. And it is not currently supported by the openEO specification.
Are you working on this in the context of a specific project?

Kind regards,
Emile