Printing or logging from UDF

Hi,
For debugging purposes I would like to use a print statement in a UDF (as suggested in another discussion) but I cannot see the output either below the cell executing the UDF or in the job logs. See the code example below. What is the proper way to do this?

udf_code = """
from openeo.udf import XarrayDataCube

def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    print("test")
    return cube
"""

process = openeo.UDF(code=udf_code, runtime="Python", data={"from_parameter": "x"})
out_cube = in_cube.apply(process=process)
my_job  = out_cube.send_job(title="test")
results = my_job.start_and_wait().get_results()

0:00:00 Job 'j-f6c5c0e454314179b205c5cb6f80efed': send 'start'
0:00:24 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:00:29 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:00:36 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:00:44 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:00:54 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:01:07 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:01:23 Job 'j-f6c5c0e454314179b205c5cb6f80efed': queued (progress N/A)
0:01:42 Job 'j-f6c5c0e454314179b205c5cb6f80efed': running (progress N/A)
0:02:06 Job 'j-f6c5c0e454314179b205c5cb6f80efed': running (progress N/A)
0:02:37 Job 'j-f6c5c0e454314179b205c5cb6f80efed': running (progress N/A)
0:03:14 Job 'j-f6c5c0e454314179b205c5cb6f80efed': running (progress N/A)
0:04:01 Job 'j-f6c5c0e454314179b205c5cb6f80efed': running (progress N/A)
0:04:59 Job 'j-f6c5c0e454314179b205c5cb6f80efed': finished (progress N/A)
print(my_job.logs())
[]
1 Like

Indeed, this feature (logging from UDFs) does not work yet the way we want it at the moment.
We’re still working on connecting all the necessary dots to make that possible for an end user.

Hi Stefaan, would you have any update on the progress of UDF logging?

So far we don’t have anything working for end users.
Now with the summer months we are also a bit short on development capacity to tackle this swiftly

Hi,

I wanted to let know that we have initial support for logging from UDF’s for debugging purposes.

usage example:
UDF code: use the inspect logging function (based on inspect process):

from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect

def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    inspect(message="Hello UDF logging")
    inspect(data=[1,2,3], message="hello UDF logging with data")
    array = cube.get_array()
    array.values = 0.0001 * array.values
    return cube

When requesting the job logs, you should be able to see these log entries, for example in a Jupyter notebook using the job.logs() widget:
Screenshot from 2022-09-21 17-22-40

At the moment, not all data types are properly supported in the data argument of inspect, we are working on that. But feel free to already experiment with the current functionality

1 Like

Hi, thank you for this feature. Do I need to use a development version of openEO or does the latest release (0.23.0) already contain the logging capability?

Hi,

Note that the UDF runs backend-side, so this feature does not really depend on what version of the python client you are using on your system.

thanks you for the precision, I’m discovering openEO and I’m trying to do some logging. Here is the code I’m using

import openeo

connection = openeo.connect("openeo.vito.be")
connection.authenticate_oidc()
collections = connection.list_collection_ids()
sentinel2_cube = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19},
    temporal_extent=["2021-02-01", "2021-04-30"],
    bands=["B02", "B04", "B08"],
)
blue = sentinel2_cube.band("B02") * 0.0001
red = sentinel2_cube.band("B04") * 0.0001
nir = sentinel2_cube.band("B08") * 0.0001
evi_cube = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0)

udf = openeo.UDF(
    """
from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect
def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    inspect(message="Hello UDF logging")
    inspect(data=[1,2,3], message="hello UDF logging with data")
    array = cube.get_array()
    array.values = 0.0001 * array.values
    return cube
"""
)
evi_cube.apply(process=udf)

inf_job = evi_cube.execute_batch(out_format="GTiff")

inf_job.get_results().download_files("/somewhere/")

print(inf_job.logs())

But nothing seems related to my UDF logs. I can provide the job id : j-09480bffb3df4ad28d402fdcc1c01ce8. Do you need extra informations ?

quick note (I’m not sure it is root cause)

you need evi_cube = evi_cube.apply(process=udf) there, otherwise that line doesn’t have any effect

FYI: this snippet works for me regarding the inspect logging:

# imports and `connect` stuff omitted 
cube = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 5.11, "south": 51.11, "east": 5.12, "north": 51.12},
    temporal_extent=["2021-02-01", "2021-02-20"],
    bands=["B02", "B04"],
)
udf = openeo.UDF(
"""
from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect
def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    inspect(message="Hello UDF logging")
    inspect(data=[1,2,3], message="Hello UDF logging with data")
    array = cube.get_array()
    array.values = 0.0001 * array.values
    return cube
"""
)
cube = cube.apply(process=udf)
job = cube.execute_batch()

There is the problem of poor signal/noise ratio of the logs, but I created a ticket for that at Improve signal noise ratio of batch job logs · Issue #538 · Open-EO/openeo-geopyspark-driver · GitHub

Thank’s you for your quick reply, I updated my code as

import openeo

connection = openeo.connect("openeo.vito.be")
connection.authenticate_oidc()
collections = connection.list_collection_ids()
sentinel2_cube = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19},
    temporal_extent=["2021-02-01", "2021-04-30"],
    bands=["B02", "B04", "B08"],
)
blue = sentinel2_cube.band("B02") * 0.0001
red = sentinel2_cube.band("B04") * 0.0001
nir = sentinel2_cube.band("B08") * 0.0001
evi_cube = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0)

udf = openeo.UDF(
    """
from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect
def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    inspect(message="Hello UDF logging")
    inspect(data=[1,2,3], message="hello UDF logging with data")
    array = cube.get_array()
    array.values = 0.0001 * array.values
    return cube
"""
)
out_cube = evi_cube.apply(process=udf)

my_job = out_cube.send_job(title="test")
results = my_job.start_and_wait().get_results()
print(my_job.logs())

I’m no longer using “execute_batch” and I’m not downloading files. However the job fails with the ID j-650d652afc8c4ca5a275707ef95422cb

I managed to get some logs with your example. However, once you introduce calculations on the cube as follows

blue = sentinel2_cube.band("B02") * 0.0001
red = sentinel2_cube.band("B04") * 0.0001
nir = sentinel2_cube.band("B08") * 0.0001
evi_cube = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0)

the job fails. Have you succeeded in reproducing this behaviour?

you stumbled on a bug, for which I created a ticket here: UDF failure on cube without bands dimension · Issue #539 · Open-EO/openeo-geopyspark-driver · GitHub

workaround is to add a “bands” dimension before applying the UDF.
E.g. something like this:

evi_cube = evi_cube.add_dimension(name="bands", label="evi", type="bands")

thanks you for creating the issue and for suggesting the workaround ! I’ll keep an eye on the issue.