UDF dependency issue

david.kovacs · 11 July 2024 09:44

Hi all,

I am having issue with the udf-dependency-archives in job_options. My intention is to provide larger ML models, which are .txt files, to the udf from GitHub. However, when I submit the job, there is an error, for which I receive no logs or specific errors. (j-2407118357734ee988413afce0818e04)

I provide an excerpt of my code:

udf2 = openeo.UDF("""
import os
import sys
import time
from configparser import ConfigParser
from pathlib import Path
from typing import Dict

from openeo.metadata import Band, CollectionMetadata
from openeo.udf import XarrayDataCube, inspect

from openeo.udf import XarrayDataCube
import numpy as np
import xarray as xr
from pathlib import Path


def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:

    sys.path.insert(0, 'tmp/venv')
    from CNC_Cab_np import scaleFactor_GREEN #accessed from the GitHub repo

    inspect(data=[scaleFactor_GREEN], message="scaleFactor_GREEN")
    
    # further processing of the ML algorithm with the parameters imported from GitHub file
""")

rescaled2 = s2_cube.apply_dimension(process=udf2, dimension="bands")  

S2_GPR_job = rescaled2.execute_batch(
    title="GPR",
    outputfile="GPR_hosting.nc",
    job_options = {
        'executor-memory': '10g',
        'udf-dependency-archives': [
            'https://github.com/SentiFLEXinel/EBD-GPR-CNC/blob/6e7083995819b8055768cd42585447cf8d74643b/#tmp/venv',
        ]
    }
)

stefaan.lippens · 17 July 2024 07:55

Do you still fail to run that job?

I see that attempt j-2407118357734ee988413afce0818e04 was from last week, when we had some availability issues.

Also note that you are apparently using openEO from Copernicus Data Space Ecosystem, while this forum is for openEO Platform (openeo.cloud). It’s recommended to reach out for support at https://forum.dataspace.copernicus.eu/

emile.sonneveld · 18 July 2024 08:51

Hey,

The link in udf-dependency-archives causes an obscure spark error. Using this one, I got a bit further: “https://raw.githubusercontent.com/SentiFLEXinel/EBD-GPR-CNC/6e7083995819b8055768cd42585447cf8d74643b/CNC_Cab_np.py”

But the file probably needs some import statements like import numpy as np

Emile