Supplying user based models

dkvcsdvd · 15 August 2024 13:34

Hi,

I have recently created a python package running on openEO backends processing maps of biophysical traits. pyeogpr python package
I have the “core” udf of the processing, and I would like the to have a file that the users can edit, which is their own pre-trained ML model for mapping.

Now, I managed to pass a user defined “.py” file as udf. However, I want to create a code that supplies the batch process a udf (coming from job_options = { 'executor-memory': '10g', 'udf-dependency-archives': ["url of my udf"] and a file that is my own pre-trained model on my computer.

To give an example:

udf_cloud = openeo.UDF("""
user_model = # I would like to define user parameters here imported from a file from user side e.g. "C:Users/dir/userparams.py" 
from user_model import user_model_variable

def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:

    variable = context["biovar"]

    result  = user_model_variable * variable  #here I would do some arithmetic from the user_model 
    returned = xr.DataArray(result)
    return returned
""",context={"from_parameter": "context"})


sensor = "SENTINEL2_L2A"
biovar = "CNC_Cab"
context = {"sensor": "Sentinel 2 ",
           "biovar":biovar}

S2_cloud = s2_cube.apply_dimension(process=udf_cloud,
                                   dimension="bands",
                                   context =context)

S2_cloud_process = S2_cloud.execute_batch(
    title=f"{sensor} _ {biovar}",
    outputfile=f"{sensor} _ {biovar}.nc",
    job_options = {
        'executor-memory': '10g',
        'udf-dependency-archives': [
            'https://github.com/daviddkovacs/pyeogpr/raw/main/models/GPR_models_bulk.zip#tmp/venv'
        ]
    }
)

Ideally the user could upload a file from their local computer and use the package with his/her variables. The udf would be pulled from the GitHub zip as per job options

Thanks

stefaan.lippens · 19 August 2024 07:49

What kind of model are we talking about here? In your snippet it looks like pretty simple (just some float values?):

If your model is indeed simple, e.g. you can represent it as a reasonably small python construct of dicts, lists, tuples, you could consider embedding that object directly in the UDF code, so you don’t have to jump through various hoops to be able to import it inside your UDF on the openeo backend.

E.g. something like this:

udf_code_template = """
model = USERMODELPLACEHOLDER

def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:
    # Use `model` dict here
"""

user_model = {
    "user_model_variable": 123.456,
}

udf_cloud = openeo.UDF(
    udf_code_template.replace(
        "USERMODELPLACEHOLDER",
        repl(user_model),
    )
)

...

(Note: I use a simple str.replace() here to keep it simple, but you can use any templating solution that would be more fitting)

david.kovacs · 23 August 2024 07:18

Thank you Stefaan, with some modifications your example worked!