Hi,
I am working on a UDF to apply an externally trained scikit-learn model. I work in PyCharm with conda environments. So far, the UDF works when I apply it locally with “execute_local_udf”. But when I switch to executing the UDF regularly, in my case with reduce_dimension(), the model won’t load. I used the inspect logging function to verify that this is where the udf fails.
The model is loaded with the following code:
clf = pickle.load(urllib.request.urlopen(url_model))
The model is currently hosted on GitHub:
url_model = "https://raw.githubusercontent.com/MargotVerhulst/raw/master/results_bestModel-modelRF-norun1.pickle"
The error I get is (job_id=‘j-231105a259794a02ab7a3c6e7bf6e12c’)
OpenEO batch job failed: UDF Exception during Spark execution: File "/opt/venv/lib64/python3.8/site-packages/openeo/udf/run_code.py", line 180, in run_udf_code result_cube = func(cube=data.get_datacube_list()[0], context=data.user_context) File "<string>", line 29, in apply_datacube File "sklearn/tree/_tree.pyx", line 661, in sklearn.tree._tree.Tree.__setstate__ ValueError: Did not recognise loaded array dytpe
It maybe seems like some kind of dependency issue where the versions of the relevant libraries do not match? I read this documentation here which might be relevant, but I do not fully understand how to apply it. Can you help me with how I should proceed?
Thank you in advance.
Kind regards,
Margot