Segmentation job error (Parcel_dilenation)

msair565 · 16 February 2025 19:38

I am trying to run openEO parcel dilenation code but the segementation jobs are not running getting error on all jobs udf_segmentation, sobel_felzenszwalb_job and vectorization_job here is one from the error:

segmentation_job = segmentationband.create_job(
title=“segmentation_onnx_job2”,
out_format=“NetCDF”,
job_options=job_options
)
segmentation_job.start_and_wait()
segmentation_job.download_result(base_path / “delineation.nc”)

0:00:00 Job ‘j-25021619274943b28944b65ea0432997’: send ‘start’
0:00:13 Job ‘j-25021619274943b28944b65ea0432997’: created (progress 0%)
0:00:18 Job ‘j-25021619274943b28944b65ea0432997’: created (progress 0%)
0:00:25 Job ‘j-25021619274943b28944b65ea0432997’: created (progress 0%)
0:00:33 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:00:43 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:00:55 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:01:11 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:01:30 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:01:54 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:02:24 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:03:02 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:03:49 Job ‘j-25021619274943b28944b65ea0432997’: running (progress N/A)
0:04:47 Job ‘j-25021619274943b28944b65ea0432997’: error (progress N/A)
Your batch job ‘j-25021619274943b28944b65ea0432997’ failed. Error logs:
[{‘id’: ‘[1739734317763, 770342]’, ‘time’: ‘2025-02-16T19:31:57.763Z’, ‘level’: ‘error’, ‘message’: ‘Task 20 in stage 15.0 failed 4 times; aborting job’}, {‘id’: ‘[1739734317778, 421999]’, ‘time’: ‘2025-02-16T19:31:57.778Z’, ‘level’: ‘error’, ‘message’: ‘Stage error: Job aborted due to stage failure: Task 20 in stage 15.0 failed 4 times, most recent failure: Lost task 20.3 in stage 15.0 (TID 974) (10.42.232.45 executor 12): org.apache.spark.api.python.PythonException: Traceback (most recent call last):\n File “/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py”, line 830, in main\n process()\n File “/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py”, line 822, in process\n serializer.dump_stream(out_iter, outfile)\n File “/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py”, line 146, in dump_stream\n for obj in iterator:\n File “/usr/local/spark/python/lib/pyspark.zip/pyspark/util.py”, line 81, in wrapper\n return f(*args, **kwargs)\n File “/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py”, line 64, in memory_logging_wrapper\n return function(*args, **kwargs)\n File “/opt/openeo/lib/python3.8/site-packages/epsel.py”, line 44, in wrapper\n return _FUNCTION_POINTERS[key](*args, **kwargs)\n File “/opt/openeo/lib/python3.8/site-packages/epsel.py”, line 37, in first_time\n return f(*args, **kwargs)\n File “/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py”, line 570, in tile_function\n result_data = run_udf_code(code=udf_code, data=data)\n File “/opt/openeo/lib/python3.8/site-packages/epsel.py”, line 44, in wrapper\n return _FUNCTION_POINTERS[key](*args, **kwargs)\n File “/opt/openeo/lib/python3.8/site-packages/epsel.py”, line 37, in first_time\n return f(*args, kwargs)\n File “/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/udf.py”, line 59, in run_udf_code\n return openeo.udf.run_udf_code(code=code, data=data)\n File “/opt/openeo/lib/python3.8/site-packages/openeo/udf/run_code.py”, line 195, in run_udf_code\n result_cube: xarray.DataArray = func(cube=data.get_datacube_list()[0].get_array(), context=data.user_context)\n File “”, line 131, in apply_datacube\n File “”, line 48, in process_window_onnx\n File “”, line 32, in load_ort_sessions\n File “”, line 33, in \n File “onnx_deps/onnxruntime/capi/onnxruntime_inference_collection.py”, line 360, in init\n self._create_inference_session(providers, provider_options, disabled_optimizers)\n File “onnx_deps/onnxruntime/capi/onnxruntime_inference_collection.py”, line 408, in _create_inference_session\n sess.initialize_session(providers, provider_options, disabled_optimizers)\nonnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: std::bad_alloc\n\n\tat org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)\n\tat org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:767)\n\tat org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:749)\n\tat org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)\n\tat org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)\n\tat org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197)\n\tat org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)\n\tat org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:139)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\nDriver stacktrace:’}, {‘id’: ‘[1739734320039, 81136]’, ‘time’: ‘2025-02-16T19:32:00.039Z’, ‘level’: ‘error’, ‘message’: ‘OpenEO batch job failed: UDF exception while evaluating processing graph. Please check your user defined functions. File “/opt/openeo/lib/python3.8/site-packages/openeo/udf/run_code.py”, line 195, in run_udf_code\n result_cube: xarray.DataArray = func(cube=data.get_datacube_list()[0].get_array(), context=data.user_context)\n File “”, line 131, in apply_datacube\n File “”, line 48, in process_window_onnx\n File “”, line 32, in load_ort_sessions\n File “”, line 33, in \n File “onnx_deps/onnxruntime/capi/onnxruntime_inference_collection.py”, line 360, in init**\n self._create_inference_session(providers, provider_options, disabled_optimizers)\n File “onnx_deps/onnxruntime/capi/onnxruntime_inference_collection.py”, line 408, in _create_inference_session\n sess.initialize_session(providers, provider_options, disabled_optimizers)\nonnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: std::bad_alloc’}]
Full logs can be inspected in an openEO (web) editor or with `connection.job('j-25021619274943b28944b65ea0432997').logs()`.

JobFailedException Traceback (most recent call last)
in <cell line: 0>()
4 job_options=job_options
5 )
----> 6 segmentation_job.start_and_wait()
7 segmentation_job.download_result(base_path / “delineation.nc”)

/usr/local/lib/python3.11/dist-packages/openeo/rest/job.py in start_and_wait(self, print, max_poll_interval, connection_retry_interval, soft_error_max, show_error_logs)
336 f"Full logs can be inspected in an openEO (web) editor or with connection.job({self.job_id!r}).logs()."
337 )
→ 338 raise JobFailedException(
339 f"Batch job {self.job_id!r} didn’t finish successfully. Status: {status} (after {elapsed()}).",
340 job=self,

JobFailedException: Batch job ‘j-25021619274943b28944b65ea0432997’ didn’t finish successfully. Status: error (after 0:04:48).

jeroen.dries · 16 February 2025 19:56

Hi,

I believe you ran that job on CDSE, which is a different openEO service with its own forum:

It looks like your job went out of memory. The job options you specify are different from the ones specified in the notebook and in fact are not something that works with openEO at all, where did you find them?
The original notebook has job options that should work.

msair565 · 16 February 2025 20:29

The code I am trying to run is available here parcel-delineation/Parcel delineation.ipynb at main · openEOPlatform/parcel-delineation · GitHub
Is this not the original notebook for parcel_delineation? It says it has been changed to the CDSE backend. Could you guide me on how to make it run?

jeroen.dries · 17 February 2025 06:57

Hi,
thanks for pointing this out, you were looking at an outdated version indeed.
Like many openEO samples, this one is also cross platform and it should work on openEO platform if you simply use ‘openeo.cloud’ as backend url.

The latest community sample shows the version with job options that work:

github.com/Open-EO/openeo-community-examples

python/ParcelDelineation/Parcel%20delineation.ipynb

main

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c62fa769",
   "metadata": {},
   "source": [
    "# Parcel delineation using Sentinel-2\n",
    "\n",
    "Authors:\n",
    "\n",
    "* Kristof van Tricht (VITO)\n",
    "* Jeroen Dries (VITO)\n",
    "* Victor Verhaert (VITO)\n",
    "\n",
    "Tuning:\n",
    "\n",
    "* Kasper Bonte (VITO)\n",
    "* Bart Driessen (VITO)"
   ]

This file has been truncated. show original

msair565 · 17 February 2025 11:52

Thanks @jeroen.dries for assisting its working