Preflight process graph validation raised

Hi,

When applying my udf function, I received the following error:

Preflight process graph validation raised: [MissingProduct] Tile ('31UDS', '20241029') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20240723') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20241019') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20240909') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20241130') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20241029') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20240723') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20241019') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20240909') in collection 'TERRASCOPE_S2_TOC_V2' is not available. [MissingProduct] Tile ('31UDS', '20241130') in collection 'TERRASCOPE_S2_TOC_V2' is not available.

I then changed some code and now I receive this error:

Preflight process graph validation raised: [Internal] An error occurred while calling o420490.getProducts.
: java.net.SocketException: Connection reset
	at java.base/jdk.internal.reflect.GeneratedConstructorAccessor454.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at java.base/sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1974)
	at java.base/sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1969)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1968)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
	at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)
	at scalaj.http.HttpRequest.doConnection(Http.scala:367)
	at scalaj.http.HttpRequest.exec(Http.scala:343)
	at scalaj.http.HttpRequest.asString(Http.scala:492)
	at org.openeo.opensearch.OpenSearchClient.$anonfun$execute$1(OpenSearchClient.scala:146)
	at org.openeo.opensearch.package$.attempt$1(package.scala:46)
	at org.openeo.opensearch.package$.withRetries(package.scala:58)
	at org.openeo.opensearch.OpenSearchClient.execute(OpenSearchClient.scala:142)
	at org.openeo.opensearch.backends.CreodiasClient.getProductsFromPageCustom(CreodiasClient.scala:187)
	at org.openeo.opensearch.backends.CreodiasClient.getProductsOriginal(CreodiasClient.scala:86)
	at org.openeo.opensearch.backends.CreodiasClient.getProducts(CreodiasClient.scala:75)
	at jdk.internal.reflect.GeneratedMethodAccessor642.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.SocketException: Connection reset
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:186)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
	at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472)
	at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1454)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1065)
	at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
	at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
	at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:754)
	at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:689)
	at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:713)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1615)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
	at scalaj.http.HttpRequest.doConnection(Http.scala:365)
	... 20 more

The job id for this last error is:
j-241202a4797448a58d3cd7c6afbd5465

For context, the udf has already been used to process Sentinel-2 time series for a large number of fields in Flanders. At the moment, I’m trying to apply it to a smaller subset of fields and for data of the current year. However, it seems that the satellite data is missing when passing it to my udf.

Kind regards,
Kato Vanpoucke

Hi Kato,

The error I see is:

  File "<string>", line 62, in apply_udf_data
  File "/opt/venv/lib64/python3.8/site-packages/sklearn/ensemble/_forest.py", line 865, in predict_proba
    X = self._validate_X_predict(X)
  File "/opt/venv/lib64/python3.8/site-packages/sklearn/ensemble/_forest.py", line 599, in _validate_X_predict
    X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
  File "/opt/venv/lib64/python3.8/site-packages/sklearn/base.py", line 580, in _validate_data
    self._check_feature_names(X, reset=reset)
  File "/opt/venv/lib64/python3.8/site-packages/sklearn/base.py", line 507, in _check_feature_names
    raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names seen at fit time, yet now missing:
- 05-10
- 05-11
- 15-10
- 15-11
- 25-10
- ...

Does that error mean more to you? It is the last visible error when checking the logs in the editor.
The message is trunkated by sklearn self: scikit-learn/sklearn/base.py at 35b5ee65770ee7fb7bf15d0acb2ee96d7d3bf1ab · scikit-learn/scikit-learn · GitHub

Hi Emile,

I also saw this message in the logs, 05-10, 15-10, 05-11,… are my feature names for data for the 5th of October, 15th of October, 5th of Novermber, …

I suspect that the datacube that is passed to sklearn is empty, because of the MissingProduct messages from the ‘TERRASCOPE_S2_TOC_V2’ collection.

Kind regards,
Kato

Hey Kato,

Some dates indeed have no data. For TERRASCOPE_S2_TOC_V2, clouds are masked away, and sometimes the extent could be right on the edge of an satellite image.

For 2024-07-01 for example, you can see that some parcels have no pixel data:

To debug it, it is possible to put a try-except around the root of the error, and add some extra information to it. For example:

def apply_udf_data(data: UdfData):
    ## DATA ##
    # Load data
    ts_dict = data.get_structured_data_list()[0].data
    try:
        # All udf content..
    except Exception as e:
        raise Exception(f"data: {ts_dict} Error in UDF: {e}")
example output showing date with full data and date without
...
    "2024-07-21T00:00:00Z": [
        [0.8570518051253425],
        [0.6750948457633819],
        [0.448033818602562],
        [0.4843438924413866],
        [0.5221023321151733],
        [0.4247104223874899],
        [0.7748762883061655],
        [0.3261186053522494],
        [0.3745484292507172],
        [0.3898260928180119],
        [0.6785347053879186],
        [0.6013477584719658],
        [0.6835786989591655],
        [0.671942118096025],
        [0.7652009990480211],
        [0.7984053597510236],
        [0.812631408231599],
        [0.7812918780230674],
        [0.7503446271643043],
        [0.6487146471179811],
        [0.4645040519535541],
        [0.4679112182678403],
        [0.5753125915569919],
        [0.7401473565253219],
        [0.5436416496621802],
        [0.7716637833913167],
        [0.6086455756879371],
        [0.4922040828988572],
        [0.632121983456285],
        [0.6265893798321486],
        [0.5093726385079447],
        [0.4625303537949272],
        [0.7344460727332475],
        [0.686734888653398],
        [0.6822119863136955],
        [0.6550523322395962],
        [0.6488686796798501],
        [0.5015533617858229],
        [0.3824223267360472],
        [0.8114082086831331],
        [0.3861597855124742],
        [0.5325922912784985],
        [0.8319205016317502],
        [0.5145040865127857],
        [0.4465995261418646],
        [0.3304970000938671],
        [0.3858031509004437],
        [0.3183579211754183],
        [0.8354987859725952],
        [0.7430333115082585],
        [0.7569319856794257],
        [0.5214956813993362],
        [0.7124942578767476],
        [0.2654576441530364],
        [0.2521615133646431],
        [0.2847584302796692],
        [0.283719279180617],
        [0.5317853645517908],
        [0.4937739551807782],
        [0.4785858993568728],
        [0.5271638310736134],
        [0.5725147386173626],
        [0.2989589810961544],
        [0.3201076213556986],
        [0.3895329804915302],
        [0.3594205772876739],
        [0.5677102953195572],
        [0.5464025835196177],
        [0.3475884056091308],
        [0.3116959840560159],
    ],
    "2024-07-01T00:00:00Z": [
        [0.5823072632153828],
        [0.1699785963664163],
        [0.184436729333053],
        [0.2172549554204519],
        [0.2047472642080204],
        [0.1677890313704582],
        ["NaN"],
        [0.6128483965541377],
        ["NaN"],
        [0.4472256359988696],
        [0.3669533133506775],
        [0.3288891437649727],
        [0.6153470992463307],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        [0.6758419781923294],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        [0.2571837585419416],
        [0.246400433013568],
        [0.4538096615246364],
        ["NaN"],
        ["NaN"],
        [0.636947397674833],
        [0.4769561469554901],
        [0.2432188093662262],
        [0.2592999690199551],
        [0.5305595848709345],
        [0.8136955518192716],
        ["NaN"],
        ["NaN"],
        [0.6019838022631269],
        ["NaN"],
        [0.2100491670587349],
        [0.3082389459815076],
        ["NaN"],
        ["NaN"],
        [0.7018024623394012],
        [0.5673989646377102],
        [0.269725786788123],
        [0.6898954724761802],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        [0.8185631275177002],
        [0.3823173540888481],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        [0.2989414166659117],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        [0.6745566910710828],
        ["NaN"],
        ["NaN"],
        [0.2156848811677524],
        ["NaN"],
        [0.7146707514921824],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        ["NaN"],
        [0.4450587159395218],
        [0.4101725849283843],
    ],
...

Thank you Emile, I was able to debug the udf with your suggestion!

Kind regards,
Kato

1 Like

You can now also use standard Python logging in the UDF that runs on the vector cube:

import logging

def apply_udf_data(data: UdfData):
    logging.warn("test")
    ...