How to use quantiles on a 2d data cube (x,y)

I’m trying to compute a percentile using the quantiles process. It takes array as input. I have a 2d data cube (x,y; e.g. a DEM). Which data cube process can I use here? I want to get back one number.
I’ve tried it with aggreagate_spatial and it fails*.

Here’s an example graph:

{
  "process_graph": {
    "aggregate2": {
      "arguments": {
        "data": {
          "from_node": "load1"
        },
        "geometries": {
          "features": [
            {
              "geometry": {
                "coordinates": [
                  [
                    [
                      11.433438104703031,
                      47.28888805806898
                    ],
                    [
                      11.285468923538287,
                      47.28888805806898
                    ],
                    [
                      11.285468923538287,
                      47.41113281562161
                    ],
                    [
                      11.433438104703031,
                      47.41113281562161
                    ],
                    [
                      11.433438104703031,
                      47.28888805806898
                    ]
                  ]
                ],
                "type": "Polygon"
              },
              "properties": null,
              "type": "Feature"
            },
            {
              "geometry": {
                "coordinates": [
                  [
                    [
                      11.410345656636142,
                      47.39046895229126
                    ],
                    [
                      11.412708610274276,
                      47.30682719883302
                    ],
                    [
                      11.312873819063277,
                      47.305625634515195
                    ],
                    [
                      11.311101603834683,
                      47.39046895229126
                    ],
                    [
                      11.311101603834683,
                      47.3932680568073
                    ],
                    [
                      11.410345656636142,
                      47.3932680568073
                    ],
                    [
                      11.410345656636142,
                      47.39046895229126
                    ]
                  ]
                ],
                "type": "Polygon"
              },
              "properties": null,
              "type": "Feature"
            }
          ],
          "type": "FeatureCollection"
        },
        "reducer": {
          "process_graph": {
            "quantiles1": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "probabilities": [
                  0.5
                ]
              },
              "process_id": "quantiles",
              "result": true
            }
          }
        }
      },
      "process_id": "aggregate_spatial"
    },
    "load1": {
      "arguments": {
        "id": "COPERNICUS_30",
        "spatial_extent": {
          "east": 11.433438104703031,
          "north": 47.411132815621606,
          "south": 47.28888805806898,
          "west": 11.285468923538287
        },
        "temporal_extent": [
          "2010-12-12T00:00:00Z",
          null
        ]
      },
      "process_id": "load_collection"
    },
    "save5": {
      "arguments": {
        "data": {
          "from_node": "aggregate2"
        },
        "format": "CSV"
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

Here’s the ID: j-240306b9609f4cc08f4436e16c33f8a9

Here’s the Error:

Error communicating with MapOutputTracker
+23s 795msERROR
ID: [1709741768407, 593386]
▸
OpenEO batch job failed: java.lang.IllegalArgumentException: QuantilesParameterMissing: either 'q' or 'probabilities' argument needs to be set
+25s 150msERROR
ID: [1709741769762, 899002]

PS: *(it works by exchanging quantiles by median, but then gives two values as result, where it should be one?)

The reducer in aggregate spatial is expected to take an array as input and return a single scalar as output.
The reducer you specify in your process graph (just a quantiles process) takes an array, but also returns an array.
After the quantiles, your reducer callback should also do an array_element to produce a scalar. So something like

{
  "quantiles1": {
    "arguments": {"data": {"from_parameter": "data"}, "probabilities": [0.5]},
    "process_id": "quantiles",
  },
  "arrayelement1": {
    "process_id": "array_element",
    "arguments": {"data": {"from_node": "quantiles1"}, "index": 0},
    "result": true
  }
}

in what sense do you get two values? could it be that your input cube you are aggregate_spatialing has two bands?

My bad, I had created 2 polygons accidentally. The median solution works as expected!

Result:
image

Hi @stefaan.lippens, thanks for the direct support!
I’ve adapted the process graph to include the array_element process as suggested.
I get a different error now.

PG:

{
  "process_graph": {
    "aggregate1": {
      "arguments": {
        "data": {
          "from_node": "load1"
        },
        "geometries": {
          "coordinates": [
            [
              [
                11.405728030894776,
                47.34336009384771
              ],
              [
                11.336232816497366,
                47.34336009384771
              ],
              [
                11.336232816497366,
                47.3831235941382
              ],
              [
                11.405728030894776,
                47.3831235941382
              ],
              [
                11.405728030894776,
                47.34336009384771
              ]
            ]
          ],
          "type": "Polygon"
        },
        "reducer": {
          "process_graph": {
            "array1": {
              "arguments": {
                "data": {
                  "from_node": "quantiles1"
                },
                "index": 0
              },
              "process_id": "array_element",
              "result": true
            },
            "quantiles1": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "probabilities": [
                  0.5
                ]
              },
              "process_id": "quantiles"
            }
          }
        }
      },
      "process_id": "aggregate_spatial"
    },
    "load1": {
      "arguments": {
        "id": "COPERNICUS_30",
        "spatial_extent": {
          "east": 11.433438104703031,
          "north": 47.411132815621606,
          "south": 47.28888805806898,
          "west": 11.285468923538287
        },
        "temporal_extent": [
          "2010-12-12T00:00:00Z",
          null
        ]
      },
      "process_id": "load_collection"
    },
    "save3": {
      "arguments": {
        "data": {
          "from_node": "aggregate1"
        },
        "format": "CSV"
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

ID: j-2403075b05aa4d7095e3356f0f263653

Error:

OpenEO batch job failed: java.lang.IllegalArgumentException: The process Constants are not expected in a reducer callback, got value 0.5. is not supported to be used in a reducer callback.
+50s 256msERROR
ID: [1709808964929, 932882]
▸
OpenEO batch job failed: java.lang.IllegalArgumentException: The process Constants are not expected in a reducer callback, got value 0.5. is not supported to be used in a reducer callback.
+2m 13s 936msERROR
ID: [1709809048609, 135653]

I have also tried passing two probabilities to quantiles, like [0.5, 0.75], it results in the same error, but only one of the identical error messages.

Ok this indeed looks like a bug

I’ve escalated it here: quantiles support in aggregate_spatial · Issue #714 · Open-EO/openeo-geopyspark-driver · GitHub

1 Like

I think we actually do (or should) support returning an array in aggregate_spatial. We need it when computing multiple statistics at once:
https://open-eo.github.io/openeo-python-client/basics.html#computing-multiple-statistics

Ah indeed, I wasn’t aware, but apparently the VITO backend supports aggregate_spatial reducers that return an array. For example a reducer like below currently works

{
  "process_graph": {
    "mean1": {
      "process_id": "mean",
      "arguments": {"data": {"from_parameter": "data"}}
    },
    "median1": {
      "process_id": "median",
      "arguments": {"data": {"from_parameter": "data"}}
    },
    "count1": {
      "process_id": "count",
      "arguments": {"data": {"from_parameter": "data"}}
    },
    "arraycreate1": {
      "process_id": "array_create",
      "arguments": {
        "data": [
          {"from_node": "mean1"},
          {"from_node": "median1"},
          {"from_node": "count1"}
        ]
      },
      "result": true
    }
  }
}

However, this is strictly speaking outside of the openEO processes spec, so should be considered an experimental feature

Thanks for clarifying that! This will be useful for machine learning workflows!

Thanks @stefaan.lippens, we anyway need the quantiles so we’re looking forward to have a solution for that!