Filter images and reducing spatially

Hi all,

I would like to calculate some statistics on pixels within my datacube, in this case 75th percentile of the green band. I would then like to save this result per image. Later on in my algorithm I want to filter entire images based on this percentile.

I have two problems:

  1. The percentile operation is not commutative. Can I reduce all pixels in one go (so not first reduce_dimension("x") then reduce_dimension("y"), but both at the same time (all the green band values in one vector?).
  2. How can I filter entire images based on the result of a calculation? I cannot add some metadata per image. filter_temporal only seems to work based on dates. Is there a generic filter method of some kind?


Hi @jaapel !

  1. I think you are looking for is the process reduce_spatial: openEO API Processes Unfortunately, it is not available yet in openEO Platform. A workaround could be another process called flatten_dimensions, which is currently worked on, but also still needs a bit of time until it’s available.
  2. openEO doesn’t work on images, it instead works on data cubes. I don’t think you’ll be able to filter images, but maybe one of the back-ends knows a workaround?


Thanks for answering @m.mohr !

For point 1: do you have a time estimate for me so I can think about workarounds if that is needed? Perhaps using UDFs: two reduce steps, saving the intermediate data as an array (python UDF list?).

I hope there is a solution for point 2, filtering images for quality is key for the algorithm to perform well! Let me think of some workarounds…


Hi Jaap,

could you expand on what you mean with filter entire images based on the result of a calculation ?

From what I understand, it could be something like:

  1. Computing the average over the datacube, reducing the spatial dimensions (the image would be the data of each timestep in the datacube)
  2. If the average is above a threshold keep it otherwise discard it.

Basically, something similar to what a back-end is doing for filtering out the dates with too many clouds Filter less cloudy image - #5 by michele.claus

I will see if it is currently possible with the available processes, maybe reduce_spatial could be replaced with two reduce_dimension in a row, firstly over x and secondly over y or viceversa.

Hi @michele.claus ,

Your understanding is correct. It would be reducing spatial dimension using the percentile reducer, then thresholding above a certain value.

The problem with two reduce_dimensions is that doing two percentile reducers in a row will not result in the same value compared to taking all pixel values in the image at once. Perhaps not a major issue for now though. I can also discuss if we can find a filtering method that we can do “stepwise” over x and y.

Thanks for your example, filtering on cloud cover is super useful! Is there a method to view this image metadata (or properties) in the datacube?


Maybe @jeroen.dries or @bart.driessen can tell you more about the available properties of the collection.

The collection metadata gives you hints about the metadata and you can filter on several properties in load_collection, but after you have loaded the imagery into a data cube, you’ll not be able to filter on metadata any longer as then you have a data cube and not separate images any longer.

Hi Jaap,
at VITO, we’ve done similar things using either apply_neighborhood, or chunk_polygon. The best choice depend on what you define as an ‘image’. Can it be arbitrarily large, like would you take the 75% percentile of a whole continent? Or can we also define it as a 512x512 pixel region for instance? Or with chunk_polygon, an ‘image’ can be defined by a bounding polygon.

A different approach, that would take the whole data cube into account, would be to split your workflow in two requests: a first request that reduces your full datacube into percentiles per date, and then a second step that loads a new datacube with only the dates you selected based on percentile.
Only, I believe that for this second step we would need filter_labels, which is not fully supported yet.

What would work for you?

Hi Jeroen!

Great suggestions. In this case I look at all the images of a certain timerange (let say 2-3 years of imagery) that overlap a water reservoir. And then taking this percentile of the whole image’s (or collection of the image’s) green band. So depending on the how efficient this is the chunck_polygon or apply_neighborhood is, both would work. What would you suggest?

Hi Jaap,
for this case, I would certainly suggest chunk_polygon, because it will give you all the pixels within the reservoir. You can then use xarray to compute percentiles, and simply drop out the dates you don’t want, or set the pixels to nodata.

1 Like

Thanks for the suggestion! I will try this out

If I understand correctly you could then also use aggregate_spatial then with a geometry corresponding to this water reservoir. aggregate_spatial by design reduces along x and y dimension at the same time (like reduce_spatial mentioned above)

@jeroen.dries I switched to the VITO backend, as it supports chunk _polygon.
I guess I now need to chunk_polygon(my_geometry, ?udf?), or how would I apply the percentile reducer here?
I tried:, lambda data: processes.quantiles(data, probabilities=[score_percentile]))
which ended up in some java class cast error (see job ec0d892d-fd46-4083-8ada-3ebc3c4300bf in VITO backend). I think I am not yet grasping your suggestion yet.

Hi @jaapel ,
yes, this would need to be fully UDF based, I don’t have many examples in open source yet, except for this unit test:

Basically, you can work on multiple geometries at the same time, and the UDF will be invoked once per polygon.
The UDF will receive the full timeseries stack, so you can first use xarray to do the percentile reduction, and then apply your threshold filtering. You can simply set pixels to nodata if you want to leave them out, or leave out out specific labels entirely.

1 Like