Moving Average with Low Pass Filter

How to calculate a moving average of a satellite time-series ?

  • The idea here is to compute a moving average already on the cloud (not locally) for S5P data.
  • Specific request/idea/comment: I thought about using apply_dimension() with filter_temporal() as below:

moving_average ← function(data, context) {
return(p$filter_temporal(data = data, dimension = “t”))
}

datacube = p$apply_dimension(process = moving_average,
data = datacube, dimension = “t”
)

the function “moving_average” is not working for apply dimension. Do you have any ideas on how to implement this (this or any
other way)?
The image from Edzer’s book is the main reference for that and it should help find a solution…

Thank you in advance

In theory, but probably not yet supported, the ‘apply_neighourhood’ process allows you to apply a function, like mean, to a spatiotemporal window that you can specify.
What time window would you like to apply it to?

https://processes.openeo.org/#apply_neighborhood

I also faced this issue some time ago, but didn’t come up with a solution! apply_kernel: apply along temporal dimension? · Issue #324 · Open-EO/openeo-processes · GitHub

Actually I’m develoing an RShiny application that calls openEO processes. The time window would be user defined, but in general monthly / 3 months, something like that.

I gave a look with Matthias at the apply_neighborhood, but the problem is that it expects a two dimensional datacube, so this would not work for time (moving average).

I tried here passing a reduce_dimension process, but I’m not sure what’s the proposal for one dimension.

process1 = function(data, context = NULL) {
reduce1 = p$reduce_dimension(data = data, dimension = “t”, reducer = mean)
reduce1
}
datacube = p$apply_neighborhood(data = datacube, size = list(list(
“dimension” = “t”, “value” = “P30D”)), process = process1)

How feasible would it be to implement this? It would be great to have it, as this is the last step I require to finish UC5 for S5P data…

1 Like

I’m not sure how this should look like with apply_neighborhood. If you pass only t as dimension then you’d work on a 1D data cube in the callback, but which process to use then? apply_dimension on t with a mean? This is somewhat weird…

process2 = function(data, context = NULL) {
  m = p$mean(data)
  return(p$array_create(data = [m]))
}
process1 = function(data, context = NULL) {
  return(p$apply_dimension(data = data, dimension = “t”, reducer = process2))
}
datacube = p$apply_neighborhood(data = datacube, size = list(list(“dimension” = “t”, “value” = “P30D”)), process = process1)

I was actually more thinking of something like:

datacube = p$apply_neighborhood(data = datacube, size = list(list(
“dimension” = “t”, “value” = “P30D”),list(
“dimension” = “x”, “value” = “1”, "unit"="px"),list(
“dimension” = “y”, “value” = “1”, "unit"="px")), process = mean)

Of course, this assumes that the backend is allowed to ‘flatten’ the input to mean into a one dimensional array. Would that be ok?
We can indeed support this on vito backend, but do need to plan the work, so due to holidays this could move into september.

Great to see this moving; can you give us an ideawhen this will become available?

@m.mohr is my proposal allowed by the spec?
I may need to do some work on our apply_neighborhood in the coming weeks in any case, could be an option to also consider this.

@jeroen.dries No. The process provides a raster-cube in the callback, which can’t be passed to mean.

Also, wouldn’t you need to set the size to 1x1x1 and then add an overlap to the temporal dimension? Then you could likely make it somehow work. But overall, this is so complicated and seems to be a candidate for a new process.


A little off topic… I haven’t looked at the process for quite a while, but I also see some contrary definitions there. On the one hand, the description says:

The process must not add new dimensions, or remove entire dimensions, but the result can have different dimension labels.

On the other hand, in other places it says:

The dimension properties (name, type, labels, reference system and resolution) must remain unchanged, otherwise a DataCubePropertiesImmutable exception will be thrown.

and

A data cube with the newly computed values and the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged.

So I assume the first part needs to be removed.

Did we actually intend for this to pass datacubes into the callback? Very similar processes like apply_dimension also pass in a labeled array, which seems to make more sense as that is what allows our callback processes to work on it right?

You would indeed set size to 1x1x1 and set overlap in temporal dimension. Not sure why that is complicated? For backends, maintaining a large number of specialized processes is also complicated.

Yes, that is intentional. That’s how it is defined since the beginning because all array operations only work on 1D arrays and if you want to work on ND arrays you need to work on datacubes. Thus, apply_neighborhood is using datacubes in the callback.

What I’m not clear about when working with the process is: If you specify overlap, what is expected to be returned from the callback? the data with or without overlap?

It doesn’t feel very intuitive for such a simple use case and multiple people can’t come up with a solution so it seems it should be made possible?

ok, so there’s a few separate topics here:

  1. apply_neighborhood is incompatible with most other processes that we typically use in callbacks, except perhaps for run_udf, which allows ‘any’ as data type. This is not impossible to solve, either by:
    a) an explicit datacube to 1D array function
    b) an rule that allows a backend to implicitly flatten a datacube into a 1D array when needed
  2. some use cases that are covered by apply_neighborhood could be made simpler by having a specialized version with arguments set to fixed values, very much like we did in the CARD4L version of backscatter. This is mostly about convenience, which could also be taken care of in the clients, but it could be an option if the convenience process turns out to be popular.
  3. apply_neighborhood description needs to better define overlap, that requires a ticket.

So basically, if we agree on 1-a or 1-b, we can help out this use case already, and follow up on 2 and 3 separately?

  1. It must be explicit (via a new process - 1A) or we’d need to change the process definition for the callback to explicitly allow to work on arrays and data cubes (somewhat similar to 1B), because openEO is basically strictly typed. I have no strong preference yet. I’ve opened an issue for discussions: apply_neighborhood: Data type (datacube/array) in the callback · Issue #387 · Open-EO/openeo-processes · GitHub

  2. Yes, for sure: apply_neighborhood: What to return when an overlap is used? · Issue #386 · Open-EO/openeo-processes · GitHub

So now we have three open issues for apply_neighborhood. The third issue is apply_neighborhood : Descriptions are in conflict · Issue #385 · Open-EO/openeo-processes · GitHub