Moving Average with Low Pass Filter

huriel.reichel · 8 July 2022 11:18

How to calculate a moving average of a satellite time-series ?

The idea here is to compute a moving average already on the cloud (not locally) for S5P data.
Specific request/idea/comment: I thought about using apply_dimension() with filter_temporal() as below:

moving_average ← function(data, context) {
return(p$filter_temporal(data = data, dimension = “t”))
}

datacube = p$apply_dimension(process = moving_average,
data = datacube, dimension = “t”
)

the function “moving_average” is not working for apply dimension. Do you have any ideas on how to implement this (this or any
other way)?
The image from Edzer’s book is the main reference for that and it should help find a solution…

Thank you in advance

jeroen.dries · 8 July 2022 14:11

In theory, but probably not yet supported, the ‘apply_neighourhood’ process allows you to apply a function, like mean, to a spatiotemporal window that you can specify.
What time window would you like to apply it to?

https://processes.openeo.org/#apply_neighborhood

michele.claus · 8 July 2022 14:18

I also faced this issue some time ago, but didn’t come up with a solution! apply_kernel: apply along temporal dimension? · Issue #324 · Open-EO/openeo-processes · GitHub

huriel.reichel · 8 July 2022 14:20

Actually I’m develoing an RShiny application that calls openEO processes. The time window would be user defined, but in general monthly / 3 months, something like that.

huriel.reichel · 18 July 2022 13:01

I gave a look with Matthias at the apply_neighborhood, but the problem is that it expects a two dimensional datacube, so this would not work for time (moving average).

I tried here passing a reduce_dimension process, but I’m not sure what’s the proposal for one dimension.

process1 = function(data, context = NULL) {
reduce1 = p$reduce_dimension(data = data, dimension = “t”, reducer = mean)
reduce1
}
datacube = p$apply_neighborhood(data = datacube, size = list(list(
“dimension” = “t”, “value” = “P30D”)), process = process1)

How feasible would it be to implement this? It would be great to have it, as this is the last step I require to finish UC5 for S5P data…

m.mohr · 18 July 2022 13:06

I’m not sure how this should look like with apply_neighborhood. If you pass only t as dimension then you’d work on a 1D data cube in the callback, but which process to use then? apply_dimension on t with a mean? This is somewhat weird…

process2 = function(data, context = NULL) {
  m = p$mean(data)
  return(p$array_create(data = [m]))
}
process1 = function(data, context = NULL) {
  return(p$apply_dimension(data = data, dimension = “t”, reducer = process2))
}
datacube = p$apply_neighborhood(data = datacube, size = list(list(“dimension” = “t”, “value” = “P30D”)), process = process1)

jeroen.dries · 20 July 2022 12:49

I was actually more thinking of something like:

datacube = p$apply_neighborhood(data = datacube, size = list(list(
“dimension” = “t”, “value” = “P30D”),list(
“dimension” = “x”, “value” = “1”, "unit"="px"),list(
“dimension” = “y”, “value” = “1”, "unit"="px")), process = mean)

Of course, this assumes that the backend is allowed to ‘flatten’ the input to mean into a one dimensional array. Would that be ok?
We can indeed support this on vito backend, but do need to plan the work, so due to holidays this could move into september.

edzer.pebesma · 20 September 2022 09:11

Great to see this moving; can you give us an ideawhen this will become available?

jeroen.dries · 22 September 2022 13:04

@m.mohr is my proposal allowed by the spec?
I may need to do some work on our apply_neighborhood in the coming weeks in any case, could be an option to also consider this.

m.mohr · 23 September 2022 09:56

@jeroen.dries No. The process provides a raster-cube in the callback, which can’t be passed to mean.

Also, wouldn’t you need to set the size to 1x1x1 and then add an overlap to the temporal dimension? Then you could likely make it somehow work. But overall, this is so complicated and seems to be a candidate for a new process.

A little off topic… I haven’t looked at the process for quite a while, but I also see some contrary definitions there. On the one hand, the description says:

The process must not add new dimensions, or remove entire dimensions, but the result can have different dimension labels.

On the other hand, in other places it says:

The dimension properties (name, type, labels, reference system and resolution) must remain unchanged, otherwise a DataCubePropertiesImmutable exception will be thrown.

and

A data cube with the newly computed values and the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged.

So I assume the first part needs to be removed.

jeroen.dries · 23 September 2022 11:17

Did we actually intend for this to pass datacubes into the callback? Very similar processes like apply_dimension also pass in a labeled array, which seems to make more sense as that is what allows our callback processes to work on it right?

You would indeed set size to 1x1x1 and set overlap in temporal dimension. Not sure why that is complicated? For backends, maintaining a large number of specialized processes is also complicated.

m.mohr · 23 September 2022 15:15

Yes, that is intentional. That’s how it is defined since the beginning because all array operations only work on 1D arrays and if you want to work on ND arrays you need to work on datacubes. Thus, apply_neighborhood is using datacubes in the callback.

What I’m not clear about when working with the process is: If you specify overlap, what is expected to be returned from the callback? the data with or without overlap?

It doesn’t feel very intuitive for such a simple use case and multiple people can’t come up with a solution so it seems it should be made possible?

jeroen.dries · 28 September 2022 12:29

ok, so there’s a few separate topics here:

apply_neighborhood is incompatible with most other processes that we typically use in callbacks, except perhaps for run_udf, which allows ‘any’ as data type. This is not impossible to solve, either by:
a) an explicit datacube to 1D array function
b) an rule that allows a backend to implicitly flatten a datacube into a 1D array when needed
some use cases that are covered by apply_neighborhood could be made simpler by having a specialized version with arguments set to fixed values, very much like we did in the CARD4L version of backscatter. This is mostly about convenience, which could also be taken care of in the clients, but it could be an option if the convenience process turns out to be popular.
apply_neighborhood description needs to better define overlap, that requires a ticket.

So basically, if we agree on 1-a or 1-b, we can help out this use case already, and follow up on 2 and 3 separately?

m.mohr · 30 September 2022 10:15

It must be explicit (via a new process - 1A) or we’d need to change the process definition for the callback to explicitly allow to work on arrays and data cubes (somewhat similar to 1B), because openEO is basically strictly typed. I have no strong preference yet. I’ve opened an issue for discussions: apply_neighborhood: Data type (datacube/array) in the callback · Issue #387 · Open-EO/openeo-processes · GitHub
Yes, for sure: apply_neighborhood: What to return when an overlap is used? · Issue #386 · Open-EO/openeo-processes · GitHub

So now we have three open issues for apply_neighborhood. The third issue is apply_neighborhood : Descriptions are in conflict · Issue #385 · Open-EO/openeo-processes · GitHub