Examples of 'aggregate_temporal_period'?

javier.martinez · 19 April 2022 14:32

Hi,

Are there any examples on how to use ‘aggregate_temporal_period’ in order to get, for example, yearly mean values of ndvi within a given period? Is it possible to do it using the openeo R or Python library?

I am trying to do it using the visual editor but I have not managed to do it properly.

Thanks!

m.mohr · 19 April 2022 15:58

Hey,

yes, it is possible, but not with aggregate_temporal_period yet. Unfortunately, aggergate_temporal_period doesn’t support the yearly period yet, see also the corresponding issue: aggregate_temporal_period: No support for a period of type: year · Issue #145 · Open-EO/openeo-geopyspark-driver · GitHub

You can fall back to aggregate_temporal though. Here’s an example in R that uses pre-computed NDVI values:

p = processes()

datacube1 = p$load_collection(
  id = "CGLS_NDVI_V3_GLOBAL",
  spatial_extent = list("east" = 7.763041605582233, "north" = 52.10387582302508, "south" = 51.82970111402692, "west" = 7.498213761663553),
  temporal_extent = list("2016-01-01T00:00:00Z", "2022-01-01T00:00:00Z"))
datacube2 = p$drop_dimension(data = datacube1, name = "bands")
reducer1 = function(data) {
	return(p$mean(data = data))
}
datacube5 = p$aggregate_temporal(
  data = datacube2,
  intervals = list(list("2016-01-01T00:00:00Z", "2017-01-01T00:00:00Z"), list("2017-01-01T00:00:00Z", "2018-01-01T00:00:00Z"), list("2018-01-01T00:00:00Z", "2019-01-01T00:00:00Z"), list("2019-01-01T00:00:00Z", "2020-01-01T00:00:00Z"), list("2020-01-01T00:00:00Z", "2021-01-01T00:00:00Z"), list("2021-01-01T00:00:00Z", "2022-01-01T00:00:00Z")),
  labels = list(2016, 2017, 2018, 2019, 2020, 2021),
  reducer = reducer1)
datacube4 = p$save_result(data = datacube5, format = "GTIFF")

or as process:

{
  "process_graph": {
    "1": {
      "arguments": {
        "bands": null,
        "id": "CGLS_NDVI_V3_GLOBAL",
        "spatial_extent": {
          "east": 7.763041605582233,
          "north": 52.10387582302508,
          "south": 51.82970111402692,
          "west": 7.498213761663553
        },
        "temporal_extent": [
          "2016-01-01T00:00:00Z",
          "2022-01-01T00:00:00Z"
        ]
      },
      "process_id": "load_collection"
    },
    "2": {
      "arguments": {
        "data": {
          "from_node": "1"
        },
        "name": "bands"
      },
      "process_id": "drop_dimension"
    },
    "4": {
      "arguments": {
        "data": {
          "from_node": "5"
        },
        "format": "GTIFF"
      },
      "process_id": "save_result",
      "result": true
    },
    "5": {
      "arguments": {
        "data": {
          "from_node": "2"
        },
        "dimension": null,
        "intervals": [
          [
            "2016-01-01T00:00:00Z",
            "2017-01-01T00:00:00Z"
          ],
          [
            "2017-01-01T00:00:00Z",
            "2018-01-01T00:00:00Z"
          ],
          [
            "2018-01-01T00:00:00Z",
            "2019-01-01T00:00:00Z"
          ],
          [
            "2019-01-01T00:00:00Z",
            "2020-01-01T00:00:00Z"
          ],
          [
            "2020-01-01T00:00:00Z",
            "2021-01-01T00:00:00Z"
          ],
          [
            "2021-01-01T00:00:00Z",
            "2022-01-01T00:00:00Z"
          ]
        ],
        "labels": [
          2016,
          2017,
          2018,
          2019,
          2020,
          2021
        ],
        "reducer": {
          "process_graph": {
            "1": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                }
              },
              "process_id": "mean",
              "result": true
            }
          }
        }
      },
      "process_id": "aggregate_temporal"
    }
  }
}

Hope it helps. Let me know if there are additional questions.

javier.martinez · 11 May 2022 15:47

Thank you! However, I get the following error back when I build the datacube5:

Error in arguments[[param_name]]$setValue(call_arg) :
Function parameter do not match ProcessGraph parameter(s)

Any idea what might be the issue? Thanks!

Best,

Javier

m.mohr · 11 May 2022 16:50

Are you saying that you are converting the process above into code and get an error when you run the processing in the cloud?
If yes, can you please post the code so that we can reproduce the issue? Thanks!

javier.martinez · 12 May 2022 07:43

I am using the same R code you suggested and I get the error on the R console when I create the datacube5 object like this, before starting the job:

datacube5 = p$aggregate_temporal(
data = datacube2,
intervals = list(list(“2016-01-01T00:00:00Z”, “2017-01-01T00:00:00Z”), list(“2017-01-01T00:00:00Z”, “2018-01-01T00:00:00Z”), list(“2018-01-01T00:00:00Z”, “2019-01-01T00:00:00Z”), list(“2019-01-01T00:00:00Z”, “2020-01-01T00:00:00Z”), list(“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”), list(“2021-01-01T00:00:00Z”, “2022-01-01T00:00:00Z”)),
labels = list(2016, 2017, 2018, 2019, 2020, 2021),
reducer = reducer_mean)

I just changed the name of the reducer (compared to your code) as following:

reducer_mean = function(data) {
return(p$mean(data = data))
}

Thanks! Javier

florian.lahn · 12 May 2022 08:30

Your reducer function is missing the context parameter. Either add ... or context as a second parameter to the function.

function(data, context) {
return(p$mean(data = data))
}

javier.martinez · 12 May 2022 09:46

Thanks! Yet, the job gives an error back. When I start the job, I get a warning:

Warning messages:
1: In is.environment(value) || !is.na(value) :
‘length(x) = 2 > 1’ in coercion to ‘logical(1)’
2: In is.environment(value) || !is.na(value) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’
3: In is.environment(value) || !is.na(value) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’
4: In !is.environment(self$getValue()) && is.na(self$getValue()) :
‘length(x) = 4 > 1’ in coercion to ‘logical(1)’
5: In !is.environment(self$getValue()) && is.na(self$getValue()) :
‘length(x) = 2 > 1’ in coercion to ‘logical(1)’
6: In length(self$getValue()) == 0 || is.na(self$getValue()) :
‘length(x) = 2 > 1’ in coercion to ‘logical(1)’
7: In length(self$getValue()) == 0 || is.na(self$getValue()) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’
8: In length(self$getValue()) == 0 || is.na(self$getValue()) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’

And then the job finished with an error, I cannot debug since there is no info associated, afaik. The log is very hard to interpret and, among other things, it says something like: Py4JJavaError: An error occurred while calling o897.datacube_seq. : geotrellis.raster.gdal.MalformedDataException: Unable to construct dataset dimensions.

Could you please try the following code and let me know your opinion?

library(openeo)

con = connect(host = “https://openeo.cloud”)

login()

p = processes()

datacube1 = p$load_collection(
id = “CGLS_NDVI_V3_GLOBAL”,
spatial_extent = list(“west” = -3.6701819762247494, “south” = 36.807954545454564, “east” = -2.4735910671338406, “north” = 37.25795454545457),
temporal_extent = list(“2016-01-01T00:00:00Z”, “2022-01-01T00:00:00Z”))

datacube2 = p$drop_dimension(data = datacube1, name = “bands”)

reducer_mean = function(data, context) {
return(p$mean(data = data))
}

datacube5 = p$aggregate_temporal(
data = datacube2,
intervals = list(list(“2016-01-01T00:00:00Z”, “2017-01-01T00:00:00Z”), list(“2017-01-01T00:00:00Z”, “2018-01-01T00:00:00Z”), list(“2018-01-01T00:00:00Z”, “2019-01-01T00:00:00Z”), list(“2019-01-01T00:00:00Z”, “2020-01-01T00:00:00Z”), list(“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”), list(“2021-01-01T00:00:00Z”, “2022-01-01T00:00:00Z”)),
labels = list(2016, 2017, 2018, 2019, 2020, 2021),
reducer = reducer_mean)

result = p$save_result(data = datacube5, format = “GTIFF”)

job_id = create_job(graph=result, title=“Job_name”, description=“This is a test”,format=“GTIFF”)

start_job(job_id)

m.mohr · 12 May 2022 13:38

Hi @javier.martinez ,

sorry about these issues. I’ve checked the process graph and it looks correct. I’ve now run it in the Web Editor to make sure the R issues are not the problem, but in the Web Editor it also doesn’t work (same error message it seems). So this seems to be a problem with the processing engine in the background, which I can’t help with. I hope our experts at VITO can help with it. @jeroen.dries @stefaan.lippens Can someone of you have a look at this?

Best,
Matthias

jeroen.dries · 12 May 2022 14:12

I had a look and found your failed job. GDAL has a problem reading a particular file, but the error didn’t reveal which one.
I made a commit to add it to the logging, so we can find out.
In the meantime, you could also try replacing your collection, so you can continue experimenting with aggregate_temporal.

jeroen.dries · 13 May 2022 06:48

I tried the python based example below on openeo-dev.vito.be just now, and it worked. Can you also try yours again?

connection = openeo.connect(“openeo-dev.vito.be”).authenticate_oidc()
fapar = connection.load_collection(“CGLS_NDVI_V3_GLOBAL”,
spatial_extent = {“west”: -3.67, “south”: 36.8, “east”: -2.47, “north”: 37.25},
temporal_extent = [“2019-01-01”, “2022-01-01”]).band(“NDVI”)
fapar.aggregate_temporal_period(period="year",reducer="mean").download("ndvi_mean.nc")

javier.martinez · 16 May 2022 15:41

Changing the connection fixed the issue. Thanks!

jeroen.dries · 17 May 2022 07:49

no problem, the fix should also be on openeo.cloud now!

javier.martinez · 17 May 2022 10:16

Hi again,

When I run this script, I get only one raster, which is presumingly the mean value of the period (2016-2021 in my case) and not the mean value for each year, which is what I need to compute. I would expect to obtain a collection containing as many images as years, and then, I want to compute the interannual mean/median/etc. Do I need to follow a different approach?

Thanks!

Javier

michele.claus · 17 May 2022 12:33

@javier.martinez could you try to use netCDF instead of geotiff as output format?

javier.martinez · 17 May 2022 12:55

Just tried and still I only get one band… BTW, do I have to specify the output format twice in the save_result and the create_job commands?

michele.claus · 17 May 2022 13:07

if you specify it in create_job it will automatically append a save_result node at the end. If you set save_result before, you can avoid specifying it in create_job.

javier.martinez · 18 May 2022 13:59

For the record, the aggregate_temporal_period now works and using it I get yearly values.