Examples of 'aggregate_temporal_period'?

Hi,

Are there any examples on how to use ‘aggregate_temporal_period’ in order to get, for example, yearly mean values of ndvi within a given period? Is it possible to do it using the openeo R or Python library?

I am trying to do it using the visual editor but I have not managed to do it properly.

Thanks!

Hey,

yes, it is possible, but not with aggregate_temporal_period yet. Unfortunately, aggergate_temporal_period doesn’t support the yearly period yet, see also the corresponding issue: aggregate_temporal_period: No support for a period of type: year · Issue #145 · Open-EO/openeo-geopyspark-driver · GitHub

You can fall back to aggregate_temporal though. Here’s an example in R that uses pre-computed NDVI values:

p = processes()

datacube1 = p$load_collection(
  id = "CGLS_NDVI_V3_GLOBAL",
  spatial_extent = list("east" = 7.763041605582233, "north" = 52.10387582302508, "south" = 51.82970111402692, "west" = 7.498213761663553),
  temporal_extent = list("2016-01-01T00:00:00Z", "2022-01-01T00:00:00Z"))
datacube2 = p$drop_dimension(data = datacube1, name = "bands")
reducer1 = function(data) {
	return(p$mean(data = data))
}
datacube5 = p$aggregate_temporal(
  data = datacube2,
  intervals = list(list("2016-01-01T00:00:00Z", "2017-01-01T00:00:00Z"), list("2017-01-01T00:00:00Z", "2018-01-01T00:00:00Z"), list("2018-01-01T00:00:00Z", "2019-01-01T00:00:00Z"), list("2019-01-01T00:00:00Z", "2020-01-01T00:00:00Z"), list("2020-01-01T00:00:00Z", "2021-01-01T00:00:00Z"), list("2021-01-01T00:00:00Z", "2022-01-01T00:00:00Z")),
  labels = list(2016, 2017, 2018, 2019, 2020, 2021),
  reducer = reducer1)
datacube4 = p$save_result(data = datacube5, format = "GTIFF")

or as process:

{
  "process_graph": {
    "1": {
      "arguments": {
        "bands": null,
        "id": "CGLS_NDVI_V3_GLOBAL",
        "spatial_extent": {
          "east": 7.763041605582233,
          "north": 52.10387582302508,
          "south": 51.82970111402692,
          "west": 7.498213761663553
        },
        "temporal_extent": [
          "2016-01-01T00:00:00Z",
          "2022-01-01T00:00:00Z"
        ]
      },
      "process_id": "load_collection"
    },
    "2": {
      "arguments": {
        "data": {
          "from_node": "1"
        },
        "name": "bands"
      },
      "process_id": "drop_dimension"
    },
    "4": {
      "arguments": {
        "data": {
          "from_node": "5"
        },
        "format": "GTIFF"
      },
      "process_id": "save_result",
      "result": true
    },
    "5": {
      "arguments": {
        "data": {
          "from_node": "2"
        },
        "dimension": null,
        "intervals": [
          [
            "2016-01-01T00:00:00Z",
            "2017-01-01T00:00:00Z"
          ],
          [
            "2017-01-01T00:00:00Z",
            "2018-01-01T00:00:00Z"
          ],
          [
            "2018-01-01T00:00:00Z",
            "2019-01-01T00:00:00Z"
          ],
          [
            "2019-01-01T00:00:00Z",
            "2020-01-01T00:00:00Z"
          ],
          [
            "2020-01-01T00:00:00Z",
            "2021-01-01T00:00:00Z"
          ],
          [
            "2021-01-01T00:00:00Z",
            "2022-01-01T00:00:00Z"
          ]
        ],
        "labels": [
          2016,
          2017,
          2018,
          2019,
          2020,
          2021
        ],
        "reducer": {
          "process_graph": {
            "1": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                }
              },
              "process_id": "mean",
              "result": true
            }
          }
        }
      },
      "process_id": "aggregate_temporal"
    }
  }
}

Hope it helps. Let me know if there are additional questions.

1 Like

Thank you! However, I get the following error back when I build the datacube5:

Error in arguments[[param_name]]$setValue(call_arg) :
Function parameter do not match ProcessGraph parameter(s)

Any idea what might be the issue? Thanks!

Best,

Javier

Are you saying that you are converting the process above into code and get an error when you run the processing in the cloud?
If yes, can you please post the code so that we can reproduce the issue? Thanks!

I am using the same R code you suggested and I get the error on the R console when I create the datacube5 object like this, before starting the job:

datacube5 = p$aggregate_temporal(
data = datacube2,
intervals = list(list(“2016-01-01T00:00:00Z”, “2017-01-01T00:00:00Z”), list(“2017-01-01T00:00:00Z”, “2018-01-01T00:00:00Z”), list(“2018-01-01T00:00:00Z”, “2019-01-01T00:00:00Z”), list(“2019-01-01T00:00:00Z”, “2020-01-01T00:00:00Z”), list(“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”), list(“2021-01-01T00:00:00Z”, “2022-01-01T00:00:00Z”)),
labels = list(2016, 2017, 2018, 2019, 2020, 2021),
reducer = reducer_mean)

I just changed the name of the reducer (compared to your code) as following:

reducer_mean = function(data) {
return(p$mean(data = data))
}

Thanks! Javier

Your reducer function is missing the context parameter. Either add ... or context as a second parameter to the function.

function(data, context) {
return(p$mean(data = data))
}
2 Likes

Thanks! Yet, the job gives an error back. When I start the job, I get a warning:

Warning messages:
1: In is.environment(value) || !is.na(value) :
‘length(x) = 2 > 1’ in coercion to ‘logical(1)’
2: In is.environment(value) || !is.na(value) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’
3: In is.environment(value) || !is.na(value) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’
4: In !is.environment(self$getValue()) && is.na(self$getValue()) :
‘length(x) = 4 > 1’ in coercion to ‘logical(1)’
5: In !is.environment(self$getValue()) && is.na(self$getValue()) :
‘length(x) = 2 > 1’ in coercion to ‘logical(1)’
6: In length(self$getValue()) == 0 || is.na(self$getValue()) :
‘length(x) = 2 > 1’ in coercion to ‘logical(1)’
7: In length(self$getValue()) == 0 || is.na(self$getValue()) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’
8: In length(self$getValue()) == 0 || is.na(self$getValue()) :
‘length(x) = 6 > 1’ in coercion to ‘logical(1)’

And then the job finished with an error, I cannot debug since there is no info associated, afaik. The log is very hard to interpret and, among other things, it says something like: Py4JJavaError: An error occurred while calling o897.datacube_seq. : geotrellis.raster.gdal.MalformedDataException: Unable to construct dataset dimensions.

Could you please try the following code and let me know your opinion?

library(openeo)

con = connect(host = “https://openeo.cloud”)

login()

p = processes()

datacube1 = p$load_collection(
id = “CGLS_NDVI_V3_GLOBAL”,
spatial_extent = list(“west” = -3.6701819762247494, “south” = 36.807954545454564, “east” = -2.4735910671338406, “north” = 37.25795454545457),
temporal_extent = list(“2016-01-01T00:00:00Z”, “2022-01-01T00:00:00Z”))

datacube2 = p$drop_dimension(data = datacube1, name = “bands”)

reducer_mean = function(data, context) {
return(p$mean(data = data))
}

datacube5 = p$aggregate_temporal(
data = datacube2,
intervals = list(list(“2016-01-01T00:00:00Z”, “2017-01-01T00:00:00Z”), list(“2017-01-01T00:00:00Z”, “2018-01-01T00:00:00Z”), list(“2018-01-01T00:00:00Z”, “2019-01-01T00:00:00Z”), list(“2019-01-01T00:00:00Z”, “2020-01-01T00:00:00Z”), list(“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”), list(“2021-01-01T00:00:00Z”, “2022-01-01T00:00:00Z”)),
labels = list(2016, 2017, 2018, 2019, 2020, 2021),
reducer = reducer_mean)

result = p$save_result(data = datacube5, format = “GTIFF”)

job_id = create_job(graph=result, title=“Job_name”, description=“This is a test”,format=“GTIFF”)

start_job(job_id)

Hi @javier.martinez ,

sorry about these issues. I’ve checked the process graph and it looks correct. I’ve now run it in the Web Editor to make sure the R issues are not the problem, but in the Web Editor it also doesn’t work (same error message it seems). So this seems to be a problem with the processing engine in the background, which I can’t help with. I hope our experts at VITO can help with it. @jeroen.dries @stefaan.lippens Can someone of you have a look at this?

Best,
Matthias

1 Like

I had a look and found your failed job. GDAL has a problem reading a particular file, but the error didn’t reveal which one.
I made a commit to add it to the logging, so we can find out.
In the meantime, you could also try replacing your collection, so you can continue experimenting with aggregate_temporal.

1 Like

I tried the python based example below on openeo-dev.vito.be just now, and it worked. Can you also try yours again?

connection = openeo.connect(“openeo-dev.vito.be”).authenticate_oidc()
fapar = connection.load_collection(“CGLS_NDVI_V3_GLOBAL”,
spatial_extent = {“west”: -3.67, “south”: 36.8, “east”: -2.47, “north”: 37.25},
temporal_extent = [“2019-01-01”, “2022-01-01”]).band(“NDVI”)

fapar.aggregate_temporal_period(period="year",reducer="mean").download("ndvi_mean.nc")
2 Likes

Changing the connection fixed the issue. Thanks!

no problem, the fix should also be on openeo.cloud now!

1 Like

Hi again,

When I run this script, I get only one raster, which is presumingly the mean value of the period (2016-2021 in my case) and not the mean value for each year, which is what I need to compute. I would expect to obtain a collection containing as many images as years, and then, I want to compute the interannual mean/median/etc. Do I need to follow a different approach?

Thanks!

Javier

@javier.martinez could you try to use netCDF instead of geotiff as output format?

1 Like

Just tried and still I only get one band… BTW, do I have to specify the output format twice in the save_result and the create_job commands?

if you specify it in create_job it will automatically append a save_result node at the end. If you set save_result before, you can avoid specifying it in create_job.

1 Like

For the record, the aggregate_temporal_period now works and using it I get yearly values.