Correlation operation

paolo.filippucci · 29 April 2022 08:48

Hi,
Is there a way to perform Spearman or Pearson correlation operations? I could not find anything about it on the API or the process description…

https://open-eo.github.io/openeo-python-client/api.html#openeo.rest.connection.Connection.load_collection

jeroen.dries · 30 April 2022 08:39

Hi Paolo,
this is not available as a process. Looking at the formula, it might be possible to compute it based on more elementary processes such as standard deviation and mean?
The other option would be the use of a user defined function, which allows you to use functions available in the much wider Python ecosystem.

best regards,
Jeroen

paolo.filippucci · 2 May 2022 10:05

Hi @jeroen.dries,
This is exactly what I was trying to obtain… But it is not so easy:
In my code I have to create a pixels mask based on some specifics (like “the pixels with the temporal standard deviation value less than the n percentile”), spatially average these pixels (obtaining a timeseries) and finally calculate the Spearman correlation between the obtained timeseries and each datacube pixel’s timeseries. I have some issue in doing this.
Apparently, the results of a spatial average is not a datacube anymore: when I try to subtract the obtained timeseries from the full datacube, an issue is raised:

C=datacube.mask(Cmask)
C=C.aggregate_spatial(rect, "mean")
datacub1=datacube-(C)
datacub1.download("data2.nc", format="netcdf")

OpenEoApiError: [500] unknown: 'AggregatePolygonResult' object has no attribute '_traces'

I also tried to download the data within C in a netcdf or csv format and then reload them from my environment (hoping this would also make the process faster) but I still obtain errors:

C.download("C.csv", format="csv")
C2=connection.load_disk_collection("csv", "C:/Users/Paolo/Desktop/bck uff/Dropbox (IRPI CNR)/BACKUP/CURRENT_USERS/p.filippucci/angelica/openeo/C.csv")

Here the system notifies me that the file I am extracting is not a datacube

C:\Users\Paolo\miniconda3\envs\openeo\lib\site-packages\openeo\metadata.py:239: UserWarning: No cube:dimensions metadata
  complain("No cube:dimensions metadata")

and then, if I try to apply any operation to it (download it again, subtract it from the original datacube, add a dimension), the following error is raised

OpenEoApiError: [500] unknown: The format is not supported by the backend: csv

I am currently stucked.

jeroen.dries · 2 May 2022 12:10

Hi Paolo,
this is not the full answer yet, but a relevant additional question: what are you doing the spatial aggregation over? Given that we’re working with datacubes here, this could be the whole world, a continent, a country a small parcel?
Another case could be aggregation over a moving window of a given size, e.g. 256x256 pixels.

Could you clarify?

thanks,
Jeroen

paolo.filippucci · 2 May 2022 12:23

Hi @jeroen.dries ,
You are right. The aggregation is performed over a region of 512*512 pixels (even if most of them are masked out). All the analysis I am doing are done in areas of similar dimensions.

jeroen.dries · 2 May 2022 14:37

Hi Paolo,
got it, you’ll probably need a UDF based solution then anyway.
I would recommend a two step process:

Compute the aggregated timeseries (you seem to have that already)
Compute the correlation in a UDF, the timeseries values can be injected as a piece of python code directly into the UDF. So basically, the UDF code is built dynamically so that the timeseries values are part of the code.

With UDF’s, you can also go for a one-step approach where basicaly the UDF also computes the aggregation. Only then you’ll want to use apply_neighborhood with a 512x512 chunk size, which will be more memory intensive.

Does that help?

Jeroen

paolo.filippucci · 2 May 2022 15:12

I wanted to try something similar, but the aggregated timeseries cannot be properly used, as I mentioned before

OpenEoApiError: [500] unknown: ‘AggregatePolygonResult’ object has no attribute ‘_traces’

Can you indicate some examples where UDF are used?

jeroen.dries · 3 May 2022 06:32

Hi Paolo,
indeed, you’re trying to directly use the result from aggregate_spatial on the backend side, whereas my suggestion is to retrieve it (using .execute() or a batch job) locally. Then, you use a second batch job where you inject the values.

The basic examples for UDF are here:
https://open-eo.github.io/openeo-python-client/udf.html
And a more complex one, showing the use of apply_neighborhood, is here:

github.com

openEOPlatform/parcel-delineation/blob/main/Parcel delineation.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "79292691",
   "metadata": {},
   "source": [
    "# Parcel delineation\n",
    "### Originally created by Kristof van Tricht, Vito , rewrite by Jeroen Dries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "fc5e75d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",

This file has been truncated. show original

As you’re working on a quite complex case, we could also set up a quick call to try and get you going!

paolo.filippucci · 3 May 2022 12:19

It would be very useful, thanks! When are you available?