Correlation operation

Hi,
Is there a way to perform Spearman or Pearson correlation operations? I could not find anything about it on the API or the process description…

https://open-eo.github.io/openeo-python-client/api.html#openeo.rest.connection.Connection.load_collection

Hi Paolo,
this is not available as a process. Looking at the formula, it might be possible to compute it based on more elementary processes such as standard deviation and mean?
The other option would be the use of a user defined function, which allows you to use functions available in the much wider Python ecosystem.

best regards,
Jeroen

Hi @jeroen.dries,
This is exactly what I was trying to obtain… But it is not so easy:
In my code I have to create a pixels mask based on some specifics (like “the pixels with the temporal standard deviation value less than the n percentile”), spatially average these pixels (obtaining a timeseries) and finally calculate the Spearman correlation between the obtained timeseries and each datacube pixel’s timeseries. I have some issue in doing this.
Apparently, the results of a spatial average is not a datacube anymore: when I try to subtract the obtained timeseries from the full datacube, an issue is raised:

C=datacube.mask(Cmask)
C=C.aggregate_spatial(rect, "mean")
datacub1=datacube-(C)
datacub1.download("data2.nc", format="netcdf")

OpenEoApiError: [500] unknown: 'AggregatePolygonResult' object has no attribute '_traces'

I also tried to download the data within C in a netcdf or csv format and then reload them from my environment (hoping this would also make the process faster) but I still obtain errors:

C.download("C.csv", format="csv")
C2=connection.load_disk_collection("csv", "C:/Users/Paolo/Desktop/bck uff/Dropbox (IRPI CNR)/BACKUP/CURRENT_USERS/p.filippucci/angelica/openeo/C.csv")

Here the system notifies me that the file I am extracting is not a datacube

C:\Users\Paolo\miniconda3\envs\openeo\lib\site-packages\openeo\metadata.py:239: UserWarning: No cube:dimensions metadata
  complain("No cube:dimensions metadata")

and then, if I try to apply any operation to it (download it again, subtract it from the original datacube, add a dimension), the following error is raised

OpenEoApiError: [500] unknown: The format is not supported by the backend: csv

I am currently stucked.

Hi Paolo,
this is not the full answer yet, but a relevant additional question: what are you doing the spatial aggregation over? Given that we’re working with datacubes here, this could be the whole world, a continent, a country a small parcel?
Another case could be aggregation over a moving window of a given size, e.g. 256x256 pixels.

Could you clarify?

thanks,
Jeroen

Hi @jeroen.dries ,
You are right. The aggregation is performed over a region of 512*512 pixels (even if most of them are masked out). All the analysis I am doing are done in areas of similar dimensions.

Hi Paolo,
got it, you’ll probably need a UDF based solution then anyway.
I would recommend a two step process:

  1. Compute the aggregated timeseries (you seem to have that already)
  2. Compute the correlation in a UDF, the timeseries values can be injected as a piece of python code directly into the UDF. So basically, the UDF code is built dynamically so that the timeseries values are part of the code.

With UDF’s, you can also go for a one-step approach where basicaly the UDF also computes the aggregation. Only then you’ll want to use apply_neighborhood with a 512x512 chunk size, which will be more memory intensive.

Does that help?

Jeroen

I wanted to try something similar, but the aggregated timeseries cannot be properly used, as I mentioned before

OpenEoApiError: [500] unknown: ‘AggregatePolygonResult’ object has no attribute ‘_traces’

Can you indicate some examples where UDF are used?

Hi Paolo,
indeed, you’re trying to directly use the result from aggregate_spatial on the backend side, whereas my suggestion is to retrieve it (using .execute() or a batch job) locally. Then, you use a second batch job where you inject the values.

The basic examples for UDF are here:
https://open-eo.github.io/openeo-python-client/udf.html
And a more complex one, showing the use of apply_neighborhood, is here:

As you’re working on a quite complex case, we could also set up a quick call to try and get you going!

It would be very useful, thanks! When are you available?