Don’t know to whom I should address this concern but I would like to compare time series of one vegetation indice (leaf area index) for land cover type in my study area
I’m working with the vito backend and I’m getting an error message “SERVER-ERROR: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit”
Would you know a workaround? Maybe using another backend? I would like to know if this error is due to the string parsing of the geojson or if the request itself is too demanding for the server
I joined a R code to reproduce the error (as well as geojson with my study area):
Hi @adrien.michez. There is a process called read_vector that you could use to overcome this issue, but I’m not 100% sure that can load data coming from a different source than the Terrascope workspace.
@stefaan.lippens do you know something more about the read_vector process at VITO? Other suggestions?
Unfortunately I don’t know the R client enough to give you a valid snippet for that. With the Python client for example you can just pass the URL (string) as geometries argument in aggregate_spatial (which will be automatically converted to a read_vector construct)
Hi, @adrien.michez, I looked through the R code. That looks good.
For passing a GeoJSON from a URL to the geometries argument; I’m not sure if this is currently supported by the R-Client. @florian.lahn, could you comment on that?
The R client has no hard-coded processes so it is supported as long as the process description is valid, which unfortunately is not the case.
For geometries in aggregate_spatial the VITO back-end doesn’t expose that strings are allowed in the schema, so that will not work. It also doesn’t allow vector-cubes so this will not work either.
@stefaan.lippens From a schema standpoint, you need to adopt the schemas for this to work:
In aggregate_spatial you need to allow vector cubes and strings as input for geometries, otherwise the client will complain about invalid schemas (vector-cube vs. geojson). Not just R, also JS and the Web Editor.
Optional, but recommended: In load_vector the schema for the filename could be more specific (allow urls and filenames specifically)
Yes, but you don’t implement the official specification, but you actually implement more than the official spec. So you’d need to add two additional schemas. One for strings and one for vector-cubes. Once we have an understanding of vector-cubes that will also land in the official spec, but we are not there yet (while VITO somewhat is).
The read_vector change looks good and is appreciated.
I couldn’t immediately find the issue yet, but I checked your GeoJSON (PUBLIC/LC_WALOUS_10m_modal.geojson at main · adrienmichez/PUBLIC · GitHub) and it seems quite a complex geometry, maybe with some overlap edge cases. Could you also try with a more simple geometry to be sure that the basics work for you?
Yes, it works with simpler geometry and not with the original file (see snippet below)… But I don’t see any issue with the geometry of my “complex” geojson. I’ve tested the geometry as valid with st_valid() from sf in R).
With the comlex geometry, I got the following error:
Incoming data type is not compatible for parameter “geometries”
If you want to track the error on your server, the id is: j-db13efdd3b7b41d587bca555c1079aeb
Could it be because there are multypolygon features? What should I do to perform aggregation with such file?
import openeo
# load data from openo
con = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")
# Load data cube from TERRASCOPE_S2_NDVI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
temporal_extent=["2015-01-01", "2021-07-31"],
bands=["LAI_10M","SCENECLASSIFICATION_20M"])
# get classification band
SCL = datacube.band("SCENECLASSIFICATION_20M")
LAI = datacube.band("LAI_10M")
# we want to mask all other values, so NOT 1
mask = ~ ((SCL == 4) | (SCL == 5))
# masking
LAI_masked = LAI.mask(mask)
# temporal aggregation
LAI_masked_dekad = LAI_masked.aggregate_temporal_period(period = "dekad" , reducer="mean")
# extract and download time series
timeseries_dekad_geom_VO = LAI_masked_dekad.aggregate_spatial(geometries="https://github.com/adrienmichez/PUBLIC/raw/main/LC_WALOUS_10m_modal.geojson", reducer="mean")
res = timeseries_dekad_geom_VO.save_result(format="CSV")
job = res.create_job(title = "LAI_spat_aggreg_geom_VO")
job.start_job()
# with simpler geometry
timeseries_dekad_geom_simp = LAI_masked_dekad.aggregate_spatial(geometries="https://github.com/adrienmichez/PUBLIC/raw/main/grid_belleheid_TMP.geojson", reducer="mean")
res2 = timeseries_dekad_geom_simp.save_result(format="CSV")
job2 = res2.create_job(title = "LAI_spat_aggreg_geom_simp")
job2.start_job()
I’m a bit confused about this error message. It doesn’t look like one that is coming from the VITO backend. Can you share some more error info (exception class, full stacktrace, screenshot, …)?
If I try your job with the original geometry I don’t get the error you quote, but just
...
0:13:02 Job 'j-b038b4d7e5a544f2b9a7e66be0e320f8': running (progress N/A)
0:14:03 Job 'j-b038b4d7e5a544f2b9a7e66be0e320f8': error (progress N/A)
Your batch job 'j-b038b4d7e5a544f2b9a7e66be0e320f8' failed.
Logs can be inspected in an openEO (web) editor or with `connection.job('j-b038b4d7e5a544f2b9a7e66be0e320f8').logs()`.
Printing logs:
[]
If I look at the back-end side logs however, I see
Application application_1657503144471_109385 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1657503144471_109385_000001 exited with exitCode: -104
Failing this attempt.Diagnostics: [2022-08-31 09:46:05.337]Container [pid=19394,containerID=container_e5049_1657503144471_109385_01_000002] is running 4919296B beyond the ‘PHYSICAL’ memory limit. Current usage: 10.0 GB of 10 GB physical memory used; 18.5 GB of 21 GB virtual memory used. Killing container.
This means that the job was killed because it was using too much memory,
which is most likely because of the complex geojson.
I see you have 8 Features in your FeatureCollection.
Could you try splitting your file in 8 separate geojson files, so you just run one job per Feature. I guess you will get a bit further and be able to pinpoint the part of your geometry that is too heavy
Thanks… I could also try to use only single parts features but I would prefer a less try and guess approach. Is there a way to bypass the memory issue by using another less demanding method?
For clean geometries without overlaps we have an efficient implementation of aggregate spatial. But if there is any overlap between features or geometries, we have to take a heavier, slower, more demanding code path.
I’m afraid your geometry (consisting of “pixel clusters” that touch practically everywhere) triggers that check.
So that’s why I proposed to process each feature separately, so that you don’t trigger the overlap check.
Another solution is to morphologically erode each of your feature’s multipolygons, also to avoid the overlap trigger.
Another solution might be a user/job option to disable the overlap check.
The user would then opt-in to: I’m fine that the back-end (arbitrarily) associates each pixel with at most one feature, even if there are multiple features covering that pixel.
That would be great! Because I tried with a negative buffer and this takes us to the size limitation of my github repository.
Isn’t there another solution than the github one to store file with direct link? Because my github repository (I guess others too) is limited to 25 Mb files