SERVER-ERROR: 413 Request Entity Too Large

adrien.michez · 26 August 2022 08:35

Hi guys,

Don’t know to whom I should address this concern but I would like to compare time series of one vegetation indice (leaf area index) for land cover type in my study area

I’m working with the vito backend and I’m getting an error message “SERVER-ERROR: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit”

Would you know a workaround? Maybe using another backend? I would like to know if this error is due to the string parsing of the geojson or if the request itself is too demanding for the server

I joined a R code to reproduce the error (as well as geojson with my study area):

https://filesender.belnet.be/?s=download&token=cc0e475d-c75e-4a07-9904-27e4ceea6176

Best

Adrien

michele.claus · 26 August 2022 14:17

Hi @adrien.michez. There is a process called read_vector that you could use to overcome this issue, but I’m not 100% sure that can load data coming from a different source than the Terrascope workspace.

@stefaan.lippens do you know something more about the read_vector process at VITO? Other suggestions?

stefaan.lippens · 26 August 2022 15:28

“SERVER-ERROR: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit”

This indeed means that your request is too large, e.g. due to a very large geojson embeded in the process graph.

There is a process called read_vector that you could use to overcome this issue,

read_vector is indeed an experimental process on the VITO backend. Among others, it supports reading GeoJSON from public URLs.

We have users putting their GeoJSON on Github and using their “raw” URL.
For example take this GeoJSON file on GitHub: https://github.com/Open-EO/openeo-python-driver/blob/master/tests/data/geojson/FeatureCollection01.json
In the top right just above the visualisation there is a button “raw” (sometimes it’s called “download”). Click that, and you get the “raw” URL https://raw.githubusercontent.com/Open-EO/openeo-python-driver/master/tests/data/geojson/FeatureCollection01.json which is the one you should use as argument filename in process read_vector.

Unfortunately I don’t know the R client enough to give you a valid snippet for that. With the Python client for example you can just pass the URL (string) as geometries argument in aggregate_spatial (which will be automatically converted to a read_vector construct)

peterjames.zellner · 29 August 2022 07:27

Hi,
@adrien.michez, I looked through the R code. That looks good.
For passing a GeoJSON from a URL to the geometries argument; I’m not sure if this is currently supported by the R-Client. @florian.lahn, could you comment on that?

m.mohr · 29 August 2022 12:36

The R client has no hard-coded processes so it is supported as long as the process description is valid, which unfortunately is not the case.
For geometries in aggregate_spatial the VITO back-end doesn’t expose that strings are allowed in the schema, so that will not work. It also doesn’t allow vector-cubes so this will not work either.

@stefaan.lippens From a schema standpoint, you need to adopt the schemas for this to work:

In aggregate_spatial you need to allow vector cubes and strings as input for geometries, otherwise the client will complain about invalid schemas (vector-cube vs. geojson). Not just R, also JS and the Web Editor.
Optional, but recommended: In load_vector the schema for the filename could be more specific (allow urls and filenames specifically)

adrien.michez · 29 August 2022 13:07

I tried (with python) but got the following error: Incoming data type is not compatible for parameter “geometries”

I think this doesn’t come from the geojson file, which I tested (locally) valid.

Here is my code:

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_NDVI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2015-01-01", "2021-07-31"],
                               bands=["LAI_10M","SCENECLASSIFICATION_20M"])

# get classification band
SCL = datacube.band("SCENECLASSIFICATION_20M")
LAI = datacube.band("LAI_10M")

# we want to mask all other values
mask = ~ ((SCL == 4) | (SCL == 5))

# masking
LAI_masked = LAI.mask(mask)

# temporal aggregation
LAI_masked_dekad = LAI_masked.aggregate_temporal_period(period =  "dekad" , reducer="mean")

# extract and download time series
timeseries_dekad = LAI_masked_dekad.aggregate_spatial(geometries="https://github.com/adrienmichez/PUBLIC/raw/main/LC_WALOUS_10m_modal.geojson", reducer="mean")
res = timeseries_dekad.save_result(format="CSV")
job = res.create_job(title = "LAIterrascope_dekad_byLC_no_Sampling")
job.start_job()

stefaan.lippens · 29 August 2022 13:36

We follow the official aggregate_spatial spec, so that depends on Vector cube usage in `aggregate_spatial`, `filter_spatial`, `mask_polygon`, ... · Issue #323 · Open-EO/openeo-processes · GitHub and related vector-cube (loading) api/processes

I’ve finetuned the schema in Fine-tune `filename` schema in `read_vector` process · Open-EO/openeo-python-driver@a42337d · GitHub

m.mohr · 29 August 2022 14:15

Yes, but you don’t implement the official specification, but you actually implement more than the official spec. So you’d need to add two additional schemas. One for strings and one for vector-cubes. Once we have an understanding of vector-cubes that will also land in the official spec, but we are not there yet (while VITO somewhat is).

The read_vector change looks good and is appreciated.

stefaan.lippens · 29 August 2022 14:43

Moving the technical schema discussion to Schema overrides for geometry in read_vector/aggregate_spatial · Issue #136 · Open-EO/openeo-python-driver · GitHub

stefaan.lippens · 29 August 2022 17:22

I couldn’t immediately find the issue yet, but I checked your GeoJSON (PUBLIC/LC_WALOUS_10m_modal.geojson at main · adrienmichez/PUBLIC · GitHub) and it seems quite a complex geometry, maybe with some overlap edge cases. Could you also try with a more simple geometry to be sure that the basics work for you?

adrien.michez · 30 August 2022 09:16

Yes, it works with simpler geometry and not with the original file (see snippet below)… But I don’t see any issue with the geometry of my “complex” geojson. I’ve tested the geometry as valid with st_valid() from sf in R).

With the comlex geometry, I got the following error:

Incoming data type is not compatible for parameter “geometries”

If you want to track the error on your server, the id is: j-db13efdd3b7b41d587bca555c1079aeb

Could it be because there are multypolygon features? What should I do to perform aggregation with such file?

import openeo

# load data from openo
con  = openeo.connect("https://openeo.vito.be").authenticate_oidc(provider_id="egi")

# Load data cube from TERRASCOPE_S2_NDVI_V2 collection.
datacube = con.load_collection("TERRASCOPE_S2_LAI_V2",
                               spatial_extent={"west": 5.60, "south": 50.42, "east": 6.3, "north": 50.7},
                               temporal_extent=["2015-01-01", "2021-07-31"],
                               bands=["LAI_10M","SCENECLASSIFICATION_20M"])

# get classification band
SCL = datacube.band("SCENECLASSIFICATION_20M")
LAI = datacube.band("LAI_10M")

# we want to mask all other values, so NOT 1
mask = ~ ((SCL == 4) | (SCL == 5))

# masking
LAI_masked = LAI.mask(mask)

# temporal aggregation
LAI_masked_dekad = LAI_masked.aggregate_temporal_period(period =  "dekad" , reducer="mean")

# extract and download time series
timeseries_dekad_geom_VO = LAI_masked_dekad.aggregate_spatial(geometries="https://github.com/adrienmichez/PUBLIC/raw/main/LC_WALOUS_10m_modal.geojson", reducer="mean")
res = timeseries_dekad_geom_VO.save_result(format="CSV")
job = res.create_job(title = "LAI_spat_aggreg_geom_VO")
job.start_job()

# with simpler geometry
timeseries_dekad_geom_simp = LAI_masked_dekad.aggregate_spatial(geometries="https://github.com/adrienmichez/PUBLIC/raw/main/grid_belleheid_TMP.geojson", reducer="mean")
res2 = timeseries_dekad_geom_simp.save_result(format="CSV")
job2 = res2.create_job(title = "LAI_spat_aggreg_geom_simp")
job2.start_job()

stefaan.lippens · 31 August 2022 07:35

I’m a bit confused about this error message. It doesn’t look like one that is coming from the VITO backend. Can you share some more error info (exception class, full stacktrace, screenshot, …)?

adrien.michez · 31 August 2022 07:59

If you run the first part of my snipped (till first job creation), you can reproduce it. I think it would be easier?

stefaan.lippens · 31 August 2022 08:04

If I try your job with the original geometry I don’t get the error you quote, but just

...
0:13:02 Job 'j-b038b4d7e5a544f2b9a7e66be0e320f8': running (progress N/A)
0:14:03 Job 'j-b038b4d7e5a544f2b9a7e66be0e320f8': error (progress N/A)

Your batch job 'j-b038b4d7e5a544f2b9a7e66be0e320f8' failed.
Logs can be inspected in an openEO (web) editor or with `connection.job('j-b038b4d7e5a544f2b9a7e66be0e320f8').logs()`.

Printing logs:
[]

If I look at the back-end side logs however, I see

Application application_1657503144471_109385 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1657503144471_109385_000001 exited with exitCode: -104
Failing this attempt.Diagnostics: [2022-08-31 09:46:05.337]Container [pid=19394,containerID=container_e5049_1657503144471_109385_01_000002] is running 4919296B beyond the ‘PHYSICAL’ memory limit. Current usage: 10.0 GB of 10 GB physical memory used; 18.5 GB of 21 GB virtual memory used. Killing container.

This means that the job was killed because it was using too much memory,
which is most likely because of the complex geojson.

I see you have 8 Features in your FeatureCollection.
Could you try splitting your file in 8 separate geojson files, so you just run one job per Feature. I guess you will get a bit further and be able to pinpoint the part of your geometry that is too heavy

stefaan.lippens · 31 August 2022 08:09

yes, I tried this already, but as noted above, I can not reproduce that.

Also, if you get an error there, how can you still create/submit a job?
Could it be that it is a non-breaking warning?

Do you know what version of the openeo python client are you using? (e.g. do openeo.client_version())

adrien.michez · 31 August 2022 08:12

Thanks… I could also try to use only single parts features but I would prefer a less try and guess approach. Is there a way to bypass the memory issue by using another less demanding method?

stefaan.lippens · 31 August 2022 12:37

For clean geometries without overlaps we have an efficient implementation of aggregate spatial. But if there is any overlap between features or geometries, we have to take a heavier, slower, more demanding code path.
I’m afraid your geometry (consisting of “pixel clusters” that touch practically everywhere) triggers that check.

So that’s why I proposed to process each feature separately, so that you don’t trigger the overlap check.

Another solution is to morphologically erode each of your feature’s multipolygons, also to avoid the overlap trigger.

stefaan.lippens · 31 August 2022 12:53

Another solution might be a user/job option to disable the overlap check.
The user would then opt-in to: I’m fine that the back-end (arbitrarily) associates each pixel with at most one feature, even if there are multiple features covering that pixel.

would that make sense @jeroen.dries ?

adrien.michez · 31 August 2022 13:04

That would be great! Because I tried with a negative buffer and this takes us to the size limitation of my github repository.

Isn’t there another solution than the github one to store file with direct link? Because my github repository (I guess others too) is limited to 25 Mb files

stefaan.lippens · 31 August 2022 13:47

For anyone following along, this is a part of the geometry (FeatureCollection with 8 complex MultiPolygons) we are talking about: