HTTP 413 error on increasing data size in demo

Hello

I am currently wokring with your demo from

to get more familiar with your framework.

Your code from the demo:

  1. cell
crops = {"maize":1200,"potatos":5100,"sugarbeet":8100,"barley":1500,"soy":4100}
crop_samples = {name:gpd.read_file("UC3_resources/"+ name + "_2019.geojson", driver='GeoJSON') for name,code in crops.items()}

points_per_type = point_sample_fields(crop_samples, 30)

points_per_type = point_sample_fields(crop_samples, 30)

  1. cell
for i in points_per_type.keys():
    sampled_features = features.filter_spatial(eval(points_per_type[i]))
    job = sampled_features.execute_batch(
        title="Point feature extraction",
        description="Feature extraction for p10,p50,p90,sd and tsteps",
        out_format="netCDF",
        sample_by_feature=True,
        job_options=job_options)
    results = job.get_results()
    results.download_files("./data/rf_300_"+i)

I now exchanged the data (I adapted the files in the folder UC3_resources) and increased its size so instead of 10 areas per surface I used much more. Which resulted in an HTTP 413 error.
Even if I used 20 areas instead of 10 per surface I received that error. Using 10 it worked again.

Is it possible to increase the data per request?

Post moved to OpenEO Platform category and bumped

Hi,
apologies for the late answer, apparently posts are ending up in a category where I don’t get notified.

Do you have more information on the actual error? It is certainly possible to request more data.
Do you have a job id, or logs from a job that failed?

thanks,
Jeroen

Hello

I tested it again (I increased the number of polygons in the files in folder UC3_resources) and got following exception:


OpenEoApiError: [413] Internal: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit.

The job identifier:

0:13:38 Job '57fb57ce-f3bf-4c31-b302-f8d5d55f7ee1': finished (progress N/A)

But I can not see this job in the log under https://editor.openeo.cloud/

Hi,
the good news is that the job finished and indeed generated a number of samples.
I was however not yet able to find the error, so also don’t know where it comes from yet.

Do you happen to have a stack trace that was printed along with that error?
Of can you do connection.job(‘57fb57ce-f3bf-4c31-b302-f8d5d55f7ee1’), does that work?

best regards,
Jeroen

The stacktrace:

---------------------------------------------------------------------------
OpenEoApiError                            Traceback (most recent call last)
/tmp/ipykernel_46488/611437660.py in <module>
      1 for i in points_per_type.keys():
      2     sampled_features = features.filter_spatial(eval(points_per_type[i]))
----> 3     job = sampled_features.execute_batch(
      4         title="Point feature extraction",
      5         description="Feature extraction for p10,p50,p90,sd and tsteps",

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/datacube.py in execute_batch(self, outputfile, out_format, print, max_poll_interval, connection_retry_interval, job_options, **format_options)
   1534 
   1535         """
-> 1536         job = self.send_job(out_format, job_options=job_options, **format_options)
   1537         return job.run_synchronous(
   1538             outputfile=outputfile,

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/datacube.py in send_job(self, out_format, title, description, plan, budget, job_options, **format_options)
   1558             # add `save_result` node
   1559             img = img.save_result(format=out_format, options=format_options)
-> 1560         return self._connection.create_job(
   1561             process_graph=img.flat_graph(),
   1562             title=title, description=description, plan=plan, budget=budget, additional=job_options

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in create_job(self, process_graph, title, description, plan, budget, additional)
    998             req["job_options"] = additional
    999 
-> 1000         response = self.post("/jobs", json=req, expected_status=201)
   1001 
   1002         if "openeo-identifier" in response.headers:

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in post(self, path, json, **kwargs)
    158         :return: response: Response
    159         """
--> 160         return self.request("post", path=path, json=json, allow_redirects=False, **kwargs)
    161 
    162     def delete(self, path, **kwargs) -> Response:

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    106         expected_status = ensure_list(expected_status) if expected_status else []
    107         if check_error and status >= 400 and status not in expected_status:
--> 108             self._raise_api_error(resp)
    109         if expected_status and status not in expected_status:
    110             raise OpenEoRestError("Got status code {s!r} for `{m} {p}` (expected {e!r})".format(

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in _raise_api_error(self, response)
    137             else:
    138                 exception = OpenEoApiError(http_status_code=status_code, message=text)
--> 139         raise exception
    140 
    141     def get(self, path, stream=False, auth: AuthBase = None, **kwargs) -> Response:

OpenEoApiError: [413] Internal: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit.

I can connect to the finished job:

but I can not see the job in https://editor.openeo.cloud/

1 Like

So after connecting to the job, can you try to download results again:

results = job.get_results()
results.download_files("./data/rf_300_"+i)

Which endpoint url are you using to create your connection, as we’ll want to use that as well for the web editor to see jobs, and also make sure that we’re correctly logged in.

I tried to download the data from the job:

Screenshot from 2022-03-23 18-56-52

My stracktrace:

OpenEoApiError                            Traceback (most recent call last)
/tmp/ipykernel_70940/3423013294.py in <module>
      1 job_5 = connection.job('57fb57ce-f3bf-4c31-b302-f8d5d55f7ee1')
      2 results = job_5.get_results()
----> 3 results.download_files("./data/rf_300_test_download")

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/job.py in download_files(self, target)
    374             raise OpenEoClientException("The target argument must be a folder. Got {t!r}".format(t=str(target)))
    375         ensure_dir(target)
--> 376         return [a.download(target) for a in self.get_assets()]
    377 
    378 

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/job.py in <listcomp>(.0)
    374             raise OpenEoClientException("The target argument must be a folder. Got {t!r}".format(t=str(target)))
    375         ensure_dir(target)
--> 376         return [a.download(target) for a in self.get_assets()]
    377 
    378 

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/job.py in download(self, target, chunk_size)
    250         logger.info("Downloading Job result asset {n!r} from {h!s} to {t!s}".format(n=self.name, h=self.href, t=target))
    251         with target.open("wb") as f:
--> 252             response = self._get_response(stream=True)
    253             for block in response.iter_content(chunk_size=chunk_size):
    254                 f.write(block)

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/job.py in _get_response(self, stream)
    256 
    257     def _get_response(self, stream=True) -> requests.Response:
--> 258         return self.job.connection.get(self.href, stream=stream)
    259 
    260     def load_json(self) -> dict:

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in get(self, path, stream, auth, **kwargs)
    148         :return: response: Response
    149         """
--> 150         return self.request("get", path=path, stream=stream, auth=auth, **kwargs)
    151 
    152     def post(self, path, json: dict = None, **kwargs) -> Response:

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    106         expected_status = ensure_list(expected_status) if expected_status else []
    107         if check_error and status >= 400 and status not in expected_status:
--> 108             self._raise_api_error(resp)
    109         if expected_status and status not in expected_status:
    110             raise OpenEoRestError("Got status code {s!r} for `{m} {p}` (expected {e!r})".format(

~/SRR2_notebooks/srr2_venv/lib/python3.9/site-packages/openeo/rest/connection.py in _raise_api_error(self, response)
    137             else:
    138                 exception = OpenEoApiError(http_status_code=status_code, message=text)
--> 139         raise exception
    140 
    141     def get(self, path, stream=False, auth: AuthBase = None, **kwargs) -> Response:

OpenEoApiError: [502] unknown: Bad Gateway

502 ‘Bad gateway’ simply means that the web service is not online, whereas our main endpoints are online:
https://openeocloud.vito.be/openeo/1.0.0
https://openeo.vito.be/openeo/1.0.0

It could be that you faced some intermittent downtime, for instance when the service was being restarted. Can you just try again?

Hello,

on the other endpoints I receive the same HTTP 413 error with the same stacktrace.

Hi,
we were able to dig deeper into it: the problem is probably that increasing your number of samples also increases the size of the http requests to the point where it exceeds a certain threshold, and you reach this error.
The solution is either to evaluate if your samples geojson perhaps contains too detailed coordinates or properties that are not needed by the backend.
Or, more long term, you can replace the inline geometries with a public url to a geojson, which can be loaded by the backend. Is it an option for you to upload your samples, for instance to a service like google drive that allows creating public url’s?

How can I replace the data with urls?

Your demo contains following code:

for i in points_per_type.keys():
    sampled_features = features.filter_spatial(eval(points_per_type[i]))
    job = sampled_features.execute_batch(
        title="Point feature extraction",
        description="Feature extraction for p10,p50,p90,sd and tsteps",
        out_format="netCDF",
        sample_by_feature=True,
        job_options=job_options)
    results = job.get_results()
    results.download_files("./data/rf_300_"+i)

where points_per_type contains the geodata … how can I now replace the actual data with references with links to files?

This is the documentation of your function filter_spatial:

    def filter_spatial(
            self,
            geometries
    ) -> 'DataCube':
        """
        Limits the data cube over the spatial dimensions to the specified geometries.

            - For polygons, the filter retains a pixel in the data cube if the point at the pixel center intersects with
              at least one of the polygons (as defined in the Simple Features standard by the OGC).
            - For points, the process considers the closest pixel center.
            - For lines (line strings), the process considers all the pixels whose centers are closest to at least one
              point on the line.

        More specifically, pixels outside of the bounding box of the given geometry will not be available after filtering.
        All pixels inside the bounding box that are not retained will be set to null (no data).

        :param geometries: One or more geometries used for filtering, specified as GeoJSON in EPSG:4326.
        :return: A data cube restricted to the specified geometries. The dimensions and dimension properties (name,
            type, labels, reference system and resolution) remain unchanged, except that the spatial dimensions have less
            (or the same) dimension labels.
        """

It demands objects for param geometries.

filter_spatial with URL

It’s indeed not well documented, but you can pass a URL string directly to filter_spatial (instead of a geometry object. For example:

cube = cube.filter_spatial("https://raw.githubusercontent.com/Open-EO/openeo-python-driver/master/tests/data/geojson/FeatureCollection01.json")

While this client-side part should work, unfortunately the VITO/Terrascope back-end does not support remote URLs for the filter_spatial process yet

>>> cube = con.load_collection("TERRASCOPE_S2_TOC_V2", temporal_extent=["2021-09-01", "2021-09-15"])
>>> cube = cube.filter_spatial("https://raw.githubusercontent.com/Open-EO/openeo-python-driver/master/tests/data/geojson/FeatureCollection01.json")
>>> cube.download("tmp.nc")
OpenEoApiError: [500] unknown: filter_spatial only supports dict but got DelayedVector('https://raw.githubusercontent.com/Open-EO/openeo-python-driver/master/tests/data/geojson/FeatureCollection01.json')

I created a ticket for that at support DelayedVector in filter_spatial · Issue #112 · Open-EO/openeo-python-driver · GitHub

Workaround

As a workaround for the current situation: filter_spatial is practically the same as the combination of filter_bbox and mask_polygon. So if you know (or extract) the bounding box of the geometry you can achieve the same result.
For example, using a GeoJSON FeatureCollection from an URL, and extracting the total bounding box using GeoPandas:

import geopandas as gpd

geometry_url = "https://raw.githubusercontent.com/Open-EO/openeo-python-driver/master/tests/data/geojson/FeatureCollection01.json"

bbox = dict(zip(
    ["west", "south", "east", "north"], 
    gpd.read_file(geometry_url).total_bounds
))
temporal_extent = ["2021-09-01", "2021-09-15"]
cube = con.load_collection("TERRASCOPE_S2_TOC_V2", temporal_extent=temporal_extent)
cube = cube.filter_bbox(bbox)
cube = cube.mask_polygon(geometry_url)