Batch download are crashing after 20 min

Hi,
is there a time limit for downloading job results?

My batch jobs are crashing after 20 mins.
Small files with a downloading time of under 20min are working normally/fine

Code:

Code:


from shapely.geometry import Polygon, mapping
import openeo
import xarray as xr


p = shapely.wkt.loads('POLYGON ((-71.01641 -33.146862, -71.007939 -33.125593, -70.994346 -33.108174, -70.976365 -33.082201, -70.948386 -33.056301, -70.934659 -33.039174, -70.93001 -33.034818, -70.901619 -33.016965, -70.887675 -33.003887, -70.86343 -32.99867, -70.843832 -32.99781, -70.809261 -33.000538, -70.789398 -33.003883, -70.769026 -33.015528, -70.748654 -33.027168, -70.736835 -33.05982, -70.715225 -33.091888, -70.704415 -33.107912, -70.699009 -33.115921, -70.698259 -33.128243, -70.697258 -33.144671, -70.695757 -33.169315, -70.693753 -33.202174, -70.693252 -33.210389, -70.692247 -33.226833, -70.690733 -33.251556, -70.689723 -33.268036, -70.688207 -33.292753, -70.686944 -33.313348, -70.685932 -33.329821, -70.699674 -33.346912, -70.718084 -33.368333, -70.731834 -33.385427, -70.740919 -33.398195, -70.754681 -33.415269, -70.758852 -33.427817, -70.767446 -33.448795, -70.776294 -33.465658, -70.775296 -33.48211, -70.773799 -33.506785, -70.771035 -33.552057, -70.769779 -33.57263, -70.768525 -33.593195, -70.767521 -33.609643, -70.828446 -33.71999, -70.768854 -33.955875, -70.925656 -34.391663, -70.953648 -34.875816, -70.968531 -34.912947, -70.974486 -34.949693, -70.980439 -34.986396, -70.995316 -35.023406, -71.010788 -35.053126, -71.025652 -35.090067, -71.042298 -35.105256, -71.056557 -35.149377, -71.071501 -35.186211, -71.086196 -35.230194, -71.100203 -35.281364, -71.115317 -35.318088, -71.394952 -35.420678, -71.408111 -35.421142, -71.43443 -35.422064, -71.470623 -35.423327, -71.487073 -35.423899, -71.506814 -35.424582, -71.513395 -35.424808, -71.539721 -35.425713, -71.552883 -35.426161, -71.566045 -35.42661, -71.586076 -35.421919, -71.850732 -35.401116, -72.101056 -35.347887, -72.108099 -35.345077, -72.115286 -35.333707, -72.125933 -35.325203, -72.133142 -35.313818, -72.13341 -35.299534, -72.137068 -35.29098, -72.140768 -35.27957, -72.096452 -35.171078, -72.073899 -34.927904, -72.077103 -34.897382, -72.078707 -34.882096, -71.9833 -34.503333, -71.909842 -34.176434, -71.821003 -33.914443, -71.648484 -33.765645, -71.649239 -33.757401, -71.646372 -33.736612, -71.459872 -33.502615, -71.458796 -33.497943, -71.459449 -33.487422, -71.453973 -33.481195, -71.450275 -33.471652, -71.443658 -33.461922, -71.437049 -33.452194, -71.433058 -33.447324, -71.430551 -33.441333, -71.429199 -33.440101, -71.425406 -33.4317, -71.420195 -33.423243, -71.418844 -33.422011, -71.413648 -33.413562, -71.405589 -33.404983, -71.392289 -33.389139, -71.38016 -33.376846, -71.372125 -33.367061, -71.365514 -33.359764, -71.36148 -33.356075, -71.35601 -33.352324, -71.350423 -33.350932, -71.346505 -33.344876, -71.342523 -33.34, -71.335907 -33.331477, -71.331988 -33.325419, -71.329672 -33.315873, -71.323167 -33.304991, -71.319175 -33.30012, -71.313837 -33.294023, -71.311262 -33.289212, -71.133578 -33.268193, -71.040515 -33.155919, -71.01641 -33.146862))')
mapped_polygon = mapping(p)

firstd= "2024-01-31"
lastd= "2024-02-05"
  	
connection = openeo.connect(url="openeo.dataspace.copernicus.eu", default_timeout=9999999)

refresh_token= #add your token

connection.authenticate_oidc_refresh_token(refresh_token=refresh_token)

s2_cube = connection.load_collection(
            "SENTINEL2_L2A",
            temporal_extent=(firstd, lastd),
            spatial_extent=mapped_polygon,
            bands=["B02", "B03", "B04", "B08", "B8A", "B11", "B12"],
            max_cloud_cover=100
        )
job_= s2_cube.create_job(title= 'test', out_format="netCDF")	
job_.start_and_wait()

results = job_.get_results()
for asset in results.get_assets():
  ds = xr.open_dataset(io.BytesIO(asset.load_bytes()))

Hi,
would it be possible to share some signed url’s of these results?
You can get them by simply copying links to outputs in the web editor, or using the share functionality in web editor to share full metadata.
It’s not a known problem, but interesting to investigate. Also wondering why it in fact takes 20 minutes to download.

Here the url: “https://openeo.dataspace.copernicus.eu/openeo/1.2/jobs/j-24030166eb4d49cdb0c03559759f5ec3/results/Y2I5ZTFhM2UtODFiMy00MmU0LTg3NjYtZDA1OGE3OGI0NWY1/97eb02b6cfdf87a0456dba49c627cd24?expires=1709997289

Thanks, I’ll be using this as a test.
Note that in the meantime, I would recommend to use a more advanced downloader tool for these larger files. For instance ‘wget’ for sure allows resuming of downloads.

The Python client currently doesn’t support that, as it’s only fairly recently that users started to generate files of over 2GB.
Unfortunately network interruptions can never be ruled out, so resume of download can be rather important.

I tried using wget to download the job result. Same result: with small files, there is no problem. However, when dealing with big files, this error arises:

import subprocess
import os


#job_id = "j-2402262a0e5a43819aa802abc4ad428e"
job_id = "j-24022686817a48b58bc19c3c035f62b9"
job = connection.job(job_id)


# Check the job details for any files or assets
results = job.get_results()

# URL of the file to download
file_url = results.get_metadata()['assets']['openEO.nc']['href']

# Path where the file will be downloaded
download_path = "/tmp/"

# Command to execute wget with the -c option
command = ["wget", "-c", file_url, "-P", download_path]
#command = ["wget", "-c", file_url]
# Execute the command
subprocess.run(command)
file_path= download_path + file_url.split('/')[-1]

from netCDF4 import Dataset

nc_file = Dataset(file_path, 'r')
print(nc_file)
---------------------------------------------------------------------------

OSError                                   Traceback (most recent call last)

<ipython-input-68-0cf844a75395> in <cell line: 2>()
      1 from netCDF4 import Dataset
----> 2 nc_file = Dataset(file_path, 'r')  
      3 print(nc_file)

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -101] NetCDF: HDF error: '/tmp/openEO.nc?expires=1710265154'

Indeed, we’re following up in the ticket below. Seems to be something weird in the network.