Batch Job seems to run forever

rickus7.swanepoel · 15 October 2024 11:52

I am trying to donwload the VV and VH bands of S1 using batch jobs but it seems to run forever:

See below my data cube and batch job code:

# Define the start and end dates
start_date = datetime.strptime("2023-10-03", "%Y-%m-%d")
end_date = datetime.strptime("2023-10-03", "%Y-%m-%d")

# Loop over each date and download data
for single_date in daterange(start_date, end_date):  # Use daterange here
    current_date_str = single_date.strftime("%Y-%m-%d")
    print("Attempting to check and download data for ", current_date_str)
    
    # Load Sentinel-1 GRD data and filter by ascending orbit and VV&VH polarization for the current date
    s1_grd = connection.load_collection(
        "SENTINEL1_GRD",
        spatial_extent={"west": bbox[0], "south": bbox[1], "east": bbox[2], "north": bbox[3]},
        temporal_extent=[current_date_str, current_date_str],  # Use the single date for each iteration
        bands=["VV", "VH"],
        properties={"sat:orbit_state": lambda od: od == "ASCENDING"},
    )
    
    # Check if there is any data returned for the date
    if not s1_grd.metadata:  # If no metadata, skip the date
        print(f"No data available for {current_date_str}, skipping...")
        continue
    
    # Apply the backscatter coefficient as a process
    s1_grd = s1_grd.sar_backscatter(coefficient="sigma0-ellipsoid", elevation_model="COPERNICUS_30")

    
    job = s1_grd.execute_batch(
    output_file= f"s1_wc_{current_date_str}.tif",
    out_format="GTiff"  # Or any other format you need
    )
    results = job.get_results()

    # Define the output filename with the current date
    # output_filename = os.path.join(output_directory, f"s1_wc_{current_date_str}.tif")

    results.download_files(output_directory)

jeroen.verstraelen · 15 October 2024 13:32

Hi,

Is it possible to provide the batch job id for the job that is hanging (for example: j-24101528a0184c738a582938e78a22aa)?

rickus7.swanepoel · 15 October 2024 13:34

Hi Jeroen

Yes, I stopped the initial run but I have now started a new batch that does the same see ID below:
j-2410152e2fb74d85a0ba7921723ecaae

jeroen.verstraelen · 15 October 2024 14:44

I investigated the issue and it appears that it is taking a long time during the sar_backscatter() process. There are two options that could solve this issue.

Set the max_processing_area_pixels for the sar_backscatter process (by default it is 3072:

s1_grd.sar_backscatter(coefficient="sigma0-ellipsoid", elevation_model="COPERNICUS_30", options={"max_processing_area_pixels": 2500})

Increase the python executor memory:

execute_batch(
    output_file= f"s1_wc_{current_date_str}.tif",
    out_format="GTiff",
    job_options={'python-memory': '3g', 'executor-memoryOverhead': '100m'}
)

You can apply the two solutions to your code together at once and see if that solves the issue.

rickus7.swanepoel · 15 October 2024 15:00

Thank you for the suggestions Jeroen, I will give the solutions you suggested a try and see if they help.