Obtaining multiple variables

Hey @jeroen.dries or others.

Would it be possible to obtain simultaneously two files (distinct files) when we create and send a job?

Basically includes two variables in this part:

        output_save = output.save_result(format='GTiff') #GTiff #netCDF
        my_job  = output_save.create_job(title= output_name, job_options=job_options)
        results = my_job.start_and_wait().get_results()
        results.download_files(output_name)

Currenlty there is only one output variable, which is a bit mask but we would like to download an RGB image (s2_cube_median) for the same area. Thus, we send only one job but we will get two ouputs (2 separate files) at once.

This is a full code where you can see the s2_cube_median and output variables defined.

Sometimes only one file will be needed, so it will be great to have an option to choose if two or one file will be obtained.

Hi Andrea,
so if I understand correctly, the key thing is to get two separate files right? Because we now indeed return always all variables/bands in a single file.
Are s2_cube_median and output sufficiently compatible to combine in a single datacube (with merge_cubes) or would these have to remain separate cubes?

If it’s separate cubes, than we would need to allow multiple save_result processes in a single graph.
The other option would perhaps be an output format parameter that simply creates separate files per variable/band.

Any idea what would be better for your use case?

thanks,
Jeroen

Hi Jeroen,

Thanks for the quick response. It should remain two separate files.

I just explain a bit more about the purpose of files:

  • the 1st file (output) should show a water extent which is a binary raster. This file should be obtained always from processing as a main result.
  • the 2nd file (s2_cube_median) should show the RGB bands (B2, B3, B4) so we can depict this file as a background map of a water extent for the validation. This should be optional to obtain as a user might be not interested in this file.

I am not really sure which options (either mutlple save_result or output format parameter) is better for this purpose. Will they have the same processing time?
The main priority should be that the second file can be optional for a user and the processing time should be still decent.

@sulova.andrea it would be possible to have a single geoTIFF with the water extent binary image + RGB bands and the option would just trigger two differrent process graphs no?
So if the user wants just the binary mask it runs the simple process graph, if he needs both the other one.

We would rather keep only one process graphs which can provide the two files prefarably

ok, so this is clearly a new feature, we’ve logged it, it’s also requested by one other user already, so good candidate for implementation via the new user voting user mechanism.

1 Like

Hi @jeroen.dries and @michele.claus

I hope you’re doing well. I was wondering if you’ve had an opportunity to develop the process graphs that can provide the two files we discussed earlier. If you have, could you please share an example of how we can use a process graph to download both files simultaneously? I’m looking forward to exploring this functionality further. Thank you!

Hi Andrea,

the feature is still subject to change, but currently it can be done as shown below.
Some caveats:


cube = connection.load_collection(
        collection_id,
        temporal_extent=temporal_extent,
        spatial_extent=spatial_extent_ethiopia
    )
    cube = cube.save_result(format="GTiff",options=dict(filename_prefix="result1"))

    # resampling 3x3 pixels to 1km resolution
    cube_1km = cube.resample_spatial(resolution=3.0 * 0.00297619047619, method="near")
    cube_1km.execute_batch(format="netCDF",filename_prefix="result2")

There are two files which I want to obtain:

1. binary mask:
output= output.rename_labels(“bands”, [“surface_water”])

2. RGB image for the same location
s2_cube_median = s2_cube.filter_temporal([start_date, end_date]).median_time()

If I followed your example but it downloaded only one file, s2_cube_median .

This is what I apply::

cube = output.save_result(format=“GTiff”,options=dict(filename_prefix=“Ouput”))
s2_cube_median.execute_batch(format=“netCDF”,filename_prefix=“s2_cube_median”)
results.download_files(output_name)

Currently, I am using two separate batch jobs to achieve my objectives, but I would prefer to consolidate them into a single job.