Optimization without downloading the CSV

nicola.ciapponi · 28 March 2024 11:19

This is an example of a simplified process of what i have to do in a more complex workflow.
I have to stretch between 0 and 1 a subset of S2 bands for many dates without using UDF (problems with chunks)
To do that i calculate the max and min of each band in each date

extr = s2_datacube_masked.aggregate_spatial(geometries=json_feature,reducer=lambda x: array_create([min(x),max(x)]))

Since i cannot use directly this datacube i have to download a csv that contains the data from extr , those comes in this format :

date	feature_index	min(band_0)	max(band_0)	min(band_1)	max(band_1)
2022-10-05T00:00:00.000Z	0	0.05970000103116	1.08440005779266	0.035000000149012	1.07749998569489
2022-10-08T00:00:00.000Z	0	0.061999998986721	0.857699990272522	0.03770000115037	0.84909999370575

(it would also be great if the header of the columns contains the name of bands instead of “band_0” )

Than i have to use 2 “for” cycle to apply my stretch to each date (first for) and to each band (second for)

This is the very simple process that i have to perform on each band:

def rescale2(x):

            return linear_scale_range(x,minx,maxx,output_min,output_max)

str_bands=curent_time_and_bands.apply(process=rescale2)

Lastly i have to merge back the datacube obtained (each date has its 2 stretched bands merged ) and later each datacube are merged with the previous by date

Is there a simpler way to do that ?
Maybe without the needs to download the CSV and to separate and merge back the datacube ?

j-2403286ff8394e92919cac14dd4ec6d5

stefaan.lippens · 2 April 2024 11:30

Hi,

at the moment, I don’t see another way to achieve that unfortunately.

In the VITO/Terrascope backend we are however working on an implementation of a vector_to_raster process, which should allow the following conceptual approach:

# Calculate min and max for each time-band -> vector cube
min_max = cube.aggregate_spatial(...)
# "Rasterize" vector cube back to raster cube
min_max_raster = min_max.vector_to_raster()
# Append min/max as new bands
cube = cube.merge_cubes(min_max_raster)
# Rescale values
rescaled = cube.reduce_dimension(
    dimension="bands",
    reducer=lambda data: linear_scale_range(data[0], data[1], data[2], 0, 1)
)