Hi all,
I am having a little trouble understanding the UDF procedures on datacubes. My intention is to apply a ML algorithm on each pixel’s spectrum (21 bands for S3 OLCI) with hyperparameters supplied by an external .py file. I have written the algorithms by matrix calculations, thus no specific ML library is needed, it all runs on numpy. In numpy, I feed the function a 3D array (bands,lat,lon) and the hyperparameters of the ML algorithm from the model file, then I run a double nested for loop over the lat-lon dimensions, and apply per-pixel the matrix multiplications to achieve a single value for the corresponding pixel, which is a biophysical variable (e.g. LAI).
My Main questions are:
-
How does openEO apply to UDF on the datacube? I was looking in the github source code, and examples, and it seems like I don’t need to run the double for loop to create the map, but the udf is subsequently applied on all lat-lon pixels?
-
How would I be able to get the spectrum (e.g. a list on numbers for each pixel) which I can use in my calculations?
-
How do I supply the external model.py file and what path do I provide for sys?
I provide below a MRE of the code. Thanks in advance
bandlist = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16', 'B17', 'B18', 'B19', 'B20', 'B21']
bbox = [
-0.7316519637662964,
39.043183874892605,
0.0723422016996551,
39.80699041715417
]
s3_cube = connection.load_collection(
"SENTINEL3_OLCI_L1B",
spatial_extent={"west": bbox[0], "south": bbox[1], "east": bbox[2], "north": bbox[3]},
temporal_extent=["2022-03-01", "2022-03-31"],
bands = bandlist
).reduce_temporal("mean")
udf = openeo.UDF("""
import numpy as np
sys.path.append('dir of model.py')
import model
def GPR_mapping(S3_scene, # 3D numpy array (bands,latitude,longitude)
hyperparameter_1, # variable 1 from model.py file (list size of bands)
hyperparameter_2, # variable 2 from model.py file (list size of bands)
):
bands, ydim,xdim = S3_scene.shape
variable_map = np.empty((ydim,xdim))
for v in range(0,xdim): # Do I need a double nested for loop to create a map with UDFs?
for f in range(0,ydim):
pixel_spectra = S3_scene[:,f,v] # How do I get a list of band values from datacube?
intermediate_result = ((pixel_spectra - hyperparameter_1) / hyperparameter_2) # Do soomething mathematical with hyperparameter
final_result = pow(intermediate_result,hyperparameter_1) # Do soomething mathematical with hyperparameter
pixel_value = final_result/2
variable_map[f,v] = pixel_value # Append variable map np.array to create map
return variable_map
""")
# Pass UDF
LAI_map= s3_cube.apply(process=udf)
LAI_map.download("udf.nc")