Hi all
when doing an mndwi calulation:
green: DataCube = filtered_dc.band("green")
swir: DataCube = filtered_dc.band("swir")
mndwi: DataCube = (green - swir) / (green + swir)
I am getting in error (job: 792f8392-3396-4589-83b4-b5ecdf1454f4
, using vito backend).
Looking at the logs I find something suspicious:
Writing NetCDF from rdd with : 0 elements and 279 partitions.
I can get the result from the previous step just fine. Any ideas?
Jaap
So you can download green
or swir
separately as valid NetCDF files?
What temporal and spatial extent are you using?
Thanks for answering @stefaan.lippens !
Checked the datacube before the DataCube.band
method. Now that I check the intermediate step, I see that the this band selection is giving me the error. Before this step, I run a udf that does an apply
step (does not change dimensions).
Perhaps some band metadata is lost during a udf?
When downloading tiffs, I had to rename the bands from: 1 2 3 4 5 6 7
to their respective band names.
udf
def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
"""Linear regression of a time-series DataCube.
This function assumes a DataCube with Dimension 't' as an input. This dimension is removed from the result.
Args:
cube (XarrayDataCube): datacube to apply the udf to.
context (dict): key-value arguments.
"""
# Load kwargs from context
cutoff_percentile: Optional[float] = context.get("cutoff_percentile")
if not cutoff_percentile:
cutoff_percentile = 35
cutoff_percentile = cutoff_percentile / 100.
scale: Optional[float] = context.get("scale")
if not scale:
scale = 500
score_percentile: Optional[float] = context.get("score_percentile")
if not score_percentile:
score_percentile = 75.
score_percentile = score_percentile / 100.
quality_band: Optional[str] = context.get("quality_band")
if not quality_band:
quality_band = "cloudp"
array: DataArray = cube.get_array()
# Need to get band index, as bands are not dims here
index = np.where(array["bands"].values == quality_band)[0][0]
score: DataArray = array.isel(bands=index).quantile([score_percentile], dim=["x", "y"])
filtered: DataArray = array.sel(t=score.where(score / np.max(score) < cutoff_percentile, drop=True).t)
print(filtered.shape)
return XarrayDataCube(
array=filtered
)
update: using .band["1"]
gives me the expected error: Invalid band name/index '1'. Valid names: ['green', 'swir', 'cloudmask', 'cloudp']
The error log indicates that the datacube is entirely empty at the end of the processing, so we’re looking for a step that somehow drops the data.
Chunk_polygon uses a polygon that’s not in lon/lat coordinates, and also doesn’t specify a CRS. That could perhaps be the problem? Can you see what happens if that polygon uses lat lon?
(The ‘official’ geojson spec only allows for that, CRS’s can simply not be specified.)
Hi Jeroen, just tested this and indeed, this version seems to be working! I projected to UTM as I need to buffer the input polygons. So for this solution I need to reproject back into latlon, or are there alternatives for the UDF?
There’s a process called ‘vector_buffer’ that’s meant for exactly this. That way you can use your original geometry, and let openEO take care of the necessary reprojection and buffering.
It’s still in draft, so not yet that well documented, but this should help you get started:
https://processes.openeo.org/draft/#vector_buffer
Great to hear that openeo also has vector support!
Any examples on the VectorCube format? Can this be combined with the DataCube?
I see potential to use a large collection of geometries (reservoir geometries in the water-watch usecase), as VectorCube and sample a DataCube using chunk_polygon
with the values from a VectorCube.