MNDWI processing job error: 0 elements

jaaplangemeijer · 29 March 2022 13:33

Hi all

when doing an mndwi calulation:

green: DataCube = filtered_dc.band("green")
swir: DataCube = filtered_dc.band("swir")
mndwi: DataCube = (green - swir) / (green + swir)

I am getting in error (job: 792f8392-3396-4589-83b4-b5ecdf1454f4, using vito backend).
Looking at the logs I find something suspicious:
Writing NetCDF from rdd with : 0 elements and 279 partitions.

I can get the result from the previous step just fine. Any ideas?

Jaap

stefaan.lippens · 29 March 2022 14:32

So you can download green or swir separately as valid NetCDF files?

What temporal and spatial extent are you using?

jaaplangemeijer · 29 March 2022 19:10

Thanks for answering @stefaan.lippens !
Checked the datacube before the DataCube.band method. Now that I check the intermediate step, I see that the this band selection is giving me the error. Before this step, I run a udf that does an apply step (does not change dimensions).

Perhaps some band metadata is lost during a udf?
When downloading tiffs, I had to rename the bands from: 1 2 3 4 5 6 7 to their respective band names.

jaaplangemeijer · 29 March 2022 19:12

udf

def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    """Linear regression of a time-series DataCube.
    
    This function assumes a DataCube with Dimension 't' as an input. This dimension is removed from the result.
    
    Args:
        cube (XarrayDataCube): datacube to apply the udf to.
        context (dict): key-value arguments.
    """

    # Load kwargs from context
    cutoff_percentile: Optional[float] = context.get("cutoff_percentile")
    if not cutoff_percentile:
        cutoff_percentile = 35
    cutoff_percentile = cutoff_percentile / 100.

    scale: Optional[float] = context.get("scale")
    if not scale:
        scale = 500

    score_percentile: Optional[float] = context.get("score_percentile")
    if not score_percentile:
        score_percentile = 75.
    score_percentile = score_percentile / 100.

    quality_band: Optional[str] = context.get("quality_band")
    if not quality_band:
        quality_band = "cloudp"

    array: DataArray = cube.get_array()
    # Need to get band index, as bands are not dims here
    index = np.where(array["bands"].values == quality_band)[0][0]
    score: DataArray = array.isel(bands=index).quantile([score_percentile], dim=["x", "y"])
    filtered: DataArray = array.sel(t=score.where(score / np.max(score) < cutoff_percentile, drop=True).t)
    print(filtered.shape)
    return XarrayDataCube(
        array=filtered
    )

jaaplangemeijer · 30 March 2022 07:36

update: using .band["1"] gives me the expected error: Invalid band name/index '1'. Valid names: ['green', 'swir', 'cloudmask', 'cloudp']

jeroen.dries · 30 March 2022 09:20

The error log indicates that the datacube is entirely empty at the end of the processing, so we’re looking for a step that somehow drops the data.
Chunk_polygon uses a polygon that’s not in lon/lat coordinates, and also doesn’t specify a CRS. That could perhaps be the problem? Can you see what happens if that polygon uses lat lon?
(The ‘official’ geojson spec only allows for that, CRS’s can simply not be specified.)

jaaplangemeijer · 30 March 2022 10:55

Hi Jeroen, just tested this and indeed, this version seems to be working! I projected to UTM as I need to buffer the input polygons. So for this solution I need to reproject back into latlon, or are there alternatives for the UDF?

jeroen.dries · 30 March 2022 11:01

There’s a process called ‘vector_buffer’ that’s meant for exactly this. That way you can use your original geometry, and let openEO take care of the necessary reprojection and buffering.
It’s still in draft, so not yet that well documented, but this should help you get started:
https://processes.openeo.org/draft/#vector_buffer

jaaplangemeijer · 30 March 2022 12:38

Great to hear that openeo also has vector support!
Any examples on the VectorCube format? Can this be combined with the DataCube?

I see potential to use a large collection of geometries (reservoir geometries in the water-watch usecase), as VectorCube and sample a DataCube using chunk_polygon with the values from a VectorCube.