Terrascope backend upgrade

jeroen.dries · 18 January 2023 08:20

Anther major update of the Terrascope backend:

merge_cubes and mask will now try to automatically resample input data cubes if required, reducing the need to always use resample_cube_spatial
When catalogs contain 2 versions of the same product, we now detect this better and try to use the most recent version. This improves performance, but also reproducability.
sar_backscatter based on sentinelhub has improved defaults for orthorectification

The majority of improvements in this release are smaller performance improvements and bugfixes.

jeroen.dries · 27 February 2023 12:04

The Terrascope production backend received another upgrade.
Main features listed below!

2023-02-27 (0.6.7a1)

GeoParquet support to allow loading large vector files
Improved specific log messages
Better support for multiple filter_spatial prcesses in same process graph (Support sample_by_features when filter_spatial is used more than once · Issue #147 · Open-EO/openeo-geopyspark-driver · GitHub)
Bugfix for sampling sentinelhub based collections (regression: decision to use SHub batch process is based on bbox area instead of geometry area · Issue #279 · Open-EO/openeo-geopyspark-driver · GitHub)
vector_buffer: Throw an error when a negative buffer size resuls in invalid geometries (return error when negative buffering results in an empty geometry · Issue #164 · Open-EO/openeo-python-driver · GitHub)
batch jobs now also report usage of credits (reporting of credit usage in batch jobs · Issue #272 · Open-EO/openeo-geopyspark-driver · GitHub)
non-utm collections should now have a better alignment to the original rasters, if the process graph does not apply an explicit resampling (Image-shifts through bbox, filter_spatial, or spatial_extent · Issue #69 · Open-EO/openeo-geotrellis-extensions · GitHub)

2023-02-07 (0.6.7a1)

Added initial support for the inspect process. It can be used on datacubes and in callbacks.
The size of a single chunk is now automatically increased for larger jobs, to improve IO performance.
resample_cube_spatial is no longer needed in all cases when using merge_cubesor mask
Better detection of duplicate products in source catalogs
The ‘if’ process will no longer evaluate the branch that is not accepted implement lazy `if` branching · Issue #109 · Open-EO/openeo-python-driver · GitHub

jeroen.dries · 8 March 2023 11:11

The previous update of Terrascope had to be rolled back for a short time, but is now back in effect.
In addition, this is included:

improvements to messages that are logged, as part of an ongoing effort to include quality of messages
area calculated and reported by batch job metadata should now be more accurate
derived-from links when using Sentinelhub based collections have been improved
more robustness for functionality that depends on external services, by adding more retries before failing
a new ‘filename_prefix’ format option allows more control over the filename of generated assets, which is otherwise set to default values like ‘openEO.tif’.

jeroen.dries · 20 March 2023 11:51

2023-03-20
The Terrascope production backend received another upgrade:

a bug in aggregate_temporal is fixed, it could cause the first aggregation to be incorrectly set to nodata
logs are being improved, for instance to also show error messages generated by user code in a UDF

jeroen.dries · 23 March 2023 14:14

2023-03-23
In today’s upgrade the Terrascope backend started running on Spark 3.3.1. Generally speaking this should not result in very visible changes, but do let us know if you suddenly see issues!

jeroen.dries · 18 September 2023 06:46

2023-09-18
While some intermediate upgrades with smaller fixes were released over summer, we now again have a pretty major upgrade, featuring new processes for vector cubes and many small fixes.

One major change is the interpretation of time intervals as half-open, which was previously announced. This aligns the Terrascope backend with the specification and ensures compatibility with other backends.

New collections were also added:

SENTINEL3_SYNERGY_VG1
SENTINEL3_SYNERGY_VG10
CGLS_FAPAR300_V1_GLOBAL
CGLS_GDMP300_V1_GLOBAL

The detailed list of changes can be found in our changelog, and contains many fixes and features that were requested by openEO platform users on this forum.

github.com

Open-EO/openeo-geopyspark-driver/blob/28a2acbcc107e60fea703fa203a9386f895a3030/CHANGELOG.md

# Changelog
All notable changes to this project will be documented in this file.

This project relies on continuous integration for new features. So we do not yet have explicitly versioned
releases. Releases are simply built continuously, automatically tested, deployed to a development environment and then to production.

Note that the openEO API provides a way to support stable and unstable versions in the same implementation:
https://openeo.org/documentation/1.0/developers/api/reference.html#operation/connect

If needed, feature flags are used to allow testing unstable features in development/production,
without compromising stable operations.

## [Unreleased]

## 2023-09-18 (0.9.5a1)

Important change: time intervals are now left closed. Workflows that are sensitive to exact time intervals may need
to be updated.

### Feature

This file has been truncated. show original

jeroen.dries · 12 March 2024 08:00

We’re happy to announce a new set of improvements and features that were integrated in the Terrascope backend over the last months. In general, there has been a strong focus on small bugfixes and stability improvements, but we also were able to add new features, usually requested by our users.

The full changelog can be found in the link below, but we highlight a few noteworthy items:

A number of processes now use floating point operations automatically, avoiding unexepected results for instance when subtracting between unsigned data types.
Using ‘filter_labels’, it is now possible to load data for multiple time intervals rather than a single continuous one.
The most commonly used UDF signature can now use xarray.DataArray directly, rather than using an openEO wrapper class. This makes your UDF’s look more simple.
The size of netCDF files that can be generated by openEO has increased. Do note that a 20GB netCDF file can be hard to handle and takes a long time to download.
Improved support for reading and generating GeoParquet files. This relatively new cloud-native format has some key advantages compared to traditional formats.
STAC metadata for results has gotten a few minor fixes to be compliant with latest versions of the extensions. There’s no big changes, but you may need to adjust if you depend on very specific STAC properties.
load_stac is getting more and more versatile. Allowing you to integrate your own datasets.

github.com

Open-EO/openeo-geopyspark-driver/blob/907135bbe50ae2e40995f285064b7c2029774b2a/CHANGELOG.md

# Changelog
All notable changes to this project will be documented in this file.

This project relies on continuous integration for new features. So we do not yet have explicitly versioned
releases. Releases are simply built continuously, automatically tested, deployed to a development environment and then to production.

Note that the openEO API provides a way to support stable and unstable versions in the same implementation:
https://openeo.org/documentation/1.0/developers/api/reference.html#operation/connect

If needed, feature flags are used to allow testing unstable features in development/production,
without compromising stable operations.

## Unreleased

## 0.28.1

- Support excluding Sentinel Hub processing units from usage reporting ([openeo-cdse-infra#37](https://github.com/eu-cdse/openeo-cdse-infra/issues/37)).

## 0.28.0

This file has been truncated. show original

jeroen.dries · 19 September 2024 09:35

The Terrascope backend has again received a number of feature updates.
Important changes include:

Improved Quantile Processing: The ‘quantiles’ process has been updated to align with the specification when used with apply or reduce_dimension. This change adopts the prescribed interpolation approach, offering more accuracy and consistency.
New Geotiff Format Option: A new separate_asset_per_band format option has been added to Geotiff, allowing users to write individual bands into separate TIFF files, providing greater flexibility in handling multi-band data.
Expanded load_stac Support: The load_stac process now accepts a wider range of input collections, no longer requiring the presence of eo:bands. This enhancement broadens the compatibility of the function with diverse data.
resample_spatial received a fix to better respect a number of resampling methods
raster_to_vector received a bunch of improvements to make it more stable, enabling more vector cube based use cases.

As usual, you can find more details in our changelog:

github.com

Open-EO/openeo-geopyspark-driver/blob/master/CHANGELOG.md

# Changelog
All notable changes to this project will be documented in this file.

This project relies on continuous integration for new features. So we do not yet have explicitly versioned
releases. Releases are simply built continuously, automatically tested, deployed to a development environment and then to production.

Note that the openEO API provides a way to support stable and unstable versions in the same implementation:
https://openeo.org/documentation/1.0/developers/api/reference.html#operation/connect

If needed, feature flags are used to allow testing unstable features in development/production,
without compromising stable operations.

<!-- start-of-changelog -->

## Unreleased

- quantiles, when used in apply_dimension was corrected to use the interpolation method that is prescribed by the openEO process definition.
- return STAC Items with valid date/time for time series job results ([#852](https://github.com/Open-EO/openeo-geopyspark-driver/issues/852))
- filter_labels now also supported for collections that use orfeo backscatter. ([Open-EO/openeo-geotrellis-extensions#320](https://github.com/Open-EO/openeo-geotrellis-extensions/issues/320))

This file has been truncated. show original

jeroen.dries · 7 February 2025 08:17

The Terrascope backend continues to receive important improvements, thanks to the feedback on this forum. We do want to highlight these important evolutions:

export_workspace is maturing further, for instance with support for exporting STAC metadata to an API. This powerful feature is drastically simplifying the setup of large scale processing projects.
Performance improvements continue to be made. Sometimes reducing costs with a factor 10. Please do continue to report jobs with unexpected costs, as this user feedback is crucial to identify and solve performance problems.
Pixel level alignment between datacubes improved for specific complex process graphs. Ensuring that you get more exact results.
Logging now reports the ‘stages’ where most time was spent, allowing to more effectively find the root cause of performance issues.

Details can be found in our changelog:
https://openeo.vito.be/openeo/CHANGELOG