Sentinel Hub Batch Processing

Sentinel Hub Batch Processing takes the geometry of a large area and divides it according to a specified tile grid. Next, it executes processing requests for each tile in the grid and stores results to a given location at AWS S3 storage. All this is efficiently executed on the server-side. Because of the optimized performance, it is significantly faster than running the same process locally.

More information about batch processing is available at Sentinel Hub documentation pages:

The tutorial will show a standard process of using Batch Processing with sentinelhub-py. The process can be divided into:

Define and create a batch request
Analyse a batch request before it is executed
Run a batch requests job and check the outcome

Imports

The tutorial requires the package geopandas which is not a dependency of sentinelhub-py.

[ ]:

%matplotlib inline

import datetime as dt
import os

import geopandas as gpd

from sentinelhub import (
    CRS,
    BatchProcessClient,
    DataCollection,
    Geometry,
    MimeType,
    SentinelHubRequest,
    SHConfig,
    bbox_to_dimensions,
    monitor_batch_process_job,
)

# The following is not a package. It is a file utils.py which should be in the same folder as this notebook.
from utils import plot_image

1. Create a batch request

To create a batch request we need to do the following:

Define a Process API request which we would like to execute on a large area.
Select a tiling grid which will define how our area will be split into smaller tiles.
Set up an S3 bucket where results will be saved.

1.1 Define a Process API request

First, let’s set up the credentials the same way as in Sentinel Hub Process API tutorial.

[ ]:

config = SHConfig()

if config.sh_client_id == "" or config.sh_client_secret == "":
    print("Warning! To use Sentinel Hub Process API, please provide the credentials (client ID and client secret).")

For our area of interest, we’ll take an area of crop fields in California.

[ ]:

SHAPE_PATH = os.path.join(".", "data", "california_crop_fields.geojson")
area_gdf = gpd.read_file(SHAPE_PATH)

# Geometry of an entire area
full_geometry = Geometry(area_gdf.geometry.values[0], crs=CRS.WGS84)
# Bounding box of a test sub-area
test_bbox = Geometry(area_gdf.geometry.values[1], crs=CRS.WGS84).bbox

area_gdf.plot(column="name");

Let’s check a true-color satellite image of the entire area:

[ ]:

evalscript_true_color = """
    //VERSION=3
    function setup() {
        return {
            input: [{
                bands: ["B02", "B03", "B04"]
            }],
            output: {
                bands: 3
            }
        };
    }
    function evaluatePixel(sample) {
        return [sample.B04, sample.B03, sample.B02];
    }
"""

request = SentinelHubRequest(
    evalscript=evalscript_true_color,
    input_data=[
        SentinelHubRequest.input_data(
            data_collection=DataCollection.SENTINEL2_L2A,
        )
    ],
    responses=[SentinelHubRequest.output_response("default", MimeType.PNG)],
    geometry=full_geometry,
    size=(512, 512),
    config=config,
)

image = request.get_data()[0]

plot_image(image, factor=3.5 / 255, clip_range=(0, 1))

Next, let’s define an evalscript and time range. To better demonstrate the power of batch processing we’ll take an evalscript that returns a temporally-interpolated stack NDVI values.

Warning:

In the following cell parameters evalscript and time_interval are both defined for the same time interval. If you decide to change the time interval you have to change it both in the cell and in the evalscript code.

[ ]:

EVALSCRIPT_PATH = os.path.join(".", "data", "interpolation_evalscript.js")

with open(EVALSCRIPT_PATH) as fp:
    evalscript = fp.read()

time_interval = dt.date(year=2020, month=7, day=1), dt.date(year=2020, month=7, day=30)

Now we can define a Process API request and test it on a smaller sub-area to make sure we get back desired data.

[ ]:

%%time

sentinelhub_request = SentinelHubRequest(
    evalscript=evalscript,
    input_data=[
        SentinelHubRequest.input_data(
            data_collection=DataCollection.SENTINEL2_L1C,
            time_interval=time_interval,
        )
    ],
    responses=[
        SentinelHubRequest.output_response("NDVI", MimeType.TIFF),
        SentinelHubRequest.output_response("data_mask", MimeType.TIFF),
    ],
    bbox=test_bbox,
    size=bbox_to_dimensions(test_bbox, 10),
    config=config,
)

results = sentinelhub_request.get_data()[0]

print(f"Output data: {list(results)}")

plot_image(results["NDVI.tif"][..., 2])

We obtained stacks of NDVI values and data masks.

1.2 Define a batch client

The interface for Sentinel Hub Batch API is class BatchProcessClient. We initialize it with a configuration object that contains credentials and URLs of the services.

Note:

The BatchProcessClient interface uses the Version 2 API of batch processing. SentinelHubBatch is the client for Version 1.

[ ]:

client = BatchProcessClient(config=config)

1.3 Select a tiling grid

Batch API offers a number of pre-defined tiling grids. We can check which ones are available.

[ ]:

list(client.iter_tiling_grids())

Let’s select a 10km grid, which is based on Sentinel-2 data tiling grid in UTM coordinate reference systems.

There is also an option to check a definition for a single grid:

[ ]:

# Specify grid ID here:
GRID_ID = 1

client.get_tiling_grid(GRID_ID)

1.4 Set up an S3 bucket

For this step please follow instructions on how to provide a way for the Batch Process to write to your S3 bucket. We suggest using the AWS IAM Assume Role Workflow.

[ ]:

BUCKET_PATH = "???"

ROLE_ARN = "???"

1.5 Join batch request definition

Now we are ready to create an entire batch request. This step won’t trigger the actual processing. It will only save a batch request definition to the server-side.

[ ]:

sentinelhub_request = SentinelHubRequest(
    evalscript=evalscript,
    input_data=[
        SentinelHubRequest.input_data(
            data_collection=DataCollection.SENTINEL2_L1C,
            time_interval=time_interval,
        )
    ],
    responses=[
        SentinelHubRequest.output_response("NDVI", MimeType.TIFF),
        SentinelHubRequest.output_response("data_mask", MimeType.TIFF),
    ],
    geometry=full_geometry,
    # This time we don't specify size parameter
    config=config,
)

batch_request = client.create(
    sentinelhub_request,
    input=client.tiling_grid_input(grid_id=GRID_ID, resolution=10, buffer_x=50, buffer_y=50),
    output=client.raster_output(
        delivery=client.s3_specification(BUCKET_PATH, iam_role_arn=ROLE_ARN, region="eu-central-1")
    ),
    description="sentinelhub-py tutorial batch job",
)

batch_request

A batch request has been successfully created. The information about a request is provided in the form of a BatchProcessRequest dataclass object. From the object representation, we can see some of its main properties, such as status, which defines the current status of a batch request.

At this point you can write down your batch request ID. In case you restart your Python kernel or delete batch_request object you can always re-initialize it with the request ID:

[ ]:

# Write your batch_request.request_id here
REQUEST_ID = "???"

client.get_request(REQUEST_ID)

2. Analyse a batch request

Before we run a batch request job we can check currently defined batch requests and run an analysis to determine the outcome of a batch request. This step is also run automatically if we just run the batch request.

2.1 Investigate past batch requests

We already have our current batch request definition in batch_request variable. However, if we would like to find it again we can search the history of all created batch requests:

[ ]:

for request in client.iter_requests():
    print(request)

2.2 Run an analysis

At the moment we don’t have information about tiles or processing units yet. But we can order the service to calculate it.

The following will start the analysis on the server-side:

[ ]:

client.start_analysis(batch_request)

Depending on the size of our batch request it might take from a few seconds to a few minutes for analysis to finish. To determine if the analysis has finished we have to update batch request info and check the status information:

[ ]:

batch_request = client.get_request(batch_request)

batch_request

3. Run a batch request job

Once we decide to run a batch request job we can trigger it with the following:

[ ]:

client.start_job(batch_request)

Again we can check if a job has finished by updating batch request info.

[ ]:

batch_request = client.get_request(batch_request)

batch_request

This package also provides a utility function that monitors batch job execution by periodically checking for status and sleeping in between.

[ ]:

monitor_batch_process_job(batch_request, client, sleep_time=60)  # It will update progress every 60 seconds

Another option is to check which results have already been saved to the given S3 bucket.

When the job is running we can decide at any time to cancel it. Results that have already been produced will remain on the bucket.

[ ]:

client.stop_job(batch_request)

4. Alternative inputs/outputs

Apart from using the predefined tiling grids, one can also provide the grid via GeoPackage. Details on how the GPKG must be structured can be found here. To construct a suitable input for the request, you can use client.geopackage_input instead of client.tiling_grid_input.

The batch process also supports Zarr format output. To construct a suitable output field switch client.raster_output with client.zarr_output when creating the batch request