Bring Your Own COG

Sentinel Hub allows you to access your own data stored in your S3 bucket with the powerful Sentinel Hub API. Since data remains on your bucket, you keep full control over it. This functionality requires no replication of data and allows you to exercise the full power of the Sentinel Hub service including Custom algorithms. More information here!

The Sentinel Hub Dashboard has a very user-friendly “Bring your own COG” tab. If you are not going to be creating collections, adding/updating collection tiles, etc. daily, the Dashboard tool is your friend. For the rest, this tutorial is a simple walk-through on creating, updating, listing, and deleting your BYOC collections through Python using sentinelhub-py.

Some general and BYOC related functionality imports:

[1]:
# Configure plots for inline use in Jupyter Notebook
%matplotlib inline

# Utilities
import boto3
import numpy as np
import datetime as dt
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# Sentinel Hub
from sentinelhub import (
    SHConfig, DataCollection, Geometry, BBox, CRS,
    SentinelHubRequest, filter_times, bbox_to_dimensions, MimeType,
    SentinelHubBYOC, ByocCollection, ByocTile, ByocCollectionAdditionalData,
    DownloadFailedException
)

Prerequisites

BYOC API requires Sentinel Hub account. Please see configuration instructions how to set up your configuration.

[2]:
# Insert your credentials here in case you don't already have them in config.json file:
SH_CLIENT_ID = ''
SH_CLIENT_SECRET = ''
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

config = SHConfig()

if SH_CLIENT_ID and SH_CLIENT_SECRET:
    config.sh_client_id = SH_CLIENT_ID
    config.sh_client_secret = SH_CLIENT_SECRET

if not config.sh_client_id or not config.sh_client_secret:
    print("Warning! To use Sentinel Hub BYOC API, please provide the credentials (client ID and client secret).")

if AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY:
    config.aws_access_key_id = AWS_ACCESS_KEY_ID
    config.aws_secret_access_key = AWS_SECRET_ACCESS_KEY

BYOC collections

SentinelHubBYOC class holds the methods for interacting with Sentinel Hub services. Let’s initialize it with our config:

[3]:
# Initialize SentinelHubBYOC class
byoc = SentinelHubBYOC(config=config)

Create new collection

The easiest way to create a collection is to use its dataclass:

[4]:
new_collection = ByocCollection(name='new collection', s3_bucket='my-s3-bucket')

The new collection is accessible on my-s3-bucket s3 bucket (please see how to configure your bucket for Sentinel Hub service here).

The call to create the collection on Sentinel Hub, will return newly created collection, which will get its own collection id:

[5]:
created_collection = byoc.create_collection(new_collection)
[6]:
print('name:', created_collection['name'])
print('id:', created_collection['id'])
name: new collection
id: 7844c86f-abae-41be-a90c-c0df37f77225

Get a list of your collections

Now we have created a data collection named new collection, we can retrieve it with the following code.

[7]:
my_collection = byoc.get_collection(created_collection['id'])

Let’s have a look at the the collection we just created.

[8]:
my_collection
[8]:
{'id': '7844c86f-abae-41be-a90c-c0df37f77225',
 'userId': 'cb04dca1-c8f2-400e-8719-e24fced98fd6',
 'name': 'new collection',
 's3Bucket': 'my-s3-bucket',
 'created': '2021-07-07T10:16:36.216611Z'}

In cases where you have a large amount of collections and you would only like to load collection info for a few collections, the following code would be a good option for you:

[9]:
collections_iterator = byoc.iter_collections()
[10]:
my_collection_using_next = next(collections_iterator)
print('name:', my_collection_using_next['name'])
print('id:', my_collection_using_next['id'])
name: new collection
id: 7844c86f-abae-41be-a90c-c0df37f77225

Note: collections_iterator won’t necessarily return collections in the same order they were created. If you already have collections, the output above could show a collection other than the one we just created.

If you prefer to work with dataclasses, you can also run the following code:

my_collection = ByocCollection.from_dict(next(collections_iterator))

One can of course retrieve all of them in one go like so:

[11]:
my_collections = list(collections_iterator)

for collection in my_collections:
    print((collection['name'], collection['id']))
('new collection', '7844c86f-abae-41be-a90c-c0df37f77225')
('test', 'd46c3275-a0d6-47e7-9bcc-8e12dbf452a9')
('byoc-tutorial', 'e567ae7e-7981-49a4-8f37-eec793577d5d')

A useful way for managing your collections is pandas.DataFrame you can create like so:

[12]:
my_collections_df = pd.DataFrame(data=list(byoc.iter_collections()))
my_collections_df[['id','name','created']].head()
[12]:
id name created
0 7844c86f-abae-41be-a90c-c0df37f77225 new collection 2021-07-07T10:16:36.216611Z
1 d46c3275-a0d6-47e7-9bcc-8e12dbf452a9 test 2021-07-01T10:06:21.165747Z
2 e567ae7e-7981-49a4-8f37-eec793577d5d byoc-tutorial 2021-06-30T12:41:59.139282Z

Update existing collection

Anything you can do on Dashboard, Bring your own COG tab, you can do programmatically as well. Below we’re going to rename the new collection collection to renamed new collection:

[13]:
my_collection['name'] = 'renamed new collection'

When using next(), run the following code:

my_collection['name'] = 'renamed new collection'

When using dataclass, run the following code:

my_collection.name = 'renamed new collection'

When using list, run the following code:

collection_to_be_updated = [
    collection for collection in my_collections
    if collection['id'] == my_collection['id']
][0]
collection_to_be_updated['name'] = 'renamed new collection'

Note: While you can change other fields as well, s3_bucket cannot be changed, and the bitDepth of bands in the collection is something that is pertinent to the COGs themselves and populated during the ingestion.

To update the collection, call:

[14]:
byoc.update_collection(my_collection)
[14]:
''

Now we can see that the new collection collection has been renamed as renamed new collection.

[15]:
get_renamed_collection = byoc.get_collection(created_collection['id'])
print("name:", get_renamed_collection['name'])
print("id:", get_renamed_collection['id'])
name: renamed new collection
id: 7844c86f-abae-41be-a90c-c0df37f77225

Delete collection

If you are the owner of the collection, you can also delete it.

Warning:

Beware! Deleting the collection will also delete all its tiles!

[16]:
byoc.delete_collection(my_collection)
[16]:
''

The collection can also be deleted via passing its id to byoc.delete_collection() as shown below:

byoc.delete_collection(my_collection['id'])

Trying to access this collection now will fail.

[17]:
try:
    deleted_collection = byoc.get_collection(my_collection['id'])
except DownloadFailedException as e:
    print(e)

BYOC tiles (cogs in the collection)

Your data needs to be organized into collections of tiles. Each tile needs to contain a set of bands and (optionally) an acquisition date and time. Tiles with the same bands can be grouped into collections. Think of the Sentinel-2 data source as a collection of Sentinel-2 tiles.

Tiles have to be on an s3 bucket and need to be in COG format. We will not go into details about the COGification process; users can have a look at the documentation or use the BYOC tool that will take care of creating a collection and ingesting the tiles for you.

Creating a new tile (and ingesting it to collection)

When we create a new tile and add it to the collection, the ingestion process on the Sentinel Hub side will happen, checking if the tile corresponds to the COG specifications as well as if it conforms to the collection.

The simplest way to create a new tile is by using the ByocTile dataclass, which will complain if the required fields are missing.

[18]:
new_tile = ByocTile(
    path='2019/11/27/28V/(BAND).tif',
    sensing_time=dt.datetime(2019, 11, 27)
)

Note:

  • The most important field of the tile is its path on an s3 bucket. For example, if your band files are stored in s3://bucket-name/folder/, then set folder as the tile path. In this case, the band names will equal the file names. For example, the band B1 corresponds to the file s3://bucket-name/folder/B1.tiff. If your file names have something other than just the band name, such as a prefix, this is fine as long as the prefix is the same for all files. In this case, the path needs to include this prefix and also the band placeholder: (BAND). Adding the extension is optional. For example, this is what would happen if you would use the following path folder/tile_1_(BAND)_2019.tiff for the following files:
  • s3://bucket-name/folder/tile_1_B1_2019.tiff - the file would be used, the band name would be B1
  • s3://bucket-name/folder/tile_1_B2_2019.tiff - the file would be used, the band name would be B2
  • s3://bucket-name/folder/tile_2_B1_2019.tiff - the file would not be used
  • s3://bucket-name/folder/tile_2_B2_2019.tiff - the file would not be used
  • ByocTile takes sensing_time as optional parameters, but setting the sensing_time is highly recommended since it makes the collection “temporal” and help you search for the data with Sentinel Hub services.
  • tile_geometry is optional as it is the bounding box of the tile and will be read from COG file.
  • cover_geometry is the geometry of where the data (within the bounding box) is and can be useful for optimized search as an optional parameter. For a good explanation of the coverGeometry please see docs.

Let’s create a new collection for these tiles.

[19]:
new_collection = ByocCollection(name='byoc-s2l2a-120m-mosaic', s3_bucket='sentinel-s2-l2a-mosaic-120')
created_collection = byoc.create_collection(new_collection)
[20]:
created_tile = byoc.create_tile(created_collection, new_tile)

The response from byoc.create_tile has a valid id, and its status is set to WAITING. Checking the tile status after a while (by requesting this tile) will tell you if it has been INGESTED or if the ingestion procedure FAILED. In case of failure, additional information (with the cause of failure) will be available in the tile additional_data.

[21]:
created_tile
[21]:
{'id': '6ff44606-54fd-40ce-afd7-2b4a7636845b',
 'created': '2021-07-07T10:16:44.000',
 'sensingTime': '2019-11-27T00:00:00',
 'path': '2019/11/27/28V/(BAND).tif',
 'status': 'WAITING'}

Add multiple tiles to a single collection

A data collection can for sure contain multiple tiles. It is important to know that adding multiple tiles will work only if these tiles have the same bands. Let’s add more tiles from the Sentinel-2 L2A 120m Mosaic listed on the open data registry on AWS to the collection.

We first define a function to get a list of paths for each tile:

[22]:
def list_objects_path(bucket, year_count, month_count, day_count, config):
    tiles_path = []
    client = boto3.client(
        's3',
        aws_access_key_id=config.aws_access_key_id,
        aws_secret_access_key=config.aws_secret_access_key
    )
    result = client.list_objects(Bucket=bucket, Delimiter='/')
    for year in result.get('CommonPrefixes')[:year_count]:
        year_result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=year.get('Prefix'))
        for month in year_result.get('CommonPrefixes')[:month_count]:
            month_result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=month.get('Prefix'))
            for day in month_result.get('CommonPrefixes')[:day_count]:
                day_result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=day.get('Prefix'))
                for tile in day_result.get('CommonPrefixes'):
                    tiles_path.append(tile.get('Prefix'))
    return tiles_path

Next we obtain a list of paths for tiles available on s3://sentinel-s2-l2a-mosaic-120/2019/1/1/.

[23]:
tiles_path = list_objects_path(
    bucket='sentinel-s2-l2a-mosaic-120',
    year_count=1,
    month_count=1,
    day_count=1,
    config=config
)

Then we can add tiles to the collection with a for loop.

[24]:
for tile in tiles_path:
    year, month, day = tile.split('/')[:3]
    byoc_tile = ByocTile(
        path=f'{tile}(BAND).tif',
        sensing_time=dt.datetime(int(year), int(month), int(day))
    )
    byoc.create_tile(created_collection, byoc_tile)

Note: The tile ingesting process could take some time, please wait a few more minutes after the cell has done running before heading to the next step.

Get tiles from your collection

After byoc.create_tile being executed, we can request the tile from the collection where it is ingested. To request a specific tile you can pass a collection id and a tile id to the get_tile method.

[25]:
tile = byoc.get_tile(collection=created_collection['id'], tile=created_tile['id'])
[26]:
tile
[26]:
{'id': '6ff44606-54fd-40ce-afd7-2b4a7636845b',
 'created': '2021-07-07T10:16:44.000',
 'sensingTime': '2019-11-27T00:00:00.000',
 'coverGeometry': {'type': 'MultiPolygon',
  'crs': {'type': 'name',
   'properties': {'name': 'urn:ogc:def:crs:EPSG::32628'}},
  'coordinates': [[[[299999.9980359557, 6899880.000729979],
     [600120.0000155062, 6899880.00067825],
     [600120.0000231846, 7100040.000720726],
     [299999.9970628682, 7100040.000814738],
     [299999.9980359557, 6899880.000729979]]]]},
 'tileGeometry': {'type': 'Polygon',
  'crs': {'type': 'name',
   'properties': {'name': 'urn:ogc:def:crs:EPSG::32628'}},
  'coordinates': [[[300000.0, 7100040.0],
    [600120.0, 7100040.0],
    [600120.0, 6899880.0],
    [300000.0, 6899880.0],
    [300000.0, 7100040.0]]]},
 'path': '2019/11/27/28V/(BAND).tif',
 'status': 'INGESTED',
 'additionalData': {'bandHeaderSizes': {'B02': 1328,
   'B03': 1328,
   'B04': 1328,
   'B08': 1328,
   'B11': 1328,
   'B12': 1328,
   'dataMask': 1328},
  'minMetersPerPixel': 120.0,
  'maxMetersPerPixel': 960.0}}

You can of course retrieve all tiles into a list.

[27]:
tiles = list(byoc.iter_tiles(created_collection))

In cases where you have a large collection with a lot of tiles and you would only like to load tile info for a few tiles, the following code using next() would be a good option for you:

tile = next(byoc.iter_tiles(created_collection))

To convert it to ByocTile dataclass using the code below:

tile = ByocTile.from_dict(next(byoc.iter_tiles(created_collection)))

Let’s take a look at the keys of the first dictionary, which contains the info of the first tile, in the returned list.

[28]:
list(tiles[0].keys())
[28]:
['id',
 'created',
 'sensingTime',
 'coverGeometry',
 'tileGeometry',
 'path',
 'status',
 'additionalData']

To check if there’s any tile failed to be ingested, run the code below:

[29]:
tiles_failed_to_be_ingested = [tile['path'] for tile in tiles if tile['status'] == 'FAILED']
tiles_failed_to_be_ingested
[29]:
[]

Visualize the tiles in your collection

Using ByocTile dataclass, which will properly parse tile geometries, date-time strings, etc., one can create a geopandas.GeoDataFrame.

Note: the geometries can be in different coordinate reference systems, so a transform to a common CRS might be needed.

[30]:
tile_iterator = byoc.iter_tiles(created_collection)
[31]:
tiles_for_visualized = []
for i in range(100):
    tiles_for_visualized.append(ByocTile.from_dict(next(tile_iterator)))

tiles_gdf = gpd.GeoDataFrame(
    tiles_for_visualized,
    geometry=[t.cover_geometry.transform(CRS.WGS84).geometry for t in tiles_for_visualized],
    crs='epsg:4326'
)
[32]:
tiles_gdf.head()
[32]:
path other_data status tile_id tile_geometry cover_geometry created sensing_time additional_data geometry
0 2019/1/1/57L/(BAND).tif {} INGESTED 00416730-13e6-4721-83d8-ae7c5db654b1 Geometry(POLYGON ((300000 9100000, 900120 9100... Geometry(MULTIPOLYGON (((299999.9999987644 869... 2021-07-07 10:18:19 2019-01-01 {'bandHeaderSizes': {'B02': 1662, 'B03': 1662,... MULTIPOLYGON (((157.16470 -11.75468, 162.66974...
1 2019/1/1/17P/(BAND).tif {} INGESTED 0103d09e-1c8b-44a7-a929-987fddc7848f Geometry(POLYGON ((199980 1800000, 900060 1800... Geometry(MULTIPOLYGON (((199979.9999914575 899... 2021-07-07 10:17:28 2019-01-01 {'bandHeaderSizes': {'B02': 1950, 'B03': 1950,... MULTIPOLYGON (((-83.72242 8.13182, -77.37090 8...
2 2019/1/1/19V/(BAND).tif {} INGESTED 010bb880-8263-4053-a257-d49642e31c82 Geometry(POLYGON ((300000 7100040, 700080 7100... Geometry(MULTIPOLYGON (((299999.9994199659 619... 2021-07-07 10:17:32 2019-01-01 {'bandHeaderSizes': {'B02': 1774, 'B03': 1774,... MULTIPOLYGON (((-72.19940 55.90309, -65.79932 ...
3 2019/1/1/11V/(BAND).tif {} INGESTED 0187f59d-9967-406b-80cb-4e1029ead2f4 Geometry(POLYGON ((300000 7100040, 700080 7100... Geometry(MULTIPOLYGON (((299999.9994199659 619... 2021-07-07 10:17:22 2019-01-01 {'bandHeaderSizes': {'B02': 1774, 'B03': 1774,... MULTIPOLYGON (((-120.19940 55.90309, -113.7993...
4 2019/1/1/46U/(BAND).tif {} INGESTED 02be7c7f-b047-4c01-8456-d63f8555b10c Geometry(POLYGON ((300000 6200040, 800040 6200... Geometry(MULTIPOLYGON (((299999.9998340536 529... 2021-07-07 10:18:04 2019-01-01 {'bandHeaderSizes': {'B02': 1886, 'B03': 1886,... MULTIPOLYGON (((90.32798 47.82152, 97.00575 47...
[33]:
fig, ax = plt.subplots(figsize=(17,8))
tiles_gdf.plot(ax=ax);
../_images/examples_byoc_request_72_0.png

In the above example, the ingested tiles are 100 tiles from the Sentinel-2 L2A 120m Mosaic which contains 19869 tiles around the globe, hence the tiles are so sparse in the image above.

Updating and deleting a tile

Updating and deleting a tile follow the same logic as updating/deleting a collection.

  • To update a tile:
[34]:
tile['sensingTime'] = '2021-06-29T18:02:34'
byoc.update_tile(created_collection, tile)
[34]:
''

After updating we can see that the sensingTime has been changed.

[35]:
byoc.get_tile(collection=created_collection['id'], tile=created_tile['id'])['sensingTime']
[35]:
'2021-06-29T18:02:34.000'
  • To delete a tile:
[36]:
byoc.delete_tile(created_collection, tile)
[36]:
''

Now the tile is gone forever.

[37]:
tiles = list(byoc.iter_tiles(created_collection))
[tile for tile in tiles if tile['id'] == created_tile['id']]
[37]:
[]

Retrieve data from collection

Once we have a collection created and its tiles ingested, we can retrieve the data from said collection. We will be using ProcessAPI for this.

[38]:
data_collection = DataCollection.define_byoc(created_collection['id'])

Alternatively using dataclass:

data_collection = my_collection_dataclass.to_data_collection()
[39]:
tile_time = dt.datetime.fromisoformat(tiles[0]['sensingTime'])

If using dataclass run:

tile_time = tile_dataclass.sensing_time

Below we’re going to request a false color image of Caspian Sea.

[40]:
caspian_sea_bbox = BBox([49.9604, 44.7176, 51.0481, 45.2324], crs=CRS.WGS84)
[41]:
false_color_evalscript = """
//VERSION=3
function setup() {
  return {
    input: ["B08","B04","B03", "dataMask"],
    output: { bands: 4 },
  };
}

var f = 2.5/10000;
function evaluatePixel(sample) {
  return [f*sample.B08, f*sample.B04, f*sample.B03, sample.dataMask];
}
"""

request = SentinelHubRequest(
    evalscript=false_color_evalscript,
    input_data=[
        SentinelHubRequest.input_data(
            data_collection=data_collection,
            time_interval=tile_time
        )
    ],
    responses=[
        SentinelHubRequest.output_response('default', MimeType.PNG)
    ],
    bbox=caspian_sea_bbox,
    size=bbox_to_dimensions(caspian_sea_bbox, 100),
    config=config
)
[42]:
data = request.get_data()[0]
[43]:
fig, ax = plt.subplots(figsize=(15, 10))

ax.imshow(data)
ax.set_title(tile_time.date().isoformat(), fontsize=10)

plt.tight_layout()
../_images/examples_byoc_request_92_0.png