Sharing Sentinel Hub authentication session

Most of Sentinel Hub services require users to authenticate using Sentinel Hub OAuth client credentials. Each authentication creates a new authentication session that typically lasts 1 hour and cannot be cancelled. Users are highly encouraged to configure their processes in a way that at any time there won’t be more than 100 active authentication sessions.

This package is implemented in a way that it automatically creates, caches, and reuses an authentication session within a single Python runtime process. Once the current session expires or the runtime process is restarted it will automatically create a new session. This way normal users don’t have to worry about interacting with session objects directly. However, once users parallelize their processes over multiple Python processes or even over a cluster of multiple compute instances then they should make sure that a single authentication session is shared between all of them.

Note:

Let’s say a user splits their area of interest into a large number of smaller bounding boxes. Then they parallelize data download from Sentinel Hub over these bounding boxes using a few different Python processes. It depends on the type of parallelization, but without implementing a session sharing mechanism such a procedure could try to create a new authentication session for each small bounding box. In extreme cases that could fail because an excessive number of authentication requests might be rate-limited by Sentinel Hub.

This tutorial will explain the basic mechanism of working with sessions in sentinelhub-py and how to implement a session transfer for different types of parallelization.

For the purposes of the tutorial only, let’s first configure a logging handler that will tell us exactly when a new session has been created:

[1]:
import logging
import sys
from logging import Formatter, StreamHandler

handler = StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)

formatter = Formatter("%(asctime)s: %(message)s")
handler.setFormatter(formatter)

session_logger = logging.getLogger("sentinelhub.download.session")
session_logger.setLevel(logging.DEBUG)
session_logger.addHandler(handler)

Session mechanism

In this chapter we’ll see how a session is created, how to extract a session token from a download client or a session object, and how to set it back into the client.

The tutorial requires a Sentinel Hub account. Please check configuration instructions about how to set up your Sentinel Hub credentials.

[2]:
from sentinelhub import SentinelHubDownloadClient, SentinelHubSession, SHConfig, __version__

assert __version__ >= "3.6.0", "The minimal required package version for this tutorial is 3.6.0"

config = SHConfig()
# config.sh_client_id = ""
# config.sh_client_secret = ""

if not config.sh_client_id or not config.sh_client_secret:
    print("Please provide the credentials (OAuth client ID and client secret).")

# The following endpoint can be accessed only if a user is authenticated:
EXAMPLE_URL = "https://services.sentinel-hub.com/oauth/tokeninfo"

By default, a Sentinel Hub authentication session is created automatically from credentials in an instance of SentinelHubDownloadClient once a request, which requires a session, is made.

[3]:
client = SentinelHubDownloadClient(config=config)

# At this point session is created:
_ = client.get_json(EXAMPLE_URL, use_session=True)
2022-05-25 18:58:23,761: Creating a new authentication session with Sentinel Hub service

Notice that running the above cell multiple times will create a new session only the first time. This is because the session is cached in the SentinelHubDownloadClient class itself and not on any particular instance of this class.

A new session would be created if before running the above cell you do anything of the following:

  • restart the Jupyter notebook’s kernel,

  • change OAuth credentials in the config object that is used to initialize the client,

  • wait until the current session expires (about 1 hour),

  • run SentinelHubDownloadClient.clear_cache().

From the client we can obtain the session object which contains the authentication session token.

[4]:
session = client.get_session()
print(session)

token = session.token

# To avoid showing the whole token in this tutorial the following will partially hide it:
token = token.copy()
token["access_token"] = token["access_token"][:3] + "..."

token
<sentinelhub.download.session.SentinelHubSession object at 0x7f48640cba30>
[4]:
{'access_token': 'eyJ...', 'expires_in': 3599, 'expires_at': 1653501502.923365}

Alternatively, a session object can also be initialized directly. This way we can even configure how soon before expiry a token will be refreshed:

[5]:
session = SentinelHubSession(
    config=config,
    refresh_before_expiry=120,  # This is also the default value
)

# If a token would be extracted 120 seconds before expiry or later, the session
# object would automatically authenticate again.
token = session.token
2022-05-25 18:58:24,149: Creating a new authentication session with Sentinel Hub service

A session object can at any point be cached on the client:

[6]:
client.cache_session(session)

cached_session = client.get_session()
cached_session is session
[6]:
True

Also a new session object can be initialized from a token. However, such a session object will by default be non-refreshing. Once a token expires it will not make another authentication.

[7]:
session = SentinelHubSession.from_token(token)

session
[7]:
<sentinelhub.download.session.SentinelHubSession at 0x7f4831ee2a60>

Session sharing

Different types of parallelization provide different support for memory sharing between processes. However, the following would be the most general description of the session sharing procedure:

  1. Create a single authentication session.

  2. Start a separate thread that is continuously running the following 2-step procedure:

    • extract a session token and send it to a shared memory space,

    • wait until the current session token is close to expiring and the token will have to be refreshed.

  3. Start parallelization with multiple Python processes.

  4. Each process should read the token from the shared memory space and cache it into its client object every time before it starts interacting with Sentinel Hub service.

image0

Note that an alternative solution, where a session object would be given to each Python process only at the beginning of parallelization, would work only if the entire parallelization would last less time than the token expiry time.

The following code would create and serialize an authentication token:

[8]:
import json

session = SentinelHubSession(config=config, refresh_before_expiry=300)

serialized_token = json.dumps(session.token)
2022-05-25 18:58:24,337: Creating a new authentication session with Sentinel Hub service

The following code would deserialize authentication token and cache it into a download client of another Python process:

[9]:
token = json.loads(serialized_token)
session = SentinelHubSession.from_token(token)

SentinelHubDownloadClient.cache_session(
    session,
    # In case this Python process wouldn't be using any OAuth credentials use
    # the following parameter:
    # universal=True
)

Parallelization frameworks

In this chapter we’ll provide implementations for 3 commonly used parallelization frameworks in Python:

Standard Python multiprocessing

For parallelization with the framework from the Standard Python Library (modules multiprocessing or concurrent.futures) we already provide utilities in sentinelhub-py that implement the process defined in the previous chapter. The authentication token is passed to other processes using multiprocessing.shared_memory functionality.

[10]:
from concurrent.futures import ProcessPoolExecutor

from sentinelhub.download import SessionSharing, collect_shared_session


def remote_function(url: str, config: SHConfig) -> None:
    """A function that will run on a worker process.

    It collects a shared session, caches it, and then interacts with Sentinel Hub service
    """
    session = collect_shared_session()
    SentinelHubDownloadClient.cache_session(session)

    client = SentinelHubDownloadClient(config=config)
    client.get_json(url, use_session=True)


# This will create a session that will be shared with all workers
session = SentinelHubSession(config)

# For the duration of "with" statement this will run a thread that will share the given Sentinel Hub session
with SessionSharing(session), ProcessPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(remote_function, EXAMPLE_URL, config) for _ in range(10)]
    for future in futures:
        future.result()
2022-05-25 18:58:24,545: Creating a new authentication session with Sentinel Hub service

Ray

The session sharing process in the Ray framework is even easier because it implements shared mutable objects called Ray Actors. A session object can be placed into a Ray Actor and shared with all processes.

[11]:
import ray


@ray.remote
class RaySessionActor:
    """This object has a mutable state and will be accessed by multiple Ray workers
    in a consecutive way."""

    def __init__(self, session: SentinelHubSession):
        self.session = session

    def get_valid_session(self) -> SentinelHubSession:
        """The following makes sure that a token is still valid or refreshed, and returns it in a
        non-refreshing session object."""
        token = self.session.token
        return SentinelHubSession.from_token(token)


@ray.remote
def remote_function(url: str, config: SHConfig, actor: RaySessionActor) -> None:
    """A function that will run on a worker process.

    It collects a shared session, caches it, and then interacts with Sentinel Hub service
    """
    session = ray.get(actor.get_valid_session.remote())
    SentinelHubDownloadClient.cache_session(session)

    client = SentinelHubDownloadClient(config=config)
    client.get_json(url, use_session=True)


ray.init(ignore_reinit_error=True)

session = SentinelHubSession(config)
actor = RaySessionActor.remote(session)

futures = [remote_function.remote(EXAMPLE_URL, config, actor) for _ in range(10)]
ray.get(futures)

ray.shutdown()
2022-05-25 18:58:29,316 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
2022-05-25 18:58:30,986: Creating a new authentication session with Sentinel Hub service

An implementation of this process for a use case of downloading data from Sentinel Hub Process API can be seen in eo-grow framework, which in combination with eo-learn heavily relies on Ray for large-scale processing.

Dask

Similarly to Ray, Dask implements Dask Actors. We will again put a session object into an actor and let it be accessed and refreshed there.

[12]:
from dask.distributed import Client


class DaskSessionActor:
    """This object has a mutable state and will be accessed by multiple Dask workers
    in a consecutive way."""

    def __init__(self, session: SentinelHubSession):
        self.session = session

    def get_valid_session(self) -> SentinelHubSession:
        """The following makes sure that a token is still valid or refreshed, and returns it in a
        non-refreshing session object."""
        token = self.session.token
        return SentinelHubSession.from_token(token)


def remote_function(url: str, config: SHConfig, actor: DaskSessionActor) -> None:
    """A function that will run on a worker process.

    It collects a shared session, caches it, and then interacts with Sentinel Hub service
    """
    session = actor.get_valid_session().result()
    SentinelHubDownloadClient.cache_session(session)

    client = SentinelHubDownloadClient(config=config)
    client.get_json(url, use_session=True)


dask_client = Client()

session = SentinelHubSession(config)
actor = dask_client.submit(DaskSessionActor, session, actor=True).result()

futures = [dask_client.submit(remote_function, EXAMPLE_URL, config, actor) for _ in range(10)]
for future in futures:
    future.result()

dask_client.shutdown()
dask_client.close()
2022-05-25 18:58:37,566: Creating a new authentication session with Sentinel Hub service