Sentinel Hub Feature Info Service (FIS)

One of the tools in sentinelhub-py package is a wrapper around Sentinel Hub Feature (or Statistical) Info Service (FIS). The service is documented at Sentinel Hub webpage.

The main purpose of FIS service is to enable users obtaining statistics about satellite data without actually having to download large amounts of raster imagery. It is most effective when you have a sparse collection of bounding boxes or polygons spread over a large area and you would only like to know the aggregated values for each of them.

In order to use FIS it is required to have Sentinel Hub account and configured instance ID. For more about configuration details please check prerequisites of notebook about WMS and WCS.

[1]:
%matplotlib inline

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import cm
from shapely.geometry import Polygon

from sentinelhub import FisRequest, BBox, Geometry, CRS, WcsRequest, CustomUrlParam, \
    DataSource, HistogramType
from sentinelhub.time_utils import iso_to_datetime

Note that pandas, seaborn and matplotlib are not dependencies of sentinelhub-py, therefore you will have to install them manually.

[2]:
INSTANCE_ID = ''  # In case you put instance ID into configuration file you can leave this unchanged

Exploring basic statistics

In the first example we will explore Sahara desert. There was a news about a sandstorm in March 2018 reported at Earth Observatory webpage. By a quick search in EO Browser we can see the same sandstorm was also observed by Sentinel-2: link.

Let’s try to find out how to detect such sandstorm only by looking at some basic statistics of Sentinel-2 data. With FIS we can do that without even downloading any image!

First we set our area of interest - the same part of Sahara desert in Algeria, where the sandstorm should be. For time interval we will take a period of 3 months around that time.

[3]:
sahara_bbox = BBox((1.0, 26.0, 1.3, 25.7), CRS.WGS84)
time_interval = ('2018-02-01', '2018-05-01')

In order to obtain data from FIS service we first have to initialize FisRequest class. Just like WmsRequest or WcsRequest it belongs in the same family of DataRequest classes and therefore works in the same way.

We will set the following parameters:

  • layer - name of the layer defined in Sentinel Hub Configurator. In this case we will use the layer with all Sentinel-2 L1C bands,
  • geometry_list - list of geometry objects (BBox or Geometry), statistics will be calculated for each of them separately,
  • time - statistics will be calculated for each acquisition in the give time interval separately,
  • resolution - spatial resolution on which to calculate statistics
  • data_folder - optional parameter for specifying location where the data should be saved locally
[4]:
fis_request = FisRequest(layer='BANDS-S2-L1C',
                         geometry_list=[sahara_bbox],
                         time=time_interval,
                         resolution='60m',
                         data_folder='./data',
                         instance_id=INSTANCE_ID)

The request can be executed by calling the standard get_data method. This will concurrently make calls to FIS for each geometry in the list and collect responses.

[5]:
fis_data = fis_request.get_data(save_data=True)  # Takes about 30s, to avoid redownloading we are saving results

fis_data[0]['C0'][0]
[5]:
{'basicStats': {'max': 0.27570000290870667,
  'mean': 0.16790866606430574,
  'min': 0.13920000195503235,
  'stDev': 0.00859753310640891},
 'date': '2018-04-30'}

The obtained data is a list of FIS responses for each geometry. Each response is a dictionary with statistics for each channel separately. In our case that is 13 channels, 1 for each Sentinel-2 band. Statistics is then further divided per each acquisition.

Let’s parse this data into a pandas data frame.

[6]:
def fis_data_to_dataframe(fis_data):
    """ Creates a DataFrame from list of FIS responses
    """
    COLUMNS = ['channel', 'date', 'min', 'max', 'mean', 'stDev']
    data = []

    for fis_response in fis_data:
        for channel, channel_stats in fis_response.items():
            for stat in channel_stats:
                row = [int(channel[1:]), iso_to_datetime(stat['date'])]

                for column in COLUMNS[2:]:
                    row.append(stat['basicStats'][column])

                data.append(row)

    return pd.DataFrame(data, columns=COLUMNS).sort_values(['channel', 'date'])


df = fis_data_to_dataframe(fis_data)

df
[6]:
channel date min max mean stDev
233 0 2018-02-04 0.1715 0.2577 0.185660 0.005714
232 0 2018-02-09 0.1478 0.2760 0.169501 0.007592
231 0 2018-02-14 0.1165 0.3047 0.170285 0.013389
230 0 2018-02-19 0.1327 0.2658 0.170829 0.015263
229 0 2018-02-24 0.4789 1.0189 0.730869 0.105041
228 0 2018-03-01 0.1468 0.2692 0.166823 0.007059
227 0 2018-03-06 0.1546 0.2754 0.171981 0.007268
226 0 2018-03-11 0.1452 0.2743 0.164977 0.007713
225 0 2018-03-16 0.1129 0.7090 0.282133 0.128851
224 0 2018-03-21 0.2313 0.2595 0.243259 0.002606
223 0 2018-03-26 0.1531 0.2687 0.170627 0.007189
222 0 2018-03-31 0.1621 0.2456 0.174637 0.005697
221 0 2018-04-05 0.1543 0.3441 0.173237 0.009972
220 0 2018-04-10 0.1951 0.9409 0.551534 0.177150
219 0 2018-04-15 0.1593 0.2687 0.174058 0.006738
218 0 2018-04-20 0.1538 0.2712 0.170792 0.007476
217 0 2018-04-25 0.1150 0.4022 0.174694 0.011503
216 0 2018-04-30 0.1392 0.2757 0.167909 0.008598
197 1 2018-02-04 0.1443 0.2704 0.181921 0.006705
196 1 2018-02-09 0.1112 0.2999 0.166571 0.009199
195 1 2018-02-14 0.0904 0.3209 0.166296 0.017122
194 1 2018-02-19 0.1076 0.2698 0.166125 0.017052
193 1 2018-02-24 0.4485 0.9791 0.716185 0.106975
192 1 2018-03-01 0.1196 0.2874 0.164644 0.008791
191 1 2018-03-06 0.1328 0.2918 0.169210 0.008578
190 1 2018-03-11 0.1223 0.2945 0.162785 0.009322
189 1 2018-03-16 0.0960 0.7012 0.274701 0.126075
188 1 2018-03-21 0.2475 0.2869 0.265075 0.003698
187 1 2018-03-26 0.1362 0.2822 0.168156 0.008458
186 1 2018-03-31 0.1473 0.2568 0.172887 0.006702
... ... ... ... ... ... ...
137 11 2018-03-06 0.2879 0.7148 0.544603 0.026648
136 11 2018-03-11 0.2620 0.7116 0.544551 0.025719
135 11 2018-03-16 0.1498 0.7504 0.459147 0.123486
134 11 2018-03-21 0.5960 0.7079 0.659035 0.017083
133 11 2018-03-26 0.3140 0.6940 0.550369 0.025360
132 11 2018-03-31 0.3647 0.6683 0.546176 0.023449
131 11 2018-04-05 0.3458 0.7054 0.555993 0.029377
130 11 2018-04-10 0.0946 0.6714 0.282479 0.117626
129 11 2018-04-15 0.3592 0.6987 0.557640 0.025207
128 11 2018-04-20 0.3500 0.6694 0.556031 0.024699
127 11 2018-04-25 0.1355 0.7007 0.553209 0.047051
126 11 2018-04-30 0.3188 0.6867 0.558972 0.026909
71 12 2018-02-04 0.2227 0.6240 0.445642 0.031939
70 12 2018-02-09 0.1625 0.6709 0.449539 0.034358
69 12 2018-02-14 0.0605 0.6641 0.437482 0.078005
68 12 2018-02-19 0.1575 0.6816 0.426998 0.057292
67 12 2018-02-24 0.1191 0.5793 0.241937 0.064104
66 12 2018-03-01 0.2045 0.6283 0.450406 0.034865
65 12 2018-03-06 0.2367 0.6079 0.441012 0.034247
64 12 2018-03-11 0.2201 0.6263 0.453911 0.035315
63 12 2018-03-16 0.1307 0.6469 0.397522 0.082114
62 12 2018-03-21 0.5117 0.6325 0.579117 0.019452
61 12 2018-03-26 0.2660 0.5908 0.449148 0.033681
60 12 2018-03-31 0.2934 0.5762 0.446576 0.032118
59 12 2018-04-05 0.2936 0.6088 0.459766 0.036881
58 12 2018-04-10 0.1020 0.5807 0.260120 0.095357
57 12 2018-04-15 0.3007 0.6025 0.460584 0.033551
56 12 2018-04-20 0.2926 0.5892 0.462507 0.035175
55 12 2018-04-25 0.1031 0.5932 0.447324 0.047799
54 12 2018-04-30 0.2652 0.6058 0.468056 0.036688

234 rows × 6 columns

The following code will plot timeseries of mean values (with standard deviation) for each band.

[7]:
BANDS = 'B01,B02,B03,B04,B05,B06,B07,B08,B8A,B09,B10,B11,B12'.split(',')

plt.figure(figsize=(12, 8))
for channel, (band, color) in enumerate(zip(BANDS, cm.jet(np.linspace(0, 1, 13)))):
    channel_df = df[df.channel == channel]
    plt.plot(channel_df.date, channel_df['mean'], '-o', markeredgewidth=1,
             color=color, markerfacecolor='None', label=band)
    plt.fill_between(list(channel_df.date),  channel_df['mean'] - channel_df['stDev'],
                     channel_df['mean'] + channel_df['stDev'], alpha=0.2, color=color)

plt.legend(loc='upper right');
../_images/examples_fis_request_13_0.png

As there is no vegetation in the desert the reflectance values shouldn’t change much over time. In our case it appears there are 4 acquisitions with different values that the rest - 5th, 9th, 10th and 14th acquisitions.

For all 4 acquisitions reflectance values are higher in visual spectra. However for 5th, 9th and 14th acquisition SWIR values (bands B11 and B12) are lover than average while for the 10th acquisition these values are higher than average. Hence this can lead us to conclusion that 5th, 9th and 14th acquisition contain clouds while 10th acquisition is the one with the sandstorm.

Just to be sure let’s download true color images for these acquisitions and visually verify the results.

[8]:
wcs_request = WcsRequest(layer='TRUE-COLOR-S2-L1C',
                         bbox= sahara_bbox,
                         time=time_interval,
                         resx='60m', resy='60m',
                         instance_id=INSTANCE_ID,
                         custom_url_params={CustomUrlParam.SHOWLOGO: False})

images = wcs_request.get_data()
[9]:
fig, axs = plt.subplots((len(images) + 2) // 3, 3, figsize=(10, 20))

for idx, (image, time) in enumerate(zip(images, wcs_request.get_dates())):
    axs.flat[idx].imshow(image)
    axs.flat[idx].set_title(time.date().isoformat())

fig.tight_layout()
../_images/examples_fis_request_16_0.png

Comparing histograms

Besides basic statistics FIS can also provide histograms of values split into specified number of bins. This allows us to analyze distribution of values without having to download entire images.

Let’s compare NDVI value distributions of the following bounding boxes and polygons:

[10]:
bbox1 = BBox([46.16, -16.15, 46.51, -15.58], CRS.WGS84)
bbox2 = BBox((1292344.0, 5195920.0, 1310615.0, 5214191.0), CRS.POP_WEB)

geometry1 = Geometry(Polygon([(-5.13, 48),
                              (-5.23, 48.09),
                              (-5.13, 48.17),
                              (-5.03, 48.08),
                              (-5.13, 48)]),
                     CRS.WGS84)
geometry2 = Geometry(Polygon([(1292344.0, 5205055.5),
                              (1301479.5, 5195920.0),
                              (1310615.0, 5205055.5),
                              (1301479.5, 5214191.0),
                              (1292344.0, 5205055.5)]),
                     CRS.POP_WEB)

Histogram can only be obtained by specifying parameter bins, which is the number of bins into which values should be divided. There are multiple ways of dividing values into bins currently supported:

[11]:
list(HistogramType)
[11]:
[<HistogramType.EQUALFREQUENCY: 'equalfrequency'>,
 <HistogramType.EQUIDISTANT: 'equidistant'>,
 <HistogramType.STREAMING: 'streaming'>]

In our case we will divide values into 20 bins and the size of each bin will be the same. We will use Landsat 8 data. For simplification we select such time interval that there is only 1 acquisition available.

[12]:
ndvi_script = 'return [(B05 - B04) / (B05 + B04)]'

histogram_request = FisRequest(layer='TRUE-COLOR-L8',
                               geometry_list=[bbox1, bbox2, geometry1, geometry2],
                               time=('2018-06-10', '2018-06-15'),
                               resolution='100m',
                               bins=20,
                               histogram_type=HistogramType.EQUIDISTANT,
                               data_source=DataSource.LANDSAT8,
                               custom_url_params={CustomUrlParam.EVALSCRIPT: ndvi_script},
                               instance_id=INSTANCE_ID)

histogram_data = histogram_request.get_data()
[13]:
histogram_data[0]
[13]:
{'C0': [{'basicStats': {'max': 0.8366492390632629,
    'mean': 0.1811541665712424,
    'min': -0.7464028596878052,
    'stDev': 0.42411094244425696},
   'date': '2018-06-14',
   'histogram': {'bins': [{'count': 15504.0, 'lowEdge': -0.7464028596878052},
     {'count': 17223.0, 'lowEdge': -0.6672502547502518},
     {'count': 6786.0, 'lowEdge': -0.5880976498126984},
     {'count': 3744.0, 'lowEdge': -0.5089450448751449},
     {'count': 1972.0, 'lowEdge': -0.42979243993759153},
     {'count': 2149.0, 'lowEdge': -0.35063983500003815},
     {'count': 2305.0, 'lowEdge': -0.2714872300624847},
     {'count': 2342.0, 'lowEdge': -0.19233462512493127},
     {'count': 2145.0, 'lowEdge': -0.11318202018737789},
     {'count': 2188.0, 'lowEdge': -0.0340294152498245},
     {'count': 3874.0, 'lowEdge': 0.04512318968772888},
     {'count': 9674.0, 'lowEdge': 0.12427579462528238},
     {'count': 17459.0, 'lowEdge': 0.20342839956283576},
     {'count': 44026.0, 'lowEdge': 0.28258100450038914},
     {'count': 42916.0, 'lowEdge': 0.36173360943794264},
     {'count': 28010.0, 'lowEdge': 0.4408862143754959},
     {'count': 16899.0, 'lowEdge': 0.5200388193130494},
     {'count': 11092.0, 'lowEdge': 0.5991914242506029},
     {'count': 5956.0, 'lowEdge': 0.6783440291881562},
     {'count': 852.0, 'lowEdge': 0.7574966341257097}]}}]}

For each of the 4 areas we got back a FIS response. Each response contains statistics for NDVI channel and a single acquisition date. Besides basic statistics it also contains a histogram.

Let’s plot them:

[14]:
plot_data = []
for idx, fis_response in enumerate(histogram_data):
    bins = fis_response['C0'][0]['histogram']['bins']

    counts = [value['count'] for value in bins]
    total_counts = sum(counts)
    counts = [round(100 * count / total_counts) for count in counts]

    bin_size = bins[1]['lowEdge'] - bins[0]['lowEdge']
    splits = [value['lowEdge'] + bin_size / 2 for value in bins]

    data = []
    for count, split in zip(counts, splits):
        data.extend([split] * count)
    plot_data.append(np.array(data))


fig, ax = plt.subplots(figsize=(12, 8))
ax = sns.violinplot(data=plot_data, ax=ax)
ax.set(xticklabels=['Area {}'.format(idx) for idx in range(len(histogram_data))])
plt.ylabel('NDVI', fontsize=15);
../_images/examples_fis_request_25_0.png