The Time Machine, identify drought events#

Hello there! 👋 We’re excited to take you on an insightful journey through our interactive notebook designed to explore drought events across the globe over time. Our main focus will be on understanding how droughts have varied both geographically and temporally.

To reach this goal we need to recognize and analyze drought occurrences worldwide from the year 1940 up to the present. We will do this by examining the Standardized Precipitation Evapotranspiration Index (SPEI) values, which help us understand moisture deficit better.

📌 Since each of you has different needs and interests, we’ve made the notebook as interactive as possible. We’ll guide you through the cells, but you’ll have the freedom to choose what to focus on: the geographical area, the type of index aggregation, and the time period.

What is SPEI?#

The SPEI is a powerful index used by scientists to determine drought conditions. It considers both precipitation and evapotranspiration (the sum of evaporation and plant transpiration from the Earth’s surface to atmosphere) to give a standardized measure of moisture adequacy in different regions and times. You can find more info on the dedicate page of our handbook.

Data Source#

The data we will use comes from ERA5, one of the most comprehensive atmospheric data services available. Specifically, we are working with ‘nc’ files, which are a type of data file used for storing complex scientific data in a format that can be accessed and processed efficiently (to get deeper see the dedicate page).

Let’s dive in and start our exploration to better understand the patterns and impacts of droughts around the world! 🌍

What we will do#

In this notebook, we will explore drought events from various points of view:

  1. By selecting a geographic area and a month of the year, we will observe the evolution of drought conditions from 1940 to the present using a slider of maps.

  2. We will study the same evolution using a scatterplot made with median and mean values for that area.

  3. We will delve deeper into the details with a boxplot chart for the same month and area.

  4. We will change the time dimension by looking at the distribution of the median values for a certain year across the twelve months.

  5. We will see the evolution over a range of years of our choice via a stripe chart, split by years and months.

  6. Finally, we will compare different accumulation windows to see the difference among them.

Setting Up the Environment#

Before we dive into the data analysis, we need to ensure our notebook has all the necessary tools and libraries. The following cell installs various Python packages that will help us manipulate data, create visualizations, and interact with our notebook more effectively.

If you haven’t installed these packages, uncomment the lines below and run the cell, otherwise skip it.

For more information on this step, you can refer to the Setting Up page in the handbook.

# !pip install numpy 
# !pip install xarray 
# !pip install netCDF4 
# !pip install "dask[complete]"
# !pip install folium
# !pip install matplotlib 
# !pip install plotly
# !pip install -U kaleido
# !pip install ipywidgets
# !pip install pyprojroot
# !pip install ipywidgets
# !pip install jupyterlab_widgets # only for JupyterLab environment

Importing the libraries#

Now we import the necessary Python libraries and modules that we’ll use throughout our analysis.

  • ipywidgets and IPython.display: to create interactive elements (like dropdowns) and display outputs within the notebook.

  • functools.partial: used to create partial functions: we can fix a certain number of arguments of a function and generate a new function.

  • datetime: for handling dates and times.

  • warnings: This module is used to control the display of warnings.

  • warnings.filterwarnings("ignore", category=RuntimeWarning) tells Python to ignore specific runtime warnings that might not be critical to halt our analysis, making the notebook output cleaner and focusing on essential messages.

You may notice that while we explicitly install some packages using pip, others are imported directly without a corresponding installation command. This is beacuse they are standard library, which comes bundled with Python (as datetime), or pre-installed with Jupyter environments (as ipywidgets).

We need to import also 4 custom modules from the utils folder: widgets_handler, coordinates_retrieve, data_preprocess and charts. These modules contain custom functions tailored to handle widgets, retrieve coordinates, preprocess data, and create charts. We will use the pyprojroot.here() function to locate them.

import sys
from pyprojroot import here
root = here()
sys.path.append(str(root / "chapters/shared/")) # Add the path to the utils and data directories
from ipywidgets import Layout, Dropdown, widgets
from IPython.display import display, clear_output, IFrame, Markdown
from functools import partial
import datetime
import numpy as np
import utils.widgets_handler as widgets_handler
import utils.coordinates_retrieve as coordinates_retrieve
import utils.data_preprocess as data_preprocess
import utils.charts as charts
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

The initialization#

This cell sets up the initial state and interface for user interactions concerning drought data selection based on geographical and temporal parameters:

  • The variables country_list, months and accumulation_windows are initialized by loading data from JSON files using functions from the widgets_handler module. They contain, respectively, a list of worldwide countries and their first- and second-level administrative subareas, the 12 months, and the specific periods over which the SPEI values are calculated (e.g., 1 month, 3 months, etc.).

  • The subset_area, bounding_box and active_btn variables are initialized to hold the state of user selections and actions.

  • The selected dictionary is designed to hold the current selections of various parameters like country, administrative subareas, accumulation_window, month, and year.

  • placeholders provides placeholder text for each dropdown or interactive widget when no selection is made.

  • widgets_handler.save_selection(placeholders) function call saves the initialized placeholder values into a selection.json file to keep trace of the user selections.

country_list = widgets_handler.read_json_to_sorted_dict('countries.json')
months = widgets_handler.read_json_to_dict('months.json')
accumulation_windows = widgets_handler.read_json_to_dict('accumulation_windows.json')
subset_area = None
bounding_box = (None, None, None, None)
active_btn = None


selected = {
    "country": None,
    "adm1_subarea": None,
    "adm2_subarea": None,
    "accumulation_window": None,
    "month": None,
    "year": None,
    "year_range": ["2024", "2024"],
    "accumulation_windows_multiple" : ["1 month", "6 months", "12 months"],
    "twenty_years": None
}

placeholders = {
    "country": "no country selected...",
    "adm1_subarea": "no adm1 subarea selected...",
    "adm2_subarea": "no adm2 subarea selected...",
    "accumulation_window": "no accumulation window selected...",
    "month": "no month selected...",
    "year": "no year selected...",
    "year_range": ["2024", "2024"],
    "accumulation_windows_multiple" : ["1 month", "6 months", "12 months"],
    "twenty_years": "no period selected..."
}


widgets_handler.save_selection({
    "country": "South Sudan",
    "adm1_subarea": "no adm1 subarea selected...",
    "adm2_subarea": "no adm2 subarea selected...",
    "accumulation_window": "12 months",
    "month": "August",
    "year": "no year selected...",
    "year_range": ["2024", "2024"],
    "accumulation_windows_multiple" : ["1 month", "6 months", "12 months"],
    "twenty_years": "no period selected..."
})

The next cell sets up and configures the user interface to ensure interactivity.
We have dropdown widgets to select the country (or its subareas), the period (month, year, or a range of years), and the SPEI index accumulation_window.
The options in these dropdowns are dynamically populated from previously loaded JSON files or generated lists (such as the list of years from 1940 to the current year).
A selectors dictionary organizes all the selector widgets for efficient access and management in the code.
Separate Get data buttons are configured for different types of data retrieval based on the selections made via the dropdown menus.
An output_area widget is included to display results or messages dynamically based on the user’s selections and interactions with the buttons.

# Custom style and layout for descriptions and dropdowns
style = {'description_width': '200px'}
dropdown_layout = Layout(width='400px', display='flex', justify_content='flex-end')
range_layout = Layout(width='400px')
btn_layout = Layout(width='400px')
selectmultiple_layout = Layout(width='400px', display='flex', justify_content='flex-end')

# Dropdown for countries
country_names = [country['name'] for country in country_list]
country_selector = widgets.Dropdown(
    options=[placeholders['country']] + country_names,
    description='Select a country:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for subareas, initially empty
adm1_subarea_selector = widgets.Dropdown(
    options=[placeholders['adm1_subarea']],
    description='a subarea of first level:',
    style=style,
    layout=dropdown_layout
)

adm2_subarea_selector = widgets.Dropdown(
    options=[placeholders['adm2_subarea']],
    description='or of second level:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for accumulation_windows
accumulation_window_selector = widgets.Dropdown(
    options=[placeholders['accumulation_window']] + list(accumulation_windows.keys()),
    description='Select a accumulation window:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for months
month_selector = widgets.Dropdown(
    options=[placeholders['month']] + list(months.keys()),
    description='Select a month:',
    style=style,
    layout=dropdown_layout
)

# Dropdown for years
current_year = datetime.datetime.now().year
years_options = [str(year) for year in range(1940, current_year + 1)]

year_selector = widgets.Dropdown(
    options=[placeholders['year']] + years_options,
    description='Select a year:',
    disabled=False,
    style=style,
    layout=dropdown_layout
)

# SelectionRangeSlider for years
year_range_selector = widgets.SelectionRangeSlider(
    options=years_options,
    index=(len(years_options) - 1, len(years_options) - 1),
    description='Select the year range:',
    disabled=False,
    style=style,
    layout=range_layout
)

# Multiple selector for accumulation windows
accumulation_windows_multiple_selector = widgets.SelectMultiple(
    options=list(accumulation_windows.keys()),
    value=["1 month", "6 months", "12 months"],
    rows=7, 
    description='Accumulation:',
    disabled=False,
    style=style,
    layout=selectmultiple_layout
)


# Dropdown for twenty years
twenty_years_selector = widgets.Dropdown(
    options=[placeholders['twenty_years']] + ["1943-1963", "1963-1983", "1983-2003", "2003-2023", f"1940-{current_year}"],
    description='Select a twenty-year period:',
    disabled=False,
    style=style,
    layout=dropdown_layout
)

selectors = {
    "country" : country_selector,
    "adm1_subarea": adm1_subarea_selector,
    "adm2_subarea": adm2_subarea_selector,
    "accumulation_window": accumulation_window_selector,
    "month": month_selector,
    "year": year_selector,
    "year_range": year_range_selector,
    "accumulation_windows_multiple": accumulation_windows_multiple_selector,
    "twenty_years": twenty_years_selector
}


month_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info',
    tooltip='Click me',
    icon='filter',
    layout=btn_layout
)
month_widgets_btn.custom_name='month_widgets_btn'


year_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info',
    tooltip='Click me',
    icon='filter',
    layout=btn_layout
)
year_widgets_btn.custom_name='year_widgets_btn'

year_range_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info',
    tooltip='Click me',
    icon='filter',
    layout=btn_layout
)
year_range_widgets_btn.custom_name='year_range_widgets_btn'


accumulation_windows_widgets_btn = widgets.Button(
    description='Get data',
    disabled=False,
    button_style='info',
    tooltip='Click me',
    icon='filter',
    layout=btn_layout
)
accumulation_windows_widgets_btn.custom_name='accumulation_windows_widgets_btn'


# Output area for display updates
output_area = widgets.Output()

The functions in the next cell handle user input, process data based on those inputs, and update the notebook interface accordingly. Here’s a summary of the key components:

  • setup_observers function: sets up event listeners (observers) for UI widgets, specifically for the country selector dropdown. This function ensures efficient setup by setting the observers only once. When the value of the country selector changes, it triggers a function to update related subarea dropdowns based on the selected country. It uses a custom attribute to prevent multiple instances of observer setup.

  • update_and_get_data function: handles data retrieval and UI updates based on user interactions, such as button clicks. It processes the selections, validates them, retrieves the relevant data, and updates the output area with the results and a map display. It ientifies which button was pressed and updates month/year selections accordingly; validates selections and, if valid, clears the output, retrieves geographic boundaries, and fetches data based on these; displays the fetched data and a map centered on the selected region.

  • on_button_clicked function: acts as a trigger for button clicks, calling update_and_get_data with the appropriate button identifier.

The observers (event listeners) are set up via setup_observers() at the end of the cell to ensure all widgets are ready to handle user input as soon as the notebook is run.

def setup_observers():
    """
    Sets up observers for UI widgets to handle interactions and updates dynamically in a graphical user interface.
    This function ensures that observers are only set once using a function attribute to track whether observers have
    already been established, enhancing efficiency and preventing multiple bindings to the same event.

    Observer is attached to widgets for country selection. This observer triggers specific functions when the 'value' property 
    of the widgets changes, facilitating responsive updates to the user interface
    based on user interactions.

    Notes:
    - This function uses a custom attribute `observers_set` on itself to ensure observers are set only once.
    """
    if not hasattr(setup_observers, 'observers_set'):      
            # When 'value' changes, update_subareas function will be called to update the dropdown menus
            # Create a partial function that includes the additional parameters
            country_selector.observe(partial(widgets_handler.update_subareas, 
                                         country_list=country_list, 
                                         placeholders=placeholders,
                                         adm1_subarea_selector=adm1_subarea_selector, 
                                         adm2_subarea_selector=adm2_subarea_selector), 'value')
            # Set a flag to indicate observers are set
            setup_observers.observers_set = True


            

def update_and_get_data(btn_name):
    """
    Update and retrieve data based on user interactions and selections.

    This function handles user interactions, validates selections, calculates geographic bounding boxes,
    fetches the corresponding data subset, and updates the output area with relevant information and a map display.
    Additionally, if the button name is 'accumulation_windows_widgets_btn', the function iterates over different
    accumulation windows and fetches data for each, returning a dictionary of the data subsets.
    
    Parameters:
    btn_name (str): The name of the button that triggered the interaction.

    Global Variables:
    selected (dict): Dictionary containing current selections for various parameters.
    placeholders (dict): Dictionary of placeholder values.
    output_area (OutputArea): The output area widget to display messages and results.
    subset_data (xarray.DataArray or dict of xarray.DataArray): Subset of data fetched based on the bounding box.
    index (str): Index for the subset data, constructed from accumulation_window value.
    bounding_box (tuple): Bounding box coordinates (min_lon, min_lat, max_lon, max_lat) for the selected area.
    active_btn (str): The name of the currently active button.

    Steps:
    1. Set the active button name.
    2. Update the month and year selections based on the button interaction.
    3. Validate the current selections.
    4. If selections are valid:
       a. Clear the output area.
       b. Retrieve the geographic boundaries for the selected area.
       c. Calculate the bounding box for the selected area.
       d. Fetch the data subset based on the bounding box.
       e. Determine the administrative level, selected area name, accumulation_window, and time period.
       f. Print information about the uploaded subset data.
       g. Display the map with the bounding box and appropriate zoom level.


    Notes:
    - The function assumes the existence of utility functions within the 'uti' module for handling interactions, validations, 
      data fetching, and map display.
    - The global variables should be properly initialized before calling this function.
    """
    global selected, placeholders, output_area, subset_data, index, bounding_box, active_btn
    map_display = None
    active_btn = btn_name
    widgets_handler.month_year_interaction(btn_name, month_selector, year_selector, selected, placeholders)
    if widgets_handler.validate_selections(btn_name, selected, selectors, placeholders, output_area):
        with output_area:
            output_area.clear_output(wait=True)
            coordinates = coordinates_retrieve.get_boundaries(selected, country_list, placeholders)
            # print(coordinates)
            bounding_box = coordinates_retrieve.calculate_bounding_box(coordinates)
            print('geographic coordinates (min_lat, min_lon, max_lat, max_lon): ', bounding_box)            
                        
            # sample_coordinates = coordinates[:3] # Showing first 3 coordinates for brevity            
            # print('Original Coordinates Sample: ', sample_coordinates)  
            # print('Bounding Box: ', bounding_box)
                        
            # Fetching data using the bounding box            
            if btn_name == 'accumulation_windows_widgets_btn':
                accumulation_window = ', '.join(selectors['accumulation_windows_multiple'].value)                
                subset_data = {}
                for window in selectors['accumulation_windows_multiple'].value:
                    selectors['accumulation_window'].value = window
                    single_index_data = data_preprocess.get_xarray_data(btn_name, bounding_box, selectors, placeholders, months, accumulation_windows)
                    subset_data[window] = single_index_data
                    print(f"SPEI {window} subset data uploaded, wait...")
            else: 
                subset_data = data_preprocess.get_xarray_data(btn_name, bounding_box, selectors, placeholders, months, accumulation_windows)
                accumulation_window = selected['accumulation_window']
                index = f"SPEI{accumulation_windows[selectors['accumulation_window'].value]}"
            adm_level, selected_area = widgets_handler.get_adm_level_and_area_name(selected, placeholders)
            time_period = widgets_handler.get_period_of_time(btn_name, selected, placeholders)
                
            print(f"SPEI subset data uploaded for {selected_area}, administrative level {adm_level}, accumulation_window {accumulation_window}, period {time_period}")
            zoom_start = 4
            if adm_level == 'ADM1' or adm_level == 'ADM2':
                zoom_start = 8  
            map_display = coordinates_retrieve.display_map(bounding_box, zoom_start)
            map_iframe = coordinates_retrieve.display_map_in_iframe(map_display)
            display(map_iframe)

            
# Set up widget interaction
def on_button_clicked(btn):
    update_and_get_data(btn.custom_name)
    
# Setup observers
setup_observers()

The cell belowed is designed to reload and set up the user interface widgets based on previously saved selections, enhancing user experience by maintaining state across sessions or after a notebook refresh. It begins by loading previously saved selections from the selection.json file.
Then it restore the widget states: the values for the country, administrative subareas, accumulation_window, and month widgets are restored using data from the previously saved selections. If no previous data exists for a particular widget, it defaults to the placeholder value.
The on_click event for the Get data button (month_widgets_btn) is configured to trigger the on_button_clicked function when clicked. This function is responsible for initiating the data fetching and processing based on current widget selections.
Finally, all the widgets along with the output area are displayed. This includes the dropdown selectors for country, subareas, and accumulation_window, the month selector, and the button for initiating data retrieval. The output_area is where messages, errors, or the results (like maps or data summaries) will be shown after the user interacts with the widgets.

Data selection#

It’s time for your first analysis. The goal is to determine whether drought is or was present in a specific region during a given period (month) and to analyze how it evolves over time (from 1940): select from the dropdown menus the geographic area you are interested in, the accumulation window of the SPEI index and the referring month.
Regarding the choice of the area, please take into account that the larger the area, the more computational power and time it will take to retrieve the data. So, if your device is not powerful, choose smaller areas, such as second-level subareas.

If you click the Get data button before choosing the necessary options from the dropdown menu, a message will be displayed under the widgets’ block explaining what you missed.

If all the selections are made, you will receive three messages:

  1. The retrieval of the coordinates for the selected area was successful (‘Coordinates retrieved for…’).

  2. The data retrieval was successful (‘SPEI subset data uploaded for…’)

  3. A map of the selected area is displayed, which can help you check if the area is the one you are interested in.

If you are in doubt about which options to choose, you can rely on the default ones: South Sudan is a country that is experiencing a severe and prolonged drought, it can therefore be useful to study its evolution over the years.

# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
accumulation_window_selector.value = previous_selection.get('accumulation_window', placeholders['accumulation_window'])
month_selector.value = previous_selection.get('month', placeholders['month'])
month_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, accumulation_window_selector, month_selector, month_widgets_btn, output_area)

Now you have retrieved the data of your interest in a variable named subset_data[index], where index is the SPEI index you have chosen.

Using the data_preprocess.display_data_details function, you can examine your data to check the following:

  • If the values chosen from the dropdown menu are correct.

  • The number of time, latitude, and longitude values present.

  • A sample of the first SPEI values.

data_preprocess.display_data_details(active_btn, selected, subset_data[index])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 data_preprocess.display_data_details(active_btn, selected, subset_data[index])

NameError: name 'subset_data' is not defined

It’s possible that the subset of data you select may contain missing or invalid entries. This could be due to factors like the data being from distant years, a malfunctioning remote sensor, or simply because the data point is over a sea. For instance, a value like -9999 is not acceptable. Additionally, some date formats might differ, causing further discrepancies.

The following function process_datarray is designed to tackle these issues and returns:

  • the processed_subset, which is the cleaned and corrected version of your data.

  • the change_summary, detailing the corrections applied to the original subset.

processed_subset, change_summary = data_preprocess.process_datarray(subset_data[index])
display(processed_subset)
display(Markdown('**Change summary:**'))
for key, val in change_summary.items():
    print(key, val)
<xarray.DataArray 'SPEI12' (time: 84, lat: 36, lon: 48)> Size: 1MB
dask.array<getitem, shape=(84, 36, 48), dtype=float64, chunksize=(1, 36, 48), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 384B 24.25 24.5 24.75 25.0 ... 35.25 35.5 35.75 36.0
  * lat      (lat) float64 288B 3.5 3.75 4.0 4.25 4.5 ... 11.5 11.75 12.0 12.25
  * time     (time) datetime64[ns] 672B 1940-08-01T06:00:00 ... 2023-08-01T06...
Attributes:
    long_name:  Standardized Drought Index (SPEI12)
    units:      -

Change summary:

invalid_values_replaced 1894
invalid_ratio 1.3
duplicates_removed 0
cftime_conversions 0

The geographical distribution over time#

An effective way to assess drought conditions is by visually examining a series of maps in a timelapse format, with one map representing each year for the chosen month. This allows you to easily observe changes and trends in drought severity over time.

The following function plot_geographical_distribution illustrates drought conditions using two distinct colors: brown for drought and green for wet areas. The intensity of the colors indicates the severity of the conditions: deeper shades of brown represent more severe drought, while deeper shades of green indicate wetter conditions.

The geographic area is divided into regular squares, each corresponding to the 0.25-degree grid of the climate model (refer to the dedicated chapter for more details).

Run the cell and explore the menu bar at the bottom. Here some suggestion on what you can look for and examine using these maps:

  • Identify drought patterns: look for areas where the brown color is consistently deep or widespread, indicating regions that are experiencing severe and prolonged drought conditions.

  • Track changes over time: use the timelapse feature to observe how drought conditions evolve over the years. Pay attention to any patterns of expansion or contraction in the drought areas.

  • Compare wet and dry periods: contrast periods where green (wet) areas dominate with those where brown (dry) areas are more prevalent. This can help identify cycles of wet and dry periods.

  • Regional analysis: focus on specific regions within the geographic area to see how drought conditions vary from one place to another. Some regions may be more prone to drought than others.

  • Seasonal Variations: if the maps cover different seasons within each year, examine how drought conditions change with the seasons. This can provide insights into the impact of seasonal climate patterns.

  • Long-term Trends: examine the maps for any long-term trends, such as increasing or decreasing drought severity over decades. This can be valuable for understanding the broader climate trends in the region.

  • Anomalies: look for any unusual patterns or anomalies in the data, such as sudden changes in drought conditions that might indicate an unusual event or error in data collection.

charts.plot_geographical_distribution(processed_subset)

The importance of median and mean values#

To validate your observations and insights, you can check some key statistical measures. The compute_stats function calculates the most significant statistics for your data subset. For now, we’ll focus on the mean and the median.

The mean (average) gives you the average level of drought severity across the entire geographic area for each year relative to the considered month. This helps you understand the general intensity of drought across the whole region or during a specific period. However, the mean can be influenced by extreme values, so it might not always represent the “typical” condition.

The median is the middle value when all data points are sorted from least to most severe. Unlike the mean, the median isn’t affected by outliers or extreme values. In the context of drought, the median gives you a better sense of the “typical” condition in the region. For instance, if the median drought index is low, it suggests that more than half of the area is experiencing less severe drought, even if a few parts are much worse.

By comparing the mean and median, you can get a clearer picture of drought conditions. If the mean is much higher than the median, it suggests that there are some very severe droughts in parts of the area that are raising the average, even if most of the region is less affected. Conversely, if the mean and median are close, it indicates a more consistent level of drought severity across the area.

stat_values = data_preprocess.compute_stats(processed_subset)

To visually grasp these concepts, we use a scatterplot chart that displays the mean (circle symbol) and the median (diamond symbol) over the years.

charts.create_scatterplot(stat_values, accumulation_windows, selected, placeholders)

The scatterplot shows us the central tendencies (mean and median), but to get a complete picture of the data distribution, we also need to understand the spread of drought severity across the region, including the range, quartiles, and potential outliers.

This is where a boxplot becomes invaluable. A boxplot provides context to the mean and median values, helps visualize data distribution, spots outliers, and aids in understanding variability. The width of the box (interquartile range) shows how much conditions vary within the middle 50% of the data, while the whiskers extend to the minimum and maximum values, giving a comprehensive view of the range.

So let’s create our boxplot:

charts.create_boxplot(stat_values, accumulation_windows, selected, placeholders)

A year in focus#

Once we’ve identified a peculiar event in a specific year for our chosen month, it can be useful to check if the other months of that same year experienced similar conditions. To do this, we’ll need to reset our selection, focusing on the entire year in question.

Running the cell below, you’ll see the dropdown menus with your previous selection of area and accumulation window. Simply choose the year and click the Get data button:

# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
accumulation_window_selector.value = previous_selection.get('accumulation_window', placeholders['accumulation_window'])
year_selector.value = previous_selection.get('year', placeholders['year'])
year_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, accumulation_window_selector, year_selector, year_widgets_btn, output_area)

Examine your data and check if the referring period now is an year (and not a month):

data_preprocess.display_data_details(active_btn, selected, subset_data[index])
Country:  South Sudan
ADM1 subarea:  no adm1 subarea selected...
ADM2 subarea:  no adm2 subarea selected...
Year:  2023
accumulation_window:  12 months 

Time values in the subset: 13
Latitude values in the subset: 36
Longitude values in the subset: 48 

Data sample:  [[-2.6619802  -2.56654295 -2.17218893 -1.60352846 -3.06478493]
 [-2.30817321 -2.91754312 -1.72772561 -4.89807169 -2.76308567]
 [-4.62586418 -2.22838894 -3.12197531 -3.19537283 -2.62631398]
 [-2.27710202 -2.5525915  -2.77689676 -1.95485749 -2.57246632]
 [-1.50284922 -3.08933909 -2.82008094 -2.20390264 -2.39344491]]

Clean your data:

processed_subset, change_summary = data_preprocess.process_datarray(subset_data[index])
display(processed_subset)
display(Markdown('**Change summary:**'))
for key, val in change_summary.items():
    print(key, val)
<xarray.DataArray 'SPEI12' (time: 12, lat: 36, lon: 48)> Size: 166kB
dask.array<getitem, shape=(12, 36, 48), dtype=float64, chunksize=(1, 36, 48), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 384B 24.25 24.5 24.75 25.0 ... 35.25 35.5 35.75 36.0
  * lat      (lat) float64 288B 3.5 3.75 4.0 4.25 4.5 ... 11.5 11.75 12.0 12.25
  * time     (time) datetime64[ns] 96B 2023-01-01T06:00:00 ... 2023-12-01T06:...
Attributes:
    long_name:  Standardized Drought Index (SPEI12)
    units:      -

Change summary:

invalid_values_replaced 26
invalid_ratio 0.12
duplicates_removed 1
cftime_conversions 0

Calculate the median and see the result displayed as a line chart from January to December: the scatterplot provided a broader view across years, while the line chart offers a zoomed-in perspective on one year.

stat_values = data_preprocess.compute_stats(processed_subset, full_stats=False)

A month-by-month view within a single year could help you to identify:

  • Seasonal patterns: the line chart can highlight seasonal trends or cycles within the year. For example, you might see worsening conditions during certain months, which could correlate with typical dry seasons.

  • Consistency and variability: the chart can reveal whether the peculiar event identified in the scatterplot is isolated to a specific month or if it persists across multiple months. If the line chart shows a consistent pattern throughout the year, it might suggest a prolonged period of drought. On the other hand, if the line chart fluctuates, the event might be more localized or temporary.

charts.create_linechart(stat_values, accumulation_windows, selected, placeholders)

Combining years and months#

Finally, to have a more comprehensive view of the drought phenomenon in a specific area, combining data across years and months in a stripe chart. Select a range of years from the slider below and ckick the Get data button. Keep in mind that the larger the range, the more computer resources and elaboration time will be required.

# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
accumulation_window_selector.value = previous_selection.get('accumulation_window', placeholders['accumulation_window'])
year_range_selector.value = tuple(previous_selection.get('year_range'))
year_range_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, accumulation_window_selector, year_range_selector, year_range_widgets_btn, output_area)

Review the data and clean it as needed:

data_preprocess.display_data_details(active_btn, selected, subset_data[index])
Country:  South Sudan
ADM1 subarea:  no adm1 subarea selected...
ADM2 subarea:  no adm2 subarea selected...
Year range:  ('2003', '2023')
accumulation_window:  12 months 

Time values in the subset: 253
Latitude values in the subset: 36
Longitude values in the subset: 48 

Data sample:  [[-0.00723115 -0.33742989  0.09283218 -0.20119271  0.58562874]
 [ 0.10346959  0.36178612 -0.27015748 -0.36490642  0.10695831]
 [-0.82519553  0.0786707  -0.44669251  0.31100318 -0.65462796]
 [-0.20311233 -0.17688248  0.30441433 -0.4889074  -0.22180889]
 [ 0.1458042  -0.2456796   0.42952066 -0.20736085 -0.02492739]]
processed_subset, change_summary = data_preprocess.process_datarray(subset_data[index])
display(processed_subset)
display(Markdown('**Change summary:**'))
for key, val in change_summary.items():
    print(key, val)
<xarray.DataArray 'SPEI12' (time: 252, lat: 36, lon: 48)> Size: 3MB
dask.array<getitem, shape=(252, 36, 48), dtype=float64, chunksize=(1, 36, 48), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 384B 24.25 24.5 24.75 25.0 ... 35.25 35.5 35.75 36.0
  * lat      (lat) float64 288B 3.5 3.75 4.0 4.25 4.5 ... 11.5 11.75 12.0 12.25
  * time     (time) datetime64[ns] 2kB 2003-01-01T06:00:00 ... 2023-12-01T06:...
Attributes:
    long_name:  Standardized Drought Index (SPEI12)
    units:      -

Change summary:

invalid_values_replaced 506
invalid_ratio 0.12
duplicates_removed 1
cftime_conversions 0

Calculate the median:

stat_values = data_preprocess.compute_stats(processed_subset, full_stats=False)

Create the stripe charts: the first one showing the years and months along the x-axis, the second showing years along the x-axis and the months along the y-axis.

These charts allow you to quickly identify temporal patterns, showing how drought conditions evolve over both months and years.

charts.create_stripechart(stat_values, accumulation_windows, selected, placeholders)
charts.create_stripechart(stat_values, accumulation_windows, selected, placeholders, 'year')

Comparing accumulation windows#

In the page on drought indices, we discussed the importance of accumulation windows in evaluating drought events. Our final chart allows you to compare different accumulation windows within the same area over a selected 20-year period.

An area chart is ideal for this comparison. Use the new widget block to select your desired 20-year period and the accumulation windows you want to compare.

By default, we suggest three accumulation windows: 1 month, 6 months, and 12 months—these cover a broad range of drought analyses. However, you can customize the selection by adding, removing, or changing these options. Keep in mind that selecting more windows may require additional computing power and could make the chart harder to read.

One last note: In the period selection, you can also choose to analyze the entire available dataset (from 1940 to today). For this, adequate computing power is recommended.

# Update existing selectors
previous_selection = widgets_handler.read_json_to_dict('selection.json')

# Set up widgets with previous settings
country_selector.value = previous_selection.get('country', placeholders['country'])
adm1_subarea_selector.value = previous_selection.get('adm1_subarea', placeholders['adm1_subarea'])
adm2_subarea_selector.value = previous_selection.get('adm2_subarea', placeholders['adm2_subarea'])
twenty_years_selector.value = previous_selection.get('twenty_years', placeholders['twenty_years'])
accumulation_windows_multiple_selector.value = previous_selection.get('accumulation_windows_multiple')
accumulation_windows_widgets_btn.on_click(on_button_clicked)

# Display widgets
display(country_selector, adm1_subarea_selector, adm2_subarea_selector, twenty_years_selector, accumulation_windows_multiple_selector, accumulation_windows_widgets_btn, output_area)

The subset data obtained from the selection is structured differently from what we’ve worked with so far. It’s a dictionary where the keys represent the accumulation windows, and the values are the corresponding xarray datasets.

However, this doesn’t alter our approach to the analysis. We simply iterate over the dictionary, applying our usual functions to each subset contained.to review and clean the data as needed:

for subset_index in subset_data.keys():
    index = f"SPEI{accumulation_windows[subset_index]}"
    display(Markdown(f"\n\n***** **{index}** *****"))
    data_preprocess.display_data_details(active_btn, selected, subset_data[subset_index][index])
    print(f"\n")

***** SPEI1 *****

Country:  South Sudan
ADM1 subarea:  no adm1 subarea selected...
ADM2 subarea:  no adm2 subarea selected...
Year range:  2003-2023
accumulation_window:  12 months 

Time values in the subset: 253
Latitude values in the subset: 36
Longitude values in the subset: 48 

Data sample:  [[-0.13749638 -0.53352617 -0.63534763 -0.08522363 -0.21464767]
 [-0.15039811 -0.53314931 -0.27074458 -0.12734775 -0.03267155]
 [-0.59265133 -0.48203539 -0.14652794  0.07665202  0.22500588]
 [-0.68890869 -0.60710107 -0.22547872 -0.13108494  0.24221635]
 [-0.71595429 -0.71630059 -0.42140953 -0.3105275   0.04454926]]

***** SPEI6 *****

Country:  South Sudan
ADM1 subarea:  no adm1 subarea selected...
ADM2 subarea:  no adm2 subarea selected...
Year range:  2003-2023
accumulation_window:  12 months 

Time values in the subset: 253
Latitude values in the subset: 36
Longitude values in the subset: 48 

Data sample:  [[ 0.35685112  0.23813132  0.43483293  0.53181604  1.22459193]
 [ 0.48343057  0.64714322  0.17200884  0.31250313  0.95711225]
 [ 0.26635871  0.45428736  0.40258189  0.99706936 -0.17128737]
 [ 0.49590106  0.38232799  0.84229587  0.36533856  0.32816143]
 [ 0.38390663  0.3740498   0.9199373   0.58035906  0.59679102]]

***** SPEI12 *****

Country:  South Sudan
ADM1 subarea:  no adm1 subarea selected...
ADM2 subarea:  no adm2 subarea selected...
Year range:  2003-2023
accumulation_window:  12 months 

Time values in the subset: 253
Latitude values in the subset: 36
Longitude values in the subset: 48 

Data sample:  [[-0.00723115 -0.33742989  0.09283218 -0.20119271  0.58562874]
 [ 0.10346959  0.36178612 -0.27015748 -0.36490642  0.10695831]
 [-0.82519553  0.0786707  -0.44669251  0.31100318 -0.65462796]
 [-0.20311233 -0.17688248  0.30441433 -0.4889074  -0.22180889]
 [ 0.1458042  -0.2456796   0.42952066 -0.20736085 -0.02492739]]
processed_subset = {}
for subset_index in subset_data.keys():
    index = f"SPEI{accumulation_windows[subset_index]}"
    processed_subset[subset_index], change_summary = data_preprocess.process_datarray(subset_data[subset_index][index])
    display(Markdown(f"\n\n***** **{index}** *****"))
    display(processed_subset[subset_index])
    display(Markdown('**Change summary:**'))
    for key, val in change_summary.items():
        print(key, val)
    print("\n")

***** SPEI1 *****

<xarray.DataArray 'SPEI1' (time: 252, lat: 36, lon: 48)> Size: 3MB
dask.array<getitem, shape=(252, 36, 48), dtype=float64, chunksize=(1, 36, 48), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 384B 24.25 24.5 24.75 25.0 ... 35.25 35.5 35.75 36.0
  * lat      (lat) float64 288B 3.5 3.75 4.0 4.25 4.5 ... 11.5 11.75 12.0 12.25
  * time     (time) datetime64[ns] 2kB 2003-01-01T06:00:00 ... 2023-12-01T06:...
Attributes:
    long_name:  Standardized Drought Index (SPEI1)
    units:      -

Change summary:

invalid_values_replaced 506
invalid_ratio 0.12
duplicates_removed 1
cftime_conversions 0

***** SPEI6 *****

<xarray.DataArray 'SPEI6' (time: 252, lat: 36, lon: 48)> Size: 3MB
dask.array<getitem, shape=(252, 36, 48), dtype=float64, chunksize=(1, 36, 48), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 384B 24.25 24.5 24.75 25.0 ... 35.25 35.5 35.75 36.0
  * lat      (lat) float64 288B 3.5 3.75 4.0 4.25 4.5 ... 11.5 11.75 12.0 12.25
  * time     (time) datetime64[ns] 2kB 2003-01-01T06:00:00 ... 2023-12-01T06:...
Attributes:
    long_name:  Standardized Drought Index (SPEI6)
    units:      -

Change summary:

invalid_values_replaced 506
invalid_ratio 0.12
duplicates_removed 1
cftime_conversions 0

***** SPEI12 *****

<xarray.DataArray 'SPEI12' (time: 252, lat: 36, lon: 48)> Size: 3MB
dask.array<getitem, shape=(252, 36, 48), dtype=float64, chunksize=(1, 36, 48), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 384B 24.25 24.5 24.75 25.0 ... 35.25 35.5 35.75 36.0
  * lat      (lat) float64 288B 3.5 3.75 4.0 4.25 4.5 ... 11.5 11.75 12.0 12.25
  * time     (time) datetime64[ns] 2kB 2003-01-01T06:00:00 ... 2023-12-01T06:...
Attributes:
    long_name:  Standardized Drought Index (SPEI12)
    units:      -

Change summary:

invalid_values_replaced 506
invalid_ratio 0.12
duplicates_removed 1
cftime_conversions 0

To create our chart, we need to condense the information. We’ll do this by calculating the median for each selected accumulation window.

stat_values = {}
for subset_index in processed_subset.keys():
    stat_values[subset_index] = data_preprocess.compute_stats(processed_subset[subset_index], full_stats=False)

Here is what you can observer thanks to this chart:

  • Trend: identify long-term trends in drought severity by observing how the medians change over time across different accumulation windows. This helps in understanding whether drought conditions are worsening, improving, or remaining stable.

  • Window Comparison: compare the medians across various accumulation windows (e.g., 1 month, 6 months, 12 months) to understand how short-term versus long-term drought conditions vary. This can highlight differences in drought intensity based on the time scale of accumulation.

charts.create_combined_areachart(stat_values, selected, placeholders)

This chart concludes the first leg of our journey into understanding the characteristics of drought. Now you can decide to go further in this direction exploring in a new notebook how to map current global drought events, or to move on and delve deeper into the methods for determining the severity of a drought event. We look forward to seeing you.

Credits#

The list of countries, subareas, and their boundaries is obtained from the geoBoundaries Global Database of Political Administrative Boundaries Database.