Chapter 2: Taking Earth’s Temperature

Chapter 2: Taking Earth’s Temperature#

Part I: It’s getting hot(ter) in here: Long-Term Development of Global Earth Temperature Since 1850#

Imagine a doctor’s thermometer, but one that’s designed to gauge the health of our planet. As a fever reveals a person’s illness, Earth’s rising temperature exposes a global condition that’s equally concerning. This chapter, much like a medical investigation, unfolds the diagnosis of Earth’s thermal well-being.

The previous chapter delved into the enigmatic world of greenhouse gases (GHGs), namely carbon dioxide and methane, elucidating how their concentrations are increasing in our atmosphere. Now, we turn our lens to scrutinise the surface temperature of our Earth, which is also on the rise, a tell-tale symptom of increasing GHGs.

We have divided this chapter into three notebooks, each focusing on a distinctive aspect:

Long-Term Development of Global Earth Temperature Since 1850 (this notebook)
Comparing Reanalysis with Observations since 1950
Visualising Recent Temperature Anomalies

Objective:
Our aim here is to perform a historical examination, comparing the global surface temperature from 1850-2022 using various authoritative data sources. We will stitch together a monthly resolved time series that paints the average surface anomaly over the past ~250 years. The grand finale? A common plot that showcases these time series.

Here’s an overview of the datasets we’ll employ:

Dataset	Spatial Coverage	Spatial Resolution	Temporal Coverage	Temporal resolution	Provider
NOAAGlobalTempv5	Global	5º x 5º	1850 - today	Monthly	NOAA
Berkeley Earth	Global	1º x 1º	1850 - today	Monthly	Berkeley Earth
GISTEMPv4	Global	2º x 2º	1880 - today	Monthly	NASA
HadCRUT5	Global	5º x 5º	1850 - today	Monthly	Met Office Hadley Centre
ERA5	Global	0.25º x 0.25º	1940 - today	Monthly	C3S/ECMWF

Additionally, we’ll wield the land-sea mask created by the respective provider to weight the datasets accurately.

Here’s what to expect:

Downloading, opening, and streamlining datasets: From providers as diverse as the climate they track.
Handling moderate data volumes (~ 5 GB): No worries, dask will take care of the work.
Calculating spatially and temporally correct averages: Like taking Earth’s temperature, but with mathematics.
Estimating uncertainties: A vital step in this scientific expedition, achieved through ensemble members.

NOTE: Before interacting with the following notebook, please ensure you've reviewed the How to Execute the Notebooks section.

Run the tutorial via free cloud platforms:

Getting Set Up: Your Toolkit#

Prepare for our planetary health check by importing all necessary packages:

# Python Standard Libraries
import os
import zipfile
import urllib.request

# Data Manipulation Libraries
import numpy as np
import pandas as pd
import xarray as xr
import regionmask as rm
import dask

# Visualization Libraries
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from dask.diagnostics.progress import ProgressBar

# Climate Data Store API for retrieving climate data
import cdsapi

The following cell ensures a consistent figure layout. If you’re running this notebook on one of the cloud platforms, the stylesheet may not be present by default. In this case you can either upload the file or ignore the following cell. It’s about styling, and while it won’t affect your calculations, aesthetics matter, don’t they?

plt.style.use("../copernicus.mplstyle")

Additionally, we instruct dask to avoid the creation of large chunks that may arise in different calculations.

dask.config.set(**{"array.slicing.split_large_chunks": True})

<dask.config.set at 0x7f6be1dc3af0>

With the regionmask package, you’ll craft a land-sea mask for any xr.DataArray, something you’ll need later if our chosen data provider doesn’t offer additional weights.

# Boolean land-sea mask
lsm = rm.defined_regions.natural_earth_v5_0_0.land_110

Here’s where your creativity shines. Define the REGIONS that captivate your curiosity. Since the data sets (except ERA5) are relatively coarse, make your regions generous enough to ensure sufficient data. While we followed the definitions used in the C3S Climate Intelligence reports, feel free to change them or to add your own regions of interest! Your playground is as vast as the planet itself:

REGIONS = {
    "Global": {"lon": slice(-180, 180), "lat": slice(-90, 90)},
    "Northern Hemisphere": {"lon": slice(-180, 180), "lat": slice(0, 90)},
    "Southern Hemisphere": {"lon": slice(-180, 180), "lat": slice(-90, 0)},
    "Europe": {"lon": slice(-25, 40), "lat": slice(34, 72)},
    "Arctic": {"lon": slice(-180, 180), "lat": slice(66.6, 90)},
}

This is the stage where you pick your time frame for defining the period used as climatology. For instance, you could use the currently valid period of 1991-2020.

REF_PERIOD = {"time": slice("1991", "2020")}

Since we are tapping into different data sources that all come in their format and peculiarities, let’s organize the data in folders for a better overview.

file_name = {}  # dictionary containing [data source : file name]

# Berkeley Earth
file_name.update({"berkeley": "temperature_berkeley.nc"})  

# GISTEMP
file_name.update({"gistemp_250km": "temperature_gistemp_250km.gz"})  # higher resolution
file_name.update({"gistemp_1200km": "temperature_gistemp_1200km.gz"})  # lower resolution
file_name.update({"gistemp_lsm": "temperature_gistemp_lsm.txt"})  # land sea mask

# HadCRUT
file_name.update({"hadcrut": "temperature_hadcrut.nc"})
file_name.update({"hadcrut_lsm": "temperature_hadcrut_lsm.nc"})  # land sea mask
file_name.update({"hadcrut_members": "temperature_hadcrut_ensemble_members.zip"})  # ensemble members

# ERA5
file_name.update({"era5": "temperature_era5.nc"})

# Create the paths to the files
path_to = {}
for source, file in file_name.items():
    root = "data/{:}/".format(source.split("_")[0])
    path_to.update({source: os.path.join(root, file)})

Create necessary directories if they don’t exist:

for file, path in path_to.items():
    os.makedirs(os.path.dirname(path), exist_ok=True)
    print("{:<15} --> {}".format(file, path))

berkeley        --> data/berkeley/temperature_berkeley.nc
gistemp_250km   --> data/gistemp/temperature_gistemp_250km.gz
gistemp_1200km  --> data/gistemp/temperature_gistemp_1200km.gz
gistemp_lsm     --> data/gistemp/temperature_gistemp_lsm.txt
hadcrut         --> data/hadcrut/temperature_hadcrut.nc
hadcrut_lsm     --> data/hadcrut/temperature_hadcrut_lsm.nc
hadcrut_members --> data/hadcrut/temperature_hadcrut_ensemble_members.zip
era5            --> data/era5/temperature_era5.nc

As we will see, the various data also come with different conventions regarding dimension names and coordinates. We make our work much easier by ensuring at the beginning that all data is in the same format. In our case, we want to streamline datasets as follows:

Dimension names are (time, lon, lat)
The monthly resolved time coordinate is in datetime format and fixed to the beginning of the month
The lon and lat coordinates are sorted by their values
The lon coordinate is defined from -180 to +180º (as opposed to 0 to 360º)

The following function streamline_coords will take care of these operations.

def streamline_coords(da):
    """Streamline the dimensions and coordinates of a DataArray.

    Parameters
    ----------
    da : xr.DataArray
        The DataArray to streamline.
    """

    # Ensure that time coordinate is fixed to the first day of the month
    if "time" in da.coords:
        # if already datetime, just ensure that it is the first day of the month
        if pd.api.types.is_datetime64_any_dtype(da.time):
            da.coords["time"] = da["time"].to_index().to_period("M").to_timestamp()
        # if float, convert to datetime
        elif da.time.dtype == float:
            first_year = int(da.time.values[0])
            time_coord = xr.cftime_range(start=f"{first_year}-01-01", periods=da.time.size, freq="MS").to_datetimeindex()
            da.coords["time"] = time_coord
        # else ¯\_(ツ)_/¯
        else:
            raise ValueError(
                f"Time coordinate is of type {da.time.dtype}, but must be either datetime or float."
            )

    # Ensure that spatial coordinates are called 'lon' and 'lat'
    if "longitude" in da.coords:
        da = da.rename({"longitude": "lon"})
    if "latitude" in da.coords:
        da = da.rename({"latitude": "lat"})

    # Ensure that lon/lat are sorted in ascending order
    da = da.sortby("lat")
    da = da.sortby("lon")

    # Ensure that lon is in the range [-180, 180]
    lon_min = da["lon"].min()
    lon_max = da["lon"].max()
    if lon_min < -180 or lon_max > 180:
        da.coords["lon"] = (da.coords["lon"] + 180) % 360 - 180
        da = da.sortby(da.lon)

    return da

Download and streamline the data#

Embarking on the setup and organization of our data, we’re now venturing into the realms of real-world numbers and figures. Here, patience is a virtue, as the time to request and download the data may vary. Typically, this process should take less than 30 minutes.

ERA5 reanalysis#

Now, we load ERA5 from the Climate Data Store (CDS) using the cdsapi, including the land-sea mask.

Setup Your CDS API Key

You’ll need a specific key to access the CDS programmatically:

URL = 'https://cds.climate.copernicus.eu/api/v2'
KEY = '##################################' # add your key here the format should be as {uid}:{api-key}

New to CDS? Consider the CDS tutorial for a detailed guide.

Retrieve the Data

Use the following code to pull the data from CDS:

c = cdsapi.Client(url=URL, key=KEY)

c.retrieve(
  'reanalysis-era5-single-levels-monthly-means',
  {
    'format': 'netcdf',
    'product_type': 'monthly_averaged_reanalysis',
    'variable': ['2m_temperature', 'land_sea_mask'],
    'year': list(range(1950, 2023)),
    'month': list(range(1, 13)),
    'time': '00:00',
  },
  path_to['era5']
)

Unlike other datasets, ERA5 provides temperatures in Kelvin. Convert them to Celsius for easier interpretation:

with xr.open_mfdataset(path_to["era5"]) as era5:
    # convert from Kelvin to Celsius
    era5["t2m"] = era5["t2m"] - 273.15
era5 = streamline_coords(era5)
era5

<xarray.Dataset>
Dimensions:  (lon: 1440, lat: 721, time: 876)
Coordinates:
  * lon      (lon) float32 -180.0 -179.8 -179.5 -179.2 ... 179.2 179.5 179.8
  * lat      (lat) float32 -90.0 -89.75 -89.5 -89.25 ... 89.25 89.5 89.75 90.0
  * time     (time) datetime64[ns] 1950-01-01 1950-02-01 ... 2022-12-01
Data variables:
    t2m      (time, lat, lon) float32 dask.array<chunksize=(876, 27, 720), meta=np.ndarray>
    lsm      (time, lat, lon) float32 dask.array<chunksize=(876, 27, 720), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.6
    history:      2023-07-18 21:45:16 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...

Understanding Model Discrepancies#

Let’s kick off by observing that datasets from the mid-last century onwards generally exhibit a high degree of agreement. But to truly quantify this, we’re going to calculate the spread, which is the range between the highest and lowest values for each time point. Note how the differences between datasets tend to grow as we delve further back in time.

qs = temp_evolution_smooth.drop_vars(["HadCRUT_ensemble"]).to_array().quantile([0, 1], 'variable')
plt.figure(figsize=(10, 3))
qs.diff('quantile').plot.line(x='time', add_legend=False)
plt.ylim(0, 2.0)
plt.ylabel("Model spread (ºC)")
plt.xlabel("")
plt.title("Disagreement between models decreases over time")
plt.show()

/home/nrieger/miniconda3/envs/tutorial/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1577: RuntimeWarning: All-NaN slice encountered
  result = np.apply_along_axis(_nanquantile_1d, axis, a, q,

../_images/af40ce5fa3fd621aef6d11899781c555413991507f091f50c76c0aba23bd010c.png

Why do these differences occur? There’s a plethora of reasons, but one significant factor stands out: as we move further back in time, there were fewer observation stations, leading to limited spatial coverage. Let’s illustrate this using BerkeleyEarth as our muse.

fig = plt.figure(figsize=(15, 5))
ax = [plt.subplot(1, 3, i+1, projection=ccrs.Robinson()) for i in range(3)]
kwargs = dict(transform=ccrs.PlateCarree(), cmap='RdBu_r', vmin=-8, vmax=8, add_colorbar=False, zorder=3)
berkeley['temperature'].sel(time='1850-01').plot(ax=ax[0], **kwargs)
berkeley['temperature'].sel(time='1900-01').plot(ax=ax[1], **kwargs)
berkeley['temperature'].sel(time='1950-01').plot(ax=ax[2], **kwargs)
for a in ax:
    a.add_feature(cfeature.LAND, lw=.5, color='.3', zorder=1)
    a.add_feature(cfeature.OCEAN, lw=.5, color='.7', zorder=2)
    a.coastlines(lw=.5, color='.3', zorder=4)
    a.set_global()
    a.set_title('')
ax[0].set_title('1850', loc='center')
ax[1].set_title('1900', loc='center')
ax[2].set_title('1950', loc='center')
plt.show()

../_images/860e8b97495a664af985c3ba73e16f12fb53478b37fc1228b3902af8ecbd5dc2.png

Reiterating an essential point: the uncertainty displayed for the HadCRUT model seems on the lower side. While a part of this can be attributed to a reduced ensemble size which we used here, another part stems from the limited spatial coverage at the beginning of the time series.

Wrapping Up with Earth’s Temperature#

Finally, we’re going to address the burning question: Just how warm is our dear Earth?

By using absolute temperatures from ERA5, let’s compute the average temperature for the reference period (1991-2020) over a region of our choice.

# Taking Earth's mean temperature
region = "Arctic"  # <--- define the region here

land_mask = era5["lsm"]
clim_temp_era5 = era5["t2m"].sel(REF_PERIOD)  # consider the reference period
clim_temp_era5 = weighted_spatial_average(clim_temp_era5, REGIONS[region], land_mask=land_mask)

with ProgressBar():
    clim_temp_era5 = clim_temp_era5.compute()  # compute the result

[########################################] | 100% Completed | 95.00 s

Now that we have monthly anomalies for our reference period, computing the temporal average correctly requires to weight each month by the respective number of days. Let’s do this:

days_in_month = clim_temp_era5.time.dt.days_in_month
clim_temp_era5 = (clim_temp_era5 * days_in_month).sum() / days_in_month.sum()

print(f'{region} mean temperature between 1991 and 2020 was {clim_temp_era5.item():.1f} °C.')

Arctic mean temperature between 1991 and 2020 was -12.7 °C.

In Retrospect…#

And there you have it! From discerning Earth’s surface temperature changes since 1850, grappling with an array of datasets using handy tools like dask and regionmask, to calculating weighted averages spatially and temporally, we’ve covered a lot of ground together.

In the following part, we focus on comparing ERA5 surface temperatures in Europe with observational data.

Chapter 2: Taking Earth’s Temperature

Contents

Chapter 2: Taking Earth’s Temperature#

Part I: It’s getting hot(ter) in here: Long-Term Development of Global Earth Temperature Since 1850#

Getting Set Up: Your Toolkit#

Download and streamline the data#

NOAA GlobalTemp#

Berkeley Earth#

GISTEMP#

HadCRUT5#

ERA5 reanalysis#

Calculation of Spatial Averages#

Understanding Model Discrepancies#

Wrapping Up with Earth’s Temperature#

In Retrospect…#