Real-time Forecasts - Open Data Visio

Here we will focus on the latest forecast of 2 metre temperature of the AIFS-Single over the world over next seven days.

1. Set Up Your Environment and Find ECMWF Open Data¶

Open data will be downloaded from a publicly available Amazon S3 Bucket. First, the following Python libraries need to be installed in the current Jupyter kernel:

ecmwf-opendata to download data,
earthkit to analyse data,
xarray for multi-dimensional arrays of geospatial data,
pandas to perform powerful operations on datasets,
geopandas to handle geographic data of pandas objects,
xagg to aggregate gridded data to polygons, and
plotly for interactive data visualization.

If the packages are not installed yet, uncomment the code below and run it.

# !pip3 install earthkit ecmwf-opendata xarray pandas geopandas xagg plotly

from ecmwf.opendata import Client
import earthkit.data as ekd

import os

import xarray as xr
import pandas as pd
import geopandas as gpd
import xagg as xa
import plotly.express as px

Data and plots directories¶

DATADIR = './data_dir/'
os.makedirs(DATADIR, exist_ok=True)

PLOTSDIR = './plots/'
os.makedirs(PLOTSDIR, exist_ok=True)

When using the ls() method, a list of all the fields in the file we downloaded will be displayed.

filename = f"{DATADIR}2t_0-168h.grib2"

client = Client(source="aws")
client.retrieve(
    date=0,
    time=0,
    step=[step for step in range(0, 174, 6)],
    stream="oper",
    type="fc",
    levtype="sfc",
    levelist=[],
    param="2t",
    model="aifs-single",
    target=filename
    )
data = ekd.from_source("file", filename)
data.ls()

We will extract the date of the latest forecast from our dataset.

today = data.index("date")[0]
today

20250831

Our data can be converted to Xarray using the .to_xarray() method and the longitude coordinates brought to a [-180, 180] grid.
ECMWF data are by default on a [0, 360] grid.

ds = data.to_xarray()
ds_180 = ds.assign_coords(longitude=(((ds.longitude + 180) % 360) - 180)).sortby('longitude')

We need to convert 2-metre temperature from Kelvin to Celsius.

t2m_K = ds_180['2t']
t2m = t2m_K - 273.15
t2m = t2m.assign_attrs(t2m_K.attrs)
t2m.attrs['units'] = '° C'

The latitude and longitude columns will be renamed.

t2m = t2m.rename({'latitude':'lat','longitude':'lon'})
t2m

2. GeoJSON data¶

Our GeoJSON file containing a vector map of the world can be read with the geopandas package. The sov_a3 column will be renamed.
When using the head() method, a selected number of rows n (n=5 by default) and information about the fields will be displayed.

countries = gpd.read_file(f"{DATADIR}globe.geojson")
countries = countries[['sov_a3','geometry']]
countries = countries.rename(columns={'sov_a3':'cntry_id'})
countries = countries.sort_values(by=['cntry_id'])
countries.head()

For data visualisation purposes, we will calculate the centroid of our geometry object.

x_cntry, y_cntry = gpd.GeoSeries(countries.geometry).union_all().centroid.xy
_lat = y_cntry[0]
_lon = x_cntry[0]
_lat, _lon

(1.1627400958433622, 18.216117510970225)

3. Aggregating the data¶

We need to create a weight map of pixels in the gridded dataset by taking the intersect between each country polygon and all pixel polygons and compute the average area of overlap. For each country, then we will calculate the weighted average of the 2-metre temperatures. For instance, a pixel that covers a larger area of a country will have a different weight than a pixel that covers a smaller area.

# get overlap between pixels and polygons
weightmap = xa.pixel_overlaps(t2m, countries, subset_bbox=False)
# aggregate data in t2m onto polygons
aggregated = xa.aggregate(t2m, weightmap)
ds2t = aggregated.to_dataframe()
# export the data into a .csv file
ds2t.to_csv(f"{DATADIR}agg_t2m.csv")

To read our .csv file into Pandas DataFrame, we will use the read_csv() function. Then, we will sort the data by the step and cntry_id columns.

ds2t = pd.read_csv(f"{DATADIR}agg_t2m.csv")
ds2t = ds2t.sort_values(by=['step', 'cntry_id'])
ds2t.head()

We want to display the step range as a list of numbers from 0 to 168 and thus we will update it.

step1 = ds2t.iloc[0]['step']
step1

'0 days 00:00:00'

steps = ds2t['step'].unique()
steps

array(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00',
       '0 days 18:00:00', '1 days 00:00:00', '1 days 06:00:00',
       '1 days 12:00:00', '1 days 18:00:00', '2 days 00:00:00',
       '2 days 06:00:00', '2 days 12:00:00', '2 days 18:00:00',
       '3 days 00:00:00', '3 days 06:00:00', '3 days 12:00:00',
       '3 days 18:00:00', '4 days 00:00:00', '4 days 06:00:00',
       '4 days 12:00:00', '4 days 18:00:00', '5 days 00:00:00',
       '5 days 06:00:00', '5 days 12:00:00', '5 days 18:00:00',
       '6 days 00:00:00', '6 days 06:00:00', '6 days 12:00:00',
       '6 days 18:00:00', '7 days 00:00:00'], dtype=object)

ds2t['step'] = ds2t['step'].replace({'0 days 00:00:00': '0',
                                     '0 days 06:00:00': '6',
                                     '0 days 12:00:00': '12',
                                     '0 days 18:00:00': '18',
                                     '1 days 00:00:00': '24',
                                     '1 days 06:00:00': '30',
                                     '1 days 12:00:00': '36',
                                     '1 days 18:00:00': '42',
                                     '2 days 00:00:00': '48',
                                     '2 days 06:00:00': '54',
                                     '2 days 12:00:00': '60',
                                     '2 days 18:00:00': '66',
                                     '3 days 00:00:00': '72',
                                     '3 days 06:00:00': '78',
                                     '3 days 12:00:00': '84',
                                     '3 days 18:00:00': '90',
                                     '4 days 00:00:00': '96',
                                     '4 days 06:00:00': '102',
                                     '4 days 12:00:00': '108',
                                     '4 days 18:00:00': '114',
                                     '5 days 00:00:00': '120',
                                     '5 days 06:00:00': '126',
                                     '5 days 12:00:00': '132',
                                     '5 days 18:00:00': '138',
                                     '6 days 00:00:00': '144',
                                     '6 days 06:00:00': '150',
                                     '6 days 12:00:00': '156',
                                     '6 days 18:00:00': '162',
                                     '7 days 00:00:00': '168'}, regex=True)

4. Data visualisation¶

The interactive plot and animation below show analysis of 2-metre temperature for today at 00 UTC.

# Specify a time step up to 168
step_n = '168'

# Plot the data
fig = px.choropleth(ds2t.loc[ds2t.step == step_n],
                        geojson=countries,
                        featureidkey='properties.cntry_id',
                        locations='cntry_id',
                        color='2t',
                        color_continuous_scale="Spectral_r",
                        range_color = [-40, 50])
fig.update_layout(title=f"AIFS: 2 metre temperature (˚C) on {today} (+{step_n}h)",
                  width=650,
                  height=420)
fig.show()

fig = px.choropleth_map(ds2t,
                        geojson=countries,
                        featureidkey='properties.cntry_id',
                        locations='cntry_id',
                        color='2t',
                        animation_frame = 'step',
                        animation_group='cntry_id',
                        color_continuous_scale="Spectral_r",
                        range_color = [-40, 50],
                        map_style="satellite")
fig.update_layout(title=f"AIFS: 2 metre temperature (˚C) on {today} over the next seven days",
                  map_center={"lat": _lat, "lon": _lon},
                  map_zoom=0.8,
                  height=755)
fig.show()