Retrieve Data

In this chapter we will explore different ways to retrieve the ECMWF real-time open data using Python libraries. The main focus of this handbook will be on the earthkit and ecmwf-opendata packages.

Source: https://github.com/ecmwf/earthkit

The development environment setup¶

The tutorials use the Python programming language and its libraries:

earthkit to speed up weather and climate science workflows
ecmwf-opendata to download the ECMWF open data
requests (or wget) for sending HTTP requests
xarray for multi-dimensional arrays of geospatial data
pandas to perform powerful operations on datasets
matplotlib for creating static and interactive visualizations
cartopy for cartographic visualizations
plotly for interactive data visualization
geopandas to handle geographic data of pandas objects
xagg for aggregating raster data over polygons

We will install our packages using the !pip3 install <package name> commands. The Jupyter Notebooks will execute these as shell commands as at the beginning of each command is the mark !.

1. The `earthkit` and `ecmwf-opendata` package¶

Here we will retrieve ECMWF real-time open data from ECMWF Data Store (ECPDS).

If the packages are not installed yet, uncomment the code below and run it.

# !pip3 install earthkit-data ecmwf-opendata requests datetime

from ecmwf.opendata import Client
import earthkit.data as ekd

client = Client(source="ecmwf")
request = {
    "date" : -1,
    "time" : 0,
    "step" : 12,
    "type" : "fc",
    "stream": "oper",
    "levtype" : "sfc",
    "model" : "aifs-single",
    "param" : "2t",
}
client.retrieve(request, "2t.grib2")

ds_2t = ekd.from_source("file", "2t.grib2")
ds_2t.ls()

Loading...

ds_uv = ekd.from_source("ecmwf-open-data",
                        time=12,
                        param=["u", "v"],
                        levelist=[1000, 850, 500],
                        step=0
                       )
ds_uv.ls()

Loading...

data = ekd.from_source("ecmwf-open-data",
                       date=-1,
                       time=12,
                       step=0,
                       param=['msl', 'tp'],
                       stream="oper",
                       type="fc",
                       levtype="sfc",
                       model=["ifs", "aifs-single"]
                       )
data.describe()

Loading...

2. The `requests` package¶

import requests
import datetime

DATADIR = './'
today = datetime.date.today().strftime('%Y%m%d') # a user can choose current date or data up to four days before today 
timez = "00z/"
model = "aifs-single/"
resol = "0p25/"
stream_ = "oper"
type_ = "fc"
step = "6"
filename = f'{today}{timez[:-2]}0000-{step}h-{stream_}-{type_}.grib2'

with requests.Session() as s:
    start = datetime.datetime.now()
    response = requests.get(f'https://data.ecmwf.int/ecpds/home/opendata/{today}/{timez}{model}{resol}{stream_}/{filename}', stream=True)
    response.raise_for_status()
    with open(filename, mode="wb") as file:
        for chunk in response.iter_content(chunk_size=10 * 1024):
            file.write(chunk)
    end = datetime.datetime.now()
    diff = end - start
    print(f'The {filename} file was downloaded in {diff.seconds} seconds.')

data = ekd.from_source("file", f'{DATADIR}/{filename}')
data.ls()

Loading...

3. The `wget` command-line tool¶

To install wget on Linux, execute

sudo apt-get install wget

For extremely large files, it is recommended to use the -b option that will download your content in the background. In your working directory a wget-log will also appear that can be used to check your download progress and status. You can save the file you retrieve in another directory using the -P option

ROOT="https://data.ecmwf.int/forecasts"
yyyymmdd="20250525"
HH="00"
model="ifs"
resol="0p25"
stream="oper"
step="24"
U="h"
type="fc"
format="grib2"
wget -P ../datadownload/ -b "$ROOT/$yyyymmdd/$HH"z"/$model/$resol/$stream/$yyyymmdd$HH"0000"-$step$U-$stream-$type.$format"

4. The `curl` command-line tool¶

When you need to download a single field from a GRIB file, inspect the corresponding index file and look for the parameter of your interest. For example, to download only the 2 m temperature at step=0h from the 00 UTC HRES forecast on 08 June 2025

{"domain": "g", "date": "20250608", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "0", "levtype": "sfc", "param": "2t", "_offset": 62556419, "_length": 660091}

use the values of _offset and _length keys and calculate the start_bytes and end_bytes

start_bytes = _offset = 62556419
end_bytes = _offset + _length - 1 = 62556419 + 660091 - 1 = 63216509

ROOT="https://data.ecmwf.int/forecasts"
yyyymmdd="20250608"
HH="00"
model="ifs"
resol="0p25"
stream="oper"
step="0"
U="h"
type="fc"
format="grib2"
start_bytes=62556419
end_bytes=63216509

curl --range "$start_bytes-$end_bytes" "$ROOT/$yyyymmdd/$HH"z"/$model/$resol/$stream/$yyyymmdd$HH"0000"-$step$U-$stream-$type.$format" --output 2t.grib2

The development environment setup¶

1. The earthkit and ecmwf-opendata package¶

2. The requests package¶

3. The wget command-line tool¶

4. The curl command-line tool¶

1. The `earthkit` and `ecmwf-opendata` package¶

2. The `requests` package¶

3. The `wget` command-line tool¶

4. The `curl` command-line tool¶