Skip to article frontmatterSkip to article content

Retrieve Data


In this chapter we will explore different ways to retrieve the ECMWF real-time open data using Python libraries. The main focus of this handbook will be on the earthkit and ecmwf-opendata packages.

Source: https://github.com/ecmwf/earthkit

Figure 1:Source: https://github.com/ecmwf/earthkit

The development environment setup

The tutorials use the Python programming language and its libraries:

  • earthkit to speed up weather and climate science workflows
  • ecmwf-opendata to download the ECMWF open data
  • numpy for scientific computing of multi-dimensional arrays
  • cfgrib to map GRIB files to the netCDF
  • requests (or urllib3, wget) for sending HTTP requests
  • xarray for multi-dimensional arrays of geospatial data
  • eccodes to decode and encode GRIB/BUFR files
  • matplotlib for creating static and interactive visualizations
  • cartopy for cartographic visualizations
  • plotly (or metview) for interactive data visualization

We will install our packages using the !pip3 install <package name> commands. The Jupyter Notebooks will execute these as shell commands as at the beginning of each command is the mark !.

1. The earthkit and ecmwf-opendata package

Here we will retrieve ECMWF real-time open data from ECMWF Data Store.

from ecmwf.opendata import Client
import earthkit.data as ekd
client = Client(source="ecmwf")
request = {
    "date" : -1,
    "time" : 0,
    "step" : 12,
    "type" : "fc",
    "stream": "oper",
    "levtype" : "sfc",
    "model" : "aifs-single",
    "param" : "2t",
}
client.retrieve(request, "2t.grib2")

ds_2t = ekd.from_source("file", "2t.grib2")
ds_2t.ls()
Loading...
ds_uv = ekd.from_source("ecmwf-open-data",
                        time=12,
                        param=["u", "v"],
                        levelist=[1000, 850, 500],
                        step=0
                       )
ds_uv.ls()
Loading...

When we set source to azure, data hosted on Microsoft’s Azure will be accessed.

client = Client(source="azure")
request = {
    "time": 12,
    "type": "fc",
    "step": 0,
    "param": "2t",
}
# client.retrieve(request, "azure_2t_data.grib2")
# dm_2t = ekd.from_source("file", "azure_2t_data.grib2")
# dm_2t.ls()

Below, two examples for downloading data from the Amazon’s AWS location.

client = Client(source="aws")
request = {
    "time": 0,
    "type": "fc",
    "step": 24,
    "param": "2t",
}
client.retrieve(request, "aws_2t_data.grib2")
da_2t = ekd.from_source("file", "aws_2t_data.grib2")
da_2t.ls()
Loading...
data = ekd.from_source("s3", {
    "endpoint": "s3.amazonaws.com",
    "region": "eu-central-1",
    "bucket": "ecmwf-forecasts",
    "objects": "20230118/00z/0p4-beta/oper/20230118000000-0h-oper-fc.grib2"
}, anon=True)
ds = data.to_xarray()
ds
Loading...

2. The requests package

import requests
import datetime

DATADIR = './'
today = datetime.date.today().strftime('%Y%m%d') # a user can choose current date or data up to four days before today 
timez = "00z/"
model = "aifs-single/"
resol = "0p25/"
stream_ = "oper"
type_ = "fc"
step = "6"
filename = f'{today}{timez[:-2]}0000-{step}h-{stream_}-{type_}.grib2'

with requests.Session() as s:
    try:
        start = datetime.datetime.now()
        response = requests.get(f'https://data.ecmwf.int/ecpds/home/opendata/{today}/{timez}{model}{resol}{stream_}/{filename}', stream=True)
        if response.status_code == 200:
            with open(filename, mode="wb") as file:
                for chunk in response.iter_content(chunk_size=10 * 1024):
                    file.write(chunk)
            end = datetime.datetime.now()
            diff = end - start
            print(f'The {filename} file downloaded in {diff.seconds} seconds.')
    except:
        print(f'There is no file {filename} to download.')
ekd.download_example_file(f'{DATADIR}/{filename}')
The 20250608000000-6h-oper-fc.grib2 file downloaded in 8 seconds.

3. The wget command-line tool

To install wget on Linux, execute

sudo apt-get install wget

For extremely large files, it is recommended to use the -b option that will download your content in the background. In your working directory a wget-log will also appear that can be used to check your download progress and status. You can save the file you retrieve in another directory using the -P option

ROOT="https://data.ecmwf.int/forecasts"
yyyymmdd="20250525"
HH="00"
model="ifs"
resol="0p25"
stream="oper"
step="24"
U="h"
type="fc"
format="grib2"
wget -P ../datadownload/ -b "$ROOT/$yyyymmdd/$HH"z"/$model/$resol/$stream/$yyyymmdd$HH"0000"-$step$U-$stream-$type.$format"
ROOT="https://data.ecmwf.int/forecasts"
yyyymmdd="20250608"
HH="00"
model="ifs"
resol="0p25"
stream="oper"
step="0"
U="h"
type="fc"
format="grib2"
start_bytes=62556419
end_bytes=63216509

wget "$ROOT/$yyyymmdd/$HH"z"/$model/$resol/$stream/$yyyymmdd$HH"0000"-$step$U-$stream-$type.$format" --header="Range: bytes=$start_bytes-$end_bytes"

4. The curl command-line tool

When you need to download a single field from a GRIB file, inspect the corresponding index file and look for the parameter of your interest. For example, to download only the 2m temperature at step=0h from the 00 UTC HRES forecast on 08 June 2025

{"domain": "g", "date": "20250608", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "0", "levtype": "sfc", "param": "2t", "_offset": 62556419, "_length": 660091}

use the values of _offset and _length keys and calculate the start_bytes and end_bytes (the _offset and _length values of a specific field are different for each forecast run!)

start_bytes = _offset = 62556419
end_bytes = _offset + _length - 1 = 62556419 + 660091 - 1 = 63216509
ROOT="https://data.ecmwf.int/forecasts"
yyyymmdd="20250608"
HH="00"
model="ifs"
resol="0p25"
stream="oper"
step="0"
U="h"
type="fc"
format="grib2"
start_bytes=62556419
end_bytes=63216509

curl --range "$start_bytes-$end_bytes" "$ROOT/$yyyymmdd/$HH"z"/$model/$resol/$stream/$yyyymmdd$HH"0000"-$step$U-$stream-$type.$format" --output 2t.grib2