Shortcuts

torchgeo.datasets

In torchgeo, we define two types of datasets: Geospatial Datasets and Non-geospatial Datasets. These abstract base classes are documented in more detail in Base Classes.

Geospatial Datasets

GeoDataset is designed for datasets that contain geospatial information, like latitude, longitude, coordinate system, and projection. Datasets containing this kind of information can be combined using IntersectionDataset and UnionDataset.

Dataset

Type

Source

License

Size (px)

Resolution (m)

Aboveground Woody Biomass

Masks

Landsat, LiDAR

CC-BY-4.0

40,000x40,000

30

AgriFieldNet

Imagery, Masks

Sentinel-2

CC-BY-4.0

256x256

10

Airphen

Imagery

Airphen

1,280x960

0.047–0.09

Aster Global DEM

DEM

Aster

public domain

3,601x3,601

30

Canadian Building Footprints

Geometries

Bing Imagery

ODbL-1.0

Chesapeake Land Cover

Imagery, Masks

NAIP

CC-BY-4.0

1

Global Mangrove Distribution

Masks

Remote Sensing, In Situ Measurements

public domain

3

Cropland Data Layer

Masks

Landsat

public domain

30

EDDMapS

Points

Citizen Scientists

EnviroAtlas

Imagery, Masks

NAIP, NLCD, OpenStreetMap

CC-BY-4.0

1

Esri2020

Masks

Sentinel-2

CC-BY-4.0

10

EU-DEM

DEM

Aster, SRTM, Russian Topomaps

CSCDA-ESA

25

EuroCrops

Geometries

EU Countries

CC-BY-SA-4.0

GBIF

Points

Citizen Scientists

CC0-1.0 OR CC-BY-4.0 OR CC-BY-NC-4.0

GlobBiomass

Masks

Landsat

CC-BY-4.0

45,000x45,000

100

iNaturalist

Points

Citizen Scientists

L7 Irish

Imagery, Masks

Landsat

CC0-1.0

8,400x7,500

15, 30

L8 Biome

Imagery, Masks

Landsat

CC0-1.0

8,900x8,900

15, 30

LandCover.ai Geo

Imagery, Masks

Aerial

CC-BY-NC-SA-4.0

4,200–9,500

0.25–0.5

Landsat

Imagery

Landsat

public domain

8,900x8,900

30

NAIP

Imagery

Aerial

public domain

6,100x7,600

1

NCCM

Masks

Sentinel-2

CC-BY-4.0

10

NLCD

Masks

Landsat

public domain

30

Open Buildings

Geometries

Maxar, CNES/Airbus

CC-BY-4.0 OR ODbL-1.0

PRISMA

Imagery

PRISMA

512x512

5–30

Sentinel

Imagery

Sentinel

CC-BY-SA-3.0-IGO

10,000x10,000

10

South Africa Crop Type

Imagery, Masks

Sentinel-2

CC-BY-4.0

256x256

10

South America Soybean

Masks

Landsat, MODIS

30

Aboveground Woody Biomass

class torchgeo.datasets.AbovegroundLiveWoodyBiomassDensity(paths='data', crs=None, res=None, transforms=None, download=False, cache=True)[source]

Bases: RasterDataset

Aboveground Live Woody Biomass Density dataset.

The Aboveground Live Woody Biomass Density dataset is a global-scale, wall-to-wall map of aboveground biomass at ~30m resolution for the year 2000.

Dataset features:

  • Masks with per pixel live woody biomass density estimates in megagrams biomass per hectare at ~30m resolution (~40,000x40,0000 px)

Dataset format:

  • geojson file that contains download links to tif files

  • single-channel geotiffs with the pixel values representing biomass density

If you use this dataset in your research, please give credit to:

New in version 0.3.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

filename_glob = '*N_*E.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '^\n        (?P<latitude>[0-9][0-9][A-Z])_\n        (?P<longitude>[0-9][0-9][0-9][A-Z])*\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, download=False, cache=True)[source]

Initialize a new Dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

AgriFieldNet

class torchgeo.datasets.AgriFieldNet(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 36], bands=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], transforms=None, cache=True)[source]

Bases: RasterDataset

AgriFieldNet India Challenge dataset.

The AgriFieldNet India Challenge dataset includes satellite imagery from Sentinel-2 cloud free composites (single snapshot) and labels for crop type that were collected by ground survey. The Sentinel-2 data are then matched with corresponding labels. The dataset contains 7081 fields, which have been split into training and test sets (5551 fields in the train and 1530 fields in the test). Satellite imagery and labels are tiled into 256x256 chips adding up to 1217 tiles. The fields are distributed across all chips, some chips may only have train or test fields and some may have both. Since the labels are derived from data collected on the ground, not all the pixels are labeled in each chip. If the field ID for a pixel is set to 0 it means that pixel is not included in either of the train or test set (and correspondingly the crop label will be 0 as well). For this challenge train and test sets have slightly different crop type distributions. The train set follows the distribution of ground reference data which is a skewed distribution with a few dominant crops being over represented. The test set was drawn randomly from an area weighted field list that ensured that fields with less common crop types were better represented in the test set. The original dataset can be downloaded from Source Cooperative.

Dataset format:

  • images are 12-band Sentinel-2 data

  • masks are tiff images with unique values representing the class and field id

Dataset classes:

0 - No-Data 1 - Wheat 2 - Mustard 3 - Lentil 4 - No Crop/Fallow 5 - Green pea 6 - Sugarcane 8 - Garlic 9 - Maize 13 - Gram 14 - Coriander 15 - Potato 16 - Berseem 36 - Rice

If you use this dataset in your research, please cite the following dataset:

New in version 0.6.

filename_regex = '\n        ^ref_agrifieldnet_competition_v1_source_\n        (?P<unique_folder_id>[a-z0-9]{5})\n        _(?P<band>B[0-9A-Z]{2})_10m\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

rgb_bands: list[str] = ['B04', 'B03', 'B02']

Names of RGB bands in the dataset, used for plotting

all_bands: list[str] = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12']

Names of all available bands in the dataset

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 8: (111, 166, 0, 255), 9: (0, 175, 73, 255), 13: (222, 166, 9, 255), 14: (222, 166, 9, 255), 15: (124, 211, 255, 255), 16: (226, 0, 124, 255), 36: (137, 96, 83, 255)}

Color map for the dataset, used for plotting

__init__(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 36], bands=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], transforms=None, cache=True)[source]

Initialize a new AgriFieldNet dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]

Return an index within the dataset.

Parameters:

index – index to return

Returns:

data, label, and field ids at that index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

Airphen

class torchgeo.datasets.Airphen(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: RasterDataset

Airphen dataset.

Airphen is a multispectral scientific camera developed by agronomists and photonics engineers at Hiphen to match plant measurements needs and constraints.

Main characteristics:

  • 6 Synchronized global shutter sensors

  • Sensor resolution 1280 x 960 pixels

  • Data format (.tiff, 12 bit)

  • SD card storage

  • Metadata information: Exif and XMP

  • Internal or external GPS

  • Synchronization with different sensors (TIR, RGB, others)

If you use this dataset in your research, please cite the following paper:

New in version 0.6.

all_bands: list[str] = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8']

Names of all available bands in the dataset

rgb_bands: list[str] = ['B4', 'B3', 'B1']

Names of RGB bands in the dataset, used for plotting

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

Aster Global DEM

class torchgeo.datasets.AsterGDEM(paths='data', crs=None, res=None, transforms=None, cache=True)[source]

Bases: RasterDataset

Aster Global Digital Elevation Model Dataset.

The Aster Global Digital Elevation Model dataset is a Digital Elevation Model (DEM) on a global scale. The dataset can be downloaded from the Earth Data website after making an account.

Dataset features:

  • DEMs at 30 m per pixel spatial resolution (3601x3601 px)

  • data collected from the Aster instrument

Dataset format:

  • DEMs are single-channel tif files

New in version 0.3.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

filename_glob = 'ASTGTMV003_*_dem*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        (?P<name>[ASTGTMV003]{10})\n        _(?P<id>[A-Z0-9]{7})\n        _(?P<data>[a-z]{3})*\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, cache=True)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | list[str]) – one or more root directories to search or files to load, here the collection of individual zip files for each tile should be found

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Canadian Building Footprints

class torchgeo.datasets.CanadianBuildingFootprints(paths='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)[source]

Bases: VectorDataset

Canadian Building Footprints dataset.

The Canadian Building Footprints dataset contains 11,842,186 computer generated building footprints in all Canadian provinces and territories in GeoJSON format. This data is freely available for download and use.

__init__(paths='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by VectorDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, it is possible to show subplot titles and/or use a custom suptitle.

Chesapeake Land Cover

class torchgeo.datasets.Chesapeake(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset, ABC

Abstract base class for all Chesapeake datasets.

Chesapeake Bay High-Resolution Land Cover Project dataset.

This dataset was collected by the Chesapeake Conservancy’s Conservation Innovation Center (CIC) in partnership with the University of Vermont and WorldView Solutions, Inc. It consists of one-meter resolution land cover information for the Chesapeake Bay watershed (~100,000 square miles of land).

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 0), 1: (0, 197, 255, 255), 2: (0, 168, 132, 255), 3: (38, 115, 0, 255), 4: (76, 230, 0, 255), 5: (163, 255, 115, 255), 6: (255, 170, 0, 255), 7: (255, 0, 0, 255), 8: (156, 156, 156, 255), 9: (0, 0, 0, 255), 10: (115, 115, 0, 255), 11: (230, 230, 0, 255), 12: (255, 255, 115, 255), 13: (197, 0, 255, 255)}

Color map for the dataset, used for plotting

abstract property base_folder: str

Parent directory of dataset in URL.

abstract property filename: str

Filename to find/store dataset in.

abstract property zipfile: str

Name of zipfile in download URL.

abstract property md5: str

MD5 checksum to verify integrity of dataset.

property url: str

URL to download dataset from.

__init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

class torchgeo.datasets.Chesapeake7(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

Complete 7-class dataset.

This version of the dataset is composed of 7 classes:

  1. No Data: Background values

  2. Water: All areas of open water including ponds, rivers, and lakes

  3. Tree Canopy and Shrubs: All woody vegetation including trees and shrubs

  4. Low Vegetation: Plant material less than 2 meters in height including lawns

  5. Barren: Areas devoid of vegetation consisting of natural earthen material

  6. Impervious Surfaces: Human-constructed surfaces less than 2 meters in height

  7. Impervious Roads: Impervious surfaces that are used for transportation

  8. Aberdeen Proving Ground: U.S. Army facility with no labels

filename_glob = 'Baywide_7class_20132014.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 0), 1: (0, 197, 255, 255), 2: (38, 115, 0, 255), 3: (163, 255, 115, 255), 4: (255, 170, 0, 255), 5: (156, 156, 156, 255), 6: (0, 0, 0, 255), 7: (197, 0, 255, 255)}

Color map for the dataset, used for plotting

class torchgeo.datasets.Chesapeake13(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

Complete 13-class dataset.

This version of the dataset is composed of 13 classes:

  1. No Data: Background values

  2. Water: All areas of open water including ponds, rivers, and lakes

  3. Wetlands: Low vegetation areas located along marine or estuarine regions

  4. Tree Canopy: Deciduous and evergreen woody vegetation over 3-5 meters in height

  5. Shrubland: Heterogeneous woody vegetation including shrubs and young trees

  6. Low Vegetation: Plant material less than 2 meters in height including lawns

  7. Barren: Areas devoid of vegetation consisting of natural earthen material

  8. Structures: Human-constructed objects made of impervious materials

  9. Impervious Surfaces: Human-constructed surfaces less than 2 meters in height

  10. Impervious Roads: Impervious surfaces that are used for transportation

  11. Tree Canopy over Structures: Tree cover overlapping impervious structures

  12. Tree Canopy over Impervious Surfaces: Tree cover overlapping impervious surfaces

  13. Tree Canopy over Impervious Roads: Tree cover overlapping impervious roads

  14. Aberdeen Proving Ground: U.S. Army facility with no labels

filename_glob = 'Baywide_13Class_20132014.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeDC(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for Washington, D.C.

filename_glob = 'DC_11001/DC_11001.img'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeDE(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for Delaware.

filename_glob = 'DE_STATEWIDE.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeMD(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for Maryland.

Note

This dataset requires the following additional library to be installed:

filename_glob = 'MD_STATEWIDE.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeNY(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for New York.

Note

This dataset requires the following additional library to be installed:

filename_glob = 'NY_STATEWIDE.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakePA(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for Pennsylvania.

filename_glob = 'PA_STATEWIDE.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeVA(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for Virginia.

Note

This dataset requires the following additional library to be installed:

filename_glob = 'CIC2014_VA_STATEWIDE.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeWV(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: Chesapeake

This subset of the dataset contains data only for West Virginia.

filename_glob = 'WV_STATEWIDE.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.ChesapeakeCVPR(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: GeoDataset

CVPR 2019 Chesapeake Land Cover dataset.

The CVPR 2019 Chesapeake Land Cover dataset contains two layers of NAIP aerial imagery, Landsat 8 leaf-on and leaf-off imagery, Chesapeake Bay land cover labels, NLCD land cover labels, and Microsoft building footprint labels.

This dataset was organized to accompany the 2019 CVPR paper, “Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data”.

The paper “Resolving label uncertainty with implicit generative models” added an additional layer of data to this dataset containing a prior over the Chesapeake Bay land cover classes generated from the NLCD land cover labels. For more information about this layer see the dataset documentation.

If you use this dataset in your research, please cite the following paper:

__init__(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”

  • layers (Sequence[str]) – a list containing a subset of “naip-new”, “naip-old”, “lc”, “nlcd”, “landsat-leaf-on”, “landsat-leaf-off”, “buildings”, or “prior_from_cooccurrences_101_31_no_osm_no_buildings” indicating which layers to load

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of image/mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

New in version 0.4.

Global Mangrove Distribution

class torchgeo.datasets.CMSGlobalMangroveCanopy(paths='data', crs=None, res=None, measurement='agb', country='AndamanAndNicobar', transforms=None, cache=True, checksum=False)[source]

Bases: RasterDataset

CMS Global Mangrove Canopy dataset.

The CMS Global Mangrove Canopy dataset consists of a single band map at 30m resolution of either aboveground biomass (agb), basal area weighted height (hba95), or maximum canopy height (hmax95).

The dataset needs to be manually dowloaded from the above link, where you can make an account and subsequently download the dataset.

New in version 0.3.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

filename_regex = '^\n        (?P<mangrove>[A-Za-z]{8})\n        _(?P<variable>[a-z0-9]*)\n        _(?P<country>[A-Za-z][^.]*)\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(paths='data', crs=None, res=None, measurement='agb', country='AndamanAndNicobar', transforms=None, cache=True, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | list[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • measurement (str) – which of the three measurements, ‘agb’, ‘hba95’, or ‘hmax95’

  • country (str) – country for which to retrieve data

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Cropland Data Layer

class torchgeo.datasets.CDL(paths='data', crs=None, res=None, years=[2023], classes=[0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 75, 76, 77, 81, 82, 83, 87, 88, 92, 111, 112, 121, 122, 123, 124, 131, 141, 142, 143, 152, 176, 190, 195, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 254], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

Cropland Data Layer (CDL) dataset.

The Cropland Data Layer, hosted on CropScape, provides a raster, geo-referenced, crop-specific land cover map for the continental United States. The CDL also includes a crop mask layer and planting frequency layers, as well as boundary, water and road layers. The Boundary Layer options provided are County, Agricultural Statistics Districts (ASD), State, and Region. The data is created annually using moderate resolution satellite imagery and extensive agricultural ground truth.

The dataset contains 134 classes, for a description of the classes see the xls file at the top of this page.

If you use this dataset in your research, please cite it using the following format:

filename_glob = '*_30m_cdls.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^(?P<date>\\d+)\n        _30m_cdls\\..*$\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 10: (111, 166, 0, 255), 11: (0, 175, 73, 255), 12: (222, 166, 9, 255), 13: (222, 166, 9, 255), 14: (124, 211, 255, 255), 21: (226, 0, 124, 255), 22: (137, 96, 83, 255), 23: (217, 181, 107, 255), 24: (166, 111, 0, 255), 25: (213, 158, 188, 255), 26: (111, 111, 0, 255), 27: (171, 0, 124, 255), 28: (160, 88, 137, 255), 29: (111, 0, 73, 255), 30: (213, 158, 188, 255), 31: (209, 255, 0, 255), 32: (124, 153, 255, 255), 33: (213, 213, 0, 255), 34: (209, 255, 0, 255), 35: (0, 175, 73, 255), 36: (255, 166, 226, 255), 37: (166, 241, 139, 255), 38: (0, 175, 73, 255), 39: (213, 158, 188, 255), 41: (168, 0, 226, 255), 42: (166, 0, 0, 255), 43: (111, 37, 0, 255), 44: (0, 175, 73, 255), 45: (175, 124, 255, 255), 46: (111, 37, 0, 255), 47: (255, 102, 102, 255), 48: (255, 102, 102, 255), 49: (255, 204, 102, 255), 50: (255, 102, 102, 255), 51: (0, 175, 73, 255), 52: (0, 222, 175, 255), 53: (83, 255, 0, 255), 54: (241, 162, 120, 255), 55: (255, 102, 102, 255), 56: (0, 175, 73, 255), 57: (124, 211, 255, 255), 58: (232, 190, 255, 255), 59: (175, 255, 222, 255), 60: (0, 175, 73, 255), 61: (190, 190, 120, 255), 63: (147, 204, 147, 255), 64: (198, 213, 158, 255), 65: (204, 190, 162, 255), 66: (255, 0, 255, 255), 67: (255, 143, 171, 255), 68: (185, 0, 79, 255), 69: (111, 69, 137, 255), 70: (0, 120, 120, 255), 71: (175, 153, 111, 255), 72: (255, 255, 124, 255), 74: (181, 111, 92, 255), 75: (0, 166, 130, 255), 76: (232, 213, 175, 255), 77: (175, 153, 111, 255), 81: (241, 241, 241, 255), 82: (153, 153, 153, 255), 83: (73, 111, 162, 255), 87: (124, 175, 175, 255), 88: (232, 255, 190, 255), 92: (0, 255, 255, 255), 111: (73, 111, 162, 255), 112: (211, 226, 249, 255), 121: (153, 153, 153, 255), 122: (153, 153, 153, 255), 123: (153, 153, 153, 255), 124: (153, 153, 153, 255), 131: (204, 190, 162, 255), 141: (147, 204, 147, 255), 142: (147, 204, 147, 255), 143: (147, 204, 147, 255), 152: (198, 213, 158, 255), 176: (232, 255, 190, 255), 190: (124, 175, 175, 255), 195: (124, 175, 175, 255), 204: (0, 255, 139, 255), 205: (213, 158, 188, 255), 206: (255, 102, 102, 255), 207: (255, 102, 102, 255), 208: (255, 102, 102, 255), 209: (255, 102, 102, 255), 210: (255, 143, 171, 255), 211: (51, 73, 51, 255), 212: (226, 111, 37, 255), 213: (255, 102, 102, 255), 214: (255, 102, 102, 255), 215: (102, 153, 77, 255), 216: (255, 102, 102, 255), 217: (175, 153, 111, 255), 218: (255, 143, 171, 255), 219: (255, 102, 102, 255), 220: (255, 143, 171, 255), 221: (255, 102, 102, 255), 222: (255, 102, 102, 255), 223: (255, 143, 171, 255), 224: (0, 175, 73, 255), 225: (255, 211, 0, 255), 226: (255, 211, 0, 255), 227: (255, 102, 102, 255), 228: (255, 211, 0, 255), 229: (255, 102, 102, 255), 230: (137, 96, 83, 255), 231: (255, 102, 102, 255), 232: (255, 37, 37, 255), 233: (226, 0, 124, 255), 234: (255, 158, 9, 255), 235: (255, 158, 9, 255), 236: (166, 111, 0, 255), 237: (255, 211, 0, 255), 238: (166, 111, 0, 255), 239: (37, 111, 0, 255), 240: (37, 111, 0, 255), 241: (255, 211, 0, 255), 242: (0, 0, 153, 255), 243: (255, 102, 102, 255), 244: (255, 102, 102, 255), 245: (255, 102, 102, 255), 246: (255, 102, 102, 255), 247: (255, 102, 102, 255), 248: (255, 102, 102, 255), 249: (255, 102, 102, 255), 250: (255, 102, 102, 255), 254: (37, 111, 0, 255)}

Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2023], classes=[0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 75, 76, 77, 81, 82, 83, 87, 88, 92, 111, 112, 121, 122, 123, 124, 131, 141, 142, 143, 152, 176, 190, 195, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 254], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • years (list[int]) – list of years for which to use cdl layer

  • classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

New in version 0.5: The years and classes parameters.

Changed in version 0.5: root was renamed to paths.

__getitem__(query)[source]

Retrieve mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

EDDMapS

class torchgeo.datasets.EDDMapS(root='data')[source]

Bases: GeoDataset

Dataset for EDDMapS.

EDDMapS, Early Detection and Distribution Mapping System, is a web-based mapping system for documenting invasive species and pest distribution. Launched in 2005 by the Center for Invasive Species and Ecosystem Health at the University of Georgia, it was originally designed as a tool for state Exotic Pest Plant Councils to develop more complete distribution data of invasive species. Since then, the program has expanded to include the entire US and Canada as well as to document certain native pest species.

EDDMapS query results can be downloaded in CSV, KML, or Shapefile format. This dataset currently only supports CSV files.

If you use an EDDMapS dataset in your research, please cite it like so:

  • EDDMapS. YEAR. Early Detection & Distribution Mapping System. The University of Georgia - Center for Invasive Species and Ecosystem Health. Available online at https://www.eddmaps.org/; last accessed DATE.

New in version 0.3.

__init__(root='data')[source]

Initialize a new Dataset instance.

Parameters:

root (str) – root directory where dataset can be found

Raises:

DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]

Retrieve metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

EnviroAtlas

class torchgeo.datasets.EnviroAtlas(root='data', splits=['pittsburgh_pa-2010_1m-train'], layers=['naip', 'prior'], transforms=None, prior_as_input=False, cache=True, download=False, checksum=False)[source]

Bases: GeoDataset

EnviroAtlas dataset covering four cities with prior and weak input data layers.

The EnviroAtlas dataset contains NAIP aerial imagery, NLCD land cover labels, OpenStreetMap roads, water, waterways, and waterbodies, Microsoft building footprint labels, high-resolution land cover labels from the EPA EnviroAtlas dataset, and high-resolution land cover prior layers.

This dataset was organized to accompany the 2022 paper, “Resolving label uncertainty with implicit generative models”. More details can be found at https://github.com/estherrolf/implicit-posterior.

If you use this dataset in your research, please cite the following paper:

New in version 0.3.

__init__(root='data', splits=['pittsburgh_pa-2010_1m-train'], layers=['naip', 'prior'], transforms=None, prior_as_input=False, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”

  • layers (Sequence[str]) – a list containing a subset of valid_layers indicating which layers to load

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • prior_as_input (bool) – bool describing whether the prior is used as an input (True) or as supervision (False)

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of image/mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Note: only plots the “naip” and “lc” layers.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

ValueError – if the NAIP layer isn’t included in self.layers

Return type:

Figure

Esri2020

class torchgeo.datasets.Esri2020(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

Esri 2020 Land Cover Dataset.

The Esri 2020 Land Cover dataset consists of a global single band land use/land cover map derived from ESA Sentinel-2 imagery at 10m resolution with a total of 10 classes. It was published in July 2021 and used the Universal Transverse Mercator (UTM) projection. This dataset only contains labels, no raw satellite imagery.

The 10 classes are:

  1. No Data

  2. Water

  3. Trees

  4. Grass

  5. Flooded Vegetation

  6. Crops

  7. Scrub/Shrub

  8. Built Area

  9. Bare Ground

  10. Snow/Ice

  11. Clouds

A more detailed explanation of the invidual classes can be found here.

If you use this dataset please cite the following paper:

New in version 0.3.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

filename_glob = '*_20200101-20210101.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '^\n        (?P<id>[0-9][0-9][A-Z])\n        _(?P<date>\\d{8})\n        -(?P<processing_date>\\d{8})\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

EU-DEM

class torchgeo.datasets.EUDEM(paths='data', crs=None, res=None, transforms=None, cache=True, checksum=False)[source]

Bases: RasterDataset

European Digital Elevation Model (EU-DEM) Dataset.

The EU-DEM dataset is a Digital Elevation Model of reference for the entire European region. The dataset can be downloaded from this website after making an account. A dataset factsheet is available here.

Dataset features:

  • DEMs at 25 m per pixel spatial resolution (~40,000x40,0000 px)

  • vertical accuracy of +/- 7 m RMSE

  • data fused from ASTER GDEM, SRTM and Russian topomaps

Dataset format:

  • DEMs are single-channel tif files

If you use this dataset in your research, please give credit to:

New in version 0.3.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

filename_glob = 'eu_dem_v11_*.TIF'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '(?P<name>[eudem_v11]{10})_(?P<id>[A-Z0-9]{6})'

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(paths='data', crs=None, res=None, transforms=None, cache=True, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load, here the collection of individual zip files for each tile should be found

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

EuroCrops

class torchgeo.datasets.EuroCrops(paths='data', crs=CRS.from_epsg(4326), res=1e-05, classes=None, transforms=None, download=False, checksum=False)[source]

Bases: VectorDataset

EuroCrops Dataset (Version 9).

The EuroCrops dataset combines “all publicly available self-declared crop reporting datasets from countries of the European Union” into a unified format. The dataset is released under CC BY 4.0 Deed.

The dataset consists of shapefiles containing a total of 22M polygons. Each polygon is tagged with a “EC_hcat_n” attribute indicating the harmonized crop name grown within the polygon in the year associated with the shapefile.

If you use this dataset in your research, please follow the citation guidelines at https://github.com/maja601/EuroCrops#reference.

New in version 0.6.

filename_glob = '*_EC*.shp'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^(?P<country>[A-Z]{2})\n        (_(?P<region>[A-Z]+))?\n        _\n        (?P<date>\\d{4})\n        _\n        (?P<suffix>EC(?:21)?)\n        \\.shp$\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

date_format = '%Y'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group.

__init__(paths='data', crs=CRS.from_epsg(4326), res=1e-05, classes=None, transforms=None, download=False, checksum=False)[source]

Initialize a new EuroCrops instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search for files to load

  • crs (CRS) – coordinate reference system (CRS) to warp to (defaults to WGS-84)

  • res (float) – resolution of the dataset in units of CRS

  • classes (list[str] | None) – list of classes to include (specified by their HCAT code), the rest will be mapped to 0 (defaults to all classes)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

get_label(feature)[source]

Get label value to use for rendering a feature.

Parameters:

feature (Feature) – the fiona.model.Feature from which to extract the label.

Returns:

the integer label, or 0 if the feature should not be rendered.

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by VectorDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

GBIF

class torchgeo.datasets.GBIF(root='data')[source]

Bases: GeoDataset

Dataset for the Global Biodiversity Information Facility.

GBIF, the Global Biodiversity Information Facility, is an international network and data infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth.

This dataset is intended for use with GBIF’s occurrence records. It may or may not work for other GBIF datasets. Data for a particular species or region of interest can be downloaded from the above link.

If you use a GBIF dataset in your research, please cite it according to:

New in version 0.3.

__init__(root='data')[source]

Initialize a new Dataset instance.

Parameters:

root (str) – root directory where dataset can be found

Raises:

DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]

Retrieve metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

GlobBiomass

class torchgeo.datasets.GlobBiomass(paths='data', crs=None, res=None, measurement='agb', transforms=None, cache=True, checksum=False)[source]

Bases: RasterDataset

GlobBiomass dataset.

The GlobBiomass dataset consists of global pixel wise aboveground biomass (AGB) and growth stock volume (GSV) maps.

Dataset features:

  • estimates of AGB and GSV around the world at ~100m per pixel resolution (45,000x45,0000 px)

  • standard error maps of respective measurement at same resolution

Dataset format:

  • estimate maps are single-channel

  • standard error maps are single-channel

The data can be manually downloaded from this website.

If you use this dataset please cite it with the following citation:

New in version 0.3.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

filename_regex = '^\n        (?P<tile>[0-9A-Z]*)\n        _(?P<measurement>[a-z]{3})\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(paths='data', crs=None, res=None, measurement='agb', transforms=None, cache=True, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • measurement (str) – use data from ‘agb’ or ‘gsv’ measurement

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

Changed in version 0.5: root was renamed to paths.

__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample at index consisting of measurement mask with 2 channels, where the first is the measurement and the second the error map

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

iNaturalist

class torchgeo.datasets.INaturalist(root='data')[source]

Bases: GeoDataset

Dataset for iNaturalist.

iNaturalist is a joint initiative of the California Academy of Sciences and the National Geographic Society. It allows citizen scientists to upload observations of organisms that can be downloaded by scientists and researchers.

If you use an iNaturalist dataset in your research, please cite it according to:

New in version 0.3.

__init__(root='data')[source]

Initialize a new Dataset instance.

Parameters:

root (str) – root directory where dataset can be found

Raises:

DatasetNotFoundError – If dataset is not found.

__getitem__(query)[source]

Retrieve metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

L7 Irish

class torchgeo.datasets.L7Irish(paths='data', crs=CRS.from_epsg(3857), res=None, bands=['B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80'], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

L7 Irish dataset.

The L7 Irish dataset is based on Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1G scenes. Manually generated cloud masks are used to train and validate cloud cover assessment algorithms, which in turn are intended to compute the percentage of cloud cover in each scene.

Dataset features:

  • Images divided between 9 unique biomes

  • 206 scenes from Landsat 7 ETM+ sensor

  • Imagery from global tiles between June 2000–December 2001

  • 9 Level-1 spectral bands with 30 m per pixel resolution

Dataset format:

  • Images are composed of single multiband geotiffs

  • Labels are multiclass, stored in single geotiffs

  • Level-1 metadata (MTL.txt file)

  • Landsat 7 ETM+ bands: (B10, B20, B30, B40, B50, B61, B62, B70, B80)

Dataset classes:

  1. Fill

  2. Cloud Shadow

  3. Clear

  4. Thin Cloud

  5. Cloud

If you use this dataset in your research, please cite the following:

New in version 0.5.

filename_glob = 'L71*.TIF'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^L71\n        (?P<wrs_path>\\d{3})\n        (?P<wrs_row>\\d{3})\n        _(?P=wrs_row)\n        (?P<date>\\d{8})\n        \\.TIF$\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y%m%d'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

separate_files = False

True if data is stored in a separate file for each band, else False.

rgb_bands: list[str] = ['B30', 'B20', 'B10']

Names of RGB bands in the dataset, used for plotting

all_bands: list[str] = ['B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80']

Names of all available bands in the dataset

__init__(paths='data', crs=CRS.from_epsg(3857), res=None, bands=['B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80'], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new L7Irish instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to EPSG:3857)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • bands (Sequence[str]) – bands to return (defaults to all bands)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of image, mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

L8 Biome

class torchgeo.datasets.L8Biome(paths, crs=CRS.from_epsg(3857), res=None, bands=['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11'], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

L8 Biome dataset.

The L8 Biome dataset is a validation dataset for cloud cover assessment algorithms, consisting of Pre-Collection Landsat 8 Operational Land Imager (OLI) Thermal Infrared Sensor (TIRS) terrain-corrected (Level-1T) scenes.

Dataset features:

  • Images evenly divided between 8 unique biomes

  • 96 scenes from Landsat 8 OLI/TIRS sensors

  • Imagery from global tiles between April 2013–October 2014

  • 11 Level-1 spectral bands with 30 m per pixel resolution

Dataset format:

  • Images are composed of single multiband geotiffs

  • Labels are multiclass, stored in single geotiffs

  • Quality assurance bands, stored in single geotiffs

  • Level-1 metadata (MTL.txt file)

  • Landsat 8 OLI/TIRS bands: (B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, B11)

Dataset classes:

  1. Fill

  2. Cloud Shadow

  3. Clear

  4. Thin Cloud

  5. Cloud

If you use this dataset in your research, please cite the following:

New in version 0.5.

filename_glob = 'LC8*.TIF'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^LC8\n        (?P<wrs_path>\\d{3})\n        (?P<wrs_row>\\d{3})\n        (?P<date>\\d{7})\n        (?P<gsi>[A-Z]{3})\n        (?P<version>\\d{2})\n        \\.TIF$\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y%j'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

separate_files = False

True if data is stored in a separate file for each band, else False.

rgb_bands: list[str] = ['B4', 'B3', 'B2']

Names of RGB bands in the dataset, used for plotting

all_bands: list[str] = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11']

Names of all available bands in the dataset

__init__(paths, crs=CRS.from_epsg(3857), res=None, bands=['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11'], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new L8Biome instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to EPSG:3857)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • bands (Sequence[str]) – bands to return (defaults to all bands)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of image, mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

LandCover.ai Geo

class torchgeo.datasets.LandCoverAIBase(root='data', download=False, checksum=False)[source]

Bases: Dataset[dict[str, Any]], ABC

Abstract base class for LandCover.ai Geo and NonGeo datasets.

The LandCover.ai (Land Cover from Aerial Imagery) dataset is a dataset for automatic mapping of buildings, woodlands, water and roads from aerial images. This implementation is specifically for Version 1 of LandCover.ai.

Dataset features:

  • land cover from Poland, Central Europe

  • three spectral bands - RGB

  • 33 orthophotos with 25 cm per pixel resolution (~9000x9500 px)

  • 8 orthophotos with 50 cm per pixel resolution (~4200x4700 px)

  • total area of 216.27 km2

Dataset format:

  • rasters are three-channel GeoTiffs with EPSG:2180 spatial reference system

  • masks are single-channel GeoTiffs with EPSG:2180 spatial reference system

Dataset classes:

  1. building (1.85 km2)

  2. woodland (72.02 km2)

  3. water (13.15 km2)

  4. road (3.5 km2)

If you use this dataset in your research, please cite the following paper:

New in version 0.5.

__init__(root='data', download=False, checksum=False)[source]

Initialize a new LandCover.ai dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • transforms – a function/transform that takes input sample and its target as entry and returns a transformed version

  • cache – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

abstract __getitem__(query)[source]

Retrieve image, mask and metadata indexed by index.

Parameters:

query (Any) – coordinates or an index

Returns:

sample of image, mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

class torchgeo.datasets.LandCoverAIGeo(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Bases: LandCoverAIBase, RasterDataset

LandCover.ai Geo dataset.

See the abstract LandCoverAIBase class to find out more.

New in version 0.5.

filename_glob = 'images/*.tif'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '.*tif'

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

__init__(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new LandCover.ai NonGeo dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of image, mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

Landsat

class torchgeo.datasets.Landsat(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: RasterDataset, ABC

Abstract base class for all Landsat datasets.

Landsat is a joint NASA/USGS program, providing the longest continuous space-based record of Earth’s land in existence.

If you use this dataset in your research, please cite it using the following format:

If you use any of the following Level-2 products, there may be additional citation requirements, including papers you can cite. See the “Citation Information” section of the following pages:

filename_regex = '\n        ^L\n        (?P<sensor>[COTEM])\n        (?P<satellite>\\d{2})\n        _(?P<processing_correction_level>[A-Z0-9]{4})\n        _(?P<wrs_path>\\d{3})\n        (?P<wrs_row>\\d{3})\n        _(?P<date>\\d{8})\n        _(?P<processing_date>\\d{8})\n        _(?P<collection_number>\\d{2})\n        _(?P<collection_category>[A-Z0-9]{2})\n        _(?P<band>[A-Z0-9_]+)\n        \\.\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

separate_files = True

True if data is stored in a separate file for each band, else False.

abstract property default_bands: list[str]

Bands to load by default.

__init__(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Initialize a new Dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Changed in version 0.5: root was renamed to paths.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

class torchgeo.datasets.Landsat9(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat8

Landsat 9 Operational Land Imager (OLI-2) and Thermal Infrared Sensor (TIRS-2).

filename_glob = 'LC09_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat8(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat

Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS).

filename_glob = 'LC08_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: list[str] = ['SR_B4', 'SR_B3', 'SR_B2']

Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat7(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat

Landsat 7 Enhanced Thematic Mapper Plus (ETM+).

filename_glob = 'LE07_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: list[str] = ['SR_B3', 'SR_B2', 'SR_B1']

Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat5TM(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat4TM

Landsat 5 Thematic Mapper (TM).

filename_glob = 'LT05_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat5MSS(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat4MSS

Landsat 4 Multispectral Scanner (MSS).

filename_glob = 'LM04_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat4TM(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat

Landsat 4 Thematic Mapper (TM).

filename_glob = 'LT04_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: list[str] = ['SR_B3', 'SR_B2', 'SR_B1']

Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat4MSS(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat

Landsat 4 Multispectral Scanner (MSS).

filename_glob = 'LM04_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: list[str] = ['B3', 'B2', 'B1']

Names of RGB bands in the dataset, used for plotting

class torchgeo.datasets.Landsat3(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat1

Landsat 3 Multispectral Scanner (MSS).

filename_glob = 'LM03_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat2(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat1

Landsat 2 Multispectral Scanner (MSS).

filename_glob = 'LM02_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

class torchgeo.datasets.Landsat1(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: Landsat

Landsat 1 Multispectral Scanner (MSS).

filename_glob = 'LM01_*_{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

rgb_bands: list[str] = ['B6', 'B5', 'B4']

Names of RGB bands in the dataset, used for plotting

NAIP

class torchgeo.datasets.NAIP(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: RasterDataset

National Agriculture Imagery Program (NAIP) dataset.

The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. A primary goal of the NAIP program is to make digital ortho photography available to governmental agencies and the public within a year of acquisition.

NAIP is administered by the USDA’s Farm Service Agency (FSA) through the Aerial Photography Field Office in Salt Lake City. This “leaf-on” imagery is used as a base layer for GIS programs in FSA’s County Service Centers, and is used to maintain the Common Land Unit (CLU) boundaries.

If you use this dataset in your research, please cite it using the following format:

filename_glob = 'm_*.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^m\n        _(?P<quadrangle>\\d+)\n        _(?P<quarter_quad>[a-z]+)\n        _(?P<utm_zone>\\d+)\n        _(?P<resolution>\\d+)\n        _(?P<date>\\d+)\n        (?:_(?P<processing_date>\\d+))?\n        \\..*$\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

all_bands: list[str] = ['R', 'G', 'B', 'NIR']

Names of all available bands in the dataset

rgb_bands: list[str] = ['R', 'G', 'B']

Names of RGB bands in the dataset, used for plotting

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

NCCM

class torchgeo.datasets.NCCM(paths='data', crs=None, res=None, years=[2019], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

The Northeastern China Crop Map Dataset.

Link: https://www.nature.com/articles/s41597-021-00827-9

This dataset produced annual 10-m crop maps of the major crops (maize, soybean, and rice) in Northeast China from 2017 to 2019, using hierarchial mapping strategies, random forest classifiers, interpolated and smoothed 10-day Sentinel-2 time series data and optimized features from spectral, temporal and textural characteristics of the land surface. The resultant maps have high overall accuracies (OA) based on ground truth data. The dataset contains information specific to three years: 2017, 2018, 2019.

The dataset contains 5 classes:

  1. paddy rice

  2. maize

  3. soybean

  4. others crops and lands

  5. nodata

Dataset format:

  • Three .TIF files containing the labels

  • JavaScript code to download images from the dataset.

If you use this dataset in your research, please cite the following paper:

New in version 0.6.

filename_regex = 'CDL(?P<date>\\d{4})_clip'

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

filename_glob = 'CDL*.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

date_format = '%Y'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 255, 0, 255), 1: (255, 0, 0, 255), 2: (255, 255, 0, 255), 3: (128, 128, 128, 255), 15: (255, 255, 255, 255)}

Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2019], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new dataset.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • years (list[int]) – list of years for which to use nccm layers

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]

Retrieve mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by NCCM.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

NLCD

class torchgeo.datasets.NLCD(paths='data', crs=None, res=None, years=[2019], classes=[0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

National Land Cover Database (NLCD) dataset.

The NLCD dataset is a land cover product that covers the United States and Puerto Rico. The current implementation supports maps for the continental United States only. The product is a joint effort between the United States Geological Survey (USGS) and the Multi-Resolution Land Characteristics Consortium (MRLC) which released the first product in 2001 with new updates every five years since then.

The dataset contains the following 17 classes:

  1. Background

  2. Open Water

  3. Perennial Ice/Snow

  4. Developed, Open Space

  5. Developed, Low Intensity

  6. Developed, Medium Intensity

  7. Developed, High Intensity

  8. Barren Land (Rock/Sand/Clay)

  9. Deciduous Forest

  10. Evergreen Forest

  11. Mixed Forest

  12. Shrub/Scrub

  13. Grassland/Herbaceous

  14. Pasture/Hay

  15. Cultivated Crops

  16. Woody Wetlands

  17. Emergent Herbaceous Wetlands

Detailed descriptions of the classes can be found here.

Dataset format:

  • single channel .img file with integer class labels

If you use this dataset in your research, please use the corresponding citation:

New in version 0.5.

filename_glob = 'nlcd_*_land_cover_l48_*.img'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = 'nlcd_(?P<date>\\d{4})_land_cover_l48_(?P<publication_date>\\d{8})\\.img'

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 0), 11: (70, 107, 159, 255), 12: (209, 222, 248, 255), 21: (222, 197, 197, 255), 22: (217, 146, 130, 255), 23: (235, 0, 0, 255), 24: (171, 0, 0, 255), 31: (179, 172, 159, 255), 41: (104, 171, 95, 255), 42: (28, 95, 44, 255), 43: (181, 197, 143, 255), 52: (204, 184, 121, 255), 71: (223, 223, 194, 255), 81: (220, 217, 57, 255), 82: (171, 108, 40, 255), 90: (184, 217, 235, 255), 95: (108, 159, 184, 255)}

Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2019], classes=[0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • years (list[int]) – list of years for which to use nlcd layer

  • classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:
__getitem__(query)[source]

Retrieve mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of mask and metadata at that index

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Open Buildings

class torchgeo.datasets.OpenBuildings(paths='data', crs=None, res=0.0001, transforms=None, checksum=False)[source]

Bases: VectorDataset

Open Buildings dataset.

The Open Buildings dataset consists of computer generated building detections across the African continent.

Dataset features:

  • 516M building detections as polygons with centroid lat/long

  • covering area of 19.4M km2 (64% of the African continent)

  • confidence score and Plus Code

Dataset format:

  • csv files containing building detections compressed as csv.gz

  • meta data geojson file

The data can be downloaded from here. Additionally, the meta data geometry file also needs to be placed in root as tiles.geojson.

If you use this dataset in your research, please cite the following technical report:

New in version 0.3.

filename_glob = '*_buildings.csv'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

__init__(paths='data', crs=None, res=0.0001, transforms=None, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths.

__getitem__(query)[source]

Retrieve image/mask and metadata indexed by query.

Parameters:

query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index

Returns:

sample of image/mask and metadata for the given query. If there are not matching shapes found within the query, an empty raster is returned

Raises:

IndexError – if query is not found in the index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

PRISMA

class torchgeo.datasets.PRISMA(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: RasterDataset

PRISMA dataset.

Hyperspectral Precursor and Application Mission PRISMA (PRecursore IperSpettrale della Missione Applicativa) is a medium-resolution hyperspectral imaging satellite, developed, owned, and operated by the Italian Space Agency ASI (Agenzia Spaziale Italiana). It is the successor to the discontinued HypSEO (Hyperspectral Satellite for Earth Observation) mission.

PRISMA carries two sensor instruments, the HYC (Hyperspectral Camera) module and the PAN (Panchromatic Camera) module. The HYC sensor is a prism spectrometer for two bands, VIS/NIR (Visible/Near Infrared) and NIR/SWIR (Near Infrared/Shortwave Infrared), with a total of 237 channels across both bands. Its primary mission objective is the high resolution hyperspectral imaging of land, vegetation, inner waters and coastal zones. The second sensor module, PAN, is a high resolution optical imager, and is co-registered with HYC data to allow testing of image fusion techniques.

The HYC module has a spatial resolution of 30 m and operates in two bands, a 66 channel VIS/NIR band with a spectral interval of 400-1010 nm, and a 171 channel NIR/SWIR band with a spectral interval of 920-2505 nm. It uses a pushbroom scanning technique with a swath width of 30 km, and a field of regard of 1000 km either side. The PAN module also uses a pushbroom scanning technique, with identical swath width and field of regard but spatial resolution of 5 m.

PRISMA is in a sun-synchronous orbit, with an altitude of 614 km, an inclination of 98.19° and its LTDN (Local Time on Descending Node) is at 1030 hours.

If you use this dataset in your research, please cite the following paper:

Note

PRISMA imagery is distributed as HDF5 files. However, TorchGeo does not yet have support for reprojecting and windowed reading of HDF5 files. This data loader requires you to first convert all files from HDF5 to GeoTIFF using something like this script.

New in version 0.6.

filename_glob = 'PRS_*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^PRS\n        _(?P<level>[A-Z\\d]+)\n        _(?P<product>[A-Z]+)\n        (_(?P<order>[A-Z_]+))?\n        _(?P<start>\\d{14})\n        _(?P<stop>\\d{14})\n        _(?P<version>\\d{4})\n        (_(?P<valid>\\d))?\n        \\.\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y%m%d%H%M%S'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Sentinel

class torchgeo.datasets.Sentinel(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]

Bases: RasterDataset

Abstract base class for all Sentinel datasets.

Sentinel is a family of satellites launched by the European Space Agency (ESA) under the Copernicus Programme.

If you use this dataset in your research, please cite it using the following format:

class torchgeo.datasets.Sentinel1(paths='data', crs=None, res=10, bands=['VV', 'VH'], transforms=None, cache=True)[source]

Bases: Sentinel

Sentinel-1 dataset.

The Sentinel-1 mission comprises a constellation of two polar-orbiting satellites, operating day and night performing C-band synthetic aperture radar imaging, enabling them to acquire imagery regardless of the weather.

Data can be downloaded from:

Product Types:

Polarizations:

  • HH: horizontal transmit, horizontal receive

  • HV: horizontal transmit, vertical receive

  • VV: vertical transmit, vertical receive

  • VH: vertical transmit, horizontal receive

Acquisition Modes:

Note

At the moment, this dataset only supports the GRD product type. Data must be radiometrically terrain corrected (RTC). This can be done manually using a DEM, or you can download an On Demand RTC product from ASF DAAC.

Note

Mixing \(\gamma_0\) and \(\sigma_0\) backscatter coefficient data is not recommended. Similarly, power, decibel, and amplitude scale data should not be mixed, and TorchGeo does not attempt to convert all data to a common scale.

New in version 0.4.

filename_regex = '\n        ^S1(?P<mission>[AB])\n        _(?P<mode>SM|IW|EW|WV)\n        _(?P<date>\\d{8}T\\d{6})\n        _(?P<polarization>[DS][HV])\n        (?P<orbit>[PRO])\n        _RTC(?P<spacing>\\d{2})\n        _(?P<package>G)\n        _(?P<backscatter>[gs])\n        (?P<scale>[pda])\n        (?P<mask>[uw])\n        (?P<filter>[nf])\n        (?P<area>[ec])\n        (?P<matching>[dm])\n        _(?P<product>[0-9A-Z]{4})\n        _(?P<band>[VH]{2})\n        \\.\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y%m%dT%H%M%S'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

all_bands: list[str] = ['HH', 'HV', 'VV', 'VH']

Names of all available bands in the dataset

separate_files = True

True if data is stored in a separate file for each band, else False.

__init__(paths='data', crs=None, res=10, bands=['VV', 'VH'], transforms=None, cache=True)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | list[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • bands (Sequence[str]) – bands to return (defaults to [“VV”, “VH”])

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

Raises:

Changed in version 0.5: root was renamed to paths.

filename_glob = 'S1*{}.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

class torchgeo.datasets.Sentinel2(paths='data', crs=None, res=10, bands=None, transforms=None, cache=True)[source]

Bases: Sentinel

Sentinel-2 dataset.

The Copernicus Sentinel-2 mission comprises a constellation of two polar-orbiting satellites placed in the same sun-synchronous orbit, phased at 180° to each other. It aims at monitoring variability in land surface conditions, and its wide swath width (290 km) and high revisit time (10 days at the equator with one satellite, and 5 days with 2 satellites under cloud-free conditions which results in 2-3 days at mid-latitudes) will support monitoring of Earth’s surface changes.

date_format = '%Y%m%dT%H%M%S'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

all_bands: list[str] = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12']

Names of all available bands in the dataset

rgb_bands: list[str] = ['B04', 'B03', 'B02']

Names of RGB bands in the dataset, used for plotting

separate_files = True

True if data is stored in a separate file for each band, else False.

__init__(paths='data', crs=None, res=10, bands=None, transforms=None, cache=True)[source]

Initialize a new Dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found.

Changed in version 0.5: root was renamed to paths

filename_glob = 'T*_*_{}*.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = '\n        ^T(?P<tile>\\d{{2}}[A-Z]{{3}})\n        _(?P<date>\\d{{8}}T\\d{{6}})\n        _(?P<band>B[018][\\dA])\n        (?:_(?P<resolution>{}m))?\n        \\..*$\n    '

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.

South Africa Crop Type

class torchgeo.datasets.SouthAfricaCropType(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], bands=['VH', 'VV', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], transforms=None)[source]

Bases: RasterDataset

South Africa Crop Type Challenge dataset.

The South Africa Crop Type Challenge dataset includes satellite imagery from Sentinel-1 and Sentinel-2 and labels for crop type that were collected by aerial and vehicle survey from May 2017 to March 2018. Data was provided by the Western Cape Department of Agriculture and is available via the Radiant Earth Foundation. For each field id the dataset contains time series imagery and a single label mask. Since TorchGeo does not yet support timeseries datasets, the first available imagery in July will be returned for each field. Note that the dates for S1 and S2 imagery for a given field are not guaranteed to be the same. Each pixel in the label contains an integer field number and crop type class.

Dataset format:

  • images are 2-band Sentinel 1 and 12-band Sentinel-2 data with a cloud mask

  • masks are tiff images with unique values representing the class and field id.

Dataset classes:

  1. No Data

  2. Lucerne/Medics

  3. Planted pastures (perennial)

  4. Fallow

  5. Wine grapes

  6. Weeds

  7. Small grain grazing

  8. Wheat

  9. Canola

  10. Rooibos

If you use this dataset in your research, please cite the following dataset:

  • Western Cape Department of Agriculture, Radiant Earth Foundation (2021) “Crop Type Classification Dataset for Western Cape, South Africa”, Version 1.0, Radiant MLHub, https://doi.org/10.34911/rdnt.j0co8q

New in version 0.6.

filename_regex = '\n        ^(?P<field_id>[0-9]*)\n        _(?P<date>[0-9]{4}_[0-9]{2}_[0-9]{2})\n        _(?P<band>(B[0-9A-Z]{2} | VH | VV))\n        _10m'

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y_%m_%d'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

rgb_bands: list[str] = ['B04', 'B03', 'B02']

Names of RGB bands in the dataset, used for plotting

all_bands: list[str] = ['VH', 'VV', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12']

Names of all available bands in the dataset

cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 7: (222, 166, 9, 255), 8: (111, 166, 0, 255), 9: (0, 175, 73, 255)}

Color map for the dataset, used for plotting

__init__(paths='data', crs=None, classes=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], bands=['VH', 'VV', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], transforms=None)[source]

Initialize a new South Africa Crop Type dataset instance.

Parameters:
Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(query)[source]

Return an index within the dataset.

Parameters:

index – index to return

Returns:

data and labels at that index

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

South America Soybean

class torchgeo.datasets.SouthAmericaSoybean(paths='data', crs=None, res=None, years=[2021], transforms=None, cache=True, download=False, checksum=False)[source]

Bases: RasterDataset

South America Soybean Dataset.

This dataset produced annual 30-m soybean maps of South America from 2001 to 2021.

Link: https://www.nature.com/articles/s41893-021-00729-z

Dataset contains 2 classes:

  1. other

  2. soybean

Dataset Format:

  • 21 .tif files

If you use this dataset in your research, please cite the following paper:

New in version 0.6.

filename_glob = 'South_America_Soybean_*.*'

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

filename_regex = 'South_America_Soybean_(?P<year>\\d{4})'

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

date_format = '%Y'

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, a custom __getitem__() method must be implemented.

__init__(paths='data', crs=None, res=None, years=[2021], transforms=None, cache=True, download=False, checksum=False)[source]

Initialize a new Dataset instance.

Parameters:
  • paths (str | collections.abc.Iterable[str]) – one or more root directories to search or files to load

  • crs (rasterio.crs.CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | None) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)

  • years (list[int]) – list of years for which to use the South America Soybean layer

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Non-geospatial Datasets

NonGeoDataset is designed for datasets that lack geospatial information. These datasets can still be combined using ConcatDataset.

C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection

Dataset

Task

Source

License

# Samples

# Classes

Size (px)

Resolution (m)

Bands

ADVANCE

C

Google Earth, Freesound

CC-BY-4.0

5,075

13

512x512

0.5

RGB

Benin Cashew Plantations

S

Airbus Pléiades

CC-BY-4.0

70

6

1,122x1,186

10

MSI

BigEarthNet

C

Sentinel-1/2

CDLA-Permissive-1.0

590,326

19–43

120x120

10

SAR, MSI

BioMassters

R

Sentinel-1/2 and Lidar

CC-BY-4.0

256x256

10

SAR, MSI

ChaBuD

CD

Sentinel-2

OpenRAIL

356

2

512x512

10

MSI

Cloud Cover Detection

S

Sentinel-2

CC-BY-4.0

22,728

2

512x512

10

MSI

COWC

C, R

CSUAV AFRL, ISPRS, LINZ, AGRC

AGPL-3.0-only

388,435

2

256x256

0.15

RGB

CropHarvest

C

Sentinel-1/2, SRTM, ERA5

CC-BY-SA-4.0

70,213

351

1x1

10

SAR, MSI, SRTM

Kenya Crop Type

S

Sentinel-2

CC-BY-SA-4.0

4,688

7

3,035x2,016

10

MSI

DeepGlobe Land Cover

S

DigitalGlobe +Vivid

803

7

2,448x2,448

0.5

RGB

DFC2022

S

Aerial

CC-BY-4.0

3,981

15

2,000x2,000

0.5

RGB

ETCI2021 Flood Detection

S

Sentinel-1

66,810

2

256x256

5–20

SAR

EuroSAT

C

Sentinel-2

MIT

27,000

10

64x64

10

MSI

FAIR1M

OD

Gaofen/Google Earth

CC-BY-NC-SA-3.0

15,000

37

1,024x1,024

0.3–0.8

RGB

FireRisk

C

NAIP Aerial

CC-BY-NC-4.0

91,872

7

320x320

1

RGB

Forest Damage

OD

Drone imagery

CDLA-Permissive-1.0

1,543

4

1,500x1,500

RGB

GID-15

S

Gaofen-2

150

15

6,800x7,200

3

RGB

IDTReeS

OD,C

Aerial

CC-BY-4.0

591

33

200x200

0.1–1

RGB

Inria Aerial Image Labeling

S

Aerial

360

2

5,000x5,000

0.3

RGB

LandCover.ai

S

Aerial

CC-BY-NC-SA-4.0

10,674

5

512x512

0.25–0.5

RGB

LEVIR-CD

CD

Google Earth

637

2

1,024x1,024

0.5

RGB

LEVIR-CD+

CD

Google Earth

985

2

1,024x1,024

0.5

RGB

LoveDA

S

Google Earth

CC-BY-NC-SA-4.0

5,987

7

1,024x1,024

0.3

RGB

MapInWild

S

Sentinel-1/2, ESA WorldCover, NOAA VIIRS DNB

CC-BY-4.0

1018

1

1920x1920

10–463.83

SAR, MSI, 2020_Map, avg_rad

Million-AID

C

Google Earth

1M

51–73

0.5–153

RGB

NASA Marine Debris

OD

PlanetScope

Apache-2.0

707

1

256x256

3

RGB

OSCD

CD

Sentinel-2

CC-BY-4.0

24

2

40–1,180

60

MSI

PASTIS

I

Sentinel-1/2

CC-BY-4.0

2,433

19

128x128xT

10

MSI

PatternNet

C

Google Earth

30,400

38

256x256

0.06–5

RGB

Potsdam

S

Aerial

38

6

6,000x6,000

0.05

MSI

ReforesTree

OD, R

Aerial

CC-BY-4.0

100

6

4,000x4,000

0.02

RGB

RESISC45

C

Google Earth

CC-BY-NC-4.0

31,500

45

256x256

0.2–30

RGB

Rwanda Field Boundary

S

Planetscope

NICFI AND CC-BY-4.0

70

2

256x256

4.7

RGB + NIR

Seasonal Contrast

T

Sentinel-2

CC-BY-4.0

100K–1M

264x264

10

MSI

SeasoNet

S

Sentinel-2

CC-BY-4.0

1,759,830

33

120x120

10

MSI

SEN12MS

S

Sentinel-1/2, MODIS

CC-BY-4.0

180,662

33

256x256

10

SAR, MSI

SKIPP’D

R

Fish-eye

CC-BY-4.0

363,375

64x64

RGB

So2Sat

C

Sentinel-1/2

CC-BY-4.0

400,673

17

32x32

10

SAR, MSI

SpaceNet

I

WorldView-2/3 Planet Lab Dove

CC-BY-SA-4.0

1,889–28,728

2

102–900

0.5–4

MSI

SSL4EO-L

T

Landsat

CC0-1.0

1M

264x264

30

MSI

SSL4EO-S12

T

Sentinel-1/2

CC-BY-4.0

1M

264x264

10

SAR, MSI

SSL4EO-L Benchmark

S

Lansat & CDL

CC0-1.0

25K

134

264x264

30

MSI

SSL4EO-L Benchmark

S

Lansat & NLCD

CC0-1.0

25K

17

264x264

30

MSI

SustainBench Crop Yield

R

MODIS

CC-BY-SA-4.0

11k

32x32

MSI

Tropical Cyclone

R

GOES 8–16

CC-BY-4.0

108,110

256x256

4K–8K

MSI

UC Merced

C

USGS National Map

public domain

2,100

21

256x256

0.3

RGB

USAVars

R

NAIP Aerial

CC-BY-4.0

100K

4

RGB, NIR

Vaihingen

S

Aerial

33

6

1,281–3,816

0.09

RGB

VHR-10

I

Google Earth, Vaihingen

MIT

800

10

358–1,728

0.08–2

RGB

Western USA Live Fuel Moisture

R

Landsat8, Sentinel-1

CC-BY-NC-ND-4.0

2615

xView2

CD

Maxar

CC-BY-NC-SA-4.0

3,732

4

1,024x1,024

0.8

RGB

ZueriCrop

I, T

Sentinel-2

116K

48

24x24

10

MSI

ADVANCE

class torchgeo.datasets.ADVANCE(root='data', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

ADVANCE dataset.

The ADVANCE dataset is a dataset for audio visual scene recognition.

Dataset features:

  • 5,075 pairs of geotagged audio recordings and images

  • three spectral bands - RGB (512x512 px)

  • 10-second audio recordings

Dataset format:

  • images are three-channel jpgs

  • audio files are in wav format

Dataset classes:

  1. airport

  2. beach

  3. bridge

  4. farmland

  5. forest

  6. grassland

  7. harbour

  8. lake

  9. orchard

  10. residential

  11. sparse shrub land

  12. sports land

  13. train station

If you use this dataset in your research, please cite the following paper:

Note

This dataset requires the following additional library to be installed:

  • scipy to load the audio files to tensors

__init__(root='data', transforms=None, download=False, checksum=False)[source]

Initialize a new ADVANCE dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

New in version 0.2.

Benin Cashew Plantations

class torchgeo.datasets.BeninSmallHolderCashews(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]

Bases: NonGeoDataset

Smallholder Cashew Plantations in Benin dataset.

This dataset contains labels for cashew plantations in a 120 km2 area in the center of Benin. Each pixel is classified for Well-managed plantation, Poorly-managed plantation, No plantation and other classes. The labels are generated using a combination of ground data collection with a handheld GPS device, and final corrections based on Airbus Pléiades imagery. See this website for dataset details.

Specifically, the data consists of Sentinel 2 imagery from a 120 km2 area in the center of Benin over 71 points in time from 11/05/2019 to 10/30/2020 and polygon labels for 6 classes:

  1. No data

  2. Well-managed plantation

  3. Poorly-managed planatation

  4. Non-plantation

  5. Residential

  6. Background

  7. Uncertain

If you use this dataset in your research, please cite the following:

Note

This dataset requires the following additional library to be installed:

  • radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub

__init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]

Initialize a new Benin Smallholder Cashew Plantations Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • chip_size (int) – size of chips

  • stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips

  • bands (tuple[str, ...]) – the subset of bands to load

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • api_key (str | None) – a RadiantEarth MLHub API key to use for downloading the dataset

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

  • verbose (bool) – if True, print messages when new tiles are loaded

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

a dict containing image, mask, transform, crs, and metadata at index.

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of chips in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, time_step=0, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • time_step (int) – time step at which to access image, beginning with 0

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

New in version 0.2.

BigEarthNet

class torchgeo.datasets.BigEarthNet(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

BigEarthNet dataset.

The BigEarthNet dataset is a dataset for multilabel remote sensing image scene classification.

Dataset features:

  • 590,326 patches from 125 Sentinel-1 and Sentinel-2 tiles

  • Imagery from tiles in Europe between Jun 2017 - May 2018

  • 12 spectral bands with 10-60 m per pixel resolution (base 120x120 px)

  • 2 synthetic aperture radar bands (120x120 px)

  • 43 or 19 scene classes from the 2018 CORINE Land Cover database (CLC 2018)

Dataset format:

  • images are composed of multiple single channel geotiffs

  • labels are multiclass, stored in a single json file per image

  • mapping of Sentinel-1 to Sentinel-2 patches are within Sentinel-1 json files

  • Sentinel-1 bands: (VV, VH)

  • Sentinel-2 bands: (B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)

  • All bands: (VV, VH, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)

  • Sentinel-2 bands are of different spatial resolutions and upsampled to 10m

Dataset classes (43):

  1. Continuous urban fabric

  2. Discontinuous urban fabric

  3. Industrial or commercial units

  4. Road and rail networks and associated land

  5. Port areas

  6. Airports

  7. Mineral extraction sites

  8. Dump sites

  9. Construction sites

  10. Green urban areas

  11. Sport and leisure facilities

  12. Non-irrigated arable land

  13. Permanently irrigated land

  14. Rice fields

  15. Vineyards

  16. Fruit trees and berry plantations

  17. Olive groves

  18. Pastures

  19. Annual crops associated with permanent crops

  20. Complex cultivation patterns

  21. Land principally occupied by agriculture, with significant areas of natural vegetation

  22. Agro-forestry areas

  23. Broad-leaved forest

  24. Coniferous forest

  25. Mixed forest

  26. Natural grassland

  27. Moors and heathland

  28. Sclerophyllous vegetation

  29. Transitional woodland/shrub

  30. Beaches, dunes, sands

  31. Bare rock

  32. Sparsely vegetated areas

  33. Burnt areas

  34. Inland marshes

  35. Peatbogs

  36. Salt marshes

  37. Salines

  38. Intertidal flats

  39. Water courses

  40. Water bodies

  41. Coastal lagoons

  42. Estuaries

  43. Sea and ocean

Dataset classes (19):

  1. Urban fabric

  2. Industrial or commercial units

  3. Arable land

  4. Permanent crops

  5. Pastures

  6. Complex cultivation patterns

  7. Land principally occupied by agriculture, with significant areas of natural vegetation

  8. Agro-forestry areas

  9. Broad-leaved forest

  10. Coniferous forest

  11. Mixed forest

  12. Natural grassland and sparsely vegetated areas

  13. Moors, heathland and sclerophyllous vegetation

  14. Transitional woodland, shrub

  15. Beaches, dunes, sands

  16. Inland wetlands

  17. Coastal wetlands

  18. Inland waters

  19. Marine waters

The source for the above dataset classes, their respective ordering, and 43-to-19-class mappings can be found here:

If you use this dataset in your research, please cite the following paper:

__init__(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]

Initialize a new BigEarthNet dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – train/val/test split to load

  • bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}

  • num_classes (int) – number of classes to load in target. one of {19, 43}

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

New in version 0.2.

BioMassters

class torchgeo.datasets.BioMassters(root='data', split='train', sensors=['S1', 'S2'], as_time_series=False)[source]

Bases: NonGeoDataset

BioMassters Dataset for Aboveground Biomass prediction.

Dataset intended for Aboveground Biomass (AGB) prediction over Finnish forests based on Sentinel 1 and 2 data with corresponding target AGB mask values generated by Light Detection and Ranging (LiDAR).

Dataset Format:

  • .tif files for Sentinel 1 and 2 data

  • .tif file for pixel wise AGB target mask

  • .csv files for metadata regarding features and targets

Dataset Features:

  • 13,000 target AGB masks of size (256x256px)

  • 12 months of data per target mask

  • Sentinel 1 and Sentinel 2 data for each location

  • Sentinel 1 available for every month

  • Sentinel 2 available for almost every month (not available for every month due to ESA aquisition halt over the region during particular periods)

If you use this dataset in your research, please cite the following paper:

New in version 0.5.

__init__(root='data', split='train', sensors=['S1', 'S2'], as_time_series=False)[source]

Initialize a new instance of BioMassters dataset.

If as_time_series=False (the default), each time step becomes its own sample with the target being shared across multiple samples.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – train or test split

  • sensors (Sequence[str]) – which sensors to consider for the sample, Sentinel 1 and/or Sentinel 2 (‘S1’, ‘S2’)

  • as_time_series (bool) – whether or not to return all available time-steps or just a single one for a given target location

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and labels at that index

Raises:

IndexError – if index is out of range of the dataset

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the length of the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample return by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

ChaBuD

class torchgeo.datasets.ChaBuD(root='data', split='train', bands=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

ChaBuD dataset.

ChaBuD is a dataset for Change detection for Burned area Delineation and is used for the ChaBuD ECML-PKDD 2023 Discovery Challenge.

Dataset features:

  • Sentinel-2 multispectral imagery

  • binary masks of burned areas

  • 12 multispectral bands

  • 356 pairs of pre and post images with 10 m per pixel resolution (512x512 px)

Dataset format:

  • single hdf5 dataset containing images and masks

Dataset classes:

  1. no change

  2. burned area

If you use this dataset in your research, please cite the following paper:

Note

This dataset requires the following additional library to be installed:

  • h5py to load the dataset

New in version 0.6.

__init__(root='data', split='train', bands=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], transforms=None, download=False, checksum=False)[source]

Initialize a new ChaBuD dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “val”

  • bands (list[str]) – the subset of bands to load

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

sample containing image and mask

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Cloud Cover Detection

class torchgeo.datasets.CloudCoverDetection(root='data', split='train', bands=['B02', 'B03', 'B04', 'B08'], transforms=None, download=False, api_key=None, checksum=False)[source]

Bases: NonGeoDataset

Cloud Cover Detection Challenge dataset.

This training dataset was generated as part of a crowdsourcing competition on DrivenData.org, and later on was validated using a team of expert annotators. See this website for dataset details.

The dataset consists of Sentinel-2 satellite imagery and corresponding cloudy labels stored as GeoTiffs. There are 22,728 chips in the training data, collected between 2018 and 2020.

Each chip has:

  • 4 multi-spectral bands from Sentinel-2 L2A product. The four bands are [B02, B03, B04, B08] (refer to Sentinel-2 documentation for more information about the bands).

  • Label raster for the corresponding source tile representing a binary classification for if the pixel is a cloud or not.

If you use this dataset in your research, please cite the following paper:

Note

This dataset requires the following additional library to be installed:

  • radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub

New in version 0.4.

__init__(root='data', split='train', bands=['B02', 'B03', 'B04', 'B08'], transforms=None, download=False, api_key=None, checksum=False)[source]

Initiatlize a new Cloud Cover Detection Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – train/val/test split to load

  • bands (Sequence[str]) – the subset of bands to load

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • api_key (str | None) – a RadiantEarth MLHub API key to use for downloading the dataset

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]

Return the number of items in the dataset.

Returns:

length of dataset in integer

Return type:

int

__getitem__(index)[source]

Returns a sample from dataset.

Parameters:

Index – index to return

Returns:

data and label at given index

Return type:

dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • time_step – time step at which to access image, beginning with 0

  • suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

COWC

class torchgeo.datasets.COWC(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset, ABC

Abstract base class for the COWC dataset.

The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.

The dataset has the following attributes:

  1. Data from overhead at 15 cm per pixel resolution at ground (all data is EO).

  2. Data from six distinct locations: Toronto, Canada; Selwyn, New Zealand; Potsdam and Vaihingen, Germany; Columbus, Ohio and Utah, United States.

  3. 32,716 unique annotated cars. 58,247 unique negative examples.

  4. Intentional selection of hard negative examples.

  5. Established baseline for detection and counting tasks.

  6. Extra testing scenes for use after validation.

If you use this dataset in your research, please cite the following paper:

abstract property base_url: str

Base URL to download dataset from.

abstract property filenames: list[str]

List of files to download.

abstract property md5s: list[str]

List of MD5 checksums of files to download.

abstract property filename: str

Filename containing train/test split and target labels.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new COWC dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

New in version 0.2.

class torchgeo.datasets.COWCCounting(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: COWC

COWC Dataset for car counting.

class torchgeo.datasets.COWCDetection(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: COWC

COWC Dataset for car detection.

CropHarvest

class torchgeo.datasets.CropHarvest(root='data', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

CropHarvest dataset.

CropHarvest is a crop classification dataset.

Dataset features:

  • single pixel time series with crop-type labels

  • 18 bands per image over 12 months

Dataset format:

  • arrays are 12x18 with 18 bands over 12 months

Dataset properties:

  1. is_crop - whether or not a single pixel contains cropland

  2. classification_label - optional field identifying a specific crop type

  3. dataset - source dataset for the imagery

  4. lat - latitude

  5. lon - longitude

If you use this dataset in your research, please cite the following paper:

This dataset requires the following additional library to be installed:

  • h5py to load the dataset

New in version 0.6.

__init__(root='data', transforms=None, download=False, checksum=False)[source]

Initialize a new CropHarvest dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

single pixel time-series array and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, subtitle=None)[source]

Plot a sample from the dataset using bands for Agriculture RGB composite.

Parameters:
Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Kenya Crop Type

class torchgeo.datasets.CV4AKenyaCropType(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]

Bases: NonGeoDataset

CV4A Kenya Crop Type dataset.

Used in a competition in the Computer NonGeo for Agriculture (CV4A) workshop in ICLR 2020. See this website for dataset details.

Consists of 4 tiles of Sentinel 2 imagery from 13 different points in time.

Each tile has:

  • 13 multi-band observations throughout the growing season. Each observation includes 12 bands from Sentinel-2 L2A product, and a cloud probability layer. The twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12] (refer to Sentinel-2 documentation for more information about the bands). The cloud probability layer is a product of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) and provides an estimated cloud probability (0-100%) per pixel. All of the bands are mapped to a common 10 m spatial resolution grid.

  • A raster layer indicating the crop ID for the fields in the training set.

  • A raster layer indicating field IDs for the fields (both training and test sets). Fields with a crop ID 0 are the test fields.

There are 3,286 fields in the train set and 1,402 fields in the test set.

If you use this dataset in your research, please cite the following paper:

Note

This dataset requires the following additional library to be installed:

  • radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub

__init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]

Initialize a new CV4A Kenya Crop Type Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • chip_size (int) – size of chips

  • stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips

  • bands (tuple[str, ...]) – the subset of bands to load

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • api_key (str | None) – a RadiantEarth MLHub API key to use for downloading the dataset

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

  • verbose (bool) – if True, print messages when new tiles are loaded

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data, labels, field ids, and metadata at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of chips in the dataset.

Returns:

length of the dataset

Return type:

int

get_splits()[source]

Get the field_ids for the train/test splits from the dataset directory.

Returns:

list of training field_ids and list of testing field_ids

Return type:

tuple[list[int], list[int]]

plot(sample, show_titles=True, time_step=0, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • time_step (int) – time step at which to access image, beginning with 0

  • suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

New in version 0.2.

DeepGlobe Land Cover

class torchgeo.datasets.DeepGlobeLandCover(root='data', split='train', transforms=None, checksum=False)[source]

Bases: NonGeoDataset

DeepGlobe Land Cover Classification Challenge dataset.

The DeepGlobe Land Cover Classification Challenge dataset offers high-resolution sub-meter satellite imagery focusing for the task of semantic segmentation to detect areas of urban, agriculture, rangeland, forest, water, barren, and unknown. It contains 1,146 satellite images of size 2448 x 2448 pixels in total, split into training/validation/test sets, the original dataset can be downloaded from Kaggle. However, we only use the training dataset with 803 images since the original test and valid dataset are not accompanied by labels. The dataset that we use with a custom train/test split can be downloaded from Kaggle (created as a part of Computer Vision by Deep Learning (CS4245) course offered at TU Delft).

Dataset format:

  • images are RGB data

  • masks are RGB image with with unique RGB values representing the class

Dataset classes:

  1. Urban land

  2. Agriculture land

  3. Rangeland

  4. Forest land

  5. Water

  6. Barren land

  7. Unknown

File names for satellite images and the corresponding mask image are id_sat.jpg and id_mask.png, where id is an integer assigned to every image.

If you use this dataset in your research, please cite the following paper:

Note

This dataset can be downloaded using:

$ pip install kaggle  # place api key at ~/.kaggle/kaggle.json
$ kaggle datasets download -d geoap96/deepglobe2018-landcover-segmentation-traindataset
$ unzip deepglobe2018-landcover-segmentation-traindataset.zip

New in version 0.3.

__init__(root='data', split='train', transforms=None, checksum=False)[source]

Initialize a new DeepGlobeLandCover dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

  • alpha (float) – opacity with which to render predictions on top of the imagery

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

DFC2022

class torchgeo.datasets.DFC2022(root='data', split='train', transforms=None, checksum=False)[source]

Bases: NonGeoDataset

DFC2022 dataset.

The DFC2022 dataset is used as a benchmark dataset for the 2022 IEEE GRSS Data Fusion Contest and extends the MiniFrance dataset for semi-supervised semantic segmentation. The dataset consists of a train set containing labeled and unlabeled imagery and an unlabeled validation set. The dataset can be downloaded from the IEEEDataPort DFC2022 website.

Dataset features:

  • RGB aerial images at 0.5 m per pixel spatial resolution (~2,000x2,0000 px)

  • DEMs at 1 m per pixel spatial resolution (~1,000x1,0000 px)

  • Masks at 0.5 m per pixel spatial resolution (~2,000x2,0000 px)

  • 16 land use/land cover categories

  • Images collected from the IGN BD ORTHO database

  • DEMs collected from the IGN RGE ALTI database

  • Labels collected from the UrbanAtlas 2012 database

  • Data collected from 19 regions in France

Dataset format:

  • images are three-channel geotiffs

  • DEMS are single-channel geotiffs

  • masks are single-channel geotiffs with the pixel values represent the class

Dataset classes:

  1. No information

  2. Urban fabric

  3. Industrial, commercial, public, military, private and transport units

  4. Mine, dump and construction sites

  5. Artificial non-agricultural vegetated areas

  6. Arable land (annual crops)

  7. Permanent crops

  8. Pastures

  9. Complex and mixed cultivation patterns

  10. Orchards at the fringe of urban classes

  11. Forests

  12. Herbaceous vegetation associations

  13. Open spaces with little or no vegetation

  14. Wetlands

  15. Water

  16. Clouds and Shadows

If you use this dataset in your research, please cite the following paper:

New in version 0.3.

__init__(root='data', split='train', transforms=None, checksum=False)[source]

Initialize a new DFC2022 dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

ETCI2021 Flood Detection

class torchgeo.datasets.ETCI2021(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

ETCI 2021 Flood Detection dataset.

The ETCI2021 dataset is a dataset for flood detection

Dataset features:

  • 33,405 VV & VH Sentinel-1 Synthetic Aperture Radar (SAR) images

  • 2 binary masks per image representing water body & flood, respectively

  • 2 polarization band images (VV, VH) of 3 RGB channels per band

  • 3 RGB channels per band generated by the Hybrid Pluggable Processing Pipeline (hyp3)

  • Images with 5x20m per pixel resolution (256x256) px) taken in Interferometric Wide Swath acquisition mode

  • Flood events from 5 different regions

Dataset format:

  • VV band three-channel png

  • VH band three-channel png

  • water body mask single-channel png where no water body = 0, water body = 255

  • flood mask single-channel png where no flood = 0, flood = 255

Dataset classes:

  1. no flood/water

  2. flood/water

If you use this dataset in your research, please add the following to your acknowledgements section:

The authors would like to thank the NASA Earth Science Data Systems Program,
NASA Digital Transformation AI/ML thrust, and IEEE GRSS for organizing
the ETCI competition.
__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new ETCI 2021 dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train”, “val”, or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

EuroSAT

class torchgeo.datasets.EuroSAT(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]

Bases: NonGeoClassificationDataset

EuroSAT dataset.

The EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consists of 10 target classes with a total of 27,000 labeled and geo-referenced images.

Dataset format:

  • rasters are 13-channel GeoTiffs

  • labels are values in the range [0,9]

Dataset classes:

  • Annual Crop

  • Forest

  • Herbaceous Vegetation

  • Highway

  • Industrial Buildings

  • Pasture

  • Permanent Crop

  • Residential Buildings

  • River

  • SeaLake

This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:

If you use this dataset in your research, please cite the following papers:

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]

Initialize a new EuroSAT dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train”, “val”, or “test”

  • bands (Sequence[str]) – a sequence of band names to load

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

New in version 0.3: The bands parameter.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

New in version 0.2.

class torchgeo.datasets.EuroSAT100(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]

Bases: EuroSAT

Subset of EuroSAT containing only 100 images.

Intended for tutorials and demonstrations, not for benchmarking.

Maintains the same file structure, classes, and train-val-test split. Each class has 10 images (6 train, 2 val, 2 test), for a total of 100 images.

New in version 0.5.

FAIR1M

class torchgeo.datasets.FAIR1M(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

FAIR1M dataset.

The FAIR1M dataset is a dataset for remote sensing fine-grained oriented object detection.

Dataset features:

  • 15,000+ images with 0.3-0.8 m per pixel resolution (1,000-10,000 px)

  • 1 million object instances

  • 5 object categories, 37 object sub-categories

  • three spectral bands - RGB

  • images taken by Gaofen satellites and Google Earth

Dataset format:

  • images are three-channel tiffs

  • labels are xml files with PASCAL VOC like annotations

Dataset classes:

  1. Passenger Ship

  2. Motorboat

  3. Fishing Boat

  4. Tugboat

  5. other-ship

  6. Engineering Ship

  7. Liquid Cargo Ship

  8. Dry Cargo Ship

  9. Warship

  10. Small Car

  11. Bus

  12. Cargo Truck

  13. Dump Truck

  14. other-vehicle

  15. Van

  16. Trailer

  17. Tractor

  18. Excavator

  19. Truck Tractor

  20. Boeing737

  21. Boeing747

  22. Boeing777

  23. Boeing787

  24. ARJ21

  25. C919

  26. A220

  27. A321

  28. A330

  29. A350

  30. other-airplane

  31. Baseball Field

  32. Basketball Court

  33. Football Field

  34. Tennis Court

  35. Roundabout

  36. Intersection

  37. Bridge

If you use this dataset in your research, please cite the following paper:

New in version 0.2.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new FAIR1M dataset instance.

Parameters:
Raises:

Changed in version 0.5: Added split and download parameters.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

FireRisk

class torchgeo.datasets.FireRisk(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoClassificationDataset

FireRisk dataset.

The FireRisk dataset is a dataset for remote sensing fire risk classification.

Dataset features:

  • 91,872 images with 1 m per pixel resolution (320x320 px)

  • 70,331 and 21,541 train and val images, respectively

  • three spectral bands - RGB

  • 7 fire risk classes

  • images extracted from NAIP tiles

Dataset format:

  • images are three-channel pngs

Dataset classes:

  1. high

  2. low

  3. moderate

  4. non-burnable

  5. very_high

  6. very_low

  7. water

If you use this dataset in your research, please cite the following paper:

New in version 0.5.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new FireRisk dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “val”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Forest Damage

class torchgeo.datasets.ForestDamage(root='data', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

Forest Damage dataset.

The ForestDamage dataset contains drone imagery that can be used for tree identification, as well as tree damage classification for larch trees.

Dataset features:

  • 1543 images

  • 101,878 tree annotations

  • subset of 840 images contain 44,522 annotations about tree health (Healthy (H), Light Damage (LD), High Damage (HD)), all other images have “other” as damage level

Dataset format:

Dataset Classes:

  1. other

  2. healthy

  3. light damage

  4. high damage

If the download fails or stalls, it is recommended to try azcopy as suggested here. It is expected that the downloaded data file with name Data_Set_Larch_Casebearer can be found in root.

If you use this dataset in your research, please use the following citation:

  • Swedish Forest Agency (2021): Forest Damages - Larch Casebearer 1.0. National Forest Data Lab. Dataset.

New in version 0.3.

__init__(root='data', transforms=None, download=False, checksum=False)[source]

Initialize a new ForestDamage dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

GID-15

class torchgeo.datasets.GID15(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

GID-15 dataset.

The GID-15 dataset is a dataset for semantic segmentation.

Dataset features:

  • images taken by the Gaofen-2 (GF-2) satellite over 60 cities in China

  • masks representing 15 semantic categories

  • three spectral bands - RGB

  • 150 with 3 m per pixel resolution (6800x7200 px)

Dataset format:

  • images are three-channel pngs

  • masks are single-channel pngs

  • colormapped masks are 3 channel tifs

Dataset classes:

  1. background

  2. industrial_land

  3. urban_residential

  4. rural_residential

  5. traffic_land

  6. paddy_field

  7. irrigated_land

  8. dry_cropland

  9. garden_plot

  10. arbor_woodland

  11. shrub_land

  12. natural_grassland

  13. artificial_grassland

  14. river

  15. lake

  16. pond

If you use this dataset in your research, please cite the following paper:

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new GID-15 dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train”, “val”, or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

New in version 0.2.

IDTReeS

class torchgeo.datasets.IDTReeS(root='data', split='train', task='task1', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

IDTReeS dataset.

The IDTReeS dataset is a dataset for tree crown detection.

Dataset features:

  • RGB Image, Canopy Height Model (CHM), Hyperspectral Image (HSI), LiDAR Point Cloud

  • Remote sensing and field data generated by the National Ecological Observatory Network (NEON)

  • 0.1 - 1m resolution imagery

  • Task 1 - object detection (tree crown delination)

  • Task 2 - object classification (species classification)

  • Train set contains 85 images

  • Test set (task 1) contains 153 images

  • Test set (task 2) contains 353 images and tree crown polygons

Dataset format:

  • optical - three-channel RGB 200x200 geotiff

  • canopy height model - one-channel 20x20 geotiff

  • hyperspectral - 369-channel 20x20 geotiff

  • point cloud - Nx3 LAS file (.las), some files contain RGB colors per point

  • shapely files (.shp) containing polygons

  • csv file containing species labels and other metadata for each polygon

Dataset classes:

  1. ACPE

  2. ACRU

  3. ACSA3

  4. AMLA

  5. BETUL

  6. CAGL8

  7. CATO6

  8. FAGR

  9. GOLA

  10. LITU

  11. LYLU3

  12. MAGNO

  13. NYBI

  14. NYSY

  15. OXYDE

  16. PEPA37

  17. PIEL

  18. PIPA2

  19. PINUS

  20. PITA

  21. PRSE2

  22. QUAL

  23. QUCO2

  24. QUGE2

  25. QUHE2

  26. QULA2

  27. QULA3

  28. QUMO4

  29. QUNI

  30. QURU

  31. QUERC

  32. ROPS

  33. TSCA

If you use this dataset in your research, please cite the following paper:

New in version 0.2.

__init__(root='data', split='train', task='task1', transforms=None, download=False, checksum=False)[source]

Initialize a new IDTReeS dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “test”

  • task (str) – ‘task1’ for detection, ‘task2’ for detection + classification (only relevant for split=’test’)

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None, hsi_indices=(0, 1, 2))[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

  • hsi_indices (tuple[int, int, int]) – tuple of indices to create HSI false color image

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

plot_las(index)[source]

Plot a sample point cloud at the index.

Parameters:

index (int) – index to plot

Returns:

pyvista.PolyData object. Run pyvista.plot(point_cloud, …) to display

Raises:

ImportError – if pyvista is not installed

Return type:

pyvista.Plotter

Changed in version 0.4: Ported from Open3D to PyVista, colormap parameter removed.

Inria Aerial Image Labeling

class torchgeo.datasets.InriaAerialImageLabeling(root='data', split='train', transforms=None, checksum=False)[source]

Bases: NonGeoDataset

Inria Aerial Image Labeling Dataset.

The Inria Aerial Image Labeling dataset is a building detection dataset over dissimilar settlements ranging from densely populated areas to alpine towns. Refer to the dataset homepage to download the dataset.

Dataset features:

  • Coverage of 810 km2 (405 km2 for training and 405 km2 for testing)

  • Aerial orthorectified color imagery with a spatial resolution of 0.3 m

  • Number of images: 360 (train: 180, test: 180)

  • Train cities: Austin, Chicago, Kitsap, West Tyrol, Vienna

  • Test cities: Bellingham, Bloomington, Innsbruck, San Francisco, East Tyrol

Dataset format:

  • Imagery - RGB aerial GeoTIFFs of shape 5000 x 5000

  • Labels - RGB aerial GeoTIFFs of shape 5000 x 5000

If you use this dataset in your research, please cite the following paper:

New in version 0.3.

Changed in version 0.5: Added support for a val split.

__init__(root='data', split='train', transforms=None, checksum=False)[source]

Initialize a new InriaAerialImageLabeling Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – train/val/test split

  • transforms (collections.abc.Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version.

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__len__()[source]

Return the number of samples in the dataset.

Returns:

length of the dataset

Return type:

int

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

LandCover.ai

class torchgeo.datasets.LandCoverAI(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: LandCoverAIBase, NonGeoDataset

LandCover.ai dataset.

See the abstract LandCoverAIBase class to find out more.

Note

This dataset requires the following additional library to be installed:

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new LandCover.ai dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train”, “val”, or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

LEVIR-CD

class torchgeo.datasets.LEVIRCDBase(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset, ABC

Abstract base class for the LEVIRCD datasets.

New in version 0.6.

__init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new LEVIR-CD base dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional suptitle to use for figure

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

New in version 0.2.

class torchgeo.datasets.LEVIRCD(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: LEVIRCDBase

LEVIR-CD dataset.

The LEVIR-CD dataset is a dataset for building change detection.

Dataset features:

  • image pairs of 20 different urban regions across Texas between 2002-2018

  • binary change masks representing building change

  • three spectral bands - RGB

  • 637 image pairs with 50 cm per pixel resolution (~1024x1024 px)

Dataset format:

  • images are three-channel pngs

  • masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

  1. no change

  2. change

If you use this dataset in your research, please cite the following paper:

New in version 0.6.

LEVIR-CD+

class torchgeo.datasets.LEVIRCDPlus(root='data', split='train', transforms=None, download=False, checksum=False)[source]

Bases: LEVIRCDBase

LEVIR-CD+ dataset.

The LEVIR-CD+ dataset is a dataset for building change detection.

Dataset features:

  • image pairs of 20 different urban regions across Texas between 2002-2020

  • binary change masks representing building change

  • three spectral bands - RGB

  • 985 image pairs with 50 cm per pixel resolution (~1024x1024 px)

Dataset format:

  • images are three-channel pngs

  • masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

  1. no change

  2. change

If you use this dataset in your research, please cite the following paper:

Changed in version 0.6.

LoveDA

class torchgeo.datasets.LoveDA(root='data', split='train', scene=['urban', 'rural'], transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

LoveDA dataset.

The LoveDA datataset is a semantic segmentation dataset.

Dataset features:

  • 2713 urban scene and 3274 rural scene HSR images, spatial resolution of 0.3m

  • image source is Google Earth platform

  • total of 166768 annotated objects from Nanjing, Changzhou and Wuhan cities

  • dataset comes with predefined train, validation, and test set

  • dataset differentiates between ‘rural’ and ‘urban’ images

Dataset format:

  • images are three-channel pngs with dimension 1024x1024

  • segmentation masks are single-channel pngs

Dataset classes:

  1. background

  2. building

  3. road

  4. water

  5. barren

  6. forest

  7. agriculture

No-data regions assigned with 0 and should be ignored.

If you use this dataset in your research, please cite the following paper:

New in version 0.2.

__init__(root='data', split='train', scene=['urban', 'rural'], transforms=None, download=False, checksum=False)[source]

Initialize a new LoveDA dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train”, “val”, or “test”

  • scene (list[str]) – specify whether to load only ‘urban’, only ‘rural’ or both

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

image and mask at that index with image of dimension 3x1024x1024 and mask of dimension 1024x1024

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of datapoints in the dataset.

Returns:

length of dataset

Return type:

int

plot(sample, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

MapInWild

class torchgeo.datasets.MapInWild(root='data', modality=['mask', 'esa_wc', 'viirs', 's2_summer'], split='train', transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

MapInWild dataset.

The MapInWild dataset is curated for the task of wilderness mapping on a pixel-level. MapInWild is a multi-modal dataset and comprises various geodata acquired and formed from different RS sensors over 1018 locations: dual-pol Sentinel-1, four-season Sentinel-2 with 10 bands, ESA WorldCover map, and Visible Infrared Imaging Radiometer Suite NightTime Day/Night band. The dataset consists of 8144 images with the shape of 1920 × 1920 pixels. The images are weakly annotated from the World Database of Protected Areas (WDPA).

Dataset features:

  • 1018 areas globally sampled from the WDPA

  • 10-Band Sentinel-2

  • Dual-pol Sentinel-1

  • ESA WorldCover Land Cover

  • Visible Infrared Imaging Radiometer Suite NightTime Day/Night Band

If you use this dataset in your research, please cite the following paper:

New in version 0.5.

__init__(root='data', modality=['mask', 'esa_wc', 'viirs', 's2_summer'], split='train', transforms=None, download=False, checksum=False)[source]

Initialize a new MapInWild dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • modality (list[str]) – the modality to download. Choose from: “mask”, “esa_wc”, “viirs”, “s1”, “s2_temporal_subset”, “s2_[season]”.

  • split (str) – one of “train”, “validation”, or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample image-mask pair returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Million-AID

class torchgeo.datasets.MillionAID(root='data', task='multi-class', split='train', transforms=None, checksum=False)[source]

Bases: NonGeoDataset

Million-AID Dataset.

The MillionAID dataset consists of one million aerial images from Google Earth Engine that offers either a multi-class learning task with 51 classes or a multi-label learning task with 73 different possible labels. For more details please consult the accompanying paper.

Dataset features:

  • RGB aerial images with varying resolutions from 0.5 m to 153 m per pixel

  • images within classes can have different pixel dimension

Dataset format:

  • images are three-channel jpg

If you use this dataset in your research, please cite the following paper:

New in version 0.3.

__init__(root='data', task='multi-class', split='train', transforms=None, checksum=False)[source]

Initialize a new MillionAID dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • task (str) – type of task, either “multi-class” or “multi-label”

  • split (str) – train or test split

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found.

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

NASA Marine Debris

class torchgeo.datasets.NASAMarineDebris(root='data', transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]

Bases: NonGeoDataset

NASA Marine Debris dataset.

The NASA Marine Debris dataset is a dataset for detection of floating marine debris in satellite imagery.

Dataset features:

  • 707 patches with 3 m per pixel resolution (256x256 px)

  • three spectral bands - RGB

  • 1 object class: marine_debris

  • images taken by Planet Labs PlanetScope satellites

  • imagery taken from 2016-2019 from coasts of Greece, Honduras, and Ghana

Dataset format:

  • images are three-channel geotiffs in uint8 format

  • labels are numpy files (.npy) containing bounding box (xyxy) coordinates

  • additional: images in jpg format and labels in geojson format

If you use this dataset in your research, please cite the following paper:

Note

This dataset requires the following additional library to be installed:

  • radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub

New in version 0.2.

__init__(root='data', transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]

Initialize a new NASA Marine Debris Dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • api_key (str | None) – a RadiantEarth MLHub API key to use for downloading the dataset

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

  • verbose (bool) – if True, print messages when new tiles are loaded

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and labels at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, torch.Tensor]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

OSCD

class torchgeo.datasets.OSCD(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]

Bases: NonGeoDataset

OSCD dataset.

The Onera Satellite Change Detection dataset addresses the issue of detecting changes between satellite images from different dates. Imagery comes from Sentinel-2 which contains varying resolutions per band.

Dataset format:

  • images are 13-channel tifs

  • masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

  1. no change

  2. change

If you use this dataset in your research, please cite the following paper:

New in version 0.2.

__init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]

Initialize a new OSCD dataset instance.

Parameters:
  • root (str) – root directory where dataset can be found

  • split (str) – one of “train” or “test”

  • transforms (collections.abc.Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__getitem__(index)[source]

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, torch.Tensor]

__len__()[source]

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(