torchgeo.datasets¶
In torchgeo
, we define two types of datasets: Geospatial Datasets and Non-geospatial Datasets. These abstract base classes are documented in more detail in Base Classes.
Geospatial Datasets¶
GeoDataset
is designed for datasets that contain geospatial information, like latitude, longitude, coordinate system, and projection. Datasets containing this kind of information can be combined using IntersectionDataset
and UnionDataset
.
Dataset |
Type |
Source |
Size (px) |
Resolution (m) |
---|---|---|---|---|
Masks |
Landsat, LiDAR |
40,000x40,000 |
30 |
|
Masks |
Aster |
3,601x3,601 |
30 |
|
Geometries |
Bing Imagery |
|||
Imagery, Masks |
NAIP |
1 |
||
Masks |
Remote Sensing, In Situ Measurements |
3 |
||
Masks |
Landsat |
30 |
||
Points |
Citizen Scientists |
|||
Imagery, Masks |
NAIP, NLCD, OpenStreetMap |
1 |
||
Masks |
Sentinel-2 |
10 |
||
Masks |
Aster, SRTM, Russian Topomaps |
25 |
||
Points |
Citizen Scientists |
|||
Masks |
Landsat |
45,000x45,000 |
100 |
|
Points |
Citizen Scientists |
|||
Imagery, Masks |
Landsat |
8,400x7,500 |
15, 30 |
|
Imagery, Masks |
Landsat |
8,900x8,900 |
15, 30 |
|
Imagery, Masks |
Aerial |
4,200–9,500 |
0.25–0.5 |
|
Imagery |
Landsat |
8,900x8,900 |
30 |
|
Imagery |
Aerial |
6,100x7,600 |
1 |
|
Masks |
Landsat |
30 |
||
Geometries |
Maxar, CNES/Airbus |
|||
Imagery |
Sentinel |
10,000x10,000 |
10 |
Aboveground Woody Biomass¶
- class torchgeo.datasets.AbovegroundLiveWoodyBiomassDensity(paths='data', crs=None, res=None, transforms=None, download=False, cache=True)[source]¶
Bases:
RasterDataset
Aboveground Live Woody Biomass Density dataset.
The Aboveground Live Woody Biomass Density dataset is a global-scale, wall-to-wall map of aboveground biomass at ~30m resolution for the year 2000.
Dataset features:
Masks with per pixel live woody biomass density estimates in megagrams biomass per hectare at ~30m resolution (~40,000x40,0000 px)
Dataset format:
geojson file that contains download links to tif files
single-channel geotiffs with the pixel values representing biomass density
If you use this dataset in your research, please give credit to:
New in version 0.3.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- filename_glob = '*N_*E.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '^\n (?P<latitude>[0-9][0-9][A-Z])_\n (?P<longitude>[0-9][0-9][0-9][A-Z])*\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(paths='data', crs=None, res=None, transforms=None, download=False, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises:
FileNotFoundError – if no files are found in
paths
Changed in version 0.5: root was renamed to paths.
Aster Global DEM¶
- class torchgeo.datasets.AsterGDEM(paths='data', crs=None, res=None, transforms=None, cache=True)[source]¶
Bases:
RasterDataset
Aster Global Digital Elevation Model Dataset.
The Aster Global Digital Elevation Model dataset is a Digital Elevation Model (DEM) on a global scale. The dataset can be downloaded from the Earth Data website after making an account.
Dataset features:
DEMs at 30 m per pixel spatial resolution (3601x3601 px)
data collected from the Aster instrument
Dataset format:
DEMs are single-channel tif files
New in version 0.3.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- filename_glob = 'ASTGTMV003_*_dem*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '\n (?P<name>[ASTGTMV003]{10})\n _(?P<id>[A-Z0-9]{7})\n _(?P<data>[a-z]{3})*\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(paths='data', crs=None, res=None, transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, list[str]]) – one or more root directories to search or files to load, here the collection of individual zip files for each tile should be found
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises:
FileNotFoundError – if no files are found in
paths
RuntimeError – if dataset is missing
Changed in version 0.5: root was renamed to paths.
Canadian Building Footprints¶
- class torchgeo.datasets.CanadianBuildingFootprints(paths='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)[source]¶
Bases:
VectorDataset
Canadian Building Footprints dataset.
The Canadian Building Footprints dataset contains 11,842,186 computer generated building footprints in all Canadian provinces and territories in GeoJSON format. This data is freely available for download and use.
- __init__(paths='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
and data is not found, orchecksum=True
and checksums don’t match
Changed in version 0.5: root was renamed to paths.
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, it is possible to show subplot titles and/or use a custom suptitle.
Chesapeake Land Cover¶
- class torchgeo.datasets.Chesapeake(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
RasterDataset
,ABC
Abstract base class for all Chesapeake datasets.
Chesapeake Bay High-Resolution Land Cover Project dataset.
This dataset was collected by the Chesapeake Conservancy’s Conservation Innovation Center (CIC) in partnership with the University of Vermont and WorldView Solutions, Inc. It consists of one-meter resolution land cover information for the Chesapeake Bay watershed (~100,000 square miles of land).
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 0), 1: (0, 197, 255, 255), 2: (0, 168, 132, 255), 3: (38, 115, 0, 255), 4: (76, 230, 0, 255), 5: (163, 255, 115, 255), 6: (255, 170, 0, 255), 7: (255, 0, 0, 255), 8: (156, 156, 156, 255), 9: (0, 0, 0, 255), 10: (115, 115, 0, 255), 11: (230, 230, 0, 255), 12: (255, 255, 115, 255), 13: (197, 0, 255, 255)}¶
Color map for the dataset, used for plotting
- __init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
paths
RuntimeError – if
download=False
but dataset is missing or checksum fails
Changed in version 0.5: root was renamed to paths.
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.
- class torchgeo.datasets.Chesapeake7(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
Complete 7-class dataset.
This version of the dataset is composed of 7 classes:
No Data: Background values
Water: All areas of open water including ponds, rivers, and lakes
Tree Canopy and Shrubs: All woody vegetation including trees and shrubs
Low Vegetation: Plant material less than 2 meters in height including lawns
Barren: Areas devoid of vegetation consisting of natural earthen material
Impervious Surfaces: Human-constructed surfaces less than 2 meters in height
Impervious Roads: Impervious surfaces that are used for transportation
Aberdeen Proving Ground: U.S. Army facility with no labels
- filename_glob = 'Baywide_7class_20132014.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Chesapeake13(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
Complete 13-class dataset.
This version of the dataset is composed of 13 classes:
No Data: Background values
Water: All areas of open water including ponds, rivers, and lakes
Wetlands: Low vegetation areas located along marine or estuarine regions
Tree Canopy: Deciduous and evergreen woody vegetation over 3-5 meters in height
Shrubland: Heterogeneous woody vegetation including shrubs and young trees
Low Vegetation: Plant material less than 2 meters in height including lawns
Barren: Areas devoid of vegetation consisting of natural earthen material
Structures: Human-constructed objects made of impervious materials
Impervious Surfaces: Human-constructed surfaces less than 2 meters in height
Impervious Roads: Impervious surfaces that are used for transportation
Tree Canopy over Structures: Tree cover overlapping impervious structures
Tree Canopy over Impervious Surfaces: Tree cover overlapping impervious surfaces
Tree Canopy over Impervious Roads: Tree cover overlapping impervious roads
Aberdeen Proving Ground: U.S. Army facility with no labels
- filename_glob = 'Baywide_13Class_20132014.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeDC(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for Washington, D.C.
- filename_glob = 'DC_11001/DC_11001.img'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeDE(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for Delaware.
- filename_glob = 'DE_STATEWIDE.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeMD(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for Maryland.
Note
This dataset requires the following additional library to be installed:
zipfile-deflate64 to extract the proprietary deflate64 compressed zip file.
- filename_glob = 'MD_STATEWIDE.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeNY(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for New York.
Note
This dataset requires the following additional library to be installed:
zipfile-deflate64 to extract the proprietary deflate64 compressed zip file.
- filename_glob = 'NY_STATEWIDE.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakePA(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for Pennsylvania.
- filename_glob = 'PA_STATEWIDE.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeVA(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for Virginia.
Note
This dataset requires the following additional library to be installed:
zipfile-deflate64 to extract the proprietary deflate64 compressed zip file.
- filename_glob = 'CIC2014_VA_STATEWIDE.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeWV(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
Chesapeake
This subset of the dataset contains data only for West Virginia.
- filename_glob = 'WV_STATEWIDE.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.ChesapeakeCVPR(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
GeoDataset
CVPR 2019 Chesapeake Land Cover dataset.
The CVPR 2019 Chesapeake Land Cover dataset contains two layers of NAIP aerial imagery, Landsat 8 leaf-on and leaf-off imagery, Chesapeake Bay land cover labels, NLCD land cover labels, and Microsoft building footprint labels.
This dataset was organized to accompany the 2019 CVPR paper, “Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data”.
The paper “Resolving label uncertainty with implicit generative models” added an additional layer of data to this dataset containing a prior over the Chesapeake Bay land cover classes generated from the NLCD land cover labels. For more information about this layer see the dataset documentation.
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”
layers (Sequence[str]) – a list containing a subset of “naip-new”, “naip-old”, “lc”, “nlcd”, “landsat-leaf-on”, “landsat-leaf-off”, “buildings”, or “prior_from_cooccurrences_101_31_no_osm_no_buildings” indicating which layers to load
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
but dataset is missing or checksum failsAssertionError – if
splits
orlayers
are not valid
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image/mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.4.
Global Mangrove Distribution¶
- class torchgeo.datasets.CMSGlobalMangroveCanopy(paths='data', crs=None, res=None, measurement='agb', country='AndamanAndNicobar', transforms=None, cache=True, checksum=False)[source]¶
Bases:
RasterDataset
CMS Global Mangrove Canopy dataset.
The CMS Global Mangrove Canopy dataset consists of a single band map at 30m resolution of either aboveground biomass (agb), basal area weighted height (hba95), or maximum canopy height (hmax95).
The dataset needs to be manually dowloaded from the above link, where you can make an account and subsequently download the dataset.
New in version 0.3.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- filename_regex = '^\n (?P<mangrove>[A-Za-z]{8})\n _(?P<variable>[a-z0-9]*)\n _(?P<country>[A-Za-z][^.]*)\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(paths='data', crs=None, res=None, measurement='agb', country='AndamanAndNicobar', transforms=None, cache=True, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, list[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
measurement (str) – which of the three measurements, ‘agb’, ‘hba95’, or ‘hmax95’
country (str) – country for which to retrieve data
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
paths
RuntimeError – if dataset is missing or checksum fails
AssertionError – if country or measurement arg are not str or invalid
Changed in version 0.5: root was renamed to paths.
Cropland Data Layer¶
- class torchgeo.datasets.CDL(paths='data', crs=None, res=None, years=[2022], classes=[0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 75, 76, 77, 81, 82, 83, 87, 88, 92, 111, 112, 121, 122, 123, 124, 131, 141, 142, 143, 152, 176, 190, 195, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 254], transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
RasterDataset
Cropland Data Layer (CDL) dataset.
The Cropland Data Layer, hosted on CropScape, provides a raster, geo-referenced, crop-specific land cover map for the continental United States. The CDL also includes a crop mask layer and planting frequency layers, as well as boundary, water and road layers. The Boundary Layer options provided are County, Agricultural Statistics Districts (ASD), State, and Region. The data is created annually using moderate resolution satellite imagery and extensive agricultural ground truth.
The dataset contains 134 classes, for a description of the classes see the xls file at the top of this page.
If you use this dataset in your research, please cite it using the following format:
- filename_glob = '*_30m_cdls.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '\n ^(?P<date>\\d+)\n _30m_cdls\\..*$\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- date_format = '%Y'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 255), 1: (255, 211, 0, 255), 2: (255, 37, 37, 255), 3: (0, 168, 226, 255), 4: (255, 158, 9, 255), 5: (37, 111, 0, 255), 6: (255, 255, 0, 255), 10: (111, 166, 0, 255), 11: (0, 175, 73, 255), 12: (222, 166, 9, 255), 13: (222, 166, 9, 255), 14: (124, 211, 255, 255), 21: (226, 0, 124, 255), 22: (137, 96, 83, 255), 23: (217, 181, 107, 255), 24: (166, 111, 0, 255), 25: (213, 158, 188, 255), 26: (111, 111, 0, 255), 27: (171, 0, 124, 255), 28: (160, 88, 137, 255), 29: (111, 0, 73, 255), 30: (213, 158, 188, 255), 31: (209, 255, 0, 255), 32: (124, 153, 255, 255), 33: (213, 213, 0, 255), 34: (209, 255, 0, 255), 35: (0, 175, 73, 255), 36: (255, 166, 226, 255), 37: (166, 241, 139, 255), 38: (0, 175, 73, 255), 39: (213, 158, 188, 255), 41: (168, 0, 226, 255), 42: (166, 0, 0, 255), 43: (111, 37, 0, 255), 44: (0, 175, 73, 255), 45: (175, 124, 255, 255), 46: (111, 37, 0, 255), 47: (255, 102, 102, 255), 48: (255, 102, 102, 255), 49: (255, 204, 102, 255), 50: (255, 102, 102, 255), 51: (0, 175, 73, 255), 52: (0, 222, 175, 255), 53: (83, 255, 0, 255), 54: (241, 162, 120, 255), 55: (255, 102, 102, 255), 56: (0, 175, 73, 255), 57: (124, 211, 255, 255), 58: (232, 190, 255, 255), 59: (175, 255, 222, 255), 60: (0, 175, 73, 255), 61: (190, 190, 120, 255), 63: (147, 204, 147, 255), 64: (198, 213, 158, 255), 65: (204, 190, 162, 255), 66: (255, 0, 255, 255), 67: (255, 143, 171, 255), 68: (185, 0, 79, 255), 69: (111, 69, 137, 255), 70: (0, 120, 120, 255), 71: (175, 153, 111, 255), 72: (255, 255, 124, 255), 74: (181, 111, 92, 255), 75: (0, 166, 130, 255), 76: (232, 213, 175, 255), 77: (175, 153, 111, 255), 81: (241, 241, 241, 255), 82: (153, 153, 153, 255), 83: (73, 111, 162, 255), 87: (124, 175, 175, 255), 88: (232, 255, 190, 255), 92: (0, 255, 255, 255), 111: (73, 111, 162, 255), 112: (211, 226, 249, 255), 121: (153, 153, 153, 255), 122: (153, 153, 153, 255), 123: (153, 153, 153, 255), 124: (153, 153, 153, 255), 131: (204, 190, 162, 255), 141: (147, 204, 147, 255), 142: (147, 204, 147, 255), 143: (147, 204, 147, 255), 152: (198, 213, 158, 255), 176: (232, 255, 190, 255), 190: (124, 175, 175, 255), 195: (124, 175, 175, 255), 204: (0, 255, 139, 255), 205: (213, 158, 188, 255), 206: (255, 102, 102, 255), 207: (255, 102, 102, 255), 208: (255, 102, 102, 255), 209: (255, 102, 102, 255), 210: (255, 143, 171, 255), 211: (51, 73, 51, 255), 212: (226, 111, 37, 255), 213: (255, 102, 102, 255), 214: (255, 102, 102, 255), 215: (102, 153, 77, 255), 216: (255, 102, 102, 255), 217: (175, 153, 111, 255), 218: (255, 143, 171, 255), 219: (255, 102, 102, 255), 220: (255, 143, 171, 255), 221: (255, 102, 102, 255), 222: (255, 102, 102, 255), 223: (255, 143, 171, 255), 224: (0, 175, 73, 255), 225: (255, 211, 0, 255), 226: (255, 211, 0, 255), 227: (255, 102, 102, 255), 228: (255, 211, 0, 255), 229: (255, 102, 102, 255), 230: (137, 96, 83, 255), 231: (255, 102, 102, 255), 232: (255, 37, 37, 255), 233: (226, 0, 124, 255), 234: (255, 158, 9, 255), 235: (255, 158, 9, 255), 236: (166, 111, 0, 255), 237: (255, 211, 0, 255), 238: (166, 111, 0, 255), 239: (37, 111, 0, 255), 240: (37, 111, 0, 255), 241: (255, 211, 0, 255), 242: (0, 0, 153, 255), 243: (255, 102, 102, 255), 244: (255, 102, 102, 255), 245: (255, 102, 102, 255), 246: (255, 102, 102, 255), 247: (255, 102, 102, 255), 248: (255, 102, 102, 255), 249: (255, 102, 102, 255), 250: (255, 102, 102, 255), 254: (37, 111, 0, 255)}¶
Color map for the dataset, used for plotting
- __init__(paths='data', crs=None, res=None, years=[2022], classes=[0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 75, 76, 77, 81, 82, 83, 87, 88, 92, 111, 112, 121, 122, 123, 124, 131, 141, 142, 143, 152, 176, 190, 195, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 254], transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
years (list[int]) – list of years for which to use cdl layer
classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
years
orclasses
are invalidFileNotFoundError – if no files are found in
paths
RuntimeError – if
download=False
but dataset is missing or checksum fails
New in version 0.5: The years and classes parameters.
Changed in version 0.5: root was renamed to paths.
- __getitem__(query)[source]¶
Retrieve mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.
EDDMapS¶
- class torchgeo.datasets.EDDMapS(root='data')[source]¶
Bases:
GeoDataset
Dataset for EDDMapS.
EDDMapS, Early Detection and Distribution Mapping System, is a web-based mapping system for documenting invasive species and pest distribution. Launched in 2005 by the Center for Invasive Species and Ecosystem Health at the University of Georgia, it was originally designed as a tool for state Exotic Pest Plant Councils to develop more complete distribution data of invasive species. Since then, the program has expanded to include the entire US and Canada as well as to document certain native pest species.
EDDMapS query results can be downloaded in CSV, KML, or Shapefile format. This dataset currently only supports CSV files.
If you use an EDDMapS dataset in your research, please cite it like so:
EDDMapS. YEAR. Early Detection & Distribution Mapping System. The University of Georgia - Center for Invasive Species and Ecosystem Health. Available online at https://www.eddmaps.org/; last accessed DATE.
New in version 0.3.
- __init__(root='data')[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
- Raises:
FileNotFoundError – if no files are found in
root
- __getitem__(query)[source]¶
Retrieve metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
EnviroAtlas¶
- class torchgeo.datasets.EnviroAtlas(root='data', splits=['pittsburgh_pa-2010_1m-train'], layers=['naip', 'prior'], transforms=None, prior_as_input=False, cache=True, download=False, checksum=False)[source]¶
Bases:
GeoDataset
EnviroAtlas dataset covering four cities with prior and weak input data layers.
The EnviroAtlas dataset contains NAIP aerial imagery, NLCD land cover labels, OpenStreetMap roads, water, waterways, and waterbodies, Microsoft building footprint labels, high-resolution land cover labels from the EPA EnviroAtlas dataset, and high-resolution land cover prior layers.
This dataset was organized to accompany the 2022 paper, “Resolving label uncertainty with implicit generative models”. More details can be found at https://github.com/estherrolf/implicit-posterior.
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', splits=['pittsburgh_pa-2010_1m-train'], layers=['naip', 'prior'], transforms=None, prior_as_input=False, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”
layers (Sequence[str]) – a list containing a subset of
valid_layers
indicating which layers to loadtransforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
prior_as_input (bool) – bool describing whether the prior is used as an input (True) or as supervision (False)
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
but dataset is missing or checksum failsAssertionError – if
splits
orlayers
are not valid
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image/mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
Note: only plots the “naip” and “lc” layers.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if the NAIP layer isn’t included in
self.layers
- Return type:
Esri2020¶
- class torchgeo.datasets.Esri2020(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
RasterDataset
Esri 2020 Land Cover Dataset.
The Esri 2020 Land Cover dataset consists of a global single band land use/land cover map derived from ESA Sentinel-2 imagery at 10m resolution with a total of 10 classes. It was published in July 2021 and used the Universal Transverse Mercator (UTM) projection. This dataset only contains labels, no raw satellite imagery.
The 10 classes are:
No Data
Water
Trees
Grass
Flooded Vegetation
Crops
Scrub/Shrub
Built Area
Bare Ground
Snow/Ice
Clouds
A more detailed explanation of the invidual classes can be found here.
If you use this dataset please cite the following paper:
New in version 0.3.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- filename_glob = '*_20200101-20210101.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '^\n (?P<id>[0-9][0-9][A-Z])\n _(?P<date>\\d{8})\n -(?P<processing_date>\\d{8})\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
paths
RuntimeError – if
download=False
but dataset is missing or checksum fails
Changed in version 0.5: root was renamed to paths.
EU-DEM¶
- class torchgeo.datasets.EUDEM(paths='data', crs=None, res=None, transforms=None, cache=True, checksum=False)[source]¶
Bases:
RasterDataset
European Digital Elevation Model (EU-DEM) Dataset.
The EU-DEM dataset is a Digital Elevation Model of reference for the entire European region. The dataset can be downloaded from this website after making an account. A dataset factsheet is available here.
Dataset features:
DEMs at 25 m per pixel spatial resolution (~40,000x40,0000 px)
vertical accuracy of +/- 7 m RMSE
data fused from ASTER GDEM, SRTM and Russian topomaps
Dataset format:
DEMs are single-channel tif files
If you use this dataset in your research, please give credit to:
New in version 0.3.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- filename_glob = 'eu_dem_v11_*.TIF'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '(?P<name>[eudem_v11]{10})_(?P<id>[A-Z0-9]{6})'¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(paths='data', crs=None, res=None, transforms=None, cache=True, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load, here the collection of individual zip files for each tile should be found
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
paths
Changed in version 0.5: root was renamed to paths.
GBIF¶
- class torchgeo.datasets.GBIF(root='data')[source]¶
Bases:
GeoDataset
Dataset for the Global Biodiversity Information Facility.
GBIF, the Global Biodiversity Information Facility, is an international network and data infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth.
This dataset is intended for use with GBIF’s occurrence records. It may or may not work for other GBIF datasets. Data for a particular species or region of interest can be downloaded from the above link.
If you use a GBIF dataset in your research, please cite it according to:
New in version 0.3.
- __init__(root='data')[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
- Raises:
FileNotFoundError – if no files are found in
root
- __getitem__(query)[source]¶
Retrieve metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
GlobBiomass¶
- class torchgeo.datasets.GlobBiomass(paths='data', crs=None, res=None, measurement='agb', transforms=None, cache=True, checksum=False)[source]¶
Bases:
RasterDataset
GlobBiomass dataset.
The GlobBiomass dataset consists of global pixel wise aboveground biomass (AGB) and growth stock volume (GSV) maps.
Dataset features:
estimates of AGB and GSV around the world at ~100m per pixel resolution (45,000x45,0000 px)
standard error maps of respective measurement at same resolution
Dataset format:
estimate maps are single-channel
standard error maps are single-channel
The data can be manually downloaded from this website.
If you use this dataset please cite it with the following citation:
Santoro, M. et al. (2018): GlobBiomass - global datasets of forest biomass. PANGAEA, https://doi.org/10.1594/PANGAEA.894711
New in version 0.3.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- filename_regex = '^\n (?P<tile>[0-9A-Z]*)\n _(?P<measurement>[a-z]{3})\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(paths='data', crs=None, res=None, measurement='agb', transforms=None, cache=True, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
measurement (str) – use data from ‘agb’ or ‘gsv’ measurement
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
paths
RuntimeError – if dataset is missing or checksum fails
AssertionError – if measurement argument is invalid, or not a str
Changed in version 0.5: root was renamed to paths.
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample at index consisting of measurement mask with 2 channels, where the first is the measurement and the second the error map
- Raises:
IndexError – if query is not found in the index
- Return type:
iNaturalist¶
- class torchgeo.datasets.INaturalist(root='data')[source]¶
Bases:
GeoDataset
Dataset for iNaturalist.
iNaturalist is a joint initiative of the California Academy of Sciences and the National Geographic Society. It allows citizen scientists to upload observations of organisms that can be downloaded by scientists and researchers.
If you use an iNaturalist dataset in your research, please cite it according to:
New in version 0.3.
- __init__(root='data')[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
- Raises:
FileNotFoundError – if no files are found in
root
- __getitem__(query)[source]¶
Retrieve metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
L7 Irish¶
- class torchgeo.datasets.L7Irish(paths='data', crs=CRS.from_epsg(3857), res=None, bands=['B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
RasterDataset
L7 Irish dataset.
The L7 Irish dataset is based on Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1G scenes. Manually generated cloud masks are used to train and validate cloud cover assessment algorithms, which in turn are intended to compute the percentage of cloud cover in each scene.
Dataset features:
Images divided between 9 unique biomes
206 scenes from Landsat 7 ETM+ sensor
Imagery from global tiles between June 2000–December 2001
9 Level-1 spectral bands with 30 m per pixel resolution
Dataset format:
Images are composed of single multiband geotiffs
Labels are multiclass, stored in single geotiffs
Level-1 metadata (MTL.txt file)
Landsat 7 ETM+ bands: (B10, B20, B30, B40, B50, B61, B62, B70, B80)
Dataset classes:
Fill
Cloud Shadow
Clear
Thin Cloud
Cloud
If you use this dataset in your research, please cite the following:
New in version 0.5.
- filename_glob = 'L71*.TIF'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '\n ^L71\n (?P<wrs_path>\\d{3})\n (?P<wrs_row>\\d{3})\n _(?P=wrs_row)\n (?P<date>\\d{8})\n \\.TIF$\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- date_format = '%Y%m%d'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- separate_files = False¶
True if data is stored in a separate file for each band, else False.
- all_bands: list[str] = ['B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80']¶
Names of all available bands in the dataset
- __init__(paths='data', crs=CRS.from_epsg(3857), res=None, bands=['B10', 'B20', 'B30', 'B40', 'B50', 'B61', 'B62', 'B70', 'B80'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new L7Irish instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to EPSG:3857)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image, mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
L8 Biome¶
- class torchgeo.datasets.L8Biome(paths, crs=CRS.from_epsg(3857), res=None, bands=['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
RasterDataset
L8 Biome dataset.
The L8 Biome dataset is a validation dataset for cloud cover assessment algorithms, consisting of Pre-Collection Landsat 8 Operational Land Imager (OLI) Thermal Infrared Sensor (TIRS) terrain-corrected (Level-1T) scenes.
Dataset features:
Images evenly divided between 8 unique biomes
96 scenes from Landsat 8 OLI/TIRS sensors
Imagery from global tiles between April 2013–October 2014
11 Level-1 spectral bands with 30 m per pixel resolution
Dataset format:
Images are composed of single multiband geotiffs
Labels are multiclass, stored in single geotiffs
Quality assurance bands, stored in single geotiffs
Level-1 metadata (MTL.txt file)
Landsat 8 OLI/TIRS bands: (B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, B11)
Dataset classes:
Fill
Cloud Shadow
Clear
Thin Cloud
Cloud
If you use this dataset in your research, please cite the following:
New in version 0.5.
- filename_glob = 'LC8*.TIF'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '\n ^LC8\n (?P<wrs_path>\\d{3})\n (?P<wrs_row>\\d{3})\n (?P<date>\\d{7})\n (?P<gsi>[A-Z]{3})\n (?P<version>\\d{2})\n \\.TIF$\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- date_format = '%Y%j'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- separate_files = False¶
True if data is stored in a separate file for each band, else False.
- all_bands: list[str] = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11']¶
Names of all available bands in the dataset
- __init__(paths, crs=CRS.from_epsg(3857), res=None, bands=['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new L8Biome instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to EPSG:3857)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image, mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
LandCover.ai Geo¶
- class torchgeo.datasets.LandCoverAIBase(root='data', download=False, checksum=False)[source]¶
Bases:
Dataset
[dict
[str
,Any
]],ABC
Abstract base class for LandCover.ai Geo and NonGeo datasets.
The LandCover.ai (Land Cover from Aerial Imagery) dataset is a dataset for automatic mapping of buildings, woodlands, water and roads from aerial images. This implementation is specifically for Version 1 of LandCover.ai.
Dataset features:
land cover from Poland, Central Europe
three spectral bands - RGB
33 orthophotos with 25 cm per pixel resolution (~9000x9500 px)
8 orthophotos with 50 cm per pixel resolution (~4200x4700 px)
total area of 216.27 km2
Dataset format:
rasters are three-channel GeoTiffs with EPSG:2180 spatial reference system
masks are single-channel GeoTiffs with EPSG:2180 spatial reference system
Dataset classes:
building (1.85 km2)
woodland (72.02 km2)
water (13.15 km2)
road (3.5 km2)
If you use this dataset in your research, please cite the following paper:
New in version 0.5.
- __init__(root='data', download=False, checksum=False)[source]¶
Initialize a new LandCover.ai dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms – a function/transform that takes input sample and its target as entry and returns a transformed version
cache – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- abstract __getitem__(query)[source]¶
Retrieve image, mask and metadata indexed by index.
- Parameters:
query (Any) – coordinates or an index
- Returns:
sample of image, mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
- class torchgeo.datasets.LandCoverAIGeo(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
LandCoverAIBase
,RasterDataset
LandCover.ai Geo dataset.
See the abstract LandCoverAIBase class to find out more.
New in version 0.5.
- filename_glob = 'images/*.tif'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '.*tif'¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- __init__(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new LandCover.ai NonGeo dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image, mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
Landsat¶
- class torchgeo.datasets.Landsat(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
RasterDataset
,ABC
Abstract base class for all Landsat datasets.
Landsat is a joint NASA/USGS program, providing the longest continuous space-based record of Earth’s land in existence.
If you use this dataset in your research, please cite it using the following format:
If you use any of the following Level-2 products, there may be additional citation requirements, including papers you can cite. See the “Citation Information” section of the following pages:
- filename_regex = '\n ^L\n (?P<sensor>[COTEM])\n (?P<satellite>\\d{2})\n _(?P<processing_correction_level>[A-Z0-9]{4})\n _(?P<wrs_path>\\d{3})\n (?P<wrs_row>\\d{3})\n _(?P<date>\\d{8})\n _(?P<processing_date>\\d{8})\n _(?P<collection_number>\\d{2})\n _(?P<collection_category>[A-Z0-9]{2})\n _(?P<band>[A-Z0-9_]+)\n \\.\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- separate_files = True¶
True if data is stored in a separate file for each band, else False.
- __init__(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Optional[Sequence[str]]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises:
FileNotFoundError – if no files are found in
paths
Changed in version 0.5: root was renamed to paths.
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if the RGB bands are not included in
self.bands
- Return type:
Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.
- class torchgeo.datasets.Landsat9(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat8
Landsat 9 Operational Land Imager (OLI-2) and Thermal Infrared Sensor (TIRS-2).
- filename_glob = 'LC09_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat8(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat
Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS).
- filename_glob = 'LC08_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat7(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat
Landsat 7 Enhanced Thematic Mapper Plus (ETM+).
- filename_glob = 'LE07_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat5TM(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat4TM
Landsat 5 Thematic Mapper (TM).
- filename_glob = 'LT05_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat5MSS(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat4MSS
Landsat 4 Multispectral Scanner (MSS).
- filename_glob = 'LM04_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat4TM(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat
Landsat 4 Thematic Mapper (TM).
- filename_glob = 'LT04_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat4MSS(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat
Landsat 4 Multispectral Scanner (MSS).
- filename_glob = 'LM04_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat3(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat1
Landsat 3 Multispectral Scanner (MSS).
- filename_glob = 'LM03_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat2(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat1
Landsat 2 Multispectral Scanner (MSS).
- filename_glob = 'LM02_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Landsat1(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
Landsat
Landsat 1 Multispectral Scanner (MSS).
- filename_glob = 'LM01_*_{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
NAIP¶
- class torchgeo.datasets.NAIP(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
RasterDataset
National Agriculture Imagery Program (NAIP) dataset.
The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. A primary goal of the NAIP program is to make digital ortho photography available to governmental agencies and the public within a year of acquisition.
NAIP is administered by the USDA’s Farm Service Agency (FSA) through the Aerial Photography Field Office in Salt Lake City. This “leaf-on” imagery is used as a base layer for GIS programs in FSA’s County Service Centers, and is used to maintain the Common Land Unit (CLU) boundaries.
If you use this dataset in your research, please cite it using the following format:
- filename_glob = 'm_*.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '\n ^m\n _(?P<quadrangle>\\d+)\n _(?P<quarter_quad>[a-z]+)\n _(?P<utm_zone>\\d+)\n _(?P<resolution>\\d+)\n _(?P<date>\\d+)\n (?:_(?P<processing_date>\\d+))?\n \\..*$\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.
NLCD¶
- class torchgeo.datasets.NLCD(paths='data', crs=None, res=None, years=[2019], classes=[0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95], transforms=None, cache=True, download=False, checksum=False)[source]¶
Bases:
RasterDataset
National Land Cover Database (NLCD) dataset.
The NLCD dataset is a land cover product that covers the United States and Puerto Rico. The current implementation supports maps for the continental United States only. The product is a joint effort between the United States Geological Survey (USGS) and the Multi-Resolution Land Characteristics Consortium (MRLC) which released the first product in 2001 with new updates every five years since then.
The dataset contains the following 17 classes:
Background
Open Water
Perennial Ice/Snow
Developed, Open Space
Developed, Low Intensity
Developed, Medium Intensity
Developed, High Intensity
Barren Land (Rock/Sand/Clay)
Deciduous Forest
Evergreen Forest
Mixed Forest
Shrub/Scrub
Grassland/Herbaceous
Pasture/Hay
Cultivated Crops
Woody Wetlands
Emergent Herbaceous Wetlands
Detailed descriptions of the classes can be found here.
Dataset format:
single channel .img file with integer class labels
If you use this dataset in your research, please use the corresponding citation:
New in version 0.5.
- filename_glob = 'nlcd_*_land_cover_l48_*.img'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = 'nlcd_(?P<date>\\d{4})_land_cover_l48_(?P<publication_date>\\d{8})\\.img'¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- date_format = '%Y'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- is_image = False¶
True if dataset contains imagery, False if dataset contains mask
- cmap: dict[int, tuple[int, int, int, int]] = {0: (0, 0, 0, 0), 11: (70, 107, 159, 255), 12: (209, 222, 248, 255), 21: (222, 197, 197, 255), 22: (217, 146, 130, 255), 23: (235, 0, 0, 255), 24: (171, 0, 0, 255), 31: (179, 172, 159, 255), 41: (104, 171, 95, 255), 42: (28, 95, 44, 255), 43: (181, 197, 143, 255), 52: (204, 184, 121, 255), 71: (223, 223, 194, 255), 81: (220, 217, 57, 255), 82: (171, 108, 40, 255), 90: (184, 217, 235, 255), 95: (108, 159, 184, 255)}¶
Color map for the dataset, used for plotting
- __init__(paths='data', crs=None, res=None, years=[2019], classes=[0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95], transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
years (list[int]) – list of years for which to use nlcd layer
classes (list[int]) – list of classes to include, the rest will be mapped to 0 (defaults to all classes)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)
- Raises:
AssertionError – if
years
orclasses
are invalidFileNotFoundError – if no files are found in
paths
RuntimeError – if
download=False
but dataset is missing or checksum fails
- __getitem__(query)[source]¶
Retrieve mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
Open Buildings¶
- class torchgeo.datasets.OpenBuildings(paths='data', crs=None, res=0.0001, transforms=None, checksum=False)[source]¶
Bases:
VectorDataset
Open Buildings dataset.
The Open Buildings dataset consists of computer generated building detections across the African continent.
Dataset features:
516M building detections as polygons with centroid lat/long
covering area of 19.4M km2 (64% of the African continent)
confidence score and Plus Code
Dataset format:
csv files containing building detections compressed as csv.gz
meta data geojson file
The data can be downloaded from here. Additionally, the meta data geometry file also needs to be placed in root as tiles.geojson.
If you use this dataset in your research, please cite the following technical report:
New in version 0.3.
- filename_glob = '*_buildings.csv'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- __init__(paths='data', crs=None, res=0.0001, transforms=None, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
FileNotFoundError – if no files are found in
root
Changed in version 0.5: root was renamed to paths.
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image/mask and metadata for the given query. If there are not matching shapes found within the query, an empty raster is returned
- Raises:
IndexError – if query is not found in the index
- Return type:
Sentinel¶
- class torchgeo.datasets.Sentinel(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
RasterDataset
Abstract base class for all Sentinel datasets.
Sentinel is a family of satellites launched by the European Space Agency (ESA) under the Copernicus Programme.
If you use this dataset in your research, please cite it using the following format:
- class torchgeo.datasets.Sentinel1(paths='data', crs=None, res=10, bands=['VV', 'VH'], transforms=None, cache=True)[source]¶
Bases:
Sentinel
Sentinel-1 dataset.
The Sentinel-1 mission comprises a constellation of two polar-orbiting satellites, operating day and night performing C-band synthetic aperture radar imaging, enabling them to acquire imagery regardless of the weather.
Data can be downloaded from:
Product Types:
Level-0: Raw (RAW)
Level-1: Single Look Complex (SLC)
Level-1: Ground Range Detected (GRD)
Level-2: Ocean (OCN)
Polarizations:
HH: horizontal transmit, horizontal receive
HV: horizontal transmit, vertical receive
VV: vertical transmit, vertical receive
VH: vertical transmit, horizontal receive
Acquisition Modes:
Note
At the moment, this dataset only supports the GRD product type. Data must be radiometrically terrain corrected (RTC). This can be done manually using a DEM, or you can download an On Demand RTC product from ASF DAAC.
Note
Mixing \(\gamma_0\) and \(\sigma_0\) backscatter coefficient data is not recommended. Similarly, power, decibel, and amplitude scale data should not be mixed, and TorchGeo does not attempt to convert all data to a common scale.
New in version 0.4.
- filename_regex = '\n ^S1(?P<mission>[AB])\n _(?P<mode>SM|IW|EW|WV)\n _(?P<date>\\d{8}T\\d{6})\n _(?P<polarization>[DS][HV])\n (?P<orbit>[PRO])\n _RTC(?P<spacing>\\d{2})\n _(?P<package>G)\n _(?P<backscatter>[gs])\n (?P<scale>[pda])\n (?P<mask>[uw])\n (?P<filter>[nf])\n (?P<area>[ec])\n (?P<matching>[dm])\n _(?P<product>[0-9A-Z]{4})\n _(?P<band>[VH]{2})\n \\.\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- date_format = '%Y%m%dT%H%M%S'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- separate_files = True¶
True if data is stored in a separate file for each band, else False.
- __init__(paths='data', crs=None, res=10, bands=['VV', 'VH'], transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, list[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to [“VV”, “VH”])
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises:
AssertionError – if
bands
is invalidFileNotFoundError – if no files are found in
paths
Changed in version 0.5: root was renamed to paths.
- filename_glob = 'S1*{}.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- class torchgeo.datasets.Sentinel2(paths='data', crs=None, res=10, bands=None, transforms=None, cache=True)[source]¶
Bases:
Sentinel
Sentinel-2 dataset.
The Copernicus Sentinel-2 mission comprises a constellation of two polar-orbiting satellites placed in the same sun-synchronous orbit, phased at 180° to each other. It aims at monitoring variability in land surface conditions, and its wide swath width (290 km) and high revisit time (10 days at the equator with one satellite, and 5 days with 2 satellites under cloud-free conditions which results in 2-3 days at mid-latitudes) will support monitoring of Earth’s surface changes.
- date_format = '%Y%m%dT%H%M%S'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- all_bands: list[str] = ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12']¶
Names of all available bands in the dataset
- separate_files = True¶
True if data is stored in a separate file for each band, else False.
- __init__(paths='data', crs=None, res=10, bands=None, transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Optional[Sequence[str]]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises:
FileNotFoundError – if no files are found in
paths
Changed in version 0.5: root was renamed to paths
- filename_glob = 'T*_*_{}*.*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '\n ^T(?P<tile>\\d{{2}}[A-Z]{{3}})\n _(?P<date>\\d{{8}}T\\d{{6}})\n _(?P<band>B[018][\\dA])\n (?:_(?P<resolution>{}m))?\n \\..*$\n '¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if the RGB bands are not included in
self.bands
- Return type:
Changed in version 0.3: Method now takes a sample dict, not a Tensor. Additionally, possible to show subplot titles and/or use a custom suptitle.
Non-geospatial Datasets¶
NonGeoDataset
is designed for datasets that lack geospatial information. These datasets can still be combined using ConcatDataset
.
Dataset |
Task |
Source |
# Samples |
# Classes |
Size (px) |
Resolution (m) |
Bands |
---|---|---|---|---|---|---|---|
C |
Google Earth, Freesound |
5,075 |
13 |
512x512 |
0.5 |
RGB |
|
S |
Airbus Pléiades |
70 |
6 |
1,122x1,186 |
10 |
MSI |
|
C |
Sentinel-1/2 |
590,326 |
19–43 |
120x120 |
10 |
SAR, MSI |
|
R |
Sentinel-1/2 and Lidar |
256 |
10 |
SAR, MSI |
|||
S |
Sentinel-2 |
22,728 |
2 |
512x512 |
10 |
MSI |
|
C, R |
CSUAV AFRL, ISPRS, LINZ, AGRC |
388,435 |
2 |
256x256 |
0.15 |
RGB |
|
S |
Sentinel-2 |
4,688 |
7 |
3,035x2,016 |
10 |
MSI |
|
S |
DigitalGlobe +Vivid |
803 |
7 |
2,448x2,448 |
0.5 |
RGB |
|
S |
Aerial |
3,981 |
15 |
2,000x2,000 |
0.5 |
RGB |
|
S |
Sentinel-1 |
66,810 |
2 |
256x256 |
5–20 |
SAR |
|
C |
Sentinel-2 |
27,000 |
10 |
64x64 |
10 |
MSI |
|
OD |
Gaofen/Google Earth |
15,000 |
37 |
1,024x1,024 |
0.3–0.8 |
RGB |
|
C |
NAIP Aerial |
91,872 |
7 |
320x320 |
1 |
RGB |
|
OD |
Drone imagery |
1,543 |
4 |
1,500x1,500 |
RGB |
||
S |
Gaofen-2 |
150 |
15 |
6,800x7,200 |
3 |
RGB |
|
OD,C |
Aerial |
591 |
33 |
200x200 |
0.1–1 |
RGB |
|
S |
Aerial |
360 |
2 |
5,000x5,000 |
0.3 |
RGB |
|
S |
Aerial |
10,674 |
5 |
512x512 |
0.25–0.5 |
RGB |
|
CD |
Google Earth |
985 |
2 |
1,024x1,024 |
0.5 |
RGB |
|
S |
Google Earth |
5,987 |
7 |
1,024x1,024 |
0.3 |
RGB |
|
S |
Sentinel-1/2, ESA WorldCover, NOAA VIIRS DNB |
1018 |
1 |
1920x1920 |
10–463.83 |
SAR, MSI, 2020_Map, avg_rad |
|
C |
Google Earth |
1M |
51–73 |
0.5–153 |
RGB |
||
OD |
PlanetScope |
707 |
1 |
256x256 |
3 |
RGB |
|
CD |
Sentinel-2 |
24 |
2 |
40–1,180 |
60 |
MSI |
|
I |
Sentinel-1/2 |
2,433 |
19 |
128x128xT |
10 |
MSI |
|
C |
Google Earth |
30,400 |
38 |
256x256 |
0.06–5 |
RGB |
|
S |
Aerial |
38 |
6 |
6,000x6,000 |
0.05 |
MSI |
|
OD, R |
Aerial |
100 |
6 |
4,000x4,000 |
0.02 |
RGB |
|
C |
Google Earth |
31,500 |
45 |
256x256 |
0.2–30 |
RGB |
|
S |
Planetscope |
70 |
2 |
256x256 |
4.7 |
RGB + NIR |
|
T |
Sentinel-2 |
100K–1M |
264x264 |
10 |
MSI |
||
S |
Sentinel-2 |
1,759,830 |
33 |
120x120 |
10 |
MSI |
|
S |
Sentinel-1/2, MODIS |
180,662 |
33 |
256x256 |
10 |
SAR, MSI |
|
R |
Fish-eye |
363,375 |
64x64 |
RGB |
|||
C |
Sentinel-1/2 |
400,673 |
17 |
32x32 |
10 |
SAR, MSI |
|
I |
WorldView-2/3 Planet Lab Dove |
1,889–28,728 |
2 |
102–900 |
0.5–4 |
MSI |
|
SSL4EO-L |
T |
Landsat |
1M |
264x264 |
30 |
MSI |
|
SSL4EO-S12 |
T |
Sentinel-1/2 |
1M |
264x264 |
10 |
SAR, MSI |
|
R |
MODIS |
11k |
32x32 |
MSI |
|||
R |
GOES 8–16 |
108,110 |
256x256 |
4K–8K |
MSI |
||
C |
USGS National Map |
2,100 |
21 |
256x256 |
0.3 |
RGB |
|
R |
NAIP Aerial |
100K |
4 |
RGB, NIR |
|||
S |
Aerial |
33 |
6 |
1,281–3,816 |
0.09 |
RGB |
|
I |
Google Earth, Vaihingen |
800 |
10 |
358–1,728 |
0.08–2 |
RGB |
|
R |
Landsat8, Sentinel-1 |
2615 |
|||||
CD |
Maxar |
3,732 |
4 |
1,024x1,024 |
0.8 |
RGB |
|
I, T |
Sentinel-2 |
116K |
48 |
24x24 |
10 |
MSI |
ADVANCE¶
- class torchgeo.datasets.ADVANCE(root='data', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
ADVANCE dataset.
The ADVANCE dataset is a dataset for audio visual scene recognition.
Dataset features:
5,075 pairs of geotagged audio recordings and images
three spectral bands - RGB (512x512 px)
10-second audio recordings
Dataset format:
images are three-channel jpgs
audio files are in wav format
Dataset classes:
airport
beach
bridge
farmland
forest
grassland
harbour
lake
orchard
residential
sparse shrub land
sports land
train station
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
scipy to load the audio files to tensors
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ADVANCE dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
Benin Cashew Plantations¶
- class torchgeo.datasets.BeninSmallHolderCashews(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Bases:
NonGeoDataset
Smallholder Cashew Plantations in Benin dataset.
This dataset contains labels for cashew plantations in a 120 km2 area in the center of Benin. Each pixel is classified for Well-managed plantation, Poorly-managed plantation, No plantation and other classes. The labels are generated using a combination of ground data collection with a handheld GPS device, and final corrections based on Airbus Pléiades imagery. See this website for dataset details.
Specifically, the data consists of Sentinel 2 imagery from a 120 km2 area in the center of Benin over 71 points in time from 11/05/2019 to 10/30/2020 and polygon labels for 6 classes:
No data
Well-managed plantation
Poorly-managed planatation
Non-plantation
Residential
Background
Uncertain
If you use this dataset in your research, please cite the following:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Initialize a new Benin Smallholder Cashew Plantations Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
chip_size (int) – size of chips
stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
verbose (bool) – if True, print messages when new tiles are loaded
- Raises:
RuntimeError – if
download=False
but dataset is missing or checksum fails
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
a dict containing image, mask, transform, crs, and metadata at index.
- Return type:
- __len__()[source]¶
Return the number of chips in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, time_step=0, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step (int) – time step at which to access image, beginning with 0
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if the RGB bands are not included in
self.bands
- Return type:
New in version 0.2.
BigEarthNet¶
- class torchgeo.datasets.BigEarthNet(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
BigEarthNet dataset.
The BigEarthNet dataset is a dataset for multilabel remote sensing image scene classification.
Dataset features:
590,326 patches from 125 Sentinel-1 and Sentinel-2 tiles
Imagery from tiles in Europe between Jun 2017 - May 2018
12 spectral bands with 10-60 m per pixel resolution (base 120x120 px)
2 synthetic aperture radar bands (120x120 px)
43 or 19 scene classes from the 2018 CORINE Land Cover database (CLC 2018)
Dataset format:
images are composed of multiple single channel geotiffs
labels are multiclass, stored in a single json file per image
mapping of Sentinel-1 to Sentinel-2 patches are within Sentinel-1 json files
Sentinel-1 bands: (VV, VH)
Sentinel-2 bands: (B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)
All bands: (VV, VH, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)
Sentinel-2 bands are of different spatial resolutions and upsampled to 10m
Dataset classes (43):
Continuous urban fabric
Discontinuous urban fabric
Industrial or commercial units
Road and rail networks and associated land
Port areas
Airports
Mineral extraction sites
Dump sites
Construction sites
Green urban areas
Sport and leisure facilities
Non-irrigated arable land
Permanently irrigated land
Rice fields
Vineyards
Fruit trees and berry plantations
Olive groves
Pastures
Annual crops associated with permanent crops
Complex cultivation patterns
Land principally occupied by agriculture, with significant areas of natural vegetation
Agro-forestry areas
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grassland
Moors and heathland
Sclerophyllous vegetation
Transitional woodland/shrub
Beaches, dunes, sands
Bare rock
Sparsely vegetated areas
Burnt areas
Inland marshes
Peatbogs
Salt marshes
Salines
Intertidal flats
Water courses
Water bodies
Coastal lagoons
Estuaries
Sea and ocean
Dataset classes (19):
Urban fabric
Industrial or commercial units
Arable land
Permanent crops
Pastures
Complex cultivation patterns
Land principally occupied by agriculture, with significant areas of natural vegetation
Agro-forestry areas
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grassland and sparsely vegetated areas
Moors, heathland and sclerophyllous vegetation
Transitional woodland, shrub
Beaches, dunes, sands
Inland wetlands
Coastal wetlands
Inland waters
Marine waters
The source for the above dataset classes, their respective ordering, and 43-to-19-class mappings can be found here:
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]¶
Initialize a new BigEarthNet dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – train/val/test split to load
bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
num_classes (int) – number of classes to load in target. one of {19, 43}
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
BioMassters¶
- class torchgeo.datasets.BioMassters(root='data', split='train', sensors=['S1', 'S2'], as_time_series=False)[source]¶
Bases:
NonGeoDataset
BioMassters Dataset for Aboveground Biomass prediction.
Dataset intended for Aboveground Biomass (AGB) prediction over Finnish forests based on Sentinel 1 and 2 data with corresponding target AGB mask values generated by Light Detection and Ranging (LiDAR).
Dataset Format:
.tif files for Sentinel 1 and 2 data
.tif file for pixel wise AGB target mask
.csv files for metadata regarding features and targets
Dataset Features:
13,000 target AGB masks of size (256x256px)
12 months of data per target mask
Sentinel 1 and Sentinel 2 data for each location
Sentinel 1 available for every month
Sentinel 2 available for almost every month (not available for every month due to ESA aquisition halt over the region during particular periods)
If you use this dataset in your research, please cite the following paper:
New in version 0.5.
- __init__(root='data', split='train', sensors=['S1', 'S2'], as_time_series=False)[source]¶
Initialize a new instance of BioMassters dataset.
If
as_time_series=False
(the default), each time step becomes its own sample with the target being shared across multiple samples.- Parameters:
root (str) – root directory where dataset can be found
split (str) – train or test split
sensors (Sequence[str]) – which sensors to consider for the sample, Sentinel 1 and/or Sentinel 2 (‘S1’, ‘S2’)
as_time_series (bool) – whether or not to return all available time-steps or just a single one for a given target location
- RuntimeError:
AssertionError: if
split
orsensors
is invalid
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and labels at that index
- Raises:
IndexError – if index is out of range of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample return by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Cloud Cover Detection¶
- class torchgeo.datasets.CloudCoverDetection(root='data', split='train', bands=['B02', 'B03', 'B04', 'B08'], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Cloud Cover Detection Challenge dataset.
This training dataset was generated as part of a crowdsourcing competition on DrivenData.org, and later on was validated using a team of expert annotators. See this website for dataset details.
The dataset consists of Sentinel-2 satellite imagery and corresponding cloudy labels stored as GeoTiffs. There are 22,728 chips in the training data, collected between 2018 and 2020.
Each chip has:
4 multi-spectral bands from Sentinel-2 L2A product. The four bands are [B02, B03, B04, B08] (refer to Sentinel-2 documentation for more information about the bands).
Label raster for the corresponding source tile representing a binary classification for if the pixel is a cloud or not.
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
New in version 0.4.
- __init__(root='data', split='train', bands=['B02', 'B03', 'B04', 'B08'], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initiatlize a new Cloud Cover Detection Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – train/val/test split to load
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing or checksum fails
- __len__()[source]¶
Return the number of items in the dataset.
- Returns:
length of dataset in integer
- Return type:
- __getitem__(index)[source]¶
Returns a sample from dataset.
- Parameters:
Index – index to return
- Returns:
data and label at given index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step – time step at which to access image, beginning with 0
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if dataset does not contain an RGB band
- Return type:
COWC¶
- class torchgeo.datasets.COWC(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
,ABC
Abstract base class for the COWC dataset.
The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.
The dataset has the following attributes:
Data from overhead at 15 cm per pixel resolution at ground (all data is EO).
Data from six distinct locations: Toronto, Canada; Selwyn, New Zealand; Potsdam and Vaihingen, Germany; Columbus, Ohio and Utah, United States.
32,716 unique annotated cars. 58,247 unique negative examples.
Intentional selection of hard negative examples.
Established baseline for detection and counting tasks.
Extra testing scenes for use after validation.
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new COWC dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
Kenya Crop Type¶
- class torchgeo.datasets.CV4AKenyaCropType(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Bases:
NonGeoDataset
CV4A Kenya Crop Type dataset.
Used in a competition in the Computer NonGeo for Agriculture (CV4A) workshop in ICLR 2020. See this website for dataset details.
Consists of 4 tiles of Sentinel 2 imagery from 13 different points in time.
Each tile has:
13 multi-band observations throughout the growing season. Each observation includes 12 bands from Sentinel-2 L2A product, and a cloud probability layer. The twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12] (refer to Sentinel-2 documentation for more information about the bands). The cloud probability layer is a product of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) and provides an estimated cloud probability (0-100%) per pixel. All of the bands are mapped to a common 10 m spatial resolution grid.
A raster layer indicating the crop ID for the fields in the training set.
A raster layer indicating field IDs for the fields (both training and test sets). Fields with a crop ID 0 are the test fields.
There are 3,286 fields in the train set and 1,402 fields in the test set.
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Initialize a new CV4A Kenya Crop Type Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
chip_size (int) – size of chips
stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
verbose (bool) – if True, print messages when new tiles are loaded
- Raises:
RuntimeError – if
download=False
but dataset is missing or checksum fails
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data, labels, field ids, and metadata at that index
- Return type:
- __len__()[source]¶
Return the number of chips in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, time_step=0, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step (int) – time step at which to access image, beginning with 0
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
DeepGlobe Land Cover¶
- class torchgeo.datasets.DeepGlobeLandCover(root='data', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
DeepGlobe Land Cover Classification Challenge dataset.
The DeepGlobe Land Cover Classification Challenge dataset offers high-resolution sub-meter satellite imagery focusing for the task of semantic segmentation to detect areas of urban, agriculture, rangeland, forest, water, barren, and unknown. It contains 1,146 satellite images of size 2448 x 2448 pixels in total, split into training/validation/test sets, the original dataset can be downloaded from Kaggle. However, we only use the training dataset with 803 images since the original test and valid dataset are not accompanied by labels. The dataset that we use with a custom train/test split can be downloaded from Kaggle (created as a part of Computer Vision by Deep Learning (CS4245) course offered at TU Delft).
Dataset format:
images are RGB data
masks are RGB image with with unique RGB values representing the class
Dataset classes:
Urban land
Agriculture land
Rangeland
Forest land
Water
Barren land
Unknown
File names for satellite images and the corresponding mask image are id_sat.jpg and id_mask.png, where id is an integer assigned to every image.
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new DeepGlobeLandCover dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
DFC2022¶
- class torchgeo.datasets.DFC2022(root='data', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
DFC2022 dataset.
The DFC2022 dataset is used as a benchmark dataset for the 2022 IEEE GRSS Data Fusion Contest and extends the MiniFrance dataset for semi-supervised semantic segmentation. The dataset consists of a train set containing labeled and unlabeled imagery and an unlabeled validation set. The dataset can be downloaded from the IEEEDataPort DFC2022 website.
Dataset features:
RGB aerial images at 0.5 m per pixel spatial resolution (~2,000x2,0000 px)
DEMs at 1 m per pixel spatial resolution (~1,000x1,0000 px)
Masks at 0.5 m per pixel spatial resolution (~2,000x2,0000 px)
16 land use/land cover categories
Images collected from the IGN BD ORTHO database
DEMs collected from the IGN RGE ALTI database
Labels collected from the UrbanAtlas 2012 database
Data collected from 19 regions in France
Dataset format:
images are three-channel geotiffs
DEMS are single-channel geotiffs
masks are single-channel geotiffs with the pixel values represent the class
Dataset classes:
No information
Urban fabric
Industrial, commercial, public, military, private and transport units
Mine, dump and construction sites
Artificial non-agricultural vegetated areas
Arable land (annual crops)
Permanent crops
Pastures
Complex and mixed cultivation patterns
Orchards at the fringe of urban classes
Forests
Herbaceous vegetation associations
Open spaces with little or no vegetation
Wetlands
Water
Clouds and Shadows
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new DFC2022 dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
is invalid
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
ETCI2021 Flood Detection¶
- class torchgeo.datasets.ETCI2021(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
ETCI 2021 Flood Detection dataset.
The ETCI2021 dataset is a dataset for flood detection
Dataset features:
33,405 VV & VH Sentinel-1 Synthetic Aperture Radar (SAR) images
2 binary masks per image representing water body & flood, respectively
2 polarization band images (VV, VH) of 3 RGB channels per band
3 RGB channels per band generated by the Hybrid Pluggable Processing Pipeline (hyp3)
Images with 5x20m per pixel resolution (256x256) px) taken in Interferometric Wide Swath acquisition mode
Flood events from 5 different regions
Dataset format:
VV band three-channel png
VH band three-channel png
water body mask single-channel png where no water body = 0, water body = 255
flood mask single-channel png where no flood = 0, flood = 255
Dataset classes:
no flood/water
flood/water
If you use this dataset in your research, please add the following to your acknowledgements section:
The authors would like to thank the NASA Earth Science Data Systems Program, NASA Digital Transformation AI/ML thrust, and IEEE GRSS for organizing the ETCI competition.
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ETCI 2021 dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
EuroSAT¶
- class torchgeo.datasets.EuroSAT(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B08A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoClassificationDataset
EuroSAT dataset.
The EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consists of 10 target classes with a total of 27,000 labeled and geo-referenced images.
Dataset format:
rasters are 13-channel GeoTiffs
labels are values in the range [0,9]
Dataset classes:
Industrial Buildings
Residential Buildings
Annual Crop
Permanent Crop
River
Sea and Lake
Herbaceous Vegetation
Highway
Pasture
Forest
This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:
If you use this dataset in your research, please cite the following papers:
- __init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B08A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶
Initialize a new EuroSAT dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
New in version 0.3: The bands parameter.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if RGB bands are not found in dataset
- Return type:
New in version 0.2.
- class torchgeo.datasets.EuroSAT100(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B08A', 'B09', 'B10', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶
Bases:
EuroSAT
Subset of EuroSAT containing only 100 images.
Intended for tutorials and demonstrations, not for benchmarking.
Maintains the same file structure, classes, and train-val-test split. Each class has 10 images (6 train, 2 val, 2 test), for a total of 100 images.
New in version 0.5.
FAIR1M¶
- class torchgeo.datasets.FAIR1M(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
FAIR1M dataset.
The FAIR1M dataset is a dataset for remote sensing fine-grained oriented object detection.
Dataset features:
15,000+ images with 0.3-0.8 m per pixel resolution (1,000-10,000 px)
1 million object instances
5 object categories, 37 object sub-categories
three spectral bands - RGB
images taken by Gaofen satellites and Google Earth
Dataset format:
images are three-channel tiffs
labels are xml files with PASCAL VOC like annotations
Dataset classes:
Passenger Ship
Motorboat
Fishing Boat
Tugboat
other-ship
Engineering Ship
Liquid Cargo Ship
Dry Cargo Ship
Warship
Small Car
Bus
Cargo Truck
Dump Truck
other-vehicle
Van
Trailer
Tractor
Excavator
Truck Tractor
Boeing737
Boeing747
Boeing777
Boeing787
ARJ21
C919
A220
A321
A330
A350
other-airplane
Baseball Field
Basketball Court
Football Field
Tennis Court
Roundabout
Intersection
Bridge
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new FAIR1M dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
Changed in version 0.5: Added split and download parameters.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
FireRisk¶
- class torchgeo.datasets.FireRisk(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoClassificationDataset
FireRisk dataset.
The FireRisk dataset is a dataset for remote sensing fire risk classification.
Dataset features:
91,872 images with 1 m per pixel resolution (320x320 px)
70,331 and 21,541 train and val images, respectively
three spectral bands - RGB
7 fire risk classes
images extracted from NAIP tiles
Dataset format:
images are three-channel pngs
Dataset classes:
high
low
moderate
non-burnable
very_high
very_low
water
If you use this dataset in your research, please cite the following paper:
New in version 0.5.
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new FireRisk dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “val”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
but dataset is missing or checksum fails
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Forest Damage¶
- class torchgeo.datasets.ForestDamage(root='data', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
Forest Damage dataset.
The ForestDamage dataset contains drone imagery that can be used for tree identification, as well as tree damage classification for larch trees.
Dataset features:
1543 images
101,878 tree annotations
subset of 840 images contain 44,522 annotations about tree health (Healthy (H), Light Damage (LD), High Damage (HD)), all other images have “other” as damage level
Dataset format:
images are three-channel jpgs
annotations are in Pascal VOC XML format
Dataset Classes:
other
healthy
light damage
high damage
If the download fails or stalls, it is recommended to try azcopy as suggested here. It is expected that the downloaded data file with name
Data_Set_Larch_Casebearer
can be found inroot
.If you use this dataset in your research, please use the following citation:
Swedish Forest Agency (2021): Forest Damages - Larch Casebearer 1.0. National Forest Data Lab. Dataset.
New in version 0.3.
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ForestDamage dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
GID-15¶
- class torchgeo.datasets.GID15(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
GID-15 dataset.
The GID-15 dataset is a dataset for semantic segmentation.
Dataset features:
images taken by the Gaofen-2 (GF-2) satellite over 60 cities in China
masks representing 15 semantic categories
three spectral bands - RGB
150 with 3 m per pixel resolution (6800x7200 px)
Dataset format:
images are three-channel pngs
masks are single-channel pngs
colormapped masks are 3 channel tifs
Dataset classes:
background
industrial_land
urban_residential
rural_residential
traffic_land
paddy_field
irrigated_land
dry_cropland
garden_plot
arbor_woodland
shrub_land
natural_grassland
artificial_grassland
river
lake
pond
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new GID-15 dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample return by
__getitem__()
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
IDTReeS¶
- class torchgeo.datasets.IDTReeS(root='data', split='train', task='task1', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
IDTReeS dataset.
The IDTReeS dataset is a dataset for tree crown detection.
Dataset features:
RGB Image, Canopy Height Model (CHM), Hyperspectral Image (HSI), LiDAR Point Cloud
Remote sensing and field data generated by the National Ecological Observatory Network (NEON)
0.1 - 1m resolution imagery
Task 1 - object detection (tree crown delination)
Task 2 - object classification (species classification)
Train set contains 85 images
Test set (task 1) contains 153 images
Test set (task 2) contains 353 images and tree crown polygons
Dataset format:
optical - three-channel RGB 200x200 geotiff
canopy height model - one-channel 20x20 geotiff
hyperspectral - 369-channel 20x20 geotiff
point cloud - Nx3 LAS file (.las), some files contain RGB colors per point
shapely files (.shp) containing polygons
csv file containing species labels and other metadata for each polygon
Dataset classes:
ACPE
ACRU
ACSA3
AMLA
BETUL
CAGL8
CATO6
FAGR
GOLA
LITU
LYLU3
MAGNO
NYBI
NYSY
OXYDE
PEPA37
PIEL
PIPA2
PINUS
PITA
PRSE2
QUAL
QUCO2
QUGE2
QUHE2
QULA2
QULA3
QUMO4
QUNI
QURU
QUERC
ROPS
TSCA
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', task='task1', transforms=None, download=False, checksum=False)[source]¶
Initialize a new IDTReeS dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
task (str) – ‘task1’ for detection, ‘task2’ for detection + classification (only relevant for split=’test’)
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
ImportError – if laspy is not installed
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, hsi_indices=(0, 1, 2))[source]¶
Plot a sample from the dataset.
- Parameters:
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
- plot_las(index)[source]¶
Plot a sample point cloud at the index.
- Parameters:
index (int) – index to plot
- Returns:
pyvista.PolyData object. Run pyvista.plot(point_cloud, …) to display
- Raises:
ImportError – if pyvista is not installed
- Return type:
Changed in version 0.4: Ported from Open3D to PyVista, colormap parameter removed.
Inria Aerial Image Labeling¶
- class torchgeo.datasets.InriaAerialImageLabeling(root='data', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Inria Aerial Image Labeling Dataset.
The Inria Aerial Image Labeling dataset is a building detection dataset over dissimilar settlements ranging from densely populated areas to alpine towns. Refer to the dataset homepage to download the dataset.
Dataset features:
Coverage of 810 km2 (405 km2 for training and 405 km2 for testing)
Aerial orthorectified color imagery with a spatial resolution of 0.3 m
Number of images: 360 (train: 180, test: 180)
Train cities: Austin, Chicago, Kitsap, West Tyrol, Vienna
Test cities: Bellingham, Bloomington, Innsbruck, San Francisco, East Tyrol
Dataset format:
Imagery - RGB aerial GeoTIFFs of shape 5000 x 5000
Labels - RGB aerial GeoTIFFs of shape 5000 x 5000
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
Changed in version 0.5: Added support for a val split.
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new InriaAerialImageLabeling Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – train/val/test split
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version.
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
is invalidRuntimeError – if dataset is missing
- __len__()[source]¶
Return the number of samples in the dataset.
- Returns:
length of the dataset
- Return type:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
LandCover.ai¶
- class torchgeo.datasets.LandCoverAI(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
LandCoverAIBase
,NonGeoDataset
LandCover.ai dataset.
See the abstract LandCoverAIBase class to find out more.
Note
This dataset requires the following additional library to be installed:
opencv-python to generate the train/val/test split
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new LandCover.ai dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
LEVIR-CD+¶
- class torchgeo.datasets.LEVIRCDPlus(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
LEVIR-CD+ dataset.
The LEVIR-CD+ dataset is a dataset for building change detection.
Dataset features:
image pairs of 20 different urban regions across Texas between 2002-2020
binary change masks representing building change
three spectral bands - RGB
985 image pairs with 50 cm per pixel resolution (~1024x1024 px)
Dataset format:
images are three-channel pngs
masks are single-channel pngs where no change = 0, change = 255
Dataset classes:
no change
change
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new LEVIR-CD+ dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
LoveDA¶
- class torchgeo.datasets.LoveDA(root='data', split='train', scene=['urban', 'rural'], transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
LoveDA dataset.
The LoveDA datataset is a semantic segmentation dataset.
Dataset features:
2713 urban scene and 3274 rural scene HSR images, spatial resolution of 0.3m
image source is Google Earth platform
total of 166768 annotated objects from Nanjing, Changzhou and Wuhan cities
dataset comes with predefined train, validation, and test set
dataset differentiates between ‘rural’ and ‘urban’ images
Dataset format:
images are three-channel pngs with dimension 1024x1024
segmentation masks are single-channel pngs
Dataset classes:
background
building
road
water
barren
forest
agriculture
No-data regions assigned with 0 and should be ignored.
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', scene=['urban', 'rural'], transforms=None, download=False, checksum=False)[source]¶
Initialize a new LoveDA dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
scene (list[str]) – specify whether to load only ‘urban’, only ‘rural’ or both
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidAssertionError – if
scene
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
image and mask at that index with image of dimension 3x1024x1024 and mask of dimension 1024x1024
- Return type:
- __len__()[source]¶
Return the number of datapoints in the dataset.
- Returns:
length of dataset
- Return type:
- plot(sample, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample return by
__getitem__()
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
MapInWild¶
- class torchgeo.datasets.MapInWild(root='data', modality=['mask', 'esa_wc', 'viirs', 's2_summer'], split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
MapInWild dataset.
The MapInWild dataset is curated for the task of wilderness mapping on a pixel-level. MapInWild is a multi-modal dataset and comprises various geodata acquired and formed from different RS sensors over 1018 locations: dual-pol Sentinel-1, four-season Sentinel-2 with 10 bands, ESA WorldCover map, and Visible Infrared Imaging Radiometer Suite NightTime Day/Night band. The dataset consists of 8144 images with the shape of 1920 × 1920 pixels. The images are weakly annotated from the World Database of Protected Areas (WDPA).
Dataset features:
1018 areas globally sampled from the WDPA
10-Band Sentinel-2
Dual-pol Sentinel-1
ESA WorldCover Land Cover
Visible Infrared Imaging Radiometer Suite NightTime Day/Night Band
If you use this dataset in your research, please cite the following paper:
New in version 0.5.
- __init__(root='data', modality=['mask', 'esa_wc', 'viirs', 's2_summer'], split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new MapInWild dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
modality (list[str]) – the modality to download. Choose from: “mask”, “esa_wc”, “viirs”, “s1”, “s2_temporal_subset”, “s2_[season]”.
split (str) – one of “train”, “validation”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalid
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample image-mask pair returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Million-AID¶
- class torchgeo.datasets.MillionAID(root='data', task='multi-class', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Million-AID Dataset.
The MillionAID dataset consists of one million aerial images from Google Earth Engine that offers either a multi-class learning task with 51 classes or a multi-label learning task with 73 different possible labels. For more details please consult the accompanying paper.
Dataset features:
RGB aerial images with varying resolutions from 0.5 m to 153 m per pixel
images within classes can have different pixel dimension
Dataset format:
images are three-channel jpg
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', task='multi-class', split='train', transforms=None, checksum=False)[source]¶
Initialize a new MillionAID dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
task (str) – type of task, either “multi-class” or “multi-label”
split (str) – train or test split
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if dataset is not found
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
NASA Marine Debris¶
- class torchgeo.datasets.NASAMarineDebris(root='data', transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Bases:
NonGeoDataset
NASA Marine Debris dataset.
The NASA Marine Debris dataset is a dataset for detection of floating marine debris in satellite imagery.
Dataset features:
707 patches with 3 m per pixel resolution (256x256 px)
three spectral bands - RGB
1 object class: marine_debris
images taken by Planet Labs PlanetScope satellites
imagery taken from 2016-2019 from coasts of Greece, Honduras, and Ghana
Dataset format:
images are three-channel geotiffs in uint8 format
labels are numpy files (.npy) containing bounding box (xyxy) coordinates
additional: images in jpg format and labels in geojson format
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
New in version 0.2.
- __init__(root='data', transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Initialize a new NASA Marine Debris Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
verbose (bool) – if True, print messages when new tiles are loaded
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and labels at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
OSCD¶
- class torchgeo.datasets.OSCD(root='data', split='train', bands='all', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
OSCD dataset.
The Onera Satellite Change Detection dataset addresses the issue of detecting changes between satellite images from different dates. Imagery comes from Sentinel-2 which contains varying resolutions per band.
Dataset format:
images are 13-channel tifs
masks are single-channel pngs where no change = 0, change = 255
Dataset classes:
no change
change
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', bands='all', transforms=None, download=False, checksum=False)[source]¶
Initialize a new OSCD dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
PASTIS¶
- class torchgeo.datasets.PASTIS(root='data', folds=(0, 1, 2, 3, 4), bands='s2', mode='semantic', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
PASTIS dataset.
The PASTIS dataset is a dataset for time-series panoptic segmentation of agricultural parcels.
Dataset features:
support for the original PASTIS and PASTIS-R versions of the dataset
2,433 time-series with 10 m per pixel resolution (128x128 px)
18 crop categories, 1 background category, 1 void category
semantic and instance annotations
3 Sentinel-1 Ascending bands
3 Sentinel-1 Descending bands
10 Sentinel-2 L2A multispectral bands
Dataset format:
time-series and annotations are in numpy format (.npy)
Dataset classes:
Background
Meadow
Soft Winter Wheat
Corn
Winter Barley
Winter Rapeseed
Spring Barley
Sunflower
Grapevine
Beet
Winter Triticale
Winter Durum Wheat
Fruits Vegetables Flowers
Potatoes
Leguminous Fodder
Soybeans
Orchard
Mixed Cereal
Sorghum
Void Label
If you use this dataset in your research, please cite the following papers:
New in version 0.5.
- __init__(root='data', folds=(0, 1, 2, 3, 4), bands='s2', mode='semantic', transforms=None, download=False, checksum=False)[source]¶
Initialize a new PASTIS dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
folds (Sequence[int]) – a sequence of integers from 0 to 4 specifying which of the five dataset folds to include
bands (str) – load Sentinel-1 ascending path data (s1a), Sentinel-1 descending path data (s1d), or Sentinel-2 data (s2)
mode (str) – load semantic (semantic) or instance (instance) annotations
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
PatternNet¶
- class torchgeo.datasets.PatternNet(root='data', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoClassificationDataset
PatternNet dataset.
The PatternNet dataset is a dataset for remote sensing scene classification and image retrieval.
Dataset features:
30,400 images with 6-50 cm per pixel resolution (256x256 px)
three spectral bands - RGB
38 scene classes, 800 images per class
Dataset format:
images are three-channel jpgs
Dataset classes:
airplane
baseball_field
basketball_court
beach
bridge
cemetery
chaparral
christmas_tree_farm
closed_road
coastal_mansion
crosswalk
dense_residential
ferry_terminal
football_field
forest
freeway
golf_course
harbor
intersection
mobile_home_park
nursing_home
oil_gas_field
oil_well
overpass
parking_lot
parking_space
railway
river
runway
runway_marking
shipping_yard
solar_panel
sparse_residential
storage_tank
swimming_pool
tennis_court
transformer_station
wastewater_treatment_plant
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new PatternNet dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
Potsdam¶
- class torchgeo.datasets.Potsdam2D(root='data', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Potsdam 2D Semantic Segmentation dataset.
The Potsdam dataset is a dataset for urban semantic segmentation used in the 2D Semantic Labeling Contest - Potsdam. This dataset uses the “4_Ortho_RGBIR.zip” and “5_Labels_all.zip” files to create the train/test sets used in the challenge. The dataset can be requested at the challenge homepage. Note, the server contains additional data for 3D Semantic Labeling which are currently not supported.
Dataset format:
images are 4-channel geotiffs
masks are 3-channel geotiffs with unique RGB values representing the class
Dataset classes:
Clutter/background
Impervious surfaces
Building
Low Vegetation
Tree
Car
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new Potsdam dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
ReforesTree¶
- class torchgeo.datasets.ReforesTree(root='data', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
ReforesTree dataset.
The ReforesTree dataset contains drone imagery that can be used for tree crown detection, tree species classification and Aboveground Biomass (AGB) estimation.
Dataset features:
100 high resolution RGB drone images at 2 cm/pixel of size 4,000 x 4,000 px
more than 4,600 tree crown box annotations
tree crown matched with field measurements of diameter at breast height (DBH), and computed AGB and carbon values
Dataset format:
images are three-channel pngs
annotations are csv file
Dataset Classes:
other
banana
cacao
citrus
fruit
timber
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ReforesTree dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
RESISC45¶
- class torchgeo.datasets.RESISC45(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoClassificationDataset
NWPU-RESISC45 dataset.
The RESISC45 dataset is a dataset for remote sensing image scene classification.
Dataset features:
31,500 images with 0.2-30 m per pixel resolution (256x256 px)
three spectral bands - RGB
45 scene classes, 700 images per class
images extracted from Google Earth from over 100 countries
images conditions with high variability (resolution, weather, illumination)
Dataset format:
images are three-channel jpgs
Dataset classes:
airplane
airport
baseball_diamond
basketball_court
beach
bridge
chaparral
church
circular_farmland
cloud
commercial_area
dense_residential
desert
forest
freeway
golf_course
ground_track_field
harbor
industrial_area
intersection
island
lake
meadow
medium_residential
mobile_home_park
mountain
overpass
palace
parking_lot
railway
railway_station
rectangular_farmland
river
roundabout
runway
sea_ice
ship
snowberg
sparse_residential
stadium
storage_tank
tennis_court
terrace
thermal_power_station
wetland
This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new RESISC45 dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
Rwanda Field Boundary¶
- class torchgeo.datasets.RwandaFieldBoundary(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04'), transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Rwanda Field Boundary Competition dataset.
This dataset contains field boundaries for smallholder farms in eastern Rwanda. The Nasa Harvest program funded a team of annotators from TaQadam to label Planet imagery for the 2021 growing season for the purpose of conducting the Rwanda Field boundary detection Challenge. The dataset includes rasterized labeled field boundaries and time series satellite imagery from Planet’s NICFI program. Planet’s basemap imagery is provided for six months (March, April, August, October, November and December). Note: only fields that were big enough to be differentiated on the Planetscope imagery were labeled, only fields that were fully contained within the chips were labeled. The paired dataset is provided in 256x256 chips for a total of 70 tiles covering 1532 individual fields.
The labels are provided as binary semantic segmentation labels:
No field-boundary
Field-boundary
If you use this dataset in your research, please cite the following:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
New in version 0.5.
- __init__(root='data', split='train', bands=('B01', 'B02', 'B03', 'B04'), transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new RwandaFieldBoundary instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing or checksum fails or ifdownload=True
andapi_key=None
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
a dict containing image, mask, transform, crs, and metadata at index.
- Return type:
- __len__()[source]¶
Return the number of chips in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, time_step=0, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
time_step (int) – time step at which to access image, beginning with 0
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if the RGB bands are not included in
self.bands
- Return type:
Seasonal Contrast¶
- class torchgeo.datasets.SeasonalContrastS2(root='data', version='100k', seasons=1, bands=['B4', 'B3', 'B2'], transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
Sentinel 2 imagery from the Seasonal Contrast paper.
The Seasonal Contrast imagery dataset contains Sentinel 2 imagery patches sampled from different points in time around the 10k most populated cities on Earth.
Dataset features:
Two versions: 100K and 1M patches
12 band Sentinel 2 imagery from 5 points in time at each location
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', version='100k', seasons=1, bands=['B4', 'B3', 'B2'], transforms=None, download=False, checksum=False)[source]¶
Initialize a new SeasonalContrastS2 instance.
New in version 0.5: The seasons parameter.
- Parameters:
root (str) – root directory where dataset can be found
version (str) – one of “100k” or “1m” for the version of the dataset to use
seasons (int) – number of seasonal patches to sample per location, 1–5
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
version
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
sample with an “image” in SCxHxW format where S is the number of seasons
- Return type:
Changed in version 0.5: Image shape changed from 5xCxHxW to SCxHxW
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if the RGB bands are included in
self.bands
or the sample contains a “prediction” key- Return type:
New in version 0.2.
SeasoNet¶
- class torchgeo.datasets.SeasoNet(root='data', split='train', seasons={'Fall', 'Snow', 'Spring', 'Summer', 'Winter'}, bands=('10m_RGB', '10m_IR', '20m', '60m'), grids=[1, 2], concat_seasons=1, transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
SeasoNet Semantic Segmentation dataset.
The SeasoNet dataset consists of 1,759,830 multi-spectral Sentinel-2 image patches, taken from 519,547 unique locations, covering the whole surface area of Germany. Annotations are provided in the form of pixel-level land cover and land usage segmentation masks from the German land cover model LBM-DE2018 with land cover classes based on the CORINE Land Cover database (CLC) 2018. The set is split into two overlapping grids, consisting of roughly 880,000 samples each, which are shifted by half the patch size in both dimensions. The images in each of the both grids themselves do not overlap.
Dataset format:
images are 16-bit GeoTiffs, split into seperate files based on resolution
images include 12 spectral bands with 10, 20 and 60 m per pixel resolutions
masks are single-channel 8-bit GeoTiffs
Dataset classes:
Continuous urban fabric
Discontinuous urban fabric
Industrial or commercial units
Road and rail networks and associated land
Port areas
Airports
Mineral extraction sites
Dump sites
Construction sites
Green urban areas
Sport and leisure facilities
Non-irrigated arable land
Vineyards
Fruit trees and berry plantations
Pastures
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grasslands
Moors and heathland
Transitional woodland/shrub
Beaches, dunes, sands
Bare rock
Sparsely vegetated areas
Inland marshes
Peat bogs
Salt marshes
Intertidal flats
Water courses
Water bodies
Coastal lagoons
Estuaries
Sea and ocean
If you use this dataset in your research, please cite the following paper:
New in version 0.5.
- __init__(root='data', split='train', seasons={'Fall', 'Snow', 'Spring', 'Summer', 'Winter'}, bands=('10m_RGB', '10m_IR', '20m', '60m'), grids=[1, 2], concat_seasons=1, transforms=None, download=False, checksum=False)[source]¶
Initialize a new SeasoNet dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val” or “test”
seasons (Collection[str]) – list of seasons to load
grids (Iterable[int]) – which of the overlapping grids to load
concat_seasons (int) – number of seasonal images to return per sample. if 1, each seasonal image is returned as its own sample, otherwise seasonal images are randomly picked from the seasons specified in
seasons
and returned as stacked tensorstransforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
sample at that index containing the image with shape SCxHxW and the mask with shape HxW, where
S = self.concat_seasons
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, show_legend=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
show_legend (bool) – flag indicating whether to show a legend for the segmentation masks
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – If bands does not contain all RGB bands.
- Return type:
SEN12MS¶
- class torchgeo.datasets.SEN12MS(root='data', split='train', bands=('VV', 'VH', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
SEN12MS dataset.
The SEN12MS dataset contains 180,662 patch triplets of corresponding Sentinel-1 dual-pol SAR data, Sentinel-2 multi-spectral images, and MODIS-derived land cover maps. The patches are distributed across the land masses of the Earth and spread over all four meteorological seasons. This is reflected by the dataset structure. All patches are provided in the form of 16-bit GeoTiffs containing the following specific information:
Sentinel-1 SAR: 2 channels corresponding to sigma nought backscatter values in dB scale for VV and VH polarization.
Sentinel-2 Multi-Spectral: 13 channels corresponding to the 13 spectral bands (B1, B2, B3, B4, B5, B6, B7, B8, B8a, B9, B10, B11, B12).
MODIS Land Cover: 4 channels corresponding to IGBP, LCCS Land Cover, LCCS Land Use, and LCCS Surface Hydrology layers.
If you use this dataset in your research, please cite the following paper:
Note
This dataset can be automatically downloaded using the following bash script:
for season in 1158_spring 1868_summer 1970_fall 2017_winter do for source in lc s1 s2 do wget "ftp://m1474000:m1474000@dataserv.ub.tum.de/ROIs${season}_${source}.tar.gz" tar xvzf "ROIs${season}_${source}.tar.gz" done done for split in train test do wget "https://raw.githubusercontent.com/schmitt-muc/SEN12MS/3a41236a28d08d253ebe2fa1a081e5e32aa7eab4/splits/${split}_list.txt" done
or manually downloaded from https://dataserv.ub.tum.de/s/m1474000 and https://github.com/schmitt-muc/SEN12MS/tree/master/splits. This download will likely take several hours.
- __init__(root='data', split='train', bands=('VV', 'VH', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'), transforms=None, checksum=False)[source]¶
Initialize a new SEN12MS dataset instance.
The
bands
argument allows for the subsetting of bands returned by the dataset. Integers inbands
index into a stack of Sentinel 1 and Sentinel 2 imagery. Indices 0 and 1 correspond to the Sentinel 1 imagery where indices 2 through 14 correspond to the Sentinel 2 imagery.- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
bands (Sequence[str]) – a sequence of band indices to use where the indices correspond to the array index of combined Sentinel 1 and Sentinel 2
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if data is not found in
root
, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
SKIPP’D¶
- class torchgeo.datasets.SKIPPD(root='data', split='trainval', task='nowcast', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
SKy Images and Photovoltaic Power Dataset (SKIPP’D).
The SKIPP’D dataset contains ground-based fish-eye photos of the sky for solar forecasting tasks.
Dataset Format:
.hdf5 file containing images and labels
.npy files with corresponding datetime timestamps
Dataset Features:
fish-eye RGB images (64x64px)
power output measurements from 30-kW rooftop PV array
1-min interval across 3 years (2017-2019)
Nowcast task:
349,372 images under the split key trainval
14,003 images under the split key test
Forecast task:
130,412 images under the split key trainval
2,462 images under the split key test
consists of a concatenated RGB time-series of 16 time-steps
If you use this dataset in your research, please cite:
New in version 0.5.
- __init__(root='data', split='trainval', task='nowcast', transforms=None, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “trainval”, or “test”
task (str) – one fo “nowcast”, or “forecast”
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)
- Raises:
AssertionError – if
task
orsplit
is invalidImportError – if h5py is not installed
RuntimeError – if
download=False
but dataset is missing or checksum fails
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
So2Sat¶
- class torchgeo.datasets.So2Sat(root='data', version='2', split='train', bands=('S1_B1', 'S1_B2', 'S1_B3', 'S1_B4', 'S1_B5', 'S1_B6', 'S1_B7', 'S1_B8', 'S2_B02', 'S2_B03', 'S2_B04', 'S2_B05', 'S2_B06', 'S2_B07', 'S2_B08', 'S2_B8A', 'S2_B11', 'S2_B12'), transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
So2Sat dataset.
The So2Sat dataset consists of corresponding synthetic aperture radar and multispectral optical image data acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and a corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world, and comes with a variety of different splits.
This implementation covers the 2nd and 3rd versions of the dataset as described in the author’s github repository: https://github.com/zhu-xlab/So2Sat-LCZ42.
The different versions are as follows:
Version 2: This version contains imagery from 52 cities and is split into train/val/test as follows:
Training: 42 cities around the world
Validation: western half of 10 other cities covering 10 cultural zones
Testing: eastern half of the 10 other cities
Version 3: A version of the dataset with 3 different train/test splits, as follows:
Random split: every city 80% training / 20% testing (randomly sampled)
Block split: every city is split in a geospatial 80%/20%-manner
Cultural 10: 10 cities from different cultural zones are held back for testing purposes
Dataset classes:
Compact high rise
Compact middle rise
Compact low rise
Open high rise
Open mid rise
Open low rise
Lightweight low rise
Large low rise
Sparsely built
Heavy industry
Dense trees
Scattered trees
Bush, scrub
Low plants
Bare rock or paved
Bare soil or sand
Water
If you use this dataset in your research, please cite the following paper:
Note
The version 2 dataset can be automatically downloaded using the following bash script:
for split in training validation testing do wget ftp://m1483140:m1483140@dataserv.ub.tum.de/$split.h5 done
or manually downloaded from https://dataserv.ub.tum.de/index.php/s/m1483140 This download will likely take several hours.
The version 3 datasets can be downloaded using the following bash script:
for version in random block culture_10 do for split in training testing do wget -P $version/ ftp://m1613658:m1613658@dataserv.ub.tum.de/$version/$split.h5 done done
or manually downloaded from https://mediatum.ub.tum.de/1613658
- __init__(root='data', version='2', split='train', bands=('S1_B1', 'S1_B2', 'S1_B3', 'S1_B4', 'S1_B5', 'S1_B6', 'S1_B7', 'S1_B8', 'S2_B02', 'S2_B03', 'S2_B04', 'S2_B05', 'S2_B06', 'S2_B07', 'S2_B08', 'S2_B8A', 'S2_B11', 'S2_B12'), transforms=None, checksum=False)[source]¶
Initialize a new So2Sat dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
version (str) – one of “2”, “3_random”, “3_block”, or “3_culture_10”
split (str) – one of “train”, “validation”, or “test”
bands (Sequence[str]) – a sequence of band names to use where the indices correspond to the array index of combined Sentinel 1 and Sentinel 2
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if data is not found in
root
, or checksums don’t match
New in version 0.3: The bands parameter.
New in version 0.5: The version parameter.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
ValueError – if RGB bands are not found in dataset
- Return type:
New in version 0.2.
SpaceNet¶
- class torchgeo.datasets.SpaceNet(root, image, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
NonGeoDataset
,ABC
Abstract base class for the SpaceNet datasets.
The SpaceNet datasets are a set of datasets that all together contain >11M building footprints and ~20,000 km of road labels mapped over high-resolution satellite imagery obtained from a variety of sensors such as Worldview-2, Worldview-3 and Dove.
Note
The SpaceNet datasets require the following additional library to be installed:
- radiant-mlhub to download the
imagery and labels from the Radiant Earth MLHub
- __init__(root, image, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
- __len__()[source]¶
Return the number of samples in the dataset.
- Returns:
length of the dataset
- Return type:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
- class torchgeo.datasets.SpaceNet1(root='data', image='rgb', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
SpaceNet
SpaceNet 1: Building Detection v1 Dataset.
SpaceNet 1 is a dataset of building footprints over the city of Rio de Janeiro.
Dataset features:
No. of images: 6940 (8 Band) + 6940 (RGB)
No. of polygons: 382,534 building labels
Area Coverage: 2544 sq km
GSD: 1 m (8 band), 50 cm (rgb)
Chip size: 101 x 110 (8 band), 406 x 438 (rgb)
Dataset format:
Imagery - Worldview-2 GeoTIFFs
8Band.tif (Multispectral)
RGB.tif (Pansharpened RGB)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', image='rgb', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 1 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection which must be “rgb” or “8band”
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
- class torchgeo.datasets.SpaceNet2(root='data', image='PS-RGB', collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
SpaceNet
SpaceNet 2: Building Detection v2 Dataset.
SpaceNet 2 is a dataset of building footprints over the cities of Las Vegas, Paris, Shanghai and Khartoum.
Collection features:
AOI
Area (km2)
# Images
# Buildings
Las Vegas
216
3850
151,367
Paris
1030
1148
23,816
Shanghai
1000
4582
92,015
Khartoum
765
1012
35,503
Imagery features:
PAN
MS
PS-MS
PS-RGB
GSD (m)
0.31
1.24
0.30
0.30
Chip size (px)
650 x 650
162 x 162
650 x 650
650 x 650
Dataset format:
Imagery - Worldview-3 GeoTIFFs
PAN.tif (Panchromatic)
MS.tif (Multispectral)
PS-MS (Pansharpened Multispectral)
PS-RGB (Pansharpened RGB)
Labels - GeoJSON
label.geojson
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', image='PS-RGB', collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 2 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“MS”, “PAN”, “PS-MS”, “PS-RGB”]
collections (list[str]) – collection selection which must be a subset of: [sn2_AOI_2_Vegas, sn2_AOI_3_Paris, sn2_AOI_4_Shanghai, sn2_AOI_5_Khartoum]. If unspecified, all collections will be used.
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
- class torchgeo.datasets.SpaceNet3(root='data', image='PS-RGB', speed_mask=False, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
SpaceNet
SpaceNet 3: Road Network Detection.
SpaceNet 3 is a dataset of road networks over the cities of Las Vegas, Paris, Shanghai, and Khartoum.
Collection features:
AOI
Area (km2)
# Images
# Road Network Labels (km)
Vegas
216
854
3685
Paris
1030
257
425
Shanghai
1000
1028
3537
Khartoum
765
283
1030
Imagery features:
PAN
MS
PS-MS
PS-RGB
GSD (m)
0.31
1.24
0.30
0.30
Chip size (px)
1300 x 1300
325 x 325
1300 x 1300
1300 x 1300
Dataset format:
Imagery - Worldview-3 GeoTIFFs
PAN.tif (Panchromatic)
MS.tif (Multispectral)
PS-MS (Pansharpened Multispectral)
PS-RGB (Pansharpened RGB)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', image='PS-RGB', speed_mask=False, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 3 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“MS”, “PAN”, “PS-MS”, “PS-RGB”]
speed_mask (Optional[bool]) – use multi-class speed mask (created by binning roads at 10 mph increments) as label if true, else use binary mask
collections (list[str]) – collection selection which must be a subset of: [sn3_AOI_2_Vegas, sn3_AOI_3_Paris, sn3_AOI_4_Shanghai, sn3_AOI_5_Khartoum]. If unspecified, all collections will be used.
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
SpaceNet.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
- class torchgeo.datasets.SpaceNet4(root='data', image='PS-RGBNIR', angles=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
SpaceNet
SpaceNet 4: Off-Nadir Buildings Dataset.
SpaceNet 4 is a dataset of 27 WV-2 imagery captured at varying off-nadir angles and associated building footprints over the city of Atlanta. The off-nadir angle ranges from 7 degrees to 54 degrees.
Dataset features:
No. of chipped images: 28,728 (PAN/MS/PS-RGBNIR)
No. of label files: 1064
No. of building footprints: >120,000
Area Coverage: 665 sq km
Chip size: 225 x 225 (MS), 900 x 900 (PAN/PS-RGBNIR)
Dataset format:
Imagery - Worldview-2 GeoTIFFs
PAN.tif (Panchromatic)
MS.tif (Multispectral)
PS-RGBNIR (Pansharpened RGBNIR)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', image='PS-RGBNIR', angles=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 4 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“MS”, “PAN”, “PS-RGBNIR”]
angles (list[str]) – angle selection which must be in [“nadir”, “off-nadir”, “very-off-nadir”]
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
- class torchgeo.datasets.SpaceNet5(root='data', image='PS-RGB', speed_mask=False, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
SpaceNet3
SpaceNet 5: Automated Road Network Extraction and Route Travel Time Estimation.
SpaceNet 5 is a dataset of road networks over the cities of Moscow, Mumbai and San Juan (unavailable).
Collection features:
AOI
Area (km2)
# Images
# Road Network Labels (km)
Moscow
1353
1353
3066
Mumbai
1021
1016
1951
Imagery features:
PAN
MS
PS-MS
PS-RGB
GSD (m)
0.31
1.24
0.30
0.30
Chip size (px)
1300 x 1300
325 x 325
1300 x 1300
1300 x 1300
Dataset format:
Imagery - Worldview-3 GeoTIFFs
PAN.tif (Panchromatic)
MS.tif (Multispectral)
PS-MS (Pansharpened Multispectral)
PS-RGB (Pansharpened RGB)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please use the following citation:
The SpaceNet Partners, “SpaceNet5: Automated Road Network Extraction and Route Travel Time Estimation from Satellite Imagery”, https://spacenet.ai/sn5-challenge/
New in version 0.2.
- __init__(root='data', image='PS-RGB', speed_mask=False, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 5 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“MS”, “PAN”, “PS-MS”, “PS-RGB”]
speed_mask (Optional[bool]) – use multi-class speed mask (created by binning roads at 10 mph increments) as label if true, else use binary mask
collections (list[str]) – collection selection which must be a subset of: [sn5_AOI_7_Moscow, sn5_AOI_8_Mumbai]. If unspecified, all collections will be used.
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
- class torchgeo.datasets.SpaceNet6(root='data', image='PS-RGB', transforms=None, download=False, api_key=None)[source]¶
Bases:
SpaceNet
SpaceNet 6: Multi-Sensor All-Weather Mapping.
SpaceNet 6 is a dataset of optical and SAR imagery over the city of Rotterdam.
Collection features:
AOI
Area (km2)
# Images
# Building Footprint Labels
Rotterdam
120
3401
48000
Imagery features:
PAN
RGBNIR
PS-RGB
PS-RGBNIR
SAR-Intensity
GSD (m)
0.5
2.0
0.5
0.5
0.5
Chip size (px)
900 x 900
450 x 450
900 x 900
900 x 900
900 x 900
Dataset format:
Imagery - GeoTIFFs from Worldview-2 (optical) and Capella Space (SAR)
PAN.tif (Panchromatic)
RGBNIR.tif (Multispectral)
PS-RGB (Pansharpened RGB)
PS-RGBNIR (Pansharpened RGBNIR)
SAR-Intensity (SAR Intensity)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
New in version 0.4.
- __init__(root='data', image='PS-RGB', transforms=None, download=False, api_key=None)[source]¶
Initialize a new SpaceNet 6 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“PAN”, “RGBNIR”, “PS-RGB”, “PS-RGBNIR”, “SAR-Intensity”]
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
- Raises:
RuntimeError – if
download=False
but dataset is missing
- class torchgeo.datasets.SpaceNet7(root='data', split='train', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
SpaceNet
SpaceNet 7: Multi-Temporal Urban Development Challenge.
SpaceNet 7 is a dataset which consist of medium resolution (4.0m) satellite imagery mosaics acquired from Planet Labs’ Dove constellation between 2017 and 2020. It includes ≈ 24 images (one per month) covering > 100 unique geographies, and comprises > 40,000 km2 of imagery and exhaustive polygon labels of building footprints therein, totaling over 11M individual annotations.
Dataset features:
No. of train samples: 1423
No. of test samples: 466
No. of building footprints: 11,080,000
Area Coverage: 41,000 sq km
Chip size: 1023 x 1023
GSD: ~4m
Dataset format:
Imagery - Planet Dove GeoTIFF
mosaic.tif
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 7 Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – split selection which must be in [“train”, “test”]
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
but dataset is missing
SSL4EO¶
- class torchgeo.datasets.SSL4EO[source]¶
Bases:
NonGeoDataset
Base class for all SSL4EO datasets.
Self-Supervised Learning for Earth Observation (SSL4EO) is a collection of large-scale multimodal multitemporal datasets for unsupervised/self-supervised pre-training in Earth observation.
New in version 0.5.
- class torchgeo.datasets.SSL4EOL(root='data', split='oli_sr', seasons=1, transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
SSL4EO-L dataset.
Landsat version of SSL4EO.
The dataset consists of a parallel corpus (same locations and dates for SR/TOA) for the following sensors:
Satellites
Sensors
Level
# Bands
Link
Landsat 4–5
TM
TOA
7
Landsat 7
ETM+
SR
6
Landsat 7
ETM+
TOA
9
Landsat 8–9
OLI+TIRS
TOA
11
Landsat 8–9
OLI
SR
7
Each patch has the following properties:
264 x 264 pixels
Resampled to 30 m resolution (7920 x 7920 m)
Single multispectral GeoTIFF file
Note
Each split is 300–400 GB and requires 3x that to concatenate and extract tarballs. Tarballs can be safely deleted after extraction to save space. The dataset takes about 1.5 hrs to download and checksum and another 3 hrs to extract.
If you use this dataset in your research, please cite the following paper:
New in version 0.5.
- __init__(root='data', split='oli_sr', seasons=1, transforms=None, download=False, checksum=False)[source]¶
Initialize a new SSL4EOL instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of [‘tm_toa’, ‘etm_toa’, ‘etm_sr’, ‘oli_tirs_toa’, ‘oli_sr’]
seasons (int) – number of seasonal patches to sample per location, 1–4
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)
- Raises:
AssertionError – if any arguments are invalid
RuntimeError – if
download=False
but dataset is missing or checksum fails
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
image sample
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
- class torchgeo.datasets.SSL4EOS12(root='data', split='s2c', seasons=1, transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
SSL4EO-S12 dataset.
Sentinel-1/2 version of SSL4EO.
The dataset consists of unlabeled patch triplets (Sentinel-1 dual-pol SAR, Sentinel-2 top-of-atmosphere multispectral, Sentinel-2 surface reflectance multispectral) from 251079 locations across the globe, each patch covering 2640mx2640m and including four seasonal time stamps.
If you use this dataset in your research, please cite the following paper:
Note
This dataset can be downloaded using:
$ export RSYNC_PASSWORD=m1660427.001 $ rsync -av rsync://m1660427.001@dataserv.ub.tum.de/m1660427.001/ .
The dataset is about 1.5 TB when compressed and 3.7 TB when uncompressed, and takes roughly 36 hrs to download, 1 hr to checksum, and 12 hrs to extract.
New in version 0.5.
- __init__(root='data', split='s2c', seasons=1, transforms=None, checksum=False)[source]¶
Initialize a new SSL4EOS12 instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “s1” (Sentinel-1 dual-pol SAR), “s2c” (Sentinel-2 Level-1C top-of-atmosphere reflectance), and “s2a” (Sentinel-2 Level-2a surface reflectance)
seasons (int) – number of seasonal patches to sample per location, 1–4
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if dataset is missing or checksum fails
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
image sample
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
SustainBench Crop Yield¶
- class torchgeo.datasets.SustainBenchCropYield(root='data', split='train', countries=['usa'], transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
SustainBench Crop Yield Dataset.
This dataset contains MODIS band histograms and soybean yield estimates for selected counties in the USA, Argentina and Brazil. The dataset is part of the SustainBench datasets for tackling the UN Sustainable Development Goals (SDGs).
Dataset Format:
.npz files of stacked samples
Dataset Features:
input histogram of 7 surface reflectance and 2 surface temperature bands from MODIS pixel values in 32 ranges across 32 timesteps resulting in 32x32x9 input images
regression target value of soybean yield in metric tonnes per harvested hectare
If you use this dataset in your research, please cite:
New in version 0.5.
- __init__(root='data', split='train', countries=['usa'], transforms=None, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “dev”, or “test”
countries (list[str]) – which countries to include in the dataset
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)
- Raises:
AssertionError – if
countries
contains invalid countries or ifsplit
is invalidRuntimeError – if
download=False
but dataset is missing or checksum fails
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
Tropical Cyclone¶
- class torchgeo.datasets.TropicalCyclone(root='data', split='train', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Tropical Cyclone Wind Estimation Competition dataset.
A collection of tropical storms in the Atlantic and East Pacific Oceans from 2000 to 2019 with corresponding maximum sustained surface wind speed. This dataset is split into training and test categories for the purpose of a competition.
See https://www.drivendata.org/competitions/72/predict-wind-speeds/ for more information about the competition.
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
Changed in version 0.4: Class name changed from TropicalCycloneWindEstimation to TropicalCyclone to be consistent with TropicalCycloneDataModule.
- __init__(root='data', split='train', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new Tropical Cyclone Wind Estimation Competition Dataset.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
but dataset is missing or checksum fails
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
UC Merced¶
- class torchgeo.datasets.UCMerced(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoClassificationDataset
UC Merced Land Use dataset.
The UC Merced Land Use dataset is a land use classification dataset of 2.1k 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection with 21 land use classes (100 images per class).
Dataset features:
land use class labels from around the U.S.
three spectral bands - RGB
21 classes
Dataset classes:
agricultural
airplane
baseballdiamond
beach
buildings
chaparral
denseresidential
forest
freeway
golfcourse
harbor
intersection
mediumresidential
mobilehomepark
overpass
parkinglot
river
runway
sparseresidential
storagetanks
tenniscourt
This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new UC Merced dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- plot(sample, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
NonGeoClassificationDataset.__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
USAVars¶
- class torchgeo.datasets.USAVars(root='data', split='train', labels=['treecover', 'elevation', 'population'], transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
USAVars dataset.
The USAVars dataset is reproduction of the dataset used in the paper “A generalizable and accessible approach to machine learning with global satellite imagery”. Specifically, this dataset includes 1 sq km. crops of NAIP imagery resampled to 4m/px cenetered on ~100k points that are sampled randomly from the contiguous states in the USA. Each point contains three continuous valued labels (taken from the dataset released in the paper): tree cover percentage, elevation, and population density.
Dataset format:
images are 4-channel GeoTIFFs
labels are singular float values
Dataset labels:
tree cover
elevation
population density
If you use this dataset in your research, please cite the following paper:
New in version 0.3.
- __init__(root='data', split='train', labels=['treecover', 'elevation', 'population'], transforms=None, download=False, checksum=False)[source]¶
Initialize a new USAVars dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – train/val/test split to load
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if invalid labels are provided
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_labels=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_labels (bool) – flag indicating whether to show labels above panel
suptitle (Optional[str]) – optional string to use as a suptitle
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
Vaihingen¶
- class torchgeo.datasets.Vaihingen2D(root='data', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Vaihingen 2D Semantic Segmentation dataset.
The Vaihingen dataset is a dataset for urban semantic segmentation used in the 2D Semantic Labeling Contest - Vaihingen. This dataset uses the “ISPRS_semantic_labeling_Vaihingen.zip” and “ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip” files to create the train/test sets used in the challenge. The dataset can be downloaded from here. Note, the server contains additional data for 3D Semantic Labeling which are currently not supported.
Dataset format:
images are 3-channel RGB geotiffs
masks are 3-channel geotiffs with unique RGB values representing the class
Dataset classes:
Clutter/background
Impervious surfaces
Building
Low Vegetation
Tree
Car
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new Vaihingen2D dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
VHR-10¶
- class torchgeo.datasets.VHR10(root='data', split='positive', transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
NWPU VHR-10 dataset.
Northwestern Polytechnical University (NWPU) very-high-resolution ten-class (VHR-10) remote sensing image dataset.
Consists of 800 VHR optical remote sensing images, where 715 color images were acquired from Google Earth with the spatial resolution ranging from 0.5 to 2 m, and 85 pansharpened color infrared (CIR) images were acquired from Vaihingen data with a spatial resolution of 0.08 m.
The data set is divided into two sets:
Positive image set (650 images) which contains at least one target in an image
Negative image set (150 images) does not contain any targets
The positive image set consists of objects from ten classes:
Airplanes (757)
Ships (302)
Storage tanks (655)
Baseball diamonds (390)
Tennis courts (524)
Basketball courts (159)
Ground track fields (163)
Harbors (224)
Bridges (124)
Vehicles (477)
Includes object detection bounding boxes from original paper and instance segmentation masks from follow-up publications. If you use this dataset in your research, please cite the following papers:
Note
This dataset requires the following additional libraries to be installed:
pycocotools to load the
annotations.json
file for the “positive” image setrarfile to extract the dataset, which is stored in a RAR file
- __init__(root='data', split='positive', transforms=None, download=False, checksum=False)[source]¶
Initialize a new VHR-10 dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “postive” or “negative”
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
split
argument is invalidImportError – if
split="positive"
and pycocotools is not installedRuntimeError – if
download=False
and data is not found, or checksums don’t match
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, show_feats='both', box_alpha=0.7, mask_alpha=0.7)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
suptitle (Optional[str]) – optional string to use as a suptitle
show_titles (bool) – flag indicating whether to show titles above each panel
show_feats (Optional[str]) – optional string to pick features to be shown: boxes, masks, both
box_alpha (float) – alpha value of box
mask_alpha (float) – alpha value of mask
- Returns:
a matplotlib Figure with the rendered sample
- Raises:
AssertionError – if
show_feats
argument is invalidImportError – if plotting masks and scikit-image is not installed
- Return type:
New in version 0.4.
Western USA Live Fuel Moisture¶
- class torchgeo.datasets.WesternUSALiveFuelMoisture(root='data', input_features=['slope(t)', 'elevation(t)', 'canopy_height(t)', 'forest_cover(t)', 'silt(t)', 'sand(t)', 'clay(t)', 'vv(t)', 'vh(t)', 'red(t)', 'green(t)', 'blue(t)', 'swir(t)', 'nir(t)', 'ndvi(t)', 'ndwi(t)', 'nirv(t)', 'vv_red(t)', 'vv_green(t)', 'vv_blue(t)', 'vv_swir(t)', 'vv_nir(t)', 'vv_ndvi(t)', 'vv_ndwi(t)', 'vv_nirv(t)', 'vh_red(t)', 'vh_green(t)', 'vh_blue(t)', 'vh_swir(t)', 'vh_nir(t)', 'vh_ndvi(t)', 'vh_ndwi(t)', 'vh_nirv(t)', 'vh_vv(t)', 'slope(t-1)', 'elevation(t-1)', 'canopy_height(t-1)', 'forest_cover(t-1)', 'silt(t-1)', 'sand(t-1)', 'clay(t-1)', 'vv(t-1)', 'vh(t-1)', 'red(t-1)', 'green(t-1)', 'blue(t-1)', 'swir(t-1)', 'nir(t-1)', 'ndvi(t-1)', 'ndwi(t-1)', 'nirv(t-1)', 'vv_red(t-1)', 'vv_green(t-1)', 'vv_blue(t-1)', 'vv_swir(t-1)', 'vv_nir(t-1)', 'vv_ndvi(t-1)', 'vv_ndwi(t-1)', 'vv_nirv(t-1)', 'vh_red(t-1)', 'vh_green(t-1)', 'vh_blue(t-1)', 'vh_swir(t-1)', 'vh_nir(t-1)', 'vh_ndvi(t-1)', 'vh_ndwi(t-1)', 'vh_nirv(t-1)', 'vh_vv(t-1)', 'slope(t-2)', 'elevation(t-2)', 'canopy_height(t-2)', 'forest_cover(t-2)', 'silt(t-2)', 'sand(t-2)', 'clay(t-2)', 'vv(t-2)', 'vh(t-2)', 'red(t-2)', 'green(t-2)', 'blue(t-2)', 'swir(t-2)', 'nir(t-2)', 'ndvi(t-2)', 'ndwi(t-2)', 'nirv(t-2)', 'vv_red(t-2)', 'vv_green(t-2)', 'vv_blue(t-2)', 'vv_swir(t-2)', 'vv_nir(t-2)', 'vv_ndvi(t-2)', 'vv_ndwi(t-2)', 'vv_nirv(t-2)', 'vh_red(t-2)', 'vh_green(t-2)', 'vh_blue(t-2)', 'vh_swir(t-2)', 'vh_nir(t-2)', 'vh_ndvi(t-2)', 'vh_ndwi(t-2)', 'vh_nirv(t-2)', 'vh_vv(t-2)', 'slope(t-3)', 'elevation(t-3)', 'canopy_height(t-3)', 'forest_cover(t-3)', 'silt(t-3)', 'sand(t-3)', 'clay(t-3)', 'vv(t-3)', 'vh(t-3)', 'red(t-3)', 'green(t-3)', 'blue(t-3)', 'swir(t-3)', 'nir(t-3)', 'ndvi(t-3)', 'ndwi(t-3)', 'nirv(t-3)', 'vv_red(t-3)', 'vv_green(t-3)', 'vv_blue(t-3)', 'vv_swir(t-3)', 'vv_nir(t-3)', 'vv_ndvi(t-3)', 'vv_ndwi(t-3)', 'vv_nirv(t-3)', 'vh_red(t-3)', 'vh_green(t-3)', 'vh_blue(t-3)', 'vh_swir(t-3)', 'vh_nir(t-3)', 'vh_ndvi(t-3)', 'vh_ndwi(t-3)', 'vh_nirv(t-3)', 'vh_vv(t-3)', 'lat', 'lon'], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Bases:
NonGeoDataset
Western USA Live Fuel Moisture Dataset.
This tabular style dataset contains fuel moisture (mass of water in vegetation) and remotely sensed variables in the western United States. It contains 2615 datapoints and 138 variables. For more details see the dataset page.
Dataset Format:
.geojson file for each datapoint
Dataset Features:
138 remote sensing derived variables, some with a time dependency
2615 datapoints with regression target of predicting fuel moisture
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
New in version 0.5.
- __init__(root='data', input_features=['slope(t)', 'elevation(t)', 'canopy_height(t)', 'forest_cover(t)', 'silt(t)', 'sand(t)', 'clay(t)', 'vv(t)', 'vh(t)', 'red(t)', 'green(t)', 'blue(t)', 'swir(t)', 'nir(t)', 'ndvi(t)', 'ndwi(t)', 'nirv(t)', 'vv_red(t)', 'vv_green(t)', 'vv_blue(t)', 'vv_swir(t)', 'vv_nir(t)', 'vv_ndvi(t)', 'vv_ndwi(t)', 'vv_nirv(t)', 'vh_red(t)', 'vh_green(t)', 'vh_blue(t)', 'vh_swir(t)', 'vh_nir(t)', 'vh_ndvi(t)', 'vh_ndwi(t)', 'vh_nirv(t)', 'vh_vv(t)', 'slope(t-1)', 'elevation(t-1)', 'canopy_height(t-1)', 'forest_cover(t-1)', 'silt(t-1)', 'sand(t-1)', 'clay(t-1)', 'vv(t-1)', 'vh(t-1)', 'red(t-1)', 'green(t-1)', 'blue(t-1)', 'swir(t-1)', 'nir(t-1)', 'ndvi(t-1)', 'ndwi(t-1)', 'nirv(t-1)', 'vv_red(t-1)', 'vv_green(t-1)', 'vv_blue(t-1)', 'vv_swir(t-1)', 'vv_nir(t-1)', 'vv_ndvi(t-1)', 'vv_ndwi(t-1)', 'vv_nirv(t-1)', 'vh_red(t-1)', 'vh_green(t-1)', 'vh_blue(t-1)', 'vh_swir(t-1)', 'vh_nir(t-1)', 'vh_ndvi(t-1)', 'vh_ndwi(t-1)', 'vh_nirv(t-1)', 'vh_vv(t-1)', 'slope(t-2)', 'elevation(t-2)', 'canopy_height(t-2)', 'forest_cover(t-2)', 'silt(t-2)', 'sand(t-2)', 'clay(t-2)', 'vv(t-2)', 'vh(t-2)', 'red(t-2)', 'green(t-2)', 'blue(t-2)', 'swir(t-2)', 'nir(t-2)', 'ndvi(t-2)', 'ndwi(t-2)', 'nirv(t-2)', 'vv_red(t-2)', 'vv_green(t-2)', 'vv_blue(t-2)', 'vv_swir(t-2)', 'vv_nir(t-2)', 'vv_ndvi(t-2)', 'vv_ndwi(t-2)', 'vv_nirv(t-2)', 'vh_red(t-2)', 'vh_green(t-2)', 'vh_blue(t-2)', 'vh_swir(t-2)', 'vh_nir(t-2)', 'vh_ndvi(t-2)', 'vh_ndwi(t-2)', 'vh_nirv(t-2)', 'vh_vv(t-2)', 'slope(t-3)', 'elevation(t-3)', 'canopy_height(t-3)', 'forest_cover(t-3)', 'silt(t-3)', 'sand(t-3)', 'clay(t-3)', 'vv(t-3)', 'vh(t-3)', 'red(t-3)', 'green(t-3)', 'blue(t-3)', 'swir(t-3)', 'nir(t-3)', 'ndvi(t-3)', 'ndwi(t-3)', 'nirv(t-3)', 'vv_red(t-3)', 'vv_green(t-3)', 'vv_blue(t-3)', 'vv_swir(t-3)', 'vv_nir(t-3)', 'vv_ndvi(t-3)', 'vv_ndwi(t-3)', 'vv_nirv(t-3)', 'vh_red(t-3)', 'vh_green(t-3)', 'vh_blue(t-3)', 'vh_swir(t-3)', 'vh_nir(t-3)', 'vh_ndvi(t-3)', 'vh_ndwi(t-3)', 'vh_nirv(t-3)', 'vh_vv(t-3)', 'lat', 'lon'], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new Western USA Live Fuel Moisture Dataset.
- Parameters:
root (str) – root directory where dataset can be found
input_features (list[str]) – which input features to include
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
AssertionError – if
input_features
contains invalid variable namesRuntimeError – if
download=False
but dataset is missing or checksum fails
xView2¶
- class torchgeo.datasets.XView2(root='data', split='train', transforms=None, checksum=False)[source]¶
Bases:
NonGeoDataset
xView2 dataset.
The xView2 dataset is a dataset for building disaster change detection. This dataset object uses the “Challenge training set (~7.8 GB)” and “Challenge test set (~2.6 GB)” data from the xView2 website as the train and test splits. Note, the xView2 website contains other data under the xView2 umbrella that are _not_ included here. E.g. the “Tier3 training data”, the “Challenge holdout set”, and the “full data”.
Dataset format:
images are three-channel pngs
masks are single-channel pngs where the pixel values represent the class
Dataset classes:
background
no damage
minor damage
major damage
destroyed
If you use this dataset in your research, please cite the following paper:
New in version 0.2.
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new xView2 dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and label at that index
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, show_titles=True, suptitle=None, alpha=0.5)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional string to use as a suptitle
alpha (float) – opacity with which to render predictions on top of the imagery
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
ZueriCrop¶
- class torchgeo.datasets.ZueriCrop(root='data', bands=('NIR', 'B03', 'B02', 'B04', 'B05', 'B06', 'B07', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶
Bases:
NonGeoDataset
ZueriCrop dataset.
The ZueriCrop dataset is a dataset for time-series instance segmentation of crops.
Dataset features:
Sentinel-2 multispectral imagery
instance masks of 48 crop categories
nine multispectral bands
116k images with 10 m per pixel resolution (24x24 px)
~28k time-series containing 142 images each
Dataset format:
single hdf5 dataset containing images, semantic masks, and instance masks
data is parsed into images and instance masks, boxes, and labels
one mask per time-series
Dataset classes:
48 fine-grained hierarchical crop categories
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
h5py to load the dataset
- __init__(root='data', bands=('NIR', 'B03', 'B02', 'B04', 'B05', 'B06', 'B07', 'B11', 'B12'), transforms=None, download=False, checksum=False)[source]¶
Initialize a new ZueriCrop dataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises:
RuntimeError – if
download=False
and data is not found, or checksums don’t match
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
sample containing image, mask, bounding boxes, and target label
- Return type:
- __len__()[source]¶
Return the number of data points in the dataset.
- Returns:
length of the dataset
- Return type:
- plot(sample, time_step=0, show_titles=True, suptitle=None)[source]¶
Plot a sample from the dataset.
- Parameters:
sample (dict[str, torch.Tensor]) – a sample returned by
__getitem__()
time_step (int) – time step at which to access image, beginning with 0
show_titles (bool) – flag indicating whether to show titles above each panel
suptitle (Optional[str]) – optional suptitle to use for figure
- Returns:
a matplotlib Figure with the rendered sample
- Return type:
New in version 0.2.
Base Classes¶
If you want to write your own custom dataset, you can extend one of these abstract base classes.
GeoDataset¶
- class torchgeo.datasets.GeoDataset(transforms=None)[source]¶
Bases:
Dataset
[dict
[str
,Any
]],ABC
Abstract base class for datasets containing geospatial information.
Geospatial information includes things like:
coordinates (latitude, longitude)
resolution
GeoDataset
is a special class of datasets. UnlikeNonGeoDataset
, the presence of geospatial information allows two or more datasets to be combined based on latitude/longitude. This allows users to do things like:Combine image and target labels and sample from both simultaneously (e.g. Landsat and CDL)
Combine datasets for multiple image sources for multimodal learning or data fusion (e.g. Landsat and Sentinel)
These combinations require that all queries are present in both datasets, and can be combined using an
IntersectionDataset
:dataset = landsat & cdl
Users may also want to:
Combine datasets for multiple image sources and treat them as equivalent (e.g. Landsat 7 and Landsat 8)
Combine datasets for disparate geospatial locations (e.g. Chesapeake NY and PA)
These combinations require that all queries are present in at least one dataset, and can be combined using a
UnionDataset
:dataset = landsat7 | landsat8
- filename_glob = '*'¶
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- __add__ = None¶
GeoDataset
addition can be ambiguous and is no longer supported. Users should instead use the intersection or union operator.
- abstract __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image/mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
- __and__(other)[source]¶
Take the intersection of two
GeoDataset
.- Parameters:
other (GeoDataset) – another dataset
- Returns:
a single dataset
- Raises:
ValueError – if other is not a
GeoDataset
- Return type:
New in version 0.2.
- __or__(other)[source]¶
Take the union of two GeoDatasets.
- Parameters:
other (GeoDataset) – another dataset
- Returns:
a single dataset
- Raises:
ValueError – if other is not a
GeoDataset
- Return type:
New in version 0.2.
- __len__()[source]¶
Return the number of files in the dataset.
- Returns:
length of the dataset
- Return type:
- __str__()[source]¶
Return the informal string representation of the object.
- Returns:
informal string representation
- Return type:
- property bounds: BoundingBox¶
Bounds of the index.
- Returns:
(minx, maxx, miny, maxy, mint, maxt) of the dataset
- property crs: CRS¶
coordinate reference system (CRS) of the dataset.
- Returns:
RasterDataset¶
- class torchgeo.datasets.RasterDataset(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Bases:
GeoDataset
Abstract base class for
GeoDataset
stored as raster files.- filename_regex = '.*'¶
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date
: used to calculatemint
andmaxt
forindex
insertion
When
separate_files
is True, the following additional groups are searched for to find other files:band
: replaced with requested band name
- date_format = '%Y%m%d'¶
Date format string used to parse date from filename.
Not used if
filename_regex
does not contain adate
group.
- is_image = True¶
True if dataset contains imagery, False if dataset contains mask
- separate_files = False¶
True if data is stored in a separate file for each band, else False.
- property dtype: dtype¶
The dtype of the dataset (overrides the dtype of the data file via a cast).
- Returns:
the dtype of the dataset
New in version 5.0.
- __init__(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Optional[Sequence[str]]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises:
FileNotFoundError – if no files are found in
paths
Changed in version 0.5: root was renamed to paths.
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image/mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
VectorDataset¶
- class torchgeo.datasets.VectorDataset(paths='data', crs=None, res=0.0001, transforms=None, label_name=None)[source]¶
Bases:
GeoDataset
Abstract base class for
GeoDataset
stored as vector files.- __init__(paths='data', crs=None, res=0.0001, transforms=None, label_name=None)[source]¶
Initialize a new Dataset instance.
- Parameters:
paths (Union[str, Iterable[str]]) – one or more root directories to search or files to load
crs (Optional[CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
label_name (Optional[str]) – name of the dataset property that has the label to be rasterized into the mask
- Raises:
FileNotFoundError – if no files are found in
root
New in version 0.4: The label_name parameter.
Changed in version 0.5: root was renamed to paths.
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of image/mask and metadata at that index
- Raises:
IndexError – if query is not found in the index
- Return type:
NonGeoDataset¶
- class torchgeo.datasets.NonGeoDataset[source]¶
Bases:
Dataset
[dict
[str
,Any
]],ABC
Abstract base class for datasets lacking geospatial information.
This base class is designed for datasets with pre-defined image chips.
- abstract __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters:
index (int) – index to return
- Returns:
data and labels at that index
- Raises:
IndexError – if index is out of range of the dataset
- Return type:
NonGeoClassificationDataset¶
- class torchgeo.datasets.NonGeoClassificationDataset(root='data', transforms=None, loader=<function default_loader>, is_valid_file=None)[source]¶
Bases:
NonGeoDataset
,ImageFolder
Abstract base class for classification datasets lacking geospatial information.
This base class is designed for datasets with pre-defined image chips which are separated into separate folders per class.
- __init__(root='data', transforms=None, loader=<function default_loader>, is_valid_file=None)[source]¶
Initialize a new NonGeoClassificationDataset instance.
- Parameters:
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[dict[str, torch.Tensor]], dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
loader (Optional[Callable[[str], Any]]) – a callable function which takes as input a path to an image and returns a PIL Image or numpy array
is_valid_file (Optional[Callable[[str], bool]]) – A function that takes the path of an Image file and checks if the file is a valid file
IntersectionDataset¶
- class torchgeo.datasets.IntersectionDataset(dataset1, dataset2, collate_fn=<function concat_samples>, transforms=None)[source]¶
Bases:
GeoDataset
Dataset representing the intersection of two GeoDatasets.
This allows users to do things like:
Combine image and target labels and sample from both simultaneously (e.g. Landsat and CDL)
Combine datasets for multiple image sources for multimodal learning or data fusion (e.g. Landsat and Sentinel)
These combinations require that all queries are present in both datasets, and can be combined using an
IntersectionDataset
:dataset = landsat & cdl
New in version 0.2.
- __init__(dataset1, dataset2, collate_fn=<function concat_samples>, transforms=None)[source]¶
Initialize a new Dataset instance.
- Parameters:
dataset1 (GeoDataset) – the first dataset
dataset2 (GeoDataset) – the second dataset
collate_fn (Callable[[Sequence[dict[str, Any]]], dict[str, Any]]) – function used to collate samples
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
- Raises:
RuntimeError – if datasets have no spatiotemporal intersection
ValueError – if either dataset is not a
GeoDataset
New in version 0.4: The transforms parameter.
- __getitem__(query)[source]¶
Retrieve image and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of data/labels and metadata at that index
- Raises:
IndexError – if query is not within bounds of the index
- Return type:
- __str__()[source]¶
Return the informal string representation of the object.
- Returns:
informal string representation
- Return type:
- property crs: CRS¶
coordinate reference system (CRS) of both datasets.
- Returns:
UnionDataset¶
- class torchgeo.datasets.UnionDataset(dataset1, dataset2, collate_fn=<function merge_samples>, transforms=None)[source]¶
Bases:
GeoDataset
Dataset representing the union of two GeoDatasets.
This allows users to do things like:
Combine datasets for multiple image sources and treat them as equivalent (e.g. Landsat 7 and Landsat 8)
Combine datasets for disparate geospatial locations (e.g. Chesapeake NY and PA)
These combinations require that all queries are present in at least one dataset, and can be combined using a
UnionDataset
:dataset = landsat7 | landsat8
New in version 0.2.
- __init__(dataset1, dataset2, collate_fn=<function merge_samples>, transforms=None)[source]¶
Initialize a new Dataset instance.
- Parameters:
dataset1 (GeoDataset) – the first dataset
dataset2 (GeoDataset) – the second dataset
collate_fn (Callable[[Sequence[dict[str, Any]]], dict[str, Any]]) – function used to collate samples
transforms (Optional[Callable[[dict[str, Any]], dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
- Raises:
ValueError – if either dataset is not a
GeoDataset
New in version 0.4: The transforms parameter.
- __getitem__(query)[source]¶
Retrieve image and metadata indexed by query.
- Parameters:
query (BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns:
sample of data/labels and metadata at that index
- Raises:
IndexError – if query is not within bounds of the index
- Return type:
- __str__()[source]¶
Return the informal string representation of the object.
- Returns:
informal string representation
- Return type:
- property crs: CRS¶
coordinate reference system (CRS) of both datasets.
- Returns:
Utilities¶
- class torchgeo.datasets.BoundingBox(minx, maxx, miny, maxy, mint, maxt)[source]¶
Bases:
object
Data class for indexing spatiotemporal data.
- __post_init__()[source]¶
Validate the arguments passed to
__init__()
.- Raises:
ValueError – if bounding box is invalid (minx > maxx, miny > maxy, or mint > maxt)
New in version 0.2.
- __getitem__(key: int) float [source]¶
- __getitem__(key: slice) list[float]
Index the (minx, maxx, miny, maxy, mint, maxt) tuple.
- Parameters:
key – integer or slice object
- Returns:
the value(s) at that index
- Raises:
IndexError – if key is out of bounds
- __contains__(other)[source]¶
Whether or not other is within the bounds of this bounding box.
- Parameters:
other (BoundingBox) – another bounding box
- Returns:
True if other is within this bounding box, else False
- Return type:
New in version 0.2.
- __or__(other)[source]¶
The union operator.
- Parameters:
other (BoundingBox) – another bounding box
- Returns:
the minimum bounding box that contains both self and other
- Return type:
New in version 0.2.
- __and__(other)[source]¶
The intersection operator.
- Parameters:
other (BoundingBox) – another bounding box
- Returns:
the intersection of self and other
- Raises:
ValueError – if self and other do not intersect
- Return type:
New in version 0.2.
- property area: float¶
Area of bounding box.
Area is defined as spatial area.
- Returns:
area
New in version 0.3.
- property volume: float¶
Volume of bounding box.
Volume is defined as spatial area times temporal range.
- Returns:
volume
New in version 0.3.
- intersects(other)[source]¶
Whether or not two bounding boxes intersect.
- Parameters:
other (BoundingBox) – another bounding box
- Returns:
True if bounding boxes intersect, else False
- Return type:
- split(proportion, horizontal=True)[source]¶
Split BoundingBox in two.
- Parameters:
- Returns:
A tuple with the resulting BoundingBoxes
- Return type:
tuple[torchgeo.datasets.utils.BoundingBox, torchgeo.datasets.utils.BoundingBox]
New in version 0.5.
- __delattr__(name)¶
Implement delattr(self, name).
- __eq__(other)¶
Return self==value.
- __hash__()¶
Return hash(self).
- __init__(minx, maxx, miny, maxy, mint, maxt)¶
- __repr__()¶
Return repr(self).
- __setattr__(name, value)¶
Implement setattr(self, name, value).
- __weakref__¶
list of weak references to the object (if defined)
Collation Functions¶
- torchgeo.datasets.stack_samples(samples)[source]¶
Stack a list of samples along a new axis.
Useful for forming a mini-batch of samples to pass to
torch.utils.data.DataLoader
.- Parameters:
- Returns:
a single sample
- Return type:
New in version 0.2.
- torchgeo.datasets.concat_samples(samples)[source]¶
Concatenate a list of samples along an existing axis.
Useful for joining samples in a
torchgeo.datasets.IntersectionDataset
.- Parameters:
- Returns:
a single sample
- Return type:
New in version 0.2.
- torchgeo.datasets.merge_samples(samples)[source]¶
Merge a list of samples.
Useful for joining samples in a
torchgeo.datasets.UnionDataset
.- Parameters:
- Returns:
a single sample
- Return type:
New in version 0.2.
- torchgeo.datasets.unbind_samples(sample)[source]¶
Reverse of
stack_samples()
.Useful for turning a mini-batch of samples into a list of samples. These individual samples can then be plotted using a dataset’s
plot
method.- Parameters:
sample (dict[Any, collections.abc.Sequence[Any]]) – a mini-batch of samples
- Returns:
list of samples
- Return type:
New in version 0.2.
Splitting Functions¶
- torchgeo.datasets.random_bbox_assignment(dataset, lengths, generator=<torch._C.Generator object>)[source]¶
Split a GeoDataset randomly assigning its index’s BoundingBoxes.
This function will go through each BoundingBox in the GeoDataset’s index and randomly assign it to new GeoDatasets.
- Parameters:
dataset (GeoDataset) – dataset to be split
lengths (Sequence[float]) – lengths or fractions of splits to be produced
generator (Optional[Generator]) – (optional) generator used for the random permutation
- Returns
A list of the subset datasets.
New in version 0.5.
- torchgeo.datasets.random_bbox_splitting(dataset, fractions, generator=<torch._C.Generator object>)[source]¶
Split a GeoDataset randomly splitting its index’s BoundingBoxes.
This function will go through each BoundingBox in the GeoDataset’s index, split it in a random direction and assign the resulting BoundingBoxes to new GeoDatasets.
- Parameters:
dataset (GeoDataset) – dataset to be split
fractions (Sequence[float]) – fractions of splits to be produced
generator (Optional[Generator]) – generator used for the random permutation
- Returns
A list of the subset datasets.
New in version 0.5.
- torchgeo.datasets.random_grid_cell_assignment(dataset, fractions, grid_size=6, generator=<torch._C.Generator object>)[source]¶
Overlays a grid over a GeoDataset and randomly assigns cells to new GeoDatasets.
This function will go through each BoundingBox in the GeoDataset’s index, overlay a grid over it, and randomly assign each cell to new GeoDatasets.
- Parameters:
- Returns
A list of the subset datasets.
New in version 0.5.
- torchgeo.datasets.roi_split(dataset, rois)[source]¶
Split a GeoDataset intersecting it with a ROI for each desired new GeoDataset.
- Parameters:
dataset (GeoDataset) – dataset to be split
rois (Sequence[BoundingBox]) – regions of interest of splits to be produced
- Returns
A list of the subset datasets.
New in version 0.5.