torchgeo.datasets¶
In torchgeo
, we define two types of datasets: Geospatial Datasets and Non-geospatial Datasets. These abstract base classes are documented in more detail in Base Classes.
Geospatial Datasets¶
GeoDataset
is designed for datasets that contain geospatial information, like latitude, longitude, coordinate system, and projection. Datasets containing this kind of information can be combined using ZipDataset
.
Canadian Building Footprints¶
- class torchgeo.datasets.CanadianBuildingFootprints(root='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VectorDataset
Canadian Building Footprints dataset.
The Canadian Building Footprints dataset contains 11,842,186 computer generated building footprints in all Canadian provinces and territories in GeoJSON format. This data is freely available for download and use.
- __init__(root='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
and data is not found, orchecksum=True
and checksums don’t match
- Return type
Chesapeake Bay High-Resolution Land Cover Project¶
- class torchgeo.datasets.Chesapeake(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.RasterDataset
,abc.ABC
Abstract base class for all Chesapeake datasets.
Chesapeake Bay High-Resolution Land Cover Project dataset.
This dataset was collected by the Chesapeake Conservancy’s Conservation Innovation Center (CIC) in partnership with the University of Vermont and WorldView Solutions, Inc. It consists of one-meter resolution land cover information for the Chesapeake Bay watershed (~100,000 square miles of land).
For more information, see:
- __init__(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
but dataset is missing or checksum fails
- Return type
- class torchgeo.datasets.Chesapeake7(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
Complete 7-class dataset.
This version of the dataset is composed of 7 classes:
No Data: Background values
Water: All areas of open water including ponds, rivers, and lakes
Tree Canopy and Shrubs: All woody vegetation including trees and shrubs
Low Vegetation: Plant material less than 2 meters in height including lawns
Barren: Areas devoid of vegetation consisting of natural earthen material
Impervious Surfaces: Human-constructed surfaces less than 2 meters in height
Impervious Roads: Impervious surfaces that are used for transportation
Aberdeen Proving Ground: U.S. Army facility with no labels
- class torchgeo.datasets.Chesapeake13(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
Complete 13-class dataset.
This version of the dataset is composed of 13 classes:
No Data: Background values
Water: All areas of open water including ponds, rivers, and lakes
Wetlands: Low vegetation areas located along marine or estuarine regions
Tree Canopy: Deciduous and evergreen woody vegetation over 3-5 meters in height
Shrubland: Heterogeneous woody vegetation including shrubs and young trees
Low Vegetation: Plant material less than 2 meters in height including lawns
Barren: Areas devoid of vegetation consisting of natural earthen material
Structures: Human-constructed objects made of impervious materials
Impervious Surfaces: Human-constructed surfaces less than 2 meters in height
Impervious Roads: Impervious surfaces that are used for transportation
Tree Canopy over Structures: Tree cover overlapping impervious structures
Tree Canopy over Impervious Surfaces: Tree cover overlapping impervious surfaces
Tree Canopy over Impervious Roads: Tree cover overlapping impervious roads
Aberdeen Proving Ground: U.S. Army facility with no labels
- class torchgeo.datasets.ChesapeakeDC(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for Washington, D.C.
- class torchgeo.datasets.ChesapeakeDE(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for Delaware.
- class torchgeo.datasets.ChesapeakeMD(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for Maryland.
- class torchgeo.datasets.ChesapeakeNY(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for New York.
- class torchgeo.datasets.ChesapeakePA(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for Pennsylvania.
- class torchgeo.datasets.ChesapeakeVA(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for Virginia.
- class torchgeo.datasets.ChesapeakeWV(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.Chesapeake
This subset of the dataset contains data only for West Virginia.
- class torchgeo.datasets.ChesapeakeCVPR(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.GeoDataset
CVPR 2019 Chesapeake Land Cover dataset.
The CVPR 2019 Chesapeake Land Cover dataset contains two layers of NAIP aerial imagery, Landsat 8 leaf-on and leaf-off imagery, Chesapeake Bay land cover labels, NLCD land cover labels, and Microsoft building footprint labels.
This dataset was organized to accompany the 2019 CVPR paper, “Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data”.
If you use this dataset in your research, please cite the following paper:
- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters
query (torchgeo.datasets.BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns
sample of image/mask and metadata at that index
- Raises
IndexError – if query is not found in the index
- Return type
Dict[str, Any]
- __init__(root='data', splits=['de-train'], layers=['naip-new', 'lc'], transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”
layers (List[str]) – a list containing a subset of “naip-new”, “naip-old”, “lc”, “nlcd”, “landsat-leaf-on”, “landsat-leaf-off”, “buildings” indicating which layers to load
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
but dataset is missing or checksum fails
- Return type
- class torchgeo.datasets.ChesapeakeCVPRDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the Chesapeake CVPR Land Cover dataset.
Uses the random splits defined per state to partition tiles into train, val, and test sets.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, train_splits, val_splits, test_splits, patches_per_tile=200, patch_size=256, batch_size=64, num_workers=0, class_set=7, **kwargs)[source]¶
Initialize a LightningDataModule for Chesapeake CVPR based DataLoaders.
- Parameters
root_dir (str) – The
root
arugment to pass to the ChesapeakeCVPR Dataset classestrain_splits (List[str]) – The splits used to train the model, e.g. [“ny-train”]
val_splits (List[str]) – The splits used to validate the model, e.g. [“ny-val”]
test_splits (List[str]) – The splits used to test the model, e.g. [“ny-test”]
patches_per_tile (int) – The number of patches per tile to sample
patch_size (int) – The size of each patch in pixels (test patches will be 1.5 times this size)
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
class_set (int) – The high-resolution land cover class set to use - 5 or 7
kwargs (Any) –
- Return type
- center_crop(size=512)[source]¶
Returns a function to perform a center crop transform on a single sample.
- Parameters
size (int) – output image size
- Returns
function to perform center crop
- Return type
Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]
- nodata_check(size=512)[source]¶
Returns a function to check for nodata or mis-sized input.
- Parameters
size (int) – output image size
- Returns
function to check for nodata values
- Return type
Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]
- pad_to(size=512, image_value=0, mask_value=0)[source]¶
Returns a function to perform a padding transform on a single sample.
- Parameters
- Returns
function to perform padding
- Return type
Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]
- prepare_data()[source]¶
Confirms that the dataset is downloaded on the local node.
This method is called once per node, while
setup()
is called once per GPU.- Return type
- setup(stage=None)[source]¶
Create the train/val/test splits based on the original Dataset objects.
The splits should be done here vs. in
__init__()
per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
Cropland Data Layer (CDL)¶
- class torchgeo.datasets.CDL(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)¶
Bases:
torchgeo.datasets.RasterDataset
Cropland Data Layer (CDL) dataset.
The Cropland Data Layer, hosted on CropScape, provides a raster, geo-referenced, crop-specific land cover map for the continental United States. The CDL also includes a crop mask layer and planting frequency layers, as well as boundary, water and road layers. The Boundary Layer options provided are County, Agricultural Statistics Districts (ASD), State, and Region. The data is created annually using moderate resolution satellite imagery and extensive agricultural ground truth.
If you use this dataset in your research, please cite it using the following format:
- __init__(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)
- Raises
FileNotFoundError – if no files are found in
root
RuntimeError – if
download=False
but dataset is missing or checksum fails
- Return type
Landsat¶
- class torchgeo.datasets.Landsat(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.RasterDataset
,abc.ABC
Abstract base class for all Landsat datasets.
Landsat is a joint NASA/USGS program, providing the longest continuous space-based record of Earth’s land in existence.
If you use this dataset in your research, please cite it using the following format:
- __init__(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises
FileNotFoundError – if no files are found in
root
- Return type
- class torchgeo.datasets.Landsat9(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat8
Landsat 9 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS).
- class torchgeo.datasets.Landsat8(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat
Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS).
- class torchgeo.datasets.Landsat7(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat
Landsat 7 Enhanced Thematic Mapper Plus (ETM+).
- class torchgeo.datasets.Landsat5TM(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat4TM
Landsat 5 Thematic Mapper (TM).
- class torchgeo.datasets.Landsat5MSS(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat4MSS
Landsat 4 Multispectral Scanner (MSS).
- class torchgeo.datasets.Landsat4TM(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat
Landsat 4 Thematic Mapper (TM).
- class torchgeo.datasets.Landsat4MSS(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat
Landsat 4 Multispectral Scanner (MSS).
- class torchgeo.datasets.Landsat3(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat1
Landsat 3 Multispectral Scanner (MSS).
- class torchgeo.datasets.Landsat2(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat1
Landsat 2 Multispectral Scanner (MSS).
- class torchgeo.datasets.Landsat1(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Landsat
Landsat 1 Multispectral Scanner (MSS).
National Agriculture Imagery Program (NAIP)¶
- class torchgeo.datasets.NAIP(root, crs=None, res=None, transforms=None, cache=True)¶
Bases:
torchgeo.datasets.RasterDataset
National Agriculture Imagery Program (NAIP) dataset.
The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. A primary goal of the NAIP program is to make digital ortho photography available to governmental agencies and the public within a year of acquisition.
NAIP is administered by the USDA’s Farm Service Agency (FSA) through the Aerial Photography Field Office in Salt Lake City. This “leaf-on” imagery is used as a base layer for GIS programs in FSA’s County Service Centers, and is used to maintain the Common Land Unit (CLU) boundaries.
If you use this dataset in your research, please cite it using the following format:
- class torchgeo.datasets.NAIPChesapeakeDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the NAIP and Chesapeake datasets.
Uses the train/val/test splits from the dataset.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(naip_root_dir, chesapeake_root_dir, batch_size=64, num_workers=0, patch_size=256, **kwargs)[source]¶
Initialize a LightningDataModule for NAIP and Chesapeake based DataLoaders.
- Parameters
naip_root_dir (str) – directory containing NAIP data
chesapeake_root_dir (str) – directory containing Chesapeake data
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
patch_size (int) – size of patches to sample
kwargs (Any) –
- Return type
- prepare_data()[source]¶
Make sure that the dataset is downloaded.
This method is only called once per run.
- Return type
- setup(stage=None)[source]¶
Initialize the main
Dataset
objects.This method is called once per GPU per run.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
Sentinel¶
- class torchgeo.datasets.Sentinel(root, crs=None, res=None, transforms=None, cache=True)¶
Bases:
torchgeo.datasets.RasterDataset
Abstract base class for all Sentinel datasets.
Sentinel is a family of satellites launched by the European Space Agency (ESA) under the Copernicus Programme.
If you use this dataset in your research, please cite it using the following format:
- class torchgeo.datasets.Sentinel2(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)¶
Bases:
torchgeo.datasets.Sentinel
Sentinel-2 dataset.
The Copernicus Sentinel-2 mission comprises a constellation of two polar-orbiting satellites placed in the same sun-synchronous orbit, phased at 180° to each other. It aims at monitoring variability in land surface conditions, and its wide swath width (290 km) and high revisit time (10 days at the equator with one satellite, and 5 days with 2 satellites under cloud-free conditions which results in 2-3 days at mid-latitudes) will support monitoring of Earth’s surface changes.
- __init__(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises
FileNotFoundError – if no files are found in
root
- Return type
Non-geospatial Datasets¶
VisionDataset
is designed for datasets that lack geospatial information. These datasets can still be combined using ConcatDataset
.
ADVANCE (AuDio Visual Aerial sceNe reCognition datasEt)¶
- class torchgeo.datasets.ADVANCE(root='data', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
ADVANCE dataset.
The ADVANCE dataset is a dataset for audio visual scene recognition.
Dataset features:
5,075 pairs of geotagged audio recordings and images
three spectral bands - RGB (512x512 px)
10-second audio recordings
Dataset format:
images are three-channel jpgs
audio files are in wav format
Dataset classes:
airport
beach
bridge
farmland
forest
grassland
harbour
lake
orchard
residential
sparse shrub land
sports land
train station
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
scipy to load the audio files to tensors
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ADVANCE dataset instance.
- Parameters
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
and data is not found, or checksums don’t match- Return type
Smallholder Cashew Plantations in Benin¶
- class torchgeo.datasets.BeninSmallHolderCashews(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)¶
Bases:
torchgeo.datasets.VisionDataset
Smallholder Cashew Plantations in Benin dataset.
This dataset contains labels for cashew plantations in a 120 km2 area in the center of Benin. Each pixel is classified for Well-managed plantation, Poorly-managed plantation, No plantation and other classes. The labels are generated using a combination of ground data collection with a handheld GPS device, and final corrections based on Airbus Pléiades imagery. See this website for dataset details.
Specifically, the data consists of Sentinel 2 imagery from a 120 km2 area in the center of Benin over 71 points in time from 11/05/2019 to 10/30/2020 and polygon labels for 6 classes:
No data
Well-managed plantation
Poorly-managed planatation
Non-plantation
Residential
Background
Uncertain
If you use this dataset in your research, please cite the following:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
image, mask, and metadata at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Initialize a new Benin Smallholder Cashew Plantations Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
chip_size (int) – size of chips
stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips
bands (Tuple[str, ...]) – the subset of bands to load
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
verbose (bool) – if True, print messages when new tiles are loaded
- Raises
RuntimeError – if
download=False
but dataset is missing or checksum fails- Return type
BigEarthNet¶
- class torchgeo.datasets.BigEarthNet(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
BigEarthNet dataset.
The BigEarthNet dataset is a dataset for multilabel remote sensing image scene classification.
Dataset features:
590,326 patches from 125 Sentinel-1 and Sentinel-2 tiles
Imagery from tiles in Europe between Jun 2017 - May 2018
12 spectral bands with 10-60 m per pixel resolution (base 120x120 px)
2 synthetic aperture radar bands (120x120 px)
43 or 19 scene classes from the 2018 CORINE Land Cover database (CLC 2018)
Dataset format:
images are composed of multiple single channel geotiffs
labels are multiclass, stored in a single json file per image
mapping of Sentinel-1 to Sentinel-2 patches are within Sentinel-1 json files
Sentinel-1 bands: (VV, VH)
Sentinel-2 bands: (B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)
All bands: (VV, VH, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)
Sentinel-2 bands are of different spatial resolutions and upsampled to 10m
Dataset classes (43):
Agro-forestry areas
Airports
Annual crops associated with permanent crops
Bare rock
Beaches, dunes, sands
Broad-leaved forest
Burnt areas
Coastal lagoons
Complex cultivation patterns
Coniferous forest
Construction sites
Continuous urban fabric
Discontinuous urban fabric
Dump sites
Estuaries
Fruit trees and berry plantations
Green urban areas
Industrial or commercial units
Inland marshes
Intertidal flats
Land principally occupied by agriculture, with significant areas of natural vegetation
Mineral extraction sites
Mixed forest
Moors and heathland
Natural grassland
Non-irrigated arable land
Olive groves
Pastures
Peatbogs
Permanently irrigated land
Port areas
Rice fields
Road and rail networks and associated land
Salines
Salt marshes
Sclerophyllous vegetation
Sea and ocean
Sparsely vegetated areas
Sport and leisure facilities
Transitional woodland/shrub
Vineyards
Water bodies
Water courses
Dataset classes (19):
Urban fabric
Industrial or commercial units
Arable land
Permanent crops
Pastures
Complex cultivation patterns
- Land principally occupied by agriculture, with significant
areas of natural vegetation
Agro-forestry areas
Broad-leaved forest
Coniferous forest
Mixed forest
Natural grassland and sparsely vegetated areas
Moors, heathland and sclerophyllous vegetation
Transitional woodland, shrub
Beaches, dunes, sands
Inland wetlands
Coastal wetlands
Inland waters
Marine waters
If you use this dataset in your research, please cite the following paper:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]¶
Initialize a new BigEarthNet dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – train/val/test split to load
bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
num_classes (int) – number of classes to load in target. one of {19, 43}
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Return type
- class torchgeo.datasets.BigEarthNetDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the BigEarthNet dataset.
Uses the train/val/test splits from the dataset.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, bands='all', num_classes=19, batch_size=64, num_workers=0, **kwargs)[source]¶
Initialize a LightningDataModule for BigEarthNet based DataLoaders.
- Parameters
root_dir (str) – The
root
arugment to pass to the BigEarthNet Dataset classesbands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
num_classes (int) – number of classes to load in target. one of {19, 43}
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
kwargs (Any) –
- Return type
- prepare_data()[source]¶
Make sure that the dataset is downloaded.
This method is only called once per run.
- Return type
- setup(stage=None)[source]¶
Initialize the main
Dataset
objects.This method is called once per GPU per run.
Cars Overhead With Context (COWC)¶
- class torchgeo.datasets.COWC(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
,abc.ABC
Abstract base class for the COWC dataset.
The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.
The dataset has the following attributes:
Data from overhead at 15 cm per pixel resolution at ground (all data is EO).
Data from six distinct locations: Toronto, Canada; Selwyn, New Zealand; Potsdam and Vaihingen, Germany; Columbus, Ohio and Utah, United States.
32,716 unique annotated cars. 58,247 unique negative examples.
Intentional selection of hard negative examples.
Established baseline for detection and counting tasks.
Extra testing scenes for use after validation.
If you use this dataset in your research, please cite the following paper:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new COWC dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
- class torchgeo.datasets.COWCCounting(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.COWC
COWC Dataset for car counting.
- class torchgeo.datasets.COWCDetection(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.COWC
COWC Dataset for car detection.
- class torchgeo.datasets.COWCCountingDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the COWC Counting dataset.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, seed, batch_size=64, num_workers=0, **kwargs)[source]¶
Initialize a LightningDataModule for COWC Counting based DataLoaders.
- Parameters
root_dir (str) – The
root
arugment to pass to the COWCCounting Dataset classseed (int) – The seed value to use when doing the dataset random_split
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
kwargs (Any) –
- Return type
- prepare_data()[source]¶
Initialize the main
Dataset
objects for use insetup()
.This includes optionally downloading the dataset. This is done once per node, while
setup()
is done once per GPU.- Return type
- setup(stage=None)[source]¶
Create the train/val/test splits based on the original Dataset objects.
The splits should be done here vs. in
__init__()
per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
CV4A Kenya Crop Type Competition¶
- class torchgeo.datasets.CV4AKenyaCropType(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)¶
Bases:
torchgeo.datasets.VisionDataset
CV4A Kenya Crop Type dataset.
Used in a competition in the Computer Vision for Agriculture (CV4A) workshop in ICLR 2020. See this website for dataset details.
Consists of 4 tiles of Sentinel 2 imagery from 13 different points in time.
Each tile has:
13 multi-band observations throughout the growing season. Each observation includes 12 bands from Sentinel-2 L2A product, and a cloud probability layer. The twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12] (refer to Sentinel-2 documentation for more information about the bands). The cloud probability layer is a product of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) and provides an estimated cloud probability (0-100%) per pixel. All of the bands are mapped to a common 10 m spatial resolution grid.
A raster layer indicating the crop ID for the fields in the training set.
A raster layer indicating field IDs for the fields (both training and test sets). Fields with a crop ID 0 are the test fields.
There are 3,286 fields in the train set and 1,402 fields in the test set.
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data, labels, field ids, and metadata at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)[source]¶
Initialize a new CV4A Kenya Crop Type Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
chip_size (int) – size of chips
stride (int) – spacing between chips, if less than chip_size, then there will be overlap between chips
bands (Tuple[str, ...]) – the subset of bands to load
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
verbose (bool) – if True, print messages when new tiles are loaded
- Raises
RuntimeError – if
download=False
but dataset is missing or checksum fails- Return type
ETCI2021 Flood Detection¶
- class torchgeo.datasets.ETCI2021(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
ETCI 2021 Flood Detection dataset.
The ETCI2021 dataset is a dataset for flood detection
Dataset features:
33,405 VV & VH Sentinel-1 Synthetic Aperture Radar (SAR) images
2 binary masks per image representing water body & flood, respectively
2 polarization band images (VV, VH) of 3 RGB channels per band
3 RGB channels per band generated by the Hybrid Pluggable Processing Pipeline (hyp3)
Images with 5x20m per pixel resolution (256x256) px) taken in Interferometric Wide Swath acquisition mode
Flood events from 5 different regions
Dataset format:
VV band three-channel png
VH band three-channel png
water body mask single-channel png where no water body = 0, water body = 255
flood mask single-channel png where no flood = 0, flood = 255
Dataset classes:
no flood/water
flood/water
If you use this dataset in your research, please add the following to your acknowledgements section:
The authors would like to thank the NASA Earth Science Data Systems Program, NASA Digital Transformation AI/ML thrust, and IEEE GRSS for organizing the ETCI competition.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ETCI 2021 dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
EuroSAT¶
- class torchgeo.datasets.EuroSAT(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionClassificationDataset
EuroSAT dataset.
The EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consists of 10 target classes with a total of 27,000 labeled and geo-referenced images.
Dataset format:
rasters are 13-channel GeoTiffs
labels are values in the range [0,9]
Dataset classes:
Industrial Buildings
Residential Buildings
Annual Crop
Permanent Crop
River
Sea and Lake
Herbaceous Vegetation
Highway
Pasture
Forest
This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:
If you use this dataset in your research, please cite the following papers:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new EuroSAT dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
and data is not found, or checksums don’t match- Return type
GID-15 (Gaofen Image Dataset)¶
- class torchgeo.datasets.GID15(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
GID-15 dataset.
The GID-15 dataset is a dataset for semantic segmentation.
Dataset features:
images taken by the Gaofen-2 (GF-2) satellite over 60 cities in China
masks representing 15 semantic categories
three spectral bands - RGB
150 with 3 m per pixel resolution (6800x7200 px)
Dataset format:
images are three-channel pngs
masks are single-channel pngs
colormapped masks are 3 channel tifs
Dataset classes:
background
industrial_land
urban_residential
rural_residential
traffic_land
paddy_field
irrigated_land
dry_cropland
garden_plot
arbor_woodland
shrub_land
natural_grassland
artificial_grassland
river
lake
pond
If you use this dataset in your research, please cite the following paper:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new GID-15 dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
LandCover.ai (Land Cover from Aerial Imagery)¶
- class torchgeo.datasets.LandCoverAI(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
LandCover.ai dataset.
The LandCover.ai (Land Cover from Aerial Imagery) dataset is a dataset for automatic mapping of buildings, woodlands, water and roads from aerial images. This implementation is specifically for Version 1 of Landcover.ai.
Dataset features:
land cover from Poland, Central Europe
three spectral bands - RGB
33 orthophotos with 25 cm per pixel resolution (~9000x9500 px)
8 orthophotos with 50 cm per pixel resolution (~4200x4700 px)
total area of 216.27 km2
Dataset format:
rasters are three-channel GeoTiffs with EPSG:2180 spatial reference system
masks are single-channel GeoTiffs with EPSG:2180 spatial reference system
Dataset classes:
building (1.85 km2)
woodland (72.02 km2)
water (13.15 km2)
road (3.5 km2)
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
opencv-python to generate the train/val/test split
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new LandCover.ai dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
- class torchgeo.datasets.LandCoverAIDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the LandCover.ai dataset.
Uses the train/val/test splits from the dataset.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶
Initialize a LightningDataModule for LandCover.ai based DataLoaders.
- prepare_data()[source]¶
Make sure that the dataset is downloaded.
This method is only called once per run.
- Return type
- setup(stage=None)[source]¶
Initialize the main
Dataset
objects.This method is called once per GPU per run.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
LEVIR-CD+ (LEVIR Change Detection +)¶
- class torchgeo.datasets.LEVIRCDPlus(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
LEVIR-CD+ dataset.
The LEVIR-CD+ dataset is a dataset for building change detection.
Dataset features:
image pairs of 20 different urban regions across Texas between 2002-2020
binary change masks representing building change
three spectral bands - RGB
985 image pairs with 50 cm per pixel resolution (~1024x1024 px)
Dataset format:
images are three-channel pngs
masks are single-channel pngs where no change = 0, change = 255
Dataset classes:
no change
change
If you use this dataset in your research, please cite the following paper:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new LEVIR-CD+ dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
PatternNet¶
- class torchgeo.datasets.PatternNet(root='data', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionClassificationDataset
PatternNet dataset.
The PatternNet dataset is a dataset for remote sensing scene classification and image retrieval.
Dataset features:
30,400 images with 6-50 cm per pixel resolution (256x256 px)
three spectral bands - RGB
38 scene classes, 800 images per class
Dataset format:
images are three-channel jpgs
Dataset classes:
airplane
baseball_field
basketball_court
beach
bridge
cemetery
chaparral
christmas_tree_farm
closed_road
coastal_mansion
crosswalk
dense_residential
ferry_terminal
football_field
forest
freeway
golf_course
harbor
intersection
mobile_home_park
nursing_home
oil_gas_field
oil_well
overpass
parking_lot
parking_space
railway
river
runway
runway_marking
shipping_yard
solar_panel
sparse_residential
storage_tank
swimming_pool
tennis_court
transformer_station
wastewater_treatment_plant
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new PatternNet dataset instance.
- Parameters
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Return type
RESISC45 (Remote Sensing Image Scene Classification)¶
- class torchgeo.datasets.RESISC45(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionClassificationDataset
RESISC45 dataset.
The RESISC45 dataset is a dataset for remote sensing image scene classification.
Dataset features:
31,500 images with 0.2-30 m per pixel resolution (256x256 px)
three spectral bands - RGB
45 scene classes, 700 images per class
images extracted from Google Earth from over 100 countries
images conditions with high variability (resolution, weather, illumination)
Dataset format:
images are three-channel jpgs
Dataset classes:
airplane
airport
baseball_diamond
basketball_court
beach
bridge
chaparral
church
circular_farmland
cloud
commercial_area
dense_residential
desert
forest
freeway
golf_course
ground_track_field
harbor
industrial_area
intersection
island
lake
meadow
medium_residential
mobile_home_park
mountain
overpass
palace
parking_lot
railway
railway_station
rectangular_farmland
river
roundabout
runway
sea_ice
ship
snowberg
sparse_residential
stadium
storage_tank
tennis_court
terrace
thermal_power_station
wetland
This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new RESISC45 dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Return type
- class torchgeo.datasets.RESISC45DataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the RESISC45 dataset.
Uses the train/val/test splits from the dataset.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶
Initialize a LightningDataModule for RESISC45 based DataLoaders.
- prepare_data()[source]¶
Make sure that the dataset is downloaded.
This method is only called once per run.
- Return type
- setup(stage=None)[source]¶
Initialize the main
Dataset
objects.This method is called once per GPU per run.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
Seasonal Contrast¶
- class torchgeo.datasets.SeasonalContrastS2(root='data', version='100k', bands=['B4', 'B3', 'B2'], transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
Sentinel 2 imagery from the Seasonal Contrast paper.
The Seasonal Contrast imagery dataset contains Sentinel 2 imagery patches sampled from different points in time around the 10k most populated cities on Earth.
Dataset features:
Two versions: 100K and 1M patches
12 band Sentinel 2 imagery from 5 points in time at each location
If you use this dataset in your research, please cite the following paper:
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
- sample with an “image” in 5xCxHxW format where the 5 indexes over the same
patch sampled from different points in time by the SeCo method
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', version='100k', bands=['B4', 'B3', 'B2'], transforms=None, download=False, checksum=False)[source]¶
Initialize a new SeCo dataset instance.
- Parameters
root (str) – root directory where dataset can be found
version (str) – one of “100k” or “1m” for the version of the dataset to use
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
bands (List[str]) –
- Raises
AssertionError – if
version
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
SEN12MS¶
- class torchgeo.datasets.SEN12MS(root='data', split='train', bands=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], transforms=None, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
SEN12MS dataset.
The SEN12MS dataset contains 180,662 patch triplets of corresponding Sentinel-1 dual-pol SAR data, Sentinel-2 multi-spectral images, and MODIS-derived land cover maps. The patches are distributed across the land masses of the Earth and spread over all four meteorological seasons. This is reflected by the dataset structure. All patches are provided in the form of 16-bit GeoTiffs containing the following specific information:
Sentinel-1 SAR: 2 channels corresponding to sigma nought backscatter values in dB scale for VV and VH polarization.
Sentinel-2 Multi-Spectral: 13 channels corresponding to the 13 spectral bands (B1, B2, B3, B4, B5, B6, B7, B8, B8a, B9, B10, B11, B12).
MODIS Land Cover: 4 channels corresponding to IGBP, LCCS Land Cover, LCCS Land Use, and LCCS Surface Hydrology layers.
If you use this dataset in your research, please cite the following paper:
Note
This dataset can be automatically downloaded using the following bash script:
for season in 1158_spring 1868_summer 1970_fall 2017_winter do for source in lc s1 s2 do wget "ftp://m1474000:m1474000@dataserv.ub.tum.de/ROIs${season}_${source}.tar.gz" tar xvzf "ROIs${season}_${source}.tar.gz" done done for split in train test do wget "https://raw.githubusercontent.com/schmitt-muc/SEN12MS/master/splits/${split}_list.txt" done
or manually downloaded from https://dataserv.ub.tum.de/s/m1474000 and https://github.com/schmitt-muc/SEN12MS/tree/master/splits. This download will likely take several hours.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', bands=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], transforms=None, checksum=False)[source]¶
Initialize a new SEN12MS dataset instance.
The
bands
argument allows for the subsetting of bands returned by the dataset. Integers inbands
index into a stack of Sentinel 1 and Sentinel 2 imagery. Indices 0 and 1 correspond to the Sentinel 1 imagery where indices 2 through 14 correspond to the Sentinel 2 imagery.- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
bands (List[int]) – a list of band indices to use where the indices correspond to the array index of combined Sentinel 1 and Sentinel 2
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if data is not found in
root
, or checksums don’t match
- Return type
- class torchgeo.datasets.SEN12MSDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the SEN12MS dataset.
Implements 80/20 geographic train/val splits and uses the test split from the classification dataset definitions. See
setup()
for more details.Uses the Simplified IGBP scheme defined in the 2020 Data Fusion Competition. See https://arxiv.org/abs/2002.08254.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, seed, band_set='all', batch_size=64, num_workers=0, **kwargs)[source]¶
Initialize a LightningDataModule for SEN12MS based DataLoaders.
- Parameters
root_dir (str) – The
root
arugment to pass to the SEN12MS Dataset classesseed (int) – The seed value to use when doing the sklearn based ShuffleSplit
band_set (str) – The subset of S1/S2 bands to use. Options are: “all”, “s1”, “s2-all”, and “s2-reduced” where the “s2-reduced” set includes: B2, B3, B4, B8, B11, and B12.
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
kwargs (Any) –
- Return type
- setup(stage=None)[source]¶
Create the train/val/test splits based on the original Dataset objects.
The splits should be done here vs. in
__init__()
per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.We split samples between train and val geographically with proportions of 80/20. This mimics the geographic test set split.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
So2Sat¶
- class torchgeo.datasets.So2Sat(root='data', split='train', transforms=None, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
So2Sat dataset.
The So2Sat dataset consists of corresponding synthetic aperture radar and multispectral optical image data acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and a corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world, and comes with a split into fully independent, non-overlapping training, validation, and test sets.
This implementation focuses on the 2nd version of the dataset as described in the author’s github repository https://github.com/zhu-xlab/So2Sat-LCZ42 and hosted at https://mediatum.ub.tum.de/1483140. This version is identical to the first version of the dataset but includes the test data. The splits are defined as follows:
Training: 42 cities around the world
Validation: western half of 10 other cities covering 10 cultural zones
Testing: eastern half of the 10 other cities
If you use this dataset in your research, please cite the following paper:
Note
This dataset can be automatically downloaded using the following bash script:
for split in training validation testing do wget ftp://m1483140:m1483140@dataserv.ub.tum.de/$split.h5 done
or manually downloaded from https://dataserv.ub.tum.de/index.php/s/m1483140 This download will likely take several hours.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', split='train', transforms=None, checksum=False)[source]¶
Initialize a new So2Sat dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “validation”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if data is not found in
root
, or checksums don’t match
- Return type
- class torchgeo.datasets.So2SatDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the So2Sat dataset.
Uses the train/val/test splits from the dataset.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, batch_size=64, num_workers=0, bands='rgb', unsupervised_mode=False, **kwargs)[source]¶
Initialize a LightningDataModule for So2Sat based DataLoaders.
- Parameters
root_dir (str) – The
root
arugment to pass to the So2Sat Dataset classesbatch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
bands (str) – Either “rgb” or “s2”
unsupervised_mode (bool) – Makes the train dataloader return imagery from the train, val, and test sets
kwargs (Any) –
- Return type
- prepare_data()[source]¶
Make sure that the dataset is downloaded.
This method is only called once per run.
- Return type
- setup(stage=None)[source]¶
Initialize the main
Dataset
objects.This method is called once per GPU per run.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
SpaceNet¶
- class torchgeo.datasets.SpaceNet(root, image, collections=[], transforms=None, download=False, api_key=None, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
,abc.ABC
Abstract base class for the SpaceNet datasets.
The SpaceNet datasets are a set of datasets that all together contain >11M building footprints and ~20,000 km of road labels mapped over high-resolution satellite imagery obtained from Worldview-2 and Worldview-3 sensors.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root, image, collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
image (str) – image selection
collections (List[str]) – collection selection
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
but dataset is missing- Return type
- class torchgeo.datasets.SpaceNet1(root, image='rgb', transforms=None, download=False, api_key=None, checksum=False)¶
Bases:
torchgeo.datasets.SpaceNet
SpaceNet 1: Building Detection v1 Dataset.
SpaceNet 1 is a dataset of building footprints over the city of Rio de Janeiro.
Dataset features:
No. of images: 6940 (8 Band) + 6940 (RGB)
No. of polygons: 382,534 building labels
Area Coverage: 2544 sq km
GSD: 1 m (8 band), 50 cm (rgb)
Chip size: 101 x 110 (8 band), 406 x 438 (rgb)
Dataset format:
Imagery - Worldview-2 GeoTIFFs
8Band.tif (Multispectral)
RGB.tif (Pansharpened RGB)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __init__(root, image='rgb', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 1 Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
image (str) – image selection which must be “rgb” or “8band”
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version.
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
but dataset is missing- Return type
- class torchgeo.datasets.SpaceNet2(root, image='PS-RGB', collections=[], transforms=None, download=False, api_key=None, checksum=False)¶
Bases:
torchgeo.datasets.SpaceNet
SpaceNet 2: Building Detection v2 Dataset.
SpaceNet 2 is a dataset of building footprints over the cities of Las Vegas, Paris, Shanghai and Khartoum.
Collection features:
AOI
Area (km2)
# Images
# Buildings
Las Vegas
216
3850
151,367
Paris
1030
1148
23,816
Shanghai
1000
4582
92,015
Khartoum
765
1012
35,503
Imagery features:
PAN
MS
PS-MS
PS-RGB
GSD (m)
0.31
1.24
0.30
0.30
Chip size (px)
650 x 650
162 x 162
650 x 650
650 x 650
Dataset format:
Imagery - Worldview-3 GeoTIFFs
PAN.tif (Panchromatic)
MS.tif (Multispectral)
PS-MS (Pansharpened Multispectral)
PS-RGB (Pansharpened RGB)
Labels - GeoJSON
label.geojson
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __init__(root, image='PS-RGB', collections=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 2 Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“MS”, “PAN”, “PS-MS”, “PS-RGB”]
collections (List[str]) – collection selection which must be a subset of: [sn2_AOI_2_Vegas, sn2_AOI_3_Paris, sn2_AOI_4_Shanghai, sn2_AOI_5_Khartoum]
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
but dataset is missing- Return type
- class torchgeo.datasets.SpaceNet4(root, image='PS-RGBNIR', angles=[], transforms=None, download=False, api_key=None, checksum=False)¶
Bases:
torchgeo.datasets.SpaceNet
SpaceNet 4: Off-Nadir Buildings Dataset.
SpaceNet 4 is a dataset of 27 WV-2 imagery captured at varying off-nadir angles and associated building footprints over the city of Atlanta. The off-nadir angle ranges from 7 degrees to 54 degrees.
Dataset features:
No. of chipped images: 28,728 (PAN/MS/PS-RGBNIR)
No. of label files: 1064
No. of building footprints: >120,000
Area Coverage: 665 sq km
Chip size: 225 x 225 (MS), 900 x 900 (PAN/PS-RGBNIR)
Dataset format:
Imagery - Worldview-2 GeoTIFFs
PAN.tif (Panchromatic)
MS.tif (Multispectral)
PS-RGBNIR (Pansharpened RGBNIR)
Labels - GeoJSON
labels.geojson
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __init__(root, image='PS-RGBNIR', angles=[], transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new SpaceNet 4 Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
image (str) – image selection which must be in [“MS”, “PAN”, “PS-RGBNIR”]
angles (List[str]) – angle selection which must be in [“nadir”, “off-nadir”, “very-off-nadir”]
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory.
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
but dataset is missing- Return type
Tropical Cyclone Wind Estimation Competition¶
- class torchgeo.datasets.TropicalCycloneWindEstimation(root='data', split='train', transforms=None, download=False, api_key=None, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
Tropical Cyclone Wind Estimation Competition dataset.
A collection of tropical storms in the Atlantic and East Pacific Oceans from 2000 to 2019 with corresponding maximum sustained surface wind speed. This dataset is split into training and test categories for the purpose of a competition.
See https://www.drivendata.org/competitions/72/predict-wind-speeds/ for more information about the competition.
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
radiant-mlhub to download the imagery and labels from the Radiant Earth MLHub
- __init__(root='data', split='train', transforms=None, download=False, api_key=None, checksum=False)[source]¶
Initialize a new Tropical Cyclone Wind Estimation Competition Dataset.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train” or “test”
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
api_key (Optional[str]) – a RadiantEarth MLHub API key to use for downloading the dataset
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
but dataset is missing or checksum fails
- Return type
- class torchgeo.datasets.CycloneDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the NASA Cyclone dataset.
Implements 80/20 train/val splits based on hurricane storm ids. See
setup()
for more details.- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, seed, batch_size=64, num_workers=0, api_key=None, **kwargs)[source]¶
Initialize a LightningDataModule for NASA Cyclone based DataLoaders.
- Parameters
root_dir (str) – The
root
arugment to pass to the TropicalCycloneWindEstimation Datasets classesseed (int) – The seed value to use when doing the sklearn based GroupShuffleSplit
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
api_key (Optional[str]) – The RadiantEarth MLHub API key to use if the dataset needs to be downloaded
kwargs (Any) –
- Return type
- prepare_data()[source]¶
Initialize the main
Dataset
objects for use insetup()
.This includes optionally downloading the dataset. This is done once per node, while
setup()
is done once per GPU.- Return type
- setup(stage=None)[source]¶
Create the train/val/test splits based on the original Dataset objects.
The splits should be done here vs. in
__init__()
per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.We split samples between train/val by the
storm_id
property. I.e. all samples with the samestorm_id
value will be either in the train or the val split. This is important to test one type of generalizability – given a new storm, can we predict its windspeed. The test set, however, contains some storms from the training set (specifically, the latter parts of the storms) as well as some novel storms.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
NWPU VHR-10¶
- class torchgeo.datasets.VHR10(root='data', split='positive', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
NWPU VHR-10 dataset.
Northwestern Polytechnical University (NWPU) very-high-resolution ten-class (VHR-10) remote sensing image dataset.
Consists of 800 VHR optical remote sensing images, where 715 color images were acquired from Google Earth with the spatial resolution ranging from 0.5 to 2 m, and 85 pansharpened color infrared (CIR) images were acquired from Vaihingen data with a spatial resolution of 0.08 m.
The data set is divided into two sets:
Positive image set (650 images) which contains at least one target in an image
Negative image set (150 images) does not contain any targets
The positive image set consists of objects from ten classes:
Airplanes (757)
Ships (302)
Storage tanks (655)
Baseball diamonds (390)
Tennis courts (524)
Basketball courts (159)
Ground track fields (163)
Harbors (224)
Bridges (124)
Vehicles (477)
Includes object detection bounding boxes from original paper and instance segmentation masks from follow-up publications. If you use this dataset in your research, please cite the following papers:
Note
This dataset requires the following additional libraries to be installed:
pycocotools to load the
annotations.json
file for the “positive” image setrarfile to extract the dataset, which is stored in a RAR file
- __init__(root='data', split='positive', transforms=None, download=False, checksum=False)[source]¶
Initialize a new VHR-10 dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “postive” or “negative”
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
AssertionError – if
split
argument is invalidRuntimeError – if
download=False
and data is not found, or checksums don’t match
- Return type
UC Merced¶
- class torchgeo.datasets.UCMerced(root='data', split='train', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionClassificationDataset
UC Merced dataset.
The UC Merced dataset is a land use classification dataset of 2.1k 256x256 1ft resolution RGB images of urban locations around the U.S. extracted from the USGS National Map Urban Area Imagery collection with 21 land use classes (100 images per class).
Dataset features:
land use class labels from around the U.S.
three spectral bands - RGB
21 classes
Dataset classes:
agricultural
airplane
baseballdiamond
beach
buildings
chaparral
denseresidential
forest
freeway
golfcourse
harbor
intersection
mediumresidential
mobilehomepark
overpass
parkinglot
river
runway
sparseresidential
storagetanks
tenniscourt
This dataset uses the train/val/test splits defined in the “In-domain representation learning for remote sensing” paper:
If you use this dataset in your research, please cite the following paper:
- __init__(root='data', split='train', transforms=None, download=False, checksum=False)[source]¶
Initialize a new UC Merced dataset instance.
- Parameters
root (str) – root directory where dataset can be found
split (str) – one of “train”, “val”, or “test”
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
and data is not found, or checksums don’t match- Return type
- class torchgeo.datasets.UCMercedDataModule(*args, **kwargs)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
LightningDataModule implementation for the UC Merced dataset.
Uses random train/val/test splits.
- Parameters
args (Any) –
kwargs (Any) –
- Return type
LightningDataModule
- __init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶
Initialize a LightningDataModule for UCMerced based DataLoaders.
- prepare_data()[source]¶
Make sure that the dataset is downloaded.
This method is only called once per run.
- Return type
- setup(stage=None)[source]¶
Initialize the main
Dataset
objects.This method is called once per GPU per run.
- train_dataloader()[source]¶
Return a DataLoader for training.
- Returns
training data loader
- Return type
ZueriCrop¶
- class torchgeo.datasets.ZueriCrop(root='data', transforms=None, download=False, checksum=False)¶
Bases:
torchgeo.datasets.VisionDataset
ZueriCrop dataset.
The ZueriCrop dataset is a dataset for time-series instance segmentation of crops.
Dataset features:
Sentinel-2 multispectral imagery
instance masks of 48 crop categories
nine multispectral bands
116k images with 10 m per pixel resolution (24x24 px)
~28k time-series containing 142 images each
Dataset format:
single hdf5 dataset containing images, semantic masks, and instance masks
data is parsed into images and instance masks, boxes, and labels
one mask per time-series
Dataset classes:
48 fine-grained hierarchical crop categories
If you use this dataset in your research, please cite the following paper:
Note
This dataset requires the following additional library to be installed:
h5py to load the dataset
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
sample containing image, mask, bounding boxes, and target label
- Return type
Dict[str, torch.Tensor]
- __init__(root='data', transforms=None, download=False, checksum=False)[source]¶
Initialize a new ZueriCrop dataset instance.
- Parameters
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
- Raises
RuntimeError – if
download=False
and data is not found, or checksums don’t match- Return type
Base Classes¶
If you want to write your own custom dataset, you can extend one of these abstract base classes.
GeoDataset¶
- class torchgeo.datasets.GeoDataset(transforms=None)¶
Bases:
torch.utils.data.Dataset
[Dict
[str
,Any
]],abc.ABC
Abstract base class for datasets containing geospatial information.
Geospatial information includes things like:
coordinates (latitude, longitude)
resolution
These kind of datasets are special because they can be combined. For example:
Combine Landsat8 and CDL to train a model for crop classification
Combine NAIP and Chesapeake to train a model for land cover mapping
This isn’t true for
VisionDataset
, where the lack of geospatial information prohibits swapping image sources or target labels.- __add__(other)[source]¶
Merge two GeoDatasets.
- Parameters
other (torchgeo.datasets.GeoDataset) – another dataset
- Returns
a single dataset
- Raises
ValueError – if other is not a GeoDataset, or if datasets do not overlap, or if datasets do not have the same coordinate reference system (CRS)
- Return type
- abstract __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters
query (torchgeo.datasets.BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns
sample of image/mask and metadata at that index
- Raises
IndexError – if query is not found in the index
- Return type
Dict[str, Any]
- __len__()[source]¶
Return the number of files in the dataset.
- Returns
length of the dataset
- Return type
- __str__()[source]¶
Return the informal string representation of the object.
- Returns
informal string representation
- Return type
- property bounds: torchgeo.datasets.BoundingBox¶
Bounds of the index.
- Returns
(minx, maxx, miny, maxy, mint, maxt) of the dataset
RasterDataset¶
- class torchgeo.datasets.RasterDataset(root, crs=None, res=None, transforms=None, cache=True)¶
Bases:
torchgeo.datasets.GeoDataset
Abstract base class for
GeoDataset
stored as raster files.- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters
query (torchgeo.datasets.BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns
sample of image/mask and metadata at that index
- Raises
IndexError – if query is not found in the index
- Return type
Dict[str, Any]
- __init__(root, crs=None, res=None, transforms=None, cache=True)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
- Raises
FileNotFoundError – if no files are found in
root
- Return type
- plot(data)[source]¶
Plot a data sample.
- Parameters
data (torch.Tensor) – the data to plot
- Raises
AssertionError – if
is_image
is True anddata
has a different number of channels than expected- Return type
VectorDataset¶
- class torchgeo.datasets.VectorDataset(root='data', crs=None, res=0.0001, transforms=None)¶
Bases:
torchgeo.datasets.GeoDataset
Abstract base class for
GeoDataset
stored as vector files.- __getitem__(query)[source]¶
Retrieve image/mask and metadata indexed by query.
- Parameters
query (torchgeo.datasets.BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns
sample of image/mask and metadata at that index
- Raises
IndexError – if query is not found in the index
- Return type
Dict[str, Any]
- __init__(root='data', crs=None, res=0.0001, transforms=None)[source]¶
Initialize a new Dataset instance.
- Parameters
root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
- Raises
FileNotFoundError – if no files are found in
root
- Return type
- plot(data)[source]¶
Plot a data sample.
- Parameters
data (torch.Tensor) – the data to plot
- Return type
VisionDataset¶
- class torchgeo.datasets.VisionDataset(*args, **kwds)¶
Bases:
torch.utils.data.Dataset
[Dict
[str
,Any
]],abc.ABC
Abstract base class for datasets lacking geospatial information.
This base class is designed for datasets with pre-defined image chips.
- abstract __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and labels at that index
- Raises
IndexError – if index is out of range of the dataset
- Return type
Dict[str, Any]
VisionClassificationDataset¶
- class torchgeo.datasets.VisionClassificationDataset(root, transforms=None, loader=<function default_loader>, is_valid_file=None)¶
Bases:
torchgeo.datasets.VisionDataset
,torchvision.datasets.ImageFolder
Abstract base class for classification datasets lacking geospatial information.
This base class is designed for datasets with pre-defined image chips which are separated into separate folders per class.
- __getitem__(index)[source]¶
Return an index within the dataset.
- Parameters
index (int) – index to return
- Returns
data and label at that index
- Return type
Dict[str, torch.Tensor]
- __init__(root, transforms=None, loader=<function default_loader>, is_valid_file=None)[source]¶
Initialize a new VisionClassificationDataset instance.
- Parameters
root (str) – root directory where dataset can be found
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
loader (Optional[Callable[[str], Any]]) – a callable function which takes as input a path to an image and returns a PIL Image or numpy array
is_valid_file (Optional[Callable[[str], bool]]) – A function that takes the path of an Image file and checks if the file is a valid file
- Return type
ZipDataset¶
- class torchgeo.datasets.ZipDataset(datasets)¶
Bases:
torchgeo.datasets.GeoDataset
Dataset for merging two or more GeoDatasets.
For example, this allows you to combine an image source like Landsat8 with a target label like CDL.
- __getitem__(query)[source]¶
Retrieve image and metadata indexed by query.
- Parameters
query (torchgeo.datasets.BoundingBox) – (minx, maxx, miny, maxy, mint, maxt) coordinates to index
- Returns
sample of data/labels and metadata at that index
- Raises
IndexError – if query is not within bounds of the index
- Return type
Dict[str, Any]
- __init__(datasets)[source]¶
Initialize a new Dataset instance.
- Parameters
datasets (Sequence[torchgeo.datasets.GeoDataset]) – list of datasets to merge
- Raises
ValueError – if datasets contains non-GeoDatasets, do not overlap, are not in the same coordinate reference system (CRS), or do not have the same resolution
- Return type
- __len__()[source]¶
Return the number of files in the dataset.
- Returns
length of the dataset
- Return type
- __str__()[source]¶
Return the informal string representation of the object.
- Returns
informal string representation
- Return type
- property bounds: torchgeo.datasets.BoundingBox¶
Bounds of the index.
- Returns
(minx, maxx, miny, maxy, mint, maxt) of the dataset
Utilities¶
- class torchgeo.datasets.BoundingBox(minx, maxx, miny, maxy, mint, maxt)¶
Bases:
Tuple
[float
,float
,float
,float
,float
,float
]Data class for indexing spatiotemporal data.
- Parameters
- Return type
- static __new__(cls, minx, maxx, miny, maxy, mint, maxt)[source]¶
Create a new instance of BoundingBox.
- Parameters
- Raises
ValueError – if bounding box is invalid (minx > maxx, miny > maxy, or mint > maxt)
- Return type
- __repr__()[source]¶
Return the formal string representation of the object.
- Returns
formal string representation
- Return type
- intersects(other)[source]¶
Whether or not two bounding boxes intersect.
- Parameters
other (torchgeo.datasets.BoundingBox) – another bounding box
- Returns
True if bounding boxes intersect, else False
- Return type