torchgeo.samplers¶
Samplers¶
Samplers are used to index a dataset, retrieving a single query at a time. For VisionDataset
, dataset objects can be indexed with integers, and PyTorch’s builtin samplers are sufficient. For GeoDataset
, dataset objects require a bounding box for indexing. For this reason, we define our own GeoSampler
implementations below. These can be used like so:
from torch.utils.data import DataLoader
from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomGeoSampler
dataset = Landsat(...)
sampler = RandomGeoSampler(dataset, size=1000, length=100)
dataloader = DataLoader(dataset, sampler=sampler)
Random Geo Sampler¶
- class torchgeo.samplers.RandomGeoSampler(dataset, size, length, roi=None)¶
Bases:
torchgeo.samplers.GeoSampler
Samples elements from a region of interest randomly.
This is particularly useful during training when you want to maximize the size of the dataset and return as many random chips as possible.
This sampler is not recommended for use with tile-based datasets. Use
RandomBatchGeoSampler
instead.- __init__(dataset, size, length, roi=None)[source]¶
Initialize a new Sampler instance.
The
size
argument can either be:a single
float
- in which case the same value is used for the height and width dimensiona
tuple
of two floats - in which case, the first float is used for the height dimension, and the second float for the width dimension
- Parameters
dataset (torchgeo.datasets.GeoDataset) – dataset to index from
size (Union[Tuple[float, float], float]) – dimensions of each patch in units of CRS
length (int) – number of random samples to draw per epoch
roi (Optional[torchgeo.datasets.BoundingBox]) – region of interest to sample from (minx, maxx, miny, maxy, mint, maxt) (defaults to the bounds of
dataset.index
)
- __iter__()[source]¶
Return the index of a dataset.
- Returns
(minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset
- Return type
Iterator[torchgeo.datasets.BoundingBox]
Grid Geo Sampler¶
- class torchgeo.samplers.GridGeoSampler(dataset, size, stride, roi=None)¶
Bases:
torchgeo.samplers.GeoSampler
Samples elements in a grid-like fashion.
This is particularly useful during evaluation when you want to make predictions for an entire region of interest. You want to minimize the amount of redundant computation by minimizing overlap between chips.
Usually the stride should be slightly smaller than the chip size such that each chip has some small overlap with surrounding chips. This is used to prevent stitching artifacts when combining each prediction patch. The overlap between each chip (
chip_size - stride
) should be approximately equal to the receptive field of the CNN.- __init__(dataset, size, stride, roi=None)[source]¶
Initialize a new Sampler instance.
The
size
andstride
arguments can either be:a single
float
- in which case the same value is used for the height and width dimensiona
tuple
of two floats - in which case, the first float is used for the height dimension, and the second float for the width dimension
- Parameters
dataset (torchgeo.datasets.GeoDataset) – dataset to index from
size (Union[Tuple[float, float], float]) – dimensions of each patch in units of CRS
stride (Union[Tuple[float, float], float]) – distance to skip between each patch
roi (Optional[torchgeo.datasets.BoundingBox]) – region of interest to sample from (minx, maxx, miny, maxy, mint, maxt) (defaults to the bounds of
dataset.index
)
- __iter__()[source]¶
Return the index of a dataset.
- Returns
(minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset
- Return type
Iterator[torchgeo.datasets.BoundingBox]
Batch Samplers¶
When working with large tile-based datasets, randomly sampling patches from each tile can be extremely time consuming. It’s much more efficient to choose a tile, load it, warp it to the appropriate coordinate reference system (CRS) and resolution, and then sample random patches from that tile to construct a mini-batch of data. For this reason, we define our own BatchGeoSampler
implementations below. These can be used like so:
from torch.utils.data import DataLoader
from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomBatchGeoSampler
dataset = Landsat(...)
sampler = RandomBatchGeoSampler(dataset, size=1000, batch_size=10, length=100)
dataloader = DataLoader(dataset, batch_sampler=sampler)
Random Batch Geo Sampler¶
- class torchgeo.samplers.RandomBatchGeoSampler(dataset, size, batch_size, length, roi=None)¶
Bases:
torchgeo.samplers.BatchGeoSampler
Samples batches of elements from a region of interest randomly.
This is particularly useful during training when you want to maximize the size of the dataset and return as many random chips as possible.
- __init__(dataset, size, batch_size, length, roi=None)[source]¶
Initialize a new Sampler instance.
The
size
argument can either be:a single
float
- in which case the same value is used for the height and width dimensiona
tuple
of two floats - in which case, the first float is used for the height dimension, and the second float for the width dimension
- Parameters
dataset (torchgeo.datasets.GeoDataset) – dataset to index from
size (Union[Tuple[float, float], float]) – dimensions of each patch in units of CRS
batch_size (int) – number of samples per batch
length (int) – number of samples per epoch
roi (Optional[torchgeo.datasets.BoundingBox]) – region of interest to sample from (minx, maxx, miny, maxy, mint, maxt) (defaults to the bounds of
dataset.index
)
- __iter__()[source]¶
Return the indices of a dataset.
- Returns
batch of (minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset
- Return type
Iterator[List[torchgeo.datasets.BoundingBox]]
Base Classes¶
If you want to write your own custom sampler, you can extend one of these abstract base classes.
Geo Sampler¶
- class torchgeo.samplers.GeoSampler(dataset, roi=None)¶
Bases:
torch.utils.data.Sampler
[torchgeo.datasets.BoundingBox
],abc.ABC
Abstract base class for sampling from
GeoDataset
.Unlike PyTorch’s
Sampler
,GeoSampler
returns enough geospatial information to uniquely index anyGeoDataset
. This includes things like latitude, longitude, height, width, projection, coordinate system, and time.- __init__(dataset, roi=None)[source]¶
Initialize a new Sampler instance.
- Parameters
dataset (torchgeo.datasets.GeoDataset) – dataset to index from
roi (Optional[torchgeo.datasets.BoundingBox]) – region of interest to sample from (minx, maxx, miny, maxy, mint, maxt) (defaults to the bounds of
dataset.index
)
- abstract __iter__()[source]¶
Return the index of a dataset.
- Returns
(minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset
- Return type
Iterator[torchgeo.datasets.BoundingBox]
Batch Geo Sampler¶
- class torchgeo.samplers.BatchGeoSampler(dataset, roi=None)¶
Bases:
torch.utils.data.Sampler
[List
[torchgeo.datasets.BoundingBox
]],abc.ABC
Abstract base class for sampling from
GeoDataset
.Unlike PyTorch’s
BatchSampler
,BatchGeoSampler
returns enough geospatial information to uniquely index anyGeoDataset
. This includes things like latitude, longitude, height, width, projection, coordinate system, and time.- __init__(dataset, roi=None)[source]¶
Initialize a new Sampler instance.
- Parameters
dataset (torchgeo.datasets.GeoDataset) – dataset to index from
roi (Optional[torchgeo.datasets.BoundingBox]) – region of interest to sample from (minx, maxx, miny, maxy, mint, maxt) (defaults to the bounds of
dataset.index
)
- abstract __iter__()[source]¶
Return a batch of indices of a dataset.
- Returns
batch of (minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset
- Return type
Iterator[List[torchgeo.datasets.BoundingBox]]