Shortcuts

torchgeo.samplers

Samplers

Samplers are used to index a dataset, retrieving a single query at a time. For VisionDataset, dataset objects can be indexed with integers, and PyTorch’s builtin samplers are sufficient. For GeoDataset, dataset objects require a bounding box for indexing. For this reason, we define our own GeoSampler implementations below. These can be used like so:

from torch.utils.data import DataLoader

from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomGeoSampler

dataset = Landsat(...)
sampler = RandomGeoSampler(dataset, size=1000, length=100)
dataloader = DataLoader(dataset, sampler=sampler)

Random Geo Sampler

class torchgeo.samplers.RandomGeoSampler(dataset, size, length, roi=None)

Bases: torchgeo.samplers.GeoSampler

Samples elements from a region of interest randomly.

This is particularly useful during training when you want to maximize the size of the dataset and return as many random chips as possible.

This sampler is not recommended for use with tile-based datasets. Use RandomBatchGeoSampler instead.

__init__(dataset, size, length, roi=None)[source]

Initialize a new Sampler instance.

The size argument can either be:

  • a single float - in which case the same value is used for the height and width dimension

  • a tuple of two floats - in which case, the first float is used for the height dimension, and the second float for the width dimension

Parameters
__iter__()[source]

Return the index of a dataset.

Returns

(minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset

Return type

Iterator[torchgeo.datasets.BoundingBox]

__len__()[source]

Return the number of samples in a single epoch.

Returns

length of the epoch

Return type

int

Grid Geo Sampler

class torchgeo.samplers.GridGeoSampler(dataset, size, stride, roi=None)

Bases: torchgeo.samplers.GeoSampler

Samples elements in a grid-like fashion.

This is particularly useful during evaluation when you want to make predictions for an entire region of interest. You want to minimize the amount of redundant computation by minimizing overlap between chips.

Usually the stride should be slightly smaller than the chip size such that each chip has some small overlap with surrounding chips. This is used to prevent stitching artifacts when combining each prediction patch. The overlap between each chip (chip_size - stride) should be approximately equal to the receptive field of the CNN.

__init__(dataset, size, stride, roi=None)[source]

Initialize a new Sampler instance.

The size and stride arguments can either be:

  • a single float - in which case the same value is used for the height and width dimension

  • a tuple of two floats - in which case, the first float is used for the height dimension, and the second float for the width dimension

Parameters
__iter__()[source]

Return the index of a dataset.

Returns

(minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset

Return type

Iterator[torchgeo.datasets.BoundingBox]

__len__()[source]

Return the number of samples over the ROI.

Returns

number of patches that will be sampled

Return type

int

Batch Samplers

When working with large tile-based datasets, randomly sampling patches from each tile can be extremely time consuming. It’s much more efficient to choose a tile, load it, warp it to the appropriate coordinate reference system (CRS) and resolution, and then sample random patches from that tile to construct a mini-batch of data. For this reason, we define our own BatchGeoSampler implementations below. These can be used like so:

from torch.utils.data import DataLoader

from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomBatchGeoSampler

dataset = Landsat(...)
sampler = RandomBatchGeoSampler(dataset, size=1000, batch_size=10, length=100)
dataloader = DataLoader(dataset, batch_sampler=sampler)

Random Batch Geo Sampler

class torchgeo.samplers.RandomBatchGeoSampler(dataset, size, batch_size, length, roi=None)

Bases: torchgeo.samplers.BatchGeoSampler

Samples batches of elements from a region of interest randomly.

This is particularly useful during training when you want to maximize the size of the dataset and return as many random chips as possible.

__init__(dataset, size, batch_size, length, roi=None)[source]

Initialize a new Sampler instance.

The size argument can either be:

  • a single float - in which case the same value is used for the height and width dimension

  • a tuple of two floats - in which case, the first float is used for the height dimension, and the second float for the width dimension

Parameters
__iter__()[source]

Return the indices of a dataset.

Returns

batch of (minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset

Return type

Iterator[List[torchgeo.datasets.BoundingBox]]

__len__()[source]

Return the number of batches in a single epoch.

Returns

number of batches in an epoch

Return type

int

Base Classes

If you want to write your own custom sampler, you can extend one of these abstract base classes.

Geo Sampler

class torchgeo.samplers.GeoSampler(dataset, roi=None)

Bases: torch.utils.data.Sampler[torchgeo.datasets.BoundingBox], abc.ABC

Abstract base class for sampling from GeoDataset.

Unlike PyTorch’s Sampler, GeoSampler returns enough geospatial information to uniquely index any GeoDataset. This includes things like latitude, longitude, height, width, projection, coordinate system, and time.

__init__(dataset, roi=None)[source]

Initialize a new Sampler instance.

Parameters
abstract __iter__()[source]

Return the index of a dataset.

Returns

(minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset

Return type

Iterator[torchgeo.datasets.BoundingBox]

Batch Geo Sampler

class torchgeo.samplers.BatchGeoSampler(dataset, roi=None)

Bases: torch.utils.data.Sampler[List[torchgeo.datasets.BoundingBox]], abc.ABC

Abstract base class for sampling from GeoDataset.

Unlike PyTorch’s BatchSampler, BatchGeoSampler returns enough geospatial information to uniquely index any GeoDataset. This includes things like latitude, longitude, height, width, projection, coordinate system, and time.

__init__(dataset, roi=None)[source]

Initialize a new Sampler instance.

Parameters
abstract __iter__()[source]

Return a batch of indices of a dataset.

Returns

batch of (minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset

Return type

Iterator[List[torchgeo.datasets.BoundingBox]]

Read the Docs v: v0.2.0
Versions
latest
stable
v0.2.0
v0.1.1
v0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources