AlternativesΒΆ
TorchGeo is not the only geospatial machine learning library out there, there are a number of alternatives that you can consider using. The goal of this page is to provide an up-to-date listing of these libraries and the features they support in order to help you decide which library is right for you. Criteria for inclusion on this list include:
geospatial: Must be primarily intended for working with geospatial, remote sensing, or satellite imagery data. This rules out libraries like torchvision, which provides little to no support for multispectral data or geospatial transforms.
machine learning: Must provide basic machine learning functionality. This rules out libraries like GDAL, which is useful for data loading but offers no support for machine learning.
library: Must be an actively developed software library with testing and releases on repositories like PyPI or CRAN. This rules out libraries like TorchSat, RoboSat, and Solaris, which have been abandoned and are no longer maintained.
When deciding which library is most useful to you, it is worth considering the features they support, how actively the library is being developed, and how popular the library is, roughly in that order.
Note
Software is a living, breathing organism and is constantly undergoing change. If any of the above information is incorrect or out of date, or if you want to add a new project to this list, please open a PR!
Last updated: 28 August 2024
FeaturesΒΆ
Key: β full support, π§ partial support, β no support
Library |
ML Backend |
I/O Backend |
Spatial Backend |
Transform Backend |
Datasets |
Weights |
CLI |
Reprojection |
STAC |
Time-Series |
---|---|---|---|---|---|---|---|---|---|---|
PyTorch |
GDAL, h5py, laspy, OpenCV, pandas, pillow, scipy |
R-tree |
Kornia |
82 |
68 |
β |
β |
β |
π§ |
|
scikit-learn |
GDAL, OpenCV, pandas |
geopandas |
numpy |
0 |
0 |
β |
β |
β |
π§ |
|
PyTorch, TensorFlow* |
GDAL, OpenCV, pandas, pillow, scipy, xarray |
STAC |
Albumentations |
0 |
6 |
β |
β |
β |
β |
|
PaddlePaddle |
GDAL, OpenCV |
shapely |
numpy |
7 |
14 |
π§ |
β |
β |
π§ |
|
PyTorch |
GDAL, OpenCV, pandas |
geopandas |
numpy |
0 |
0 |
β |
β |
β |
β |
|
PyTorch |
GDAL, OpenCV, pandas, pillow, scipy |
R-tree |
Albumentations |
0 |
3 |
β |
β |
β |
β |
|
R Torch |
GDAL |
tidyverse |
22 |
0 |
β |
β |
β |
β |
||
PyTorch |
GDAL, h5py, pandas, xarray |
R-tree |
Albumentations |
16 |
1 |
β |
β |
β |
π§ |
|
scikit-learn, TensorFlow |
pandas, scipy, numpy, rasterio |
geopandas |
numpy |
0 |
0 |
β |
β |
β |
π§ |
*Support for TensorFlow was dropped in Raster Vision 0.12.
ML Backend: The machine learning libraries used by the project. For example, if you are a scikit-learn user, eo-learn may be perfect for you, but if you need more advanced deep learning support, you may want to choose a different library.
I/O Backend: The I/O libraries used by the project to read data. This gives you a rough idea of which file formats are supported. For example, if you need to work with lidar data, a project that uses laspy may be important to you.
Spatial Backend: The spatial library used to perform spatial joins and compute intersections based on geospatial metadata. This may be important to you if you intend to scale up your simulations.
Transform Backend: The transform library used to perform data augmentation. For example, Kornia performs all augmentations on PyTorch Tensors, allowing you to run your transforms on the GPU for an entire mini-batch at a time.
Datasets: The number of geospatial datasets built into the library. Note that most projects have something similar to TorchGeoβs RasterDataset
and VectorDataset
, allowing you to work with generic raster and vector files. Collections of datasets are only counted a single time, so data loaders for Landsats 1β9 are a single dataset, and data loaders for SpaceNets 1β8 are also a single dataset.
Weights: The number of model weights pre-trained on geospatial data that are offered by the library. Note that most projects support hundreds of model architectures via a library like PyTorch Image Models, and can use models pre-trained on ImageNet. There are far fewer libraries that provide foundation model weights pre-trained on multispectral satellite imagery.
CLI: Whether or not the library has a command-line interface. This low-code or no-code solution is convenient for users with limited programming experience, and can offer nice features for reproducing research and fast experimentation.
Reprojection: Whether or not the library supports automatic reprojection and resampling of data. Without this, users are forced to manually warp data using a library like GDAL if they want to combine datasets in different coordinate systems or spatial resolutions.
STAC: Whether or not the library supports the spatiotemporal asset catalog. STAC is becoming a popular means of indexing into spatiotemporal data like satellite imagery.
Time-Series: Whether or not the library supports time-series modeling. For many remote sensing applications, time-series data provide important signals.
GitHubΒΆ
These are metrics that can be scraped from GitHub.
Library |
Contributors |
Forks |
Watchers |
Stars |
Issues |
PRs |
Releases |
Commits |
Core SLOCs |
Test SLOCs |
Test Coverage |
License |
---|---|---|---|---|---|---|---|---|---|---|---|---|
72 |
308 |
44 |
2,409 |
419 |
1,714 |
11 |
2,074 |
30,761 |
16,058 |
100% |
MIT |
|
40 |
300 |
46 |
1,108 |
159 |
638 |
44 |
2,470 |
8,207 |
5,932 |
92% |
MIT |
|
32 |
381 |
71 |
2,046 |
697 |
1,382 |
22 |
3,614 |
22,779 |
9,429 |
90% |
Apache-2.0 |
|
23 |
89 |
13 |
374 |
91 |
116 |
3 |
644 |
21,859 |
3,384 |
48% |
Apache-2.0 |
|
17 |
281 |
55 |
2,834 |
129 |
104 |
27 |
186 |
5,598 |
92 |
22% |
MIT |
|
17 |
172 |
17 |
474 |
413 |
301 |
44 |
864 |
3,357 |
1,794 |
86% |
MIT |
|
14 |
76 |
28 |
451 |
622 |
583 |
44 |
6,244 |
24,284 |
8,697 |
94% |
GPL-2.0 |
|
9 |
10 |
9 |
121 |
46 |
92 |
2 |
243 |
10,101 |
583 |
44% |
Apache-2.0 |
|
6 |
17 |
8 |
132 |
20 |
11 |
15 |
496 |
1,636 |
94 |
37% |
Apache-2.0 |
Contributors: The number of contributors. This is one of the most important metrics for project development. The more developers you have, the higher the bus factor, and the more likely the project is to survive. More contributors also means more new features and bug fixes.
Forks: The number of times the git repository has been forked. This gives you an idea of how many people are attempting to modify the source code, even if they have not (yet) contributed back their changes.
Watchers: The number of people watching activity on the repository. These are people who are interested enough to get notifications for every issue, PR, release, or discussion.
Stars: The number of people who have starred the repository. This is not the best metric for number of users, and instead gives you a better idea about the amount of hype surrounding the project.
Issues: The total number of open and closed issues. Although it may seem counterintuitive, the more issues, the better. Large projects like PyTorch have tens of thousands of open issues. This does not mean that PyTorch is broken, it means that it is popular and has enough issues to discover corner cases or open feature requests.
PRs: The total number of open and closed pull requests. This tells you how active development of the project has been. Note that this metric can be artificially inflated by bots like dependabot.
Releases: The number of software releases. The frequency of releases varies from project to project. The important thing to look for is multiple releases.
Commits: The number of commits on the main development branch. This is another metric for how active development has been. However, this can vary a lot depending on whether PRs are merged with or without squashing first.
Core SLOCs: The number of source lines of code in the core library, excluding empty lines and comments. This tells you how large the library is, and how long it would take someone to write something like it themselves. We use scc to compute SLOCs and exclude markdown languages from the count.
Test SLOCs: The number of source lines of code in the testing suite, excluding empty lines and comments. This tells you how well tested the project is. A good goal to strive for is a similar amount of code for testing as there is in the core library itself.
Test Coverage: The percentage of the core library that is hit by unit tests. This is especially important for interpreted languages like Python and R where there is no compiler type checking. 100% test coverage is ideal, but 80% is considered good.
License: The license the project is distributed under. For commercial researchers, this may be very important and decide whether or not they are able to use the software.
DownloadsΒΆ
These are download metrics for the project. Note that these numbers can be artificially inflated by mirrors and installs during continuous integration. They give you a better idea of the number of projects that depend on a library than the number of users of that library.
Library |
PyPI/CRAN Last Week |
PyPI/CRAN Last Month |
PyPI/CRAN All Time |
Conda All Time |
Total All Time |
---|---|---|---|---|---|
1,828 |
9,789 |
255,293 |
21,108 |
276,401 |
|
319 |
1,560 |
141,983 |
36,205 |
178,188 |
|
138 |
652 |
61,938 |
3,254 |
65,192 |
|
10 |
36 |
1,642 |
0 |
1,642 |
|
1,553 |
7,363 |
117,664 |
18,147 |
135,811 |
|
564 |
3,652 |
761,520 |
62,869 |
824,389 |
|
304 |
648 |
12,767 |
78,976 |
91,743 |
|
259 |
988 |
2,378 |
0 |
2,378 |
|
162 |
621 |
12,048 |
0 |
12,048 |
PyPI Downloads: The number of downloads from the Python Packaging Index. PyPI download metrics are computed by PyPI Stats and PePy.
CRAN Downloads: The number of downloads from the Comprehensive R Archive Network. CRAN download metrics are computed by Meta CRAN and DataScienceMeta.
Conda Downloads: The number of downloads from Conda Forge. Conda download metrics are computed by Conda Forge.