for ecologists and practicioners
During the 2020-2021 period, I was hired to develop novel workflows in Google Earth Engine (GEE) cloud-based platform to conduct spatial ecological analyses or obtain spatiotemporal information from remotely sensed products to inform movement analyses.
GEE is a free web-based spatial analysis platform that requires only a web browser and an internet connection to programmatically access and analyze data from its multi-petabyte catalog of regularly-updated satellite imagery (e.g., MODIS, Landsat, Sentinel) and other geospatial datasets.
Taking advantage of the extensive database and the computing power of Google computing network, we can now implement species distribution models, functional and structural connectivity analysis very quickly and without the need of multiple software to compile environmental variables and conduct analyses. This project is part of the Working Land and Seascapes Amplification and Innovation Grant, SCBI.
Species distribution models in GEE
We developed a workflow in GEE to fit species distribution models, which are increasingly used in the fields of ecology, wildlife management, and conservation.
This workflow allows researchers and conservation practitioners to rapidly obtain outputs from computationally demanding analyses and offers potential to produce multi-temporal or near-real time estimates of habitat suitability by incorporating remote sensing data from active satellite missions.
We implemented a workflow for species distribution modeling in GEE that includes importing species occurrence data into the GEE platform, selecting and preparing predictor variables, and performing model fitting with spatial or temporal split-block cross-validation techniques. We present three case studies that demonstrate: i) a rigorous SDM workflow that produces informative model predictions and accompanying assessment metrics and ii) that the code can be modified to leverage GEE’s massive data catalog and supercomputing capabilities for more complex analyses, such as predicting changes in habitat suitability over time.
Main Conclusions: Our SDM workflow allows users to benefit from the high speed and performance of GEE without the need for significant computing infrastructure. This workflow may be especially beneficial to researchers in countries where computing power or internet bandwidth are limited, as alternative workflows frequently require the download, storage, and processing of large raster datasets. We also discuss key limitations of implementing SDMs in GEE, such as user memory limits and the lack of high-level functions. We include a step-by-step guide that addresses several of these challenges by developing several custom functions for SDM modeling and introducing batch processing to avoid memory limits.
We included three main case studies to demonstrate some of the possibilities for analysis using GEE
1 - Modelling species distribution using presence only data
In the basic SDM framework, we modeled habitat suitability and potential distribution on Bradypus variegatus. The habitat suitability model (left) was created by averaging 10 random forest models. The potential distribution map (right) was calculated using a majority vote from the ten random forest binary classifications. The red dots indicate occurrence records of Bradypus variegatus between 2003 and 2020.
2 - Accounting for temporal dynamics in forest cover
The second case study accounts for the temporal component of the occurrence data, matching the remotely sensed data to the time the presence record was recorded. The output model shows changes in habitat suitability in the last two decades for Cebus capucinus (2000-2019).
The main map shows the regression slope calculated from 20 years of habitat suitability values. Red pixels are areas with declining habitat suitability and blue pixels are areas where habitat suitability increased across years. Only pixels with significant trends are shown (alpha = 0.05). This output was obtained by fitting a linear regression to habitat suitability values for each pixel across 20 years of data.
The zoom-in boxes show from bottom left to bottom right: predicted habitat suitability for 2000, predicted habitat suitability for 2019, regression slope (i.e., direction of change in habitat suitability), and the deforested areas (in red) obtained from Hansen et al. (2013).
3 - Modelling species distribution at high spatial resolution using unclassified satellite images as predictor variables
In the thirds case study we demonstrate the full potential of GEE by modeling the habitat suitability of Wood Thrush (Hylocichla mustelina) across the eastern continental USA (4,606,284 km2) at 90 m spatial resolution.
This analysis used a combination of 6,150 Landsat 8 surface reflectance collection 2 and 3 global Advanced Land Observing Satellite (ALOS) Phased Arrayed L-band Synthetic Aperture Radar (SAR) mosaics and averaged temperature datasets.The model was fit using 34,880 presence records. Model predictions were made over 705,147,693 pixels. The model took 22 h to export.
The zoom-in boxes show details of habitat suitability predictions at 90 m spatial resolution.
Red dots represent 2,000 presence locations of Hylocichla mustelina, randomly selected from the 34,880 observations used for modeling.
The potential distribution is also shown in the bottom left corner of the figure.
The Google Earth Engine code and data used in this study are freely available at the following Google Earth Engine repository
You can access the tutorials to fit Species Distribution Models in Google Earth Engine below or at the following link.
The following video explains the basics for implementing SDMs in GEE
Enhancing Animal Movement Analyses: Spatiotemporal Matching of Animal Positions with Remotely Sensed Data Using Google Earth Engine and R
Movement ecologists have witnessed a rapid increase in the amount of animal position data collected over the past few decades, as well as a concomitant increase in the availability of ecologically relevant remotely sensed data.
Many researchers, however, lack the computing resources necessary to incorporate the vast spatiotemporal aspects of datasets available, especially in countries with less economic resources, limiting the scope of ecological inquiry.
We developed an R coding workflow that bridges the gap between R and the multi-petabyte catalogue of remotely sensed data available in Google Earth Engine (GEE) to efficiently extract raster pixel values that best match the spatiotemporal aspects (i.e., spatial location and time) of each animal’s GPS position.
As an example, we extracted Normalized Difference Vegetation Index information from the MOD13Q1 data product for 12,344 GPS animal locations by matching the closest MODIS image in the time series to each GPS fix. Data extractions were completed in approximately 3 min.
In a second case study, we extracted hourly air temperature from the ERA5-Land dataset for 33,074 GPS fixes from 12 different wildebeest (Connochaetes taurinus) in approximately 34 min. We then investigated the relationship between step length (i.e., the net distance between sequential GPS locations) and temperature and found that animals move less as temperature increases.
These case studies illustrate the potential to explore novel questions in animal movement research using high-temporal-resolution, remotely sensed data products. The workflow we present is efficient and customizable, with data extractions occurring over relatively short time periods. While computing times to extract remotely sensed data from GEE will vary depending on internet speed, the approach described has the potential to facilitate access to computationally demanding processes for a greater variety of researchers and may lead to increased use of remotely sensed data in the field of movement ecology.
We present a step-by-step tutorial on how to use the code and adapt it to other data products that are available in GEE. You can see it below or at this link.
We are currently developing a workflow and an online web application to conduct connectivity analyses.