IBM PAIRS: Scalable big geospatial-temporal data and analytics as-a-service

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Siyuan Lu, Hendrik Hamann

Abstract

The rapid growth of geospatial-temporal data from sources like satellites, drones, weather modeling, IoT sensors etc., accumulating at a pace of PetaBytes to ExaBytes annually, opens unprecedented opportunities for both science and industrial applications. However, the sheer size and complexity of such data presents significant challenges for conventional geospatial information systems (GIS) which are supported by relational geospatial databases and cloud-based geospatial services based on file systems (manifested as object stores or “cold” tape storages).

To fully exploit the value of geospatial-temporal data, particularly by leveraging the latest advances in machine-learning (ML) and artificial intelligence (AI), a new paradigm for platforms and services is required. Some of the necessary salient features include: (i) scalable cloud-based deployment capable of handling hundreds of PetaBytes of data, (ii) harmonization of data in order to mask the complexity of data (schema, map projection etc.) from end users, (iii) advanced search capabilities of data at a “pixel level” (in contrast to “file level”), and (iv) “in-data” analytics and computation to avoid downloading the mammoth amount of data through the internet.

In this chapter, we review the current trend of the design, implementation, and functionalities of such geospatial-temporal platforms and associated services, focusing on those based upon scalable key-value datastores. IBM PAIRS (Physical Analytics Integrated Data and Repository Services) Geoscope will be used as an example through which we illustrate how the architecture and key-value datastore design supports the aforementioned features and high-performance data ingestion, query, and analytics. The specific implementation of a publicly available PAIRS instance will be presented along with its performance benchmarking.

Furthermore, we review the RESTful API interface of IBM PAIRS. The APIs are minimalistic and designed to provide the end users from different perspectives - data providers, industrial analysts, software developers, data scientists - a smooth experience to seamlessly exploit and use geospatial-temporal data. The API interaction with PAIRS will be illustrated through a few query examples and use cases in extended range weather forecasting and electric utilities. The use cases also highlight how contextual insights can be rapidly gained through a variety of “cross-layer” queries and analytics to reveal relationships/patterns and to predict trends.

DOI

https://doi.org/10.31223/X5XK5J

Subjects

Earth Sciences, Environmental Sciences, Oceanography and Atmospheric Sciences and Meteorology

Keywords

Weather, geospatial, Big-Data, spatial-temporal

Dates

Published: 2020-11-02 06:51

Last Updated: 2020-11-02 14:51

License

CC BY Attribution 4.0 International