Spatial association between regionalizations using the information-theoretical V-measure

ABSTRACT There is a keen interest in calculating spatial associations between two variables spanning the same study area. Many methods for calculating such associations have been proposed, but the case when both variables are categorical is underdeveloped despite the fact that many datasets of interest are in the form of either regionalizations or thematic maps. In this paper, we advance this case by adapting the so-called -measure method from its original information-theoretical formulation to the analysis of variance formulation which provides more insight for spatial analysis. We present a step-by-step derivation of the -measure from the perspective of the analysis of variance. The method produces three indices of global association and two sets of local association indicators which could be mapped to indicate spatial distribution of association strength. The open-source software for calculating all indices from vector datasets accompanies the paper. To showcase the utility of the -measure, we identified three different application contexts: comparative, associative, and derivative, and present an example of each of them. The -measure method has several advantages over the widely used Mapcurves method, it has clear interpretations in terms of mutual information as well as in terms of analysis of variance, it provides more precise assessment of association, it is ready-to-use through the accompanying software, and the examples given in the paper serves as a guide to the gamut of its possible applications. Two specific contributions stemming from our re-analysis of the -measure are the finding of the conceptual flaw in the Geographical Detector—a method to quantify associations between numerical and categorical spatial variables, and a proposal for the new, cartographically based algorithm for finding an optimal number of regions in clustering-derived regionalizations.


Introduction
A common task in spatial data analysis is to calculate a degree to which two variables are spatially associated.Both global measure (a single value assessment of an overall association) and local measures (association at each observation unit) are the soughtafter indicators.An approach to this task depends on the form of the data.
If both variables are numerical, multivariate spatial correlation methods (Wartenberg 1985, Getis and Ord 1992, Lee 2001) are applied.If one variable is numerical and another is categorical, the so-called Geographical Detector frequently referred to as a map comparison (Foody 2007).
There are two different contexts which call for map comparison.In most cases, the context is the comparison of thematic maps (for example, land cover maps of the same area at different times), where map units (often raster cells) are assigned a unique category from a relatively short list of possible themes.In thematic maps, many disjointed map units are assigned the same category.Another context is a comparison of regionalizations.A regionalization is a segmentation of the entire spatial domain (an area of interest) into a set of geographically meaningful single-connected units each having its unique name.Examples of regionalizations include maps of climate classification (Kottek et al. 2006, Peel et al. 2007, Cannon 2012, Zscheischler et al. 2012, Zhang and Yan 2014, Netzel and Stepinski 2016), maps of ecoregions (Olson et al. 2001, Bailey 2014, Omernik and Griffith 2014), and administrative maps.Note that in practice, the single-connectedness of all regions is a goal which is rarely achieved.All examples given above have some regions consisting of disjointed parts (for example, in the regionalization of the USA into the states, the state of Michigan consists of two disjointed parts).Thus, for the purpose of this paper, there is no difference between regionalization and the thematic map if, in the later, we consider the sets of units assigned to the same category (sometimes referred to as strata, see, for example in Wang et al. (2010) or Metzger et al. (2012)) as regions.In the rest of this paper, we will use a term regionalization to cover both contexts.
The bulk of the previous work on map comparison (Power et al. 2001, Hagen 2003, Foody 2004, Visser and DeNijs 2006) was done in the context of raster thematic maps.Such methods overlay two raster maps and perform a cell-by-cell comparison to assess the similarity between the two maps.Hargrove et al. (2006) discussed many disadvantages of such approach and proposed a map comparison based on a degree of overlap between regions in the two maps (the so-called 'Mapcurves' method).More recently, Sadahiro and Oguchi (2015) proposed another overlap method of map comparison.
Here, we propose a different overlap method for assessing a degree of spatial association between regionalizations.The proposed method is a reinterpretation of the V-measure concept (Rosenberg and Hirschberg 2007) from its original information-theoretical formulation to the analysis of variance formulation.In this form, the V-measure is directly comparable to the Geographical Detector (Wang et al. 2010) and can be used to reveal its shortcoming, while its original interpretation, in terms of the mutual information, gives it a solid theoretical ground.The V-measure can also be used to determine the optimal number of regions in regionalizations originating from data clustering.
In addition to re-introducing the V-measure, we also identify and describe three different contexts in which it could be used: (1) comparative, (2) associative, and (3) derivative.The comparative context involves comparing two regionalizations created to depict the same realm.One example of such context is a comparison of the classic, global map of climate types (Köppen 1936) with more recent global maps of climate types obtained by clustering global datasets of climatic variables (Metzger et al. 2012, Zscheischler et al. 2012, Zhang and Yan 2014, Netzel and Stepinski 2016).Another example of comparative context is ecoregion mapping.For the USA, there are three widely referenced delineations of ecoregions, one developed by the US Environmental Protection Agency (Omernik and Griffith 2014), another developed by the US Forest Service (Bailey 2014), and the third-the Terrestrial Ecoregions of the World-developed by Olson et al. (2001).They all depict the same realm but use different methodologies; our method can quantify a degree of similarity between those maps and identify locations of largest disagreement.
An associative context involves finding magnitudes of associations between a target regionalization (response variable), and a number of regionalizations corresponding to possible predictors of this target.An example of such context is a regionalization of a domain into ecoregions as a target and categorical maps of land cover, landforms, soils, and climate covering the same domain as possible predictors for ecoregions (Nowosad and Stepinski 2018a).
Finally, the derivative context pertains to regionalizations obtained via algorithmic clustering of the domain.Examples of regionalizations created via clustering include newer maps of global climate types (see above), a map of land pattern types in the USA (Niesterowicz and Stepinski 2013) and maps of forest types in Canada (Partington andCardille 2013, Niesterowicz andStepinski 2017.When creating a regionalization via clustering, it is not immediately clear into how many clusters (regions) divide the domain.The computer science community has developed several heuristics to determine an 'optimal' number of clusters (Davies and Bouldin 1979, Rousseeuw 1987, Salvador and Chan 2004); they all aim at minimizing dissimilarities between data instances within clusters and maximizing dissimilarities between the clusters.Our method selects the number of clusters in a spatial dataset from a different, cartographic, perspective by determining the number of regions above which the further change to regionalization-a spatial manifestation of clustering-does not change the map in a meaningful way.

Methodology
The V-measure originated in the field of computer science as a measure for comparison of different clusterings of the same domain.Clustering is the task of grouping a set of objects into clusters in such a way that objects in the same cluster are more similar to each other than to those in other clusters.The important observation is that comparing regionalizations is conceptually equivalent to comparing clusterings.There is a rich literature describing many different measures proposed to quantify comparison between two clusterings of the same domain (for reviews see Denoeud and Guénoche (2006) and Wagner and Wagner (2007)).From among possible cluster comparison measures, we find the V-measure to be particularly well-suited for comparing regionalizations.It can be easily reinterpreted from a discrete domain to a continuous domain by replacing counting objects with calculating overlap areas between regions.It has an appealing interpretation in terms of an information theory.It provides both global and local measures of association.Finally, V-measure's construction is conceptually similar to the Geographical Detector, which helps to identify the weakness in the latter.
Let us denote the area of the domain as A. Consider two different regionalizations of the domain.To make a further discussion more lucid, we will refer to the first one as a regionalization and to the second one as a partition.The regionalization R divides the domain into n regions r i j i ¼ 1; . . .; n.The partition Z divides the domain into m zones z j j j ¼ 1; . . .; m.We use the term zone to denote a region in the second regionalization.Superposition of regionalization and partition divides the domain into n Â m segments having areas a i; j j i ¼ 1; . . .; n; j ¼ 1; . . .; m where a i; j is the area of the segment of the domain, which belong simultaneously to the region i and to the zone j.The entire area of a region r i is A i ¼ P m j¼1 a i; j , the entire area of a z one z j is A j ¼ P n i¼1 a i; j , and the area of the entire domain is A ¼ P m j¼1 P n i¼1 a i; j .There are two different metrics needed for evaluation of spatial association between two regionalizations: homogeneity and completeness.
Consider the following expression: in homogeneity of partition with respect to regionalization A nominator in the fraction on the right side of Equation ( 1) measures an inhomogeneity of a given zone in terms of regions.This is measured in terms of the Shannon entropy (Shannon 1948): If S R j ¼ 0 then the zone j is homogeneous in terms of regions (it is a part of a single region).When the value of S R j increases, the zone j is increasingly inhomogeneous in terms of regions (it overlays an increasing number of regions).Equation ( 2) quantifies the level of this inhomogeneity or a variance of regions in zone j.However, we are not so much interested in the absolute value of the zone inhomogeneity as in its value relative to the inhomogeneity of the entire domain with respect to regions (a denominator in the fraction on the right side of Equation ( 1)).This is because for the partition to be associated with regionalization, the regions should be colocated with the zones, so the regions within zones should have less variance than within the entire domain.The dispersion of regions in the entire domain is also given by the Shannon entropy: An overall inhomogeneity of partition with respect to regionalization is P m j¼1 ðA j =AÞ ðS R j =S R Þ, an area-weighted average of S R j =S R ratios calculated over all zones (see Equation ( 1)).The value of an overall inhomogeneity changes from 0 in the perfectly homogeneous case (each zone is within a single region) to 1 when each zone has the same composition of regions as the entire domain.The homogeneity metric suppose to be an increasing function of an average homogeneity of zones with respect to regions, therefore, it is defined as and it has a range between 0 and 1.
Note that the homogeneity metric is not sufficient to assess a degree of association between regionalization and partitioning.The high value of h assures that zones are homogeneous with respect to regions, but it does not assure that regions are homogeneous with respect to zones.For example, when a single region extends over multiple zones, each zone will be homogeneous but there will be no association between the regionalization and partitioning.Therefore, we need to calculate a homogeneity of regions with respect to zones.This metric-called completeness and denoted by c-is calculated analogously to homogeneity but with the roles of regions and zones reversed.in homogeneity of regionalization with respect to partition Completeness, like the homogeneity, has the range between 0 and 1 and is an increasing function of average homogeneity of regions with respect to zones.The single, overall measure of spatial association between regionalization and partition is called the V-measure (Rosenberg and Hirschberg 2007) and is given by the (optionally weighted) harmonic mean of homogeneity and completeness: where β is a weight given to c relative to h; V β !h if β !0, and is the harmonic mean of h and c.The V-measure has a range between 0 (no spatial association) and 1 (a perfect association).Note that if we change the roles of regionalization and partitioning, then the regionalization provides the zones and partitioning provides the regions; we do not need to recalculate the measures h and c as the h new ¼ c, c new ¼ h, and the value of V 1 remains the same: Figure 1 illustrates the procedure of calculating h, c, and V 1 using a simple example.These three quantities are the global measures of association between the two regionalizations.V 1 is an overall global measure to be used when a single number assessment of association is required.As a pair, the values of h and c provide more information than V 1 alone.Ratios S R j =S R , j ¼ 1; . . .; m and S Z i =S Z , i ¼ 1; . . .; n are the local measures of association between the two regionalization.They could be used to map a degree of local correspondence between the two regionalizations.

Software
We wrote an open-source R package (Nowosad and Stepinski 2018b) implementing the V-measure SABRE2018.The package, called SABRE (Spatial Association Between REgionalizations), is designed to work with vector (shapefile) input data.Given two vector maps, SABRE calculates values of V β , h, and c to be used as the global assessment of association between the two maps.It also returns maps of local associations utilizing the values S R j =S R , j ¼ 1; . . .; m, and S Z i =S Z , i ¼ 1; . . .; n.SABRE also implements the Mapcurves method (Hargrove et al. 2006) for vector maps.

Applications
In this section, we present examples of how the V-measure may be used in each of the three contexts identified in the Introduction: to compare two regionalization, to calculate a degree of associative between response map and maps of factor variables, and to decide on the number of regions in regionalization obtained by means of a clustering algorithm.

Comparing ecoregionalizations of the United States
Ecoregions are the result of a division of land into areal units of a homogeneous ecosystem, which contrast from surroundings.The US Environmental Protection Agency (EPA) delineated ecoregions in the conterminous US at four hierarchical levels of precision (Omernik 1987, Omernik andGriffith 2014).We use EPA Level III map as the first regionalization; it delineates the US into n ¼ 85 regions (see Figure 2(a)).For comparison, we use the Terrestrial Ecoregions of the World (TEW) map (Olson et al. 2001) restricted to boundaries of the conterminous USA as the second regionalization; it delineates the USA into m ¼ 72 zones (see Figure 2(b)).Both maps suppose to reflect the same realm but were constructed using different methodologies.The EPA map was constructed by analyzing the patterns and composition of biotic and abiotic phenomena that affect or reflect differences in ecosystems.The TEW map is based on the synthesis of previous biogeographical studies.Visual comparison of Figure 2(a,b) reveals the overall similarity between the two maps, but also local differences between them.The V-measure method can quantify the similarity and depict the locations of greatest differences between the two maps.
Using SABRE, we calculatedh ¼ 0:79, c ¼ 0:87, and V 1 ¼ 0:83 as global measures of association between EPA and TEW maps.Recall from Section 2 that h measures an average homogeneity of TEW zones with respect to EPA regions (Equation 4and Figure 2(d)) and c measures a homogeneity of EPA regions with respect to TEW zones (Equation 8and Figure 2(c)).Visually, the map in Figure 2(c) appears to be more homogeneous than the map in Figure 2(d) in agreement with quantitative assessment c > h.This is because, there are more EPA ecoregions than TEW ecoregions, so it is more likely that TEW ecoregions cross through multiple EPA ecoregions than the vice versa.However, overall, the two maps are highly associated as indicated by the high value of V 1 .The two inhomogeneity maps (Figure 2(c,d)) identify locations where the two maps differ.The biggest difference between the two maps is in the middle of the country where a single TEW ecoregion (named 'Central forest-grassland transition') intersect 12 different EPA ecoregions.

Associations between a map of ecoregions and its factors
As we mentioned in the previous subsection, EPA regionalization of the conterminous USA is based on the analysis of patterns and composition of biotic and abiotic factors including geology, landforms, soils, vegetation, climate, land cover, wildlife, and hydrology.Here, we demonstrate the utility of the V-measure to assess a degree of correspondence between the EPA Level III map of ecoregions and maps of four such factors: land cover, soils, landforms, and climate.For clarity, we restrict this demonstration to a territory of a single state-New Mexico.
The factors are all in the form of thematic (categorical) maps.We use the European Space Agency's (ESA) Climate Change Initiative (CCI) 300 m resolution global land cover map (CCI-LC 2015), which classifies land cover worldwide into 22 classes.Soil data are provided by the 250 m resolution global SoilGrids (Hengl et al. 2017) reclassified to 12 orders.Landforms data are 250 m resolution classification of landforms into 17 classes (Karagulle et al. 2017).Finally, the climate data are provided by clustering a set of bioclimatic variables at worldwide climatic grid into 37 classes (Metzger et al. 2012).
Figure 3 shows a map of EPA level III ecoregions and the maps of the four factors within the state of New Mexico.We use SABRE to calculate values of h, c, and V 1 to assess a spatial association between EPA ecoregionalization (eight ecoregions within the state of New Mexico) and a thematic map of each factor.The 'Thematic maps' section of Table 1 shows the results.The first column (denoted by m) in this section lists the number of categories in a given map present within the state of New Mexico; this is also a number of zones in the factor map.The values of h measure average homogeneity of factors' zones with respect to ecoregions and the values of c measure homogeneity of ecoregions with respect to factors' zones.Note that values of c tend to be higher than the values of h (except for landforms) indicating that ecoregions are more homogeneous with respect to land cover, soils, and, in particular, the climate, than categories of factors are homogeneous with respect to ecoregions (for example, multiple ecoregions are found within a climate category 'cool, semi-dry').Overall, associations between the map of ecoregions and thematic maps of individual factors are low as indicated by small values of V 1 .
However, it is important to note that EPA ecoregions were not constructed on the basis of homogeneity of factor categories, but rather on the basis of homogeneity of patterns of factor categories.We used a method for pattern-based segmentation of thematic maps (Jasiewicz et al. 2018, Nowosad andStepinski 2018a) to calculate segmentations of the area of New Mexico with respect to homogeneity of patterns of land cover categories, soil classes, and landforms categories.The climate zones have too large spatial extent for calculation of pattern at the scale of the state of New Mexico.Figure 4 (top row) shows segmentations.Note that there are much more segments than ecoregions.This is because segments are the results of machine delineation, which painstakingly kept track of all changes in a pattern, whereas ecoregions are the result of manual mapping which is much more generalized.The middle row of Figure 4 shows inhomogeneity maps of ecoregions with respect to segments and the bottom row of Figure 4 shows inhomogeneity maps of segments with respect to ecoregions.
We calculated values of h, c, and V 1 to assess a spatial association between EPA ecoregionalization and the three segmentations.The 'Segmentations' section of Table 1 shows the results with m indicating the number of segments.Note that the values of h are high because small segments usually are contained within a single ecoregion, but the values of c are lower because larger ecoregions usually contain several segments.Overall, associations between the map of ecoregions and maps delineating homogeneous patterns of factors are relatively high (as indicated by values of V 1 ), and, in any case, significantly higher than associations between the map of ecoregions and thematic maps of individual factors.Table 1.Spatial associations between the EPA map of ecoregions in the state of New Mexico and its biotic and abiotic factors.

Thematic maps Segmentations
Factor

Selecting a number of clusters in regionalizations stemming from clustering
A number of studies had proposed algorithmic regionalization by means of clustering a large number of small local areal units (elements) into a small number of larger regions (clusters of elements) based on similarity of features.This includes clustering local climates (Metzger et al. 2012, Zhang and Yan 2014, Netzel and Stepinski 2016) to obtain climatic zones, clustering local environmental conditions to obtain ecoregions (Hargrove and Hoffman 2005), and clustering local landscapes to obtain regions of the uniform pattern of land cover (Niesterowicz and Stepinski 2013, Partington and Cardille 2013, Niesterowicz et al. 2016).All these studies encounter the problem of selecting a number of clusters and thus the number of regions in the resultant map.The number of regions is estimated using the methods developed for non-spatial clustering (Davies and Bouldin 1979, Rousseeuw 1987, Salvador and Chan 2004).The V-measure offers a different, distinctly spatial method for estimating the number of regions resulting from clustering.
In the proposed method, a sequence of clusterings with a consecutively increasing number of clusters is calculated.Next, for each clustering, a value of V-measure between this clustering and the subsequent clustering is calculated.This value indicates a degree of similarity between maps stemming from the two clusterings.For clusterings with a small number of clusters, the maps are different and V 1 (map1, map2) is relatively small.As the number of clusters increases, the two consecutive maps are becoming more similar and V 1 (map1, map2) increases.The map with an optimal number of regions (clusters) is the one for which the V 1 achieves the maximum value.
To demonstrate the proposed method, we consider a problem of regionalization of land cover patterns.We start with 210 km Â 210 km study area located around Atlanta, Georgia, with land cover represented by the 30 m resolution National Land Cover Dataset 2011 (NLCD 2011)(NLCD 2011).We tessellate this area into 4,900 square-sized local landscapes (each consisting of 100 Â 100 NLCD cells) as shown in Figure 5(a).Next, we cluster local landscapes using a method described by Niesterowicz et al. (2016) but using a non-hierarchical partitioning around medoids (PAM) clustering algorithm (Kaufman and Rousseeuw 1987).We performed 19 clustering assuming number of clusters from N ¼ 2 to N ¼ 20. Figure 5(b) shows dependence of V 1 (map1, map2), where map1 is a regionalization with N regions and map2 is a regionalization with N þ 1 regions.The value V 1 achieves maximum at N ¼ 11, thus we selected a map with 11 regions as the optimal regionalization.
The top row of Figure 6 shows 3 out of 19 regionalizations of the Atlanta study area, using N ¼ 4, N ¼ 6, and N ¼ 11 regions, respectively.Middle row of Figure 6 shows corresponding subsequent regionalizations (N ¼ 5, N ¼ 7, and N ¼ 12).The bottom row of Figure 6 shows inhomogeneity maps of N regions in the top map in terms of N þ 1 regions in the middle map.Changing the number of regions from N= 4 to N= 5 results in a separation of the blue region from the light-green region, thus the light-green region is relatively inhomogeneous with respect to other three regions in the N= 4 regionalization (see the rightmost column in Figure 6).Because the light-green region occupies a large portion of the study area, its inhomogeneity enters the calculation of the V-measure with the high weight resulting in a relatively low value of V 1 .Changing the number of regions from N= 11 to N= 12 results in dividing the blue region into two different regions (see the leftmost column in Figure 6).However, because the blue region in N= 11 regionalization occupies a small portion of the study area, its inhomogeneity enters into the calculation of the V-measure with a small weigh resulting in a relatively high value of V 1 .

Discussion and conclusions
In this paper, we have re-introduced the V-measure to the geographic community.This measure, popular in the part of computer science community dealing with evaluation of clustering algorithms but rarely used in geographical research (for an exception see Netzel and Stepinski (2016)), is a valuable addition to GIS analyses aimed at quantifying the spatial association between two variables.
We re-derived the V-measure from the perspective of variance analysis (section 2) instead of from the original perspective of information theory, making it more relevant to the spatial analysis.In its variance analysis formulation, V-measure (intended for quantifying the spatial association between two regionalizations) has the same form as the Geographical Detector method (Wang et al. 2010) (intended for quantifying the spatial association between a regionalization and a numerical variable).In the Geographical Detector, the numerical variable is a response variable (G), whereas the categorical variable is a potential determinant (D).The spatial association index is called power of determinant PðD; GÞ and is given as where K is the number of zones formed by the categorical variable D, n k is the number of measurements of G within a zone k, n ¼ P K k¼1 n k is the number of all measurements of G in the entire domain, σ k is a variance of variable G within a zone k, and σ is a variance of variable G in the entire domain.Note that the mathematical form of PðD; GÞ (Equation 10) is identical to mathematical forms of h (Equation 4) and c (Equation 8).The only difference is that in h and c the variance is calculated using the Shannon entropy because the variable is categorical.
Our derivation in Section 2 reveals a problem with the Geographical Detector method.It calculates a relative homogeneity of variable G with respect to D, but no relative homogeneity of D with respect to G.This is because the variable G is numerical and does not naturally form zones.However, this leaves open the possibility that the assessment of the spatial association between G and D may be inaccurate if similar values of G extend over multiple zones of D. In such case, the Geographical Detector method will incorrectly indicate the high spatial association.If there is a large number of G measurements, we suggest first to segment the domain with respect to homogeneity of G values and then to perform the assessment of the spatial association between D and segmentation of G using the V-measure.
The V-measure has several advantages over the widely used Mapcurves method.First, the V-measure has a clear interpretation in terms of the information theory (as a mutual information between two variables representing the two regionalizations, see Rosenberg2007) as well as in terms of variance analysis.Second, the V-measure provides more precise information than Mapcurves.V β ¼ 1 only if the two regionalizations are identical, whereas Mapcurves score equals to 1 every time one regionalization is a subdivision of the second regionalization.This is because although Mapcurves considers two goodness-of-fit scores (which are conceptually rough equivalents of our h and c), it only uses the larger one as an overall score.Third, we provide the R package SABRE, which calculates the V-measure between two regionalizations given in the vector (shapefile) format which makes an immediate calculation of the V-measure for real-world datasets possible.
We identified three broad contexts for application of the V-measure.In Section 3, we gave a specific example for each context.These examples are intended as a guide to using the V-measure.The context of finding an optimal number of regions for clustering-based regionalization is perhaps the most novel application of the method as it uses a series of increasingly specific regionalizations to determine an optimal number of regions.
The reason why the V-measure works for determining an optimal number of regions is as follows.If the number of regions is too small, then the regions are strongly inhomogeneous and an additional region is likely to significantly change the configuration of regionalization to improve the homogeneity of the regions.This results in the small value of V 1 (left part of Figure 5(b)).If the number of regions is too large, then the regions are almost homogeneous and an additional region is artificially imposed resulting in decreased spatial association and a relatively small value of V 1 (right part of Figure 5(b)).If the number of regions is close to being optimal, an additional region causes only a small adjustment to the configuration of regionalization resulting in a high value of V 1 .
Overall, we have contributed to a better understanding of the V-measure in the context of spatial analysis including its connection to the Geographical Detector.We have also demonstrated its utility to a number of different spatial analyses and provided its software implementation.One direction for the future development is to combine an algorithm for regionalization of a numerical variable with the V-measure algorithm to address the shortcoming of the Geographical Detector method.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the University of Cincinnati Space Exploration Institute.

Notes on contributors
Jakub Nowosad is a postdoctoral fellow at the Space Informatics Lab.His main research is focused on developing and applying data mining and pattern-based spatial methods to large datasets in order to broaden our understanding of processes and patterns in the environment.During his PhD he had worked on predicting pollen concentration of Corylus, Alnus, and Betula using machine learning and GIS.His research interests also include spatial analysis, statistics, and programming.Jakub is an avid R user and an active member of the R community.
Tomasz Stepinski is the Thomas Jefferson Chair Professor of Space Exploration at the University of Cincinnati and a Director of Space Informatics Lab.His recent area of research is a development of automated tools for intelligent and intuitive exploration of very large Earth and planetary datasets.He led the team who developed the GeoPAT2a toolbox for pattern-based spatial analysis.He is also interested in computational approaches to geodemographics, racial segregation and diversity.

Figure 1 .
Figure 1.An example illustrating an assessment of the association between two regionalizations.The red regionalization segments a rectangular domain into four regions.The blue regionalization (partition) segments the same domain into three regions (zones).The variance of red regions in the three zones and the variance of blue zones in four regions are shown.Values of a i; j (in arbitrary units) are given in the part of the table enclosed by the thick-edged rectangle.

Figure 2 .
Figure 2. Spatial association between two ecoregionalizations of the conterminous U.S. The top row shows the EPA Level III map of ecoregions (a) and the TEW map of ecoregions (b).In both maps, different ecoregions are shown by random colors.The bottom row shows a map of inhomogeneity of EPA ecoregions in terms of TEW ecoregions (c) and a map of inhomogeneity of TEW ecoregions in terms of EPA ecoregions.Inhomogeneity (variance) is measured by normalized Shannon entropy.

Figure 3 .
Figure 3. EPA Level III ecoregions in the state of New Mexico and the maps of four factors influencing a delineation of these ecoregions.Legends for the maps of the factors show only dominant categories.

Figure 4 .
Figure 4. (Top row) Segmentations of influencing factors for delineation of ecoregions with respect to homogeneity of patterns of their categories.Segments are indicated by random colors.(Middle row) Inhomogeneity maps of ecoregions with respect to segments.(Bottom row) Inhomogeneity maps of segments with respect to ecoregions.

Figure 5 .
Figure 5. NLCD 2011 over the study area located around Atlanta, Georgia tessellated into 4,900 local landscapes, each having size of 3 km Â 3 km.Different colors indicate different land cover categories as described by the legend.(Right) Results of V-measure analysis using consecutive regionalizations with increasing number of regions, (b) V 1 , (b) homogeneity, (d) completeness.

Figure 6 .
Figure 6.Examples of regionalizations of the Atlanta study area.Each column consists of a regionalization with a given number of regions (top) and a regionalization with one additional region (middle).The inhomogeneity map of regions in the top map with respect to regions in the middle map is given at the bottom of the column.Colors in the top and middle rows indicate different regions.