Chapter 1 Introduction
As application areas such as earth science, public health, public transportation, environmental management, social media services, location services, multimedia, and so on started to produce large and rich datasets, it quickly became clear that there was potentially valuable knowledge embedded in this data in the form of various spatial features. Spatial co-location pattern mining developed to identify these interesting but hidden relationships between spatial features (Shekhar & Huang, 2001; Huang et al., 2004; Shekhar et al., 2015).
Spatial co-location patterns (SCPs) represent subsets of spatial features (spatial objects, events, or attributes), and SCP mining is essential to reveal the frequent co-occurrence patterns among spatial features in various applications. For example, these techniques can show that West Nile virus usually appears in areas where mosquitoes are abundant and poultry are kept; or that botanists discover that 80% of sub-humid evergreen broadleaved forests grow with orchid plants (Wang et al., 2009b).
In this chapter, we .rst brie.y look at the emergence, evolution, and development of SCP mining; summarize the current major challenges and issues troubling SCP mining techniques; and indicate how preference-based SCP mining may be the future. Finally, an overview picture of the related content of the book is given and the topics that will be covered in each chapter are brie.y introduced.
1.1 The Background and Applications
The emergence of SCP mining techniques has been driven by three forces:
First, with the development of general data mining techniques, the mined objects extended from the initial relational and transactional data to spatial data. Spatial data has become important and widely used data, containing richer and more complex information than the traditional relation-based or transaction-based data.
Although general data mining originated in relational and transactional databases, the rich knowledge discovery from spatial databases has brought attention to the available research on SCP data mining.
Second, areas such as mobile computing, scienti.c simulations, business science, environmental observation, climate measurements, geographic search logs, and so on are continually producing enormous quantities of rich spatial data. Manual analysis of these large spatial datasets is impractical, and there is a consequent need for ef.cient computational analysis techniques for the automatic extraction of the potentially valuable information. The emergence of data mining and knowledge discovery would have been very constrained without the development of geo-spatial data analysis.
Third, differently to traditional data, spatial data is often inherently related, so the closer is the location of two spatial objects, the more likely they are to have similar properties. For example, the closer the geographical locations of cities are, the more similar they are in natural resources, climate, temperature, and economic status. However, because spatial data is combined with other characteristics in massive, multi-dimension databases, possibly with uncertainty, it is necessary to use speci.c and targeted techniques. At its simplest, spatial co-location pattern discovery is directed toward processing data with spatial contexts to .nd subsets of spatial features that are frequently located together.
Spatial co-location pattern (SCP) mining, as one important area in spatial data mining, has been extensively researched for the past twenty years (Shekhar & Huang, 2001; Huang et al., 2004; Huang et al., 2008; Yoo et al., 2004, Yoo & Shekhar, 2006; Celik et al., 2007; Lin and Lim, 2008; Xiao et al., 2008; Wang et al., 2008; Wang et al., 2009a, b; Yoo & Bow, 2011a, b, 2012, 2019; Wang et al., 2013a, b; Barua & Sander, 2014; Qian et al., 2014; Andrzejewski & Boinski, 2015; Li et al., 2016; Zhao et al., 2016; Ouyang et al., 2017; Wang et al., 2018a, 2018b, c; Yao et al., 2018; Bao & Wang, 2019; Ge et al., 2021; Yoo et al., 2014, 2020; Liu et al., 2020; Yao et al., 2021). An early paper described “a set of spatial features (spatial objects, events, or attributes) which are frequently observed together in a spatial proximity.” They also de.ned a distance-based interest measure called the participation index to assess the prevalence of a co-location and some of the basic nomenclature which has been used ever since.
Let F be the set of spatial features, S be the set of spatial instances. For a feature f2F, the set of all instances of f is denoted as N( f ). Let R be a neighbor relationship over pairwise instances. Given two instances i2S, i’2S, we say they have neighbor relationship if the distance between them is no larger than a user-speci.ed distance threshold d, i.e., R(i,io’) , distance(i, i’)三 d.A co-location c is a subset of the feature set F, c . F. The number of features in c is call
展开