云书馆

精彩书摘

Chapter 1 Introduction
　　As application areas such as earth science， public health， public transportation， environmental management， social media services， location services， multimedia， and so on started to produce large and rich datasets， it quickly became clear that there was potentially valuable knowledge embedded in this data in the form of various spatial features. Spatial co-location pattern mining developed to identify these interesting but hidden relationships between spatial features (Shekhar & Huang， 2001; Huang et al.， 2004; Shekhar et al.， 2015).
　　Spatial co-location patterns (SCPs) represent subsets of spatial features (spatial objects， events， or attributes)， and SCP mining is essential to reveal the frequent co-occurrence patterns among spatial features in various applications. For example， these techniques can show that West Nile virus usually appears in areas where mosquitoes are abundant and poultry are kept; or that botanists discover that 80% of sub-humid evergreen broadleaved forests grow with orchid plants (Wang et al.， 2009b).
　　In this chapter， we .rst brie.y look at the emergence， evolution， and development of SCP mining; summarize the current major challenges and issues troubling SCP mining techniques; and indicate how preference-based SCP mining may be the future. Finally， an overview picture of the related content of the book is given and the topics that will be covered in each chapter are brie.y introduced.
　　1.1 The Background and Applications
　　The emergence of SCP mining techniques has been driven by three forces:
　　First， with the development of general data mining techniques， the mined objects extended from the initial relational and transactional data to spatial data. Spatial data has become important and widely used data， containing richer and more complex information than the traditional relation-based or transaction-based data.
　　Although general data mining originated in relational and transactional databases， the rich knowledge discovery from spatial databases has brought attention to the available research on SCP data mining.
　　Second， areas such as mobile computing， scienti.c simulations， business science， environmental observation， climate measurements， geographic search logs， and so on are continually producing enormous quantities of rich spatial data. Manual analysis of these large spatial datasets is impractical， and there is a consequent need for ef.cient computational analysis techniques for the automatic extraction of the potentially valuable information. The emergence of data mining and knowledge discovery would have been very constrained without the development of geo-spatial data analysis.
　　Third， differently to traditional data， spatial data is often inherently related， so the closer is the location of two spatial objects， the more likely they are to have similar properties. For example， the closer the geographical locations of cities are， the more similar they are in natural resources， climate， temperature， and economic status. However， because spatial data is combined with other characteristics in massive， multi-dimension databases， possibly with uncertainty， it is necessary to use speci.c and targeted techniques. At its simplest， spatial co-location pattern discovery is directed toward processing data with spatial contexts to .nd subsets of spatial features that are frequently located together.
　　Spatial co-location pattern (SCP) mining， as one important area in spatial data mining， has been extensively researched for the past twenty years (Shekhar & Huang， 2001; Huang et al.， 2004; Huang et al.， 2008; Yoo et al.， 2004， Yoo & Shekhar， 2006; Celik et al.， 2007; Lin and Lim， 2008; Xiao et al.， 2008; Wang et al.， 2008; Wang et al.， 2009a， b; Yoo & Bow， 2011a， b， 2012， 2019; Wang et al.， 2013a， b; Barua & Sander， 2014; Qian et al.， 2014; Andrzejewski & Boinski， 2015; Li et al.， 2016; Zhao et al.， 2016; Ouyang et al.， 2017; Wang et al.， 2018a， 2018b， c; Yao et al.， 2018; Bao & Wang， 2019; Ge et al.， 2021; Yoo et al.， 2014， 2020; Liu et al.， 2020; Yao et al.， 2021). An early paper described “a set of spatial features (spatial objects， events， or attributes) which are frequently observed together in a spatial proximity.” They also de.ned a distance-based interest measure called the participation index to assess the prevalence of a co-location and some of the basic nomenclature which has been used ever since.
　　Let F be the set of spatial features， S be the set of spatial instances. For a feature f2F， the set of all instances of f is denoted as N( f ). Let R be a neighbor relationship over pairwise instances. Given two instances i2S， i’2S， we say they have neighbor relationship if the distance between them is no larger than a user-speci.ed distance threshold d， i.e.， R(i，io’) ， distance(i， i’)三 d.A co-location c is a subset of the feature set F， c . F. The number of features in c is call

展开

Contents
1 Introduction 1
1.1 The Background and Applications 1
1.2 The Evolution and Development 5
1.3 The Challenges and Issues 7
1.4 Content and Organization of the Book 8
2 Maximal Prevalent Co-location Patterns 11
2.1 Introduction 11
2.2 Why the MCHT Method Is Proposed for Mining MPCPs 12
2.3 Formal Problem Statement and Appropriate Mining Framework 17
2.3.1 Co-Location Patterns 17
2.3.2 Related Work 19
2.3.3 Contributions and Novelties 21
2.4 The Novel Mining Solution 22
2.4.1 The Overall Mining Framework 22
2.4.2 Bit-String-Based Maximal Clique Enumeration 23
2.4.3 Constructing the Participating Instance Hash Table 28
2.4.4 Calculating Participation Indexes and Filtering MPCPs 30
2.4.5 The Analysis of Time and Space Complexities 32
2.5 Experiments 33
2.5.1 Data Sets 33
2.5.2 Experimental Objectives 34
2.5.3 Experimental Results and Analysis 34
2.6 Chapter Summary 47
3 Maximal Sub-prevalent Co-location Patterns 49
3.1 Introduction 49
3.2 Basic Concepts and Properties 51
3.3 A Prefix-Tree-Based Algorithm (PTBA) 54
3.3.1 Basic Idea 54
3.3.2 Algorithm 56
3.3.3 Analysis and Pruning 57
3.4 A Partition-Based Algorithm (PBA) 58
3.4.1 Basic Idea 58
3.4.2 Algorithm 62
3.4.3 Analysis of Computational Complexity 64
3.5 Comparison of PBA and PTBA 64
3.6 Experimental Evaluation 66
3.6.1 Synthetic Data Generation 67
3.6.2 Comparison of Computational Complexity Factors 67
3.6.3 Comparison of Expected Costs Involved in Identifying Candidates 69
3.6.4 Comparison of Candidate Pruning Ratio 69
3.6.5 Effects of the Parameter Clumpy 70
3.6.6 Scalability Tests 70
3.6.7 Evaluation with Real Data Sets 72
3.7 Related Work 75
3.8 Chapter Summary 77
4 SPI-Closed Prevalent Co-location Patterns 79
4.1 Introduction 79
4.2 Why SPI-Closed Prevalent Co-locations Improve Mining 81
4.3 The Concept of SPI-Closed and Its Properties 83
4.3.1 Classic Co-location Pattern Mining 83
4.3.2 The Concept of SPI-Closed 85
4.3.3 The Properties of SPI-Closed 86
4.4 SPI-Closed Miner 89
4.4.1 Preprocessing and Candidate Generation 89
4.4.2 Computing Co-location Instances and Their PI Values 93
4.4.3 The SPI-Closed Miner 93
4.5 Qualitative Analysis of the SPI-Closed Miner 95
4.5.1 Discovering the Correct SPI-Closed Co-location Set Ω 96
4.5.2 The Running Time of SPI-Closed Miner 96
4.6 Experimental Evaluation 96
4.6.1 Experiments on Real-life Data Sets 97
4.6.2 Experiments with Synthetic Data Sets 100
4.7 Related Work 104
4.8 Chapter Summary 105
5 Top-k Probabilistically Prevalent Co-location Patterns 107
5.1 Introduction 107
5.2 Why Mining Top-k Probabilistically Prevalent Co-location Patterns (Top-k PPCPs) 108
5.3 Definitions 110
5.3.1 Spatially Uncertain Data 110
5.3.2 Prevalent Co-locations 112
5.3.3 Prevalence Probability 113
5.3.4 Min_PI-Prevalence Probabilities 114
5.3.5 Top-k PPCPs 115
5.4 A Framework of Mining Top-k PPCPs 115
5.4.1 Basic Algorithm 115
5.4.2 Analysis and Pruning of Algorithm 5.1 116
5.5 Improved Computation of P(c， min_PI) 117
5.5.1 0-1-Optimization 117
5.5.2 The Matrix Method 118
5.5.3 Polynomial Matrices 122
5.6 Approximate Computation of P(c， min_PI) 125
5.7 Experimental Evaluations 128
5.7.1 Evaluation on Synthetic Data Sets 128
5.7.2 Evaluation on Real Data Sets 134
5.8 Chapter Summary 136
6 Non-redundant Prevalent Co-location Patterns 137
6.1 Introduction 137
6.2 Why We Need to Explore Non-redundant Prevalent Co-locations 139
6.3 Problem Definition 141
6.3.1 Semantic Distance 141
6.3.2 δ-Covered 143
6.3.3 The Problem Definition and Analysis 145
6.4 The RRclosed Method 148
6.5 The RRnull Method 150
6.5.1 The Method 150
6.5.2 The Algorithm 153
6.5.3 The Correctness Analysis 155
6.5.4 The Time Complexity Analysis 156
6.5.5 Comparative Analysis 157
6.6 Experimental Results 158
6.6.1 On the Three Real Data Sets 158
6.6.2 On the Synthetic Data Sets 161
6.7 Related Work 165
6.8 Chapter Summary 166
7 Dominant Spatial Co-location Patterns 167
7.1 Introduction 167
7.2 Why Dominant SCPs Are Useful to Mine 168
7.3 Related Work 171
7.4 Preliminaries and Problem Formulation 172
7.4.1 Preliminaries 173
7.4.2 Definitions 174
7.4.3 Formal Problem Formulation 179
7.4.4 Discussion of Progress 179
7.5 Proposed Algorithm for Mining Dominant SCPs 180
7.5.1 Basic Algorithm for Mining Dominant SCPs 180
7.5.2 Pruning Strategies 182
7.5.3 An Improved Algorithm 186
7.5.4 Comparison of Complexity 187
7.6 Experimental Study 188
7.6.1 Data Sets 188
7.6.2 Efficiency 189
7.6.3 Effectiveness 193
7.6.4 Real Applications 196
7.7 Chapter Summary 198
8 High Utility Co-location Patterns 201
8.1 Introduction 201
8.2 Why We Need High Utility Co-location Pattern Mining 202
8.3 Related W

展开