Geospatial data analysis : from querying to visualized exploration
Date of Issue2018-02-09
Interdisciplinary Graduate School (IGS)
With the proliferation of social sharing platforms (e.g., Foursquare and Yelp) and online social media (e.g., Twitter, Facebook and Instagram), large collections of geospatial data are becoming available, such as geo-tagged photos and geo-textual posts. The availability of substantial amount of such geospatial objects gives prominence to the spatial keyword query, which is to find the geo-textual objects that best match the query arguments exploiting both locations and textual descriptions. As an important type of spatial keyword query, the m-closest keywords (mCK) query is useful in many applications such as detecting locations of web resources. However, the existing work does not study the intractability of this problem and only provides exact algorithms, which are computationally expensive. Since the volume of geospatial data keeps growing rapidly, an emerging challenge is how to explore and understand the data before we analyze it. Presenting geospatial data through a visualization system is no doubt the most efficient way for end users, however, a remaining challenge is how to read the oversize data effectively. We aim at solving the challenge from two perspectives. On the one hand, the users may be exhausted and distracted if too many results are displayed at the same time. Following the rule of “Less is more”, we can show a small set of representative objects to users instead of the whole collection. Thus, the challenge in this problem is how to build such a map rendering system that efficiently selects a small set of representative objects from the current region of user’s interest. On the other hand, as the characteristics of geospatial data, the objects of similar types or functions may tend to appear together. Such circumstance is common in real-life, for example, there are more shopping malls grouping in the downtown area of a city than the living zones. To this end, we can explore the geo-spatial data by dividing the whole space into several functional regions according to the utilities of the geo-spatial objects. First, we study the m-closest keywords (mCK) query, finding a group of objects such that they cover all query keywords and have the smallest diameter, which is defined as the largest distance between any pair of objects in the group. We prove that the problem of answering mCK queries is NP-hard. We first devise a greedy algorithm that has an approximation ratio of 2. Then, we observe that an mCK query can be approximately answered by finding the circle with the smallest diameter that encloses a group of objects together covering all query keywords. We prove that the group enclosed in the circle can answer the mCK query with an approximation ratio of 2/√3. Based on this, we develop an algorithm for finding such a circle exactly, which has a high time complexity. To improve efficiency, we propose another two algorithms that find such a circle approximately, with a ratio of ( 2/√3+ ϵ). Finally, we propose an exact algorithm that utilizes the group found by the ( 2/√3+ϵ)-approximation algorithm to obtain the optimal group. We conduct extensive experiments using real-life datasets. The experimental results offer insights into both efficiency and accuracy of the proposed approximation algorithms, and the results also demonstrate that our exact algorithm outperforms the best known algorithm by an order of magnitude. Next, we study how to develop an interactive visualization map exploration system. We propose that such system should support the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. The first two constraints are fundamental challenges to a map exploration system, which aims to efficiently select a small set of representative objects from the current region of user’s interest, and any two selected objects should not be too close to each other for users to distinguish in the limited space of a screen. We formalize it as the Spatial Object Selection (sos) problem, prove that it is an NP-hard problem, and develop a novel approximation algorithm with performance guarantees. To further support interactive exploration of geospatial data on maps, we propose the Interactive sos (isos) problem, in which we enrich the sos problem with the zooming consistency and panning consistency constraints. The objective of isos is to provide seamless experience for end-users to interactively explore the data by navigating the map. We extend our algorithm for the sos problem to solve the isos problem, and propose a new strategy based on pre-fetching to significantly enhance the efficiency. Finally we have conducted extensive experiments to show the efficiency and scalability of our approach. Last but not least, we study how to partition the geospatial objects into functional regions according to the utilities and spatial distributions. It aims to aggregate similar and adjacent objects into enclosed regions that indicate certain functions. During this process, since the attributes distribution of the geospatial objects are aggregated, some information is lost during this process. We define the information loss to measure how much information is lost when merging two sets of objects. To reduce the possibilities of generated regions, we exploit the existing road networks to limit the boundaries of the separated regions to be roads, since it is the natural partition of cities and people live in these roads-segmented regions and POIs (points of interests) fall in these regions. We formulate this problem as Functional Region Segmentation (frs) problem, and prove that it is an NP-hard problem. We develop a bottom-up greedy algorithm to solve the frs problem, which terminates in limited steps. Results of empirical studies show that our proposed algorithm is able to solve frs problem efficiently and effectively.