Selecting the best group of objects in spatial databases and graphs
Date of Issue2018-01-31
Interdisciplinary Graduate School (IGS)
With the prominence of GPS-enabled mobile devices, people can easily acquire their locations and access online services anywhere. The spatial information bridges the gap between the offline world and the online social networks, which leads to the germination of the location-based social networks (LBSNs) and event-based social networks (EBSNs). For instance, users can share their feelings and locations with their friends through LBSNs like Foursquare, and post a review of a restaurant through Yelp. As another example, people can find interesting local events to attend, or organize their own events through Meetup. We refer to the geo-tagged data (e.g. posts, check-ins, reviews, POIs) and the users in the social networks as objects. The availability of large scale social data paves the way to design data-driven approaches for various applications and tasks. We are interested in one type of queries called the group optimization query. Specifically, the group optimization query aims to select a group of objects which together satisfy a specified constraint such that the pre-defined score function of the group is optimized. In this dissertation, we define and investigate novel group optimization queries on spatial data and social networks. Specifically, for group optimization queries on spatial data, we define an axis-aligned rectangular region of a given size as the spatial constraint. Given a size of a rectangle, all the selected spatial objects should be close enough such that they can be covered by a rectangle of the given size. On the other hand, for group optimization queries in social networks, we define a set of attributes as the attribute constraint. All the selected nodes should together cover the specified set of attributes. With the spatial constraint and the attribute constraint, we define and investigate group optimization queries on spatial data and social networks by adopting different functions to measure the quality of the selected objects. We conduct three studies on spatial data, namely best aggregation region detection, bursty region detection and attribute-based similar region detection. We also conduct a study on social networks, namely influential organizers detection. In our study of best aggregation region detection, we propose the best region search (BRS) problem. Specifically, given a set O of spatial objects, the size a * b of a query rectangle, and a submodular monotone aggregate score function, the BRS problem aims at identifying a rectangular region of size a * b such that the aggregate score of the spatial objects inside the region is maximized. We then propose an algorithm to find the exact location of the region with the maximum score. By assuming that slight imprecision to the solution is acceptable, we further propose an algorithm which can find an approximate answer to the BRS problem bounded by a constant. In our study of bursty region detection, we propose a problem which utilizes such spatial object stream to detect and maintain bursty region of a given size in a specified geographic area in real time. We adopt two sliding windows to model the burstiness of a region. To handle spatial streams with high arrival rates, we design several pruning techniques to avoid frequent recomputation. In addition, we propose two approximate algorithms with better efficiency. We also extend the proposed solutions to support continuous detection of top-k bursty regions. In our study of attribute-based similar region detection, we propose the attribute-based similar region search (ASRS) problem, which aims at finding a region of the same size as the query rectangle such that the attribute distance between the two regions is minimized. We propose a grid-based method to address the ASRS problem efficiently. In our study of influential organizers detection, we formulate the influential cover set problem, which aims to select k users who can together cover a set of required skills and their influence is maximized. We adopt the independent cascade model to evaluate the influence of users. We first propose two heuristic greedy algorithms which are very efficient. The third algorithm has an approximation ratio of 2. It guarantees to find a feasible solution if any.
DRNTU::Engineering::Computer science and engineering::Information systems::Database management