TY - GEN
T1 - Region sampling and estimation of geosocial data with dynamic range calibration
AU - Li, Yanhua
AU - Steiner, Moritz
AU - Bao, Jie
AU - Wang, Limin
AU - Zhu, Ting
PY - 2014
Y1 - 2014
N2 - Location based social networks (LBSNs) are becoming increasingly popular with the fast deployment of broadband mobile networks and the growing prevalence of versatile mobile devices. This success has attracted great interest in studying and measuring the characteristics of LBSNs, such as Facebook Places, Yelp, and Google+ Local. However, it is often prohibitive, and sometimes too costly, to obtain a detailed and complete snapshot of a LBSN due to its usually massive scale. In this work, taking Foursquare as an example, we focus on sampling and estimating restricted geographic regions in LBSNs, such as a city or a country. By exploiting the application programming interfaces (APIs) provided by Foursquare for geographic search, we first introduce how to obtain the 'ground truth', namely, a complete set of all venues (i.e., places) in a specified region. Then, we propose random region sampling algorithms that allow us to draw representative samples of venues, and design unbiased estimators of regional characteristics of venues. We validate the efficiency of our sampling algorithms on Foursquare using complete datasets obtained from 12 regions, such as Switzerland, New York City and Los Angeles. Our results are applicable to perform sampling and estimation in all GeoDatabases, such as Facebook Places, Yelp, and Google+ Local, which have similar venue search APIs as Foursquare. These location service providers can also benefit from our results to enable efficient online statistic estimation.
AB - Location based social networks (LBSNs) are becoming increasingly popular with the fast deployment of broadband mobile networks and the growing prevalence of versatile mobile devices. This success has attracted great interest in studying and measuring the characteristics of LBSNs, such as Facebook Places, Yelp, and Google+ Local. However, it is often prohibitive, and sometimes too costly, to obtain a detailed and complete snapshot of a LBSN due to its usually massive scale. In this work, taking Foursquare as an example, we focus on sampling and estimating restricted geographic regions in LBSNs, such as a city or a country. By exploiting the application programming interfaces (APIs) provided by Foursquare for geographic search, we first introduce how to obtain the 'ground truth', namely, a complete set of all venues (i.e., places) in a specified region. Then, we propose random region sampling algorithms that allow us to draw representative samples of venues, and design unbiased estimators of regional characteristics of venues. We validate the efficiency of our sampling algorithms on Foursquare using complete datasets obtained from 12 regions, such as Switzerland, New York City and Los Angeles. Our results are applicable to perform sampling and estimation in all GeoDatabases, such as Facebook Places, Yelp, and Google+ Local, which have similar venue search APIs as Foursquare. These location service providers can also benefit from our results to enable efficient online statistic estimation.
UR - http://www.scopus.com/inward/record.url?scp=84901771371&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901771371&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2014.6816726
DO - 10.1109/ICDE.2014.6816726
M3 - Conference contribution
AN - SCOPUS:84901771371
SN - 9781479925544
T3 - Proceedings - International Conference on Data Engineering
SP - 1096
EP - 1107
BT - 2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
PB - IEEE Computer Society
T2 - 30th IEEE International Conference on Data Engineering, ICDE 2014
Y2 - 31 March 2014 through 4 April 2014
ER -