doi: 10.3934/jimo.2018057

An adaptive probabilistic algorithm for online k-center clustering

1. 

Department of Information and Operations Research, College of Applied Sciences, Beijing University of Technology, Beijing 100124, China

2. 

School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China

* Corresponding author: Dongmei Zhang

Received  August 2017 Revised  December 2017 Published  April 2018

The $k$-center clustering is one of the well-studied clustering problems in computer science. We are given a set of data points $P\subseteq R^d$, where $R^d$ is $d$ dimensional Euclidean space. We need to select $k≤ |P|$ points as centers and partition the set $P$ into $k$ clusters with each point connecting to its nearest center. The goal is to minimize the maximum radius. We consider the so-called online $k$-center clustering model where the data points in $R^d$ arrive over time. We present the bi-criteria $(\frac{n}{k}, (\log\frac{U^*}{L^*})^2)$-competitive algorithm and $(\frac{n}{k}, \logγ\log\frac{nγ}{k})$-competitive algorithm for semi-online and fully-online $k$-center clustering respectively, where $U^*$ is the maximum cluster radius of optimal solution, $L^*$ is the minimum distance of two distinct points of $P$, $γ$ is the ratio of the maximum distance of two distinct points and the minimum distance of two distinct points of $P$ and $n$ is the number of points that will arrive in total.

Citation: Ruiqi Yang, Dachuan Xu, Yicheng Xu, Dongmei Zhang. An adaptive probabilistic algorithm for online k-center clustering. Journal of Industrial & Management Optimization, doi: 10.3934/jimo.2018057
References:
[1]

D. AchlioptasM. Chrobak and J. Noga, Competitive analysis of randomized paging algorithms, Theoretical Computer Science, 234 (2000), 203-218. doi: 10.1016/S0304-3975(98)00116-9.

[2]

J. Bar-IlanG. Kortsarz and D. Peleg, How to allocate network centers, Journal of Algorithms, 15 (1993), 385-415. doi: 10.1006/jagm.1993.1047.

[3]

M. CharikarC. ChekuriT. Feder and R. Motwani, Incremental clustering and dynamic information retrieval, SIAM Journal on Computing, 33 (2004), 1417-1440. doi: 10.1137/S0097539702418498.

[4]

S. ChaudhuriN. Garg and R. Ravi, The p-neighbor k-center problem, Information Processing Letters, 65 (1998), 131-134. doi: 10.1016/S0020-0190(97)00224-X.

[5]

S. Chechik and D. Peleg, The fault-tolerant capacitated k-center problem, Theoretical Computer Science, 566 (2015), 12-25. doi: 10.1016/j.tcs.2014.11.017.

[6]

M. Cygan, M. T. Hajiaghayi and S. Khuller, LP rounding for k-centers with non-uniform hard capacities, Proceedings of the fifity-third Annual Symposium on Foundations of Computer Science (FOCS). IEEE, (2012), 273–282.

[7]

S. B. DavidA. BorodinR. KarpG. Tardos and A. Wigderson, On the power of randomization in on-line algorithms, Algorithmica, 11 (1994), 2-14. doi: 10.1007/BF01294260.

[8]

T. Feder and D. Greene, Optimal algorithms for approximate clustering, Proceedings of the twentieth Annual ACM Symposium on Theory of Computing, ACM, (1988), 434–444. doi: 10.1145/62212.62255.

[9]

C. G. FernandesS. P. de Paula and L. L. C. Pedrosa, Improved approximation algorithms for capacitated fault-tolerant k-center, Algorithmica, 80 (2018), 1041-1072.

[10]

T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theoretical Computer Science, 38 (1985), 293-306. doi: 10.1016/0304-3975(85)90224-5.

[11]

D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the k-center problem, Mathematics of Operations Research, 10 (1985), 180-184. doi: 10.1287/moor.10.2.180.

[12]

W. L. Hsu and G. L. Nemhauser, Easy and hard bottleneck location problems, Discrete Applied Mathematics, 1 (1979), 209-215. doi: 10.1016/0166-218X(79)90044-1.

[13]

S. KhullerR. Pless and Y. J. Sussmann, Fault tolerant k-center problems, Theoretical Computer Science, 242 (2000), 237-245. doi: 10.1016/S0304-3975(98)00222-9.

[14]

S. Khuller and Y. J. Sussmann, The capacitated k-center problem, SIAM Journal on Discrete Mathematics, 13 (2000), 403-418. doi: 10.1137/S0895480197329776.

[15]

S. O. Krumke, On a generalization of the p-center problem, Information Processing Letters, 56 (1995), 67-71. doi: 10.1016/0020-0190(95)00141-X.

[16]

E. Liberty, R. Sriharsha and M. Sviridenko, An algorithm for online k-means clustering, Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), Society for Industrial and Applied Mathematics, (2016), 81–89. doi: 10.1137/1.9781611974317.7.

[17]

R. G. Michael and S. J. David, Computers and intractability: A guide to the theory of NP-completeness, W. H. Freeman, San Francisco, (1979), 90-91.

show all references

References:
[1]

D. AchlioptasM. Chrobak and J. Noga, Competitive analysis of randomized paging algorithms, Theoretical Computer Science, 234 (2000), 203-218. doi: 10.1016/S0304-3975(98)00116-9.

[2]

J. Bar-IlanG. Kortsarz and D. Peleg, How to allocate network centers, Journal of Algorithms, 15 (1993), 385-415. doi: 10.1006/jagm.1993.1047.

[3]

M. CharikarC. ChekuriT. Feder and R. Motwani, Incremental clustering and dynamic information retrieval, SIAM Journal on Computing, 33 (2004), 1417-1440. doi: 10.1137/S0097539702418498.

[4]

S. ChaudhuriN. Garg and R. Ravi, The p-neighbor k-center problem, Information Processing Letters, 65 (1998), 131-134. doi: 10.1016/S0020-0190(97)00224-X.

[5]

S. Chechik and D. Peleg, The fault-tolerant capacitated k-center problem, Theoretical Computer Science, 566 (2015), 12-25. doi: 10.1016/j.tcs.2014.11.017.

[6]

M. Cygan, M. T. Hajiaghayi and S. Khuller, LP rounding for k-centers with non-uniform hard capacities, Proceedings of the fifity-third Annual Symposium on Foundations of Computer Science (FOCS). IEEE, (2012), 273–282.

[7]

S. B. DavidA. BorodinR. KarpG. Tardos and A. Wigderson, On the power of randomization in on-line algorithms, Algorithmica, 11 (1994), 2-14. doi: 10.1007/BF01294260.

[8]

T. Feder and D. Greene, Optimal algorithms for approximate clustering, Proceedings of the twentieth Annual ACM Symposium on Theory of Computing, ACM, (1988), 434–444. doi: 10.1145/62212.62255.

[9]

C. G. FernandesS. P. de Paula and L. L. C. Pedrosa, Improved approximation algorithms for capacitated fault-tolerant k-center, Algorithmica, 80 (2018), 1041-1072.

[10]

T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theoretical Computer Science, 38 (1985), 293-306. doi: 10.1016/0304-3975(85)90224-5.

[11]

D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the k-center problem, Mathematics of Operations Research, 10 (1985), 180-184. doi: 10.1287/moor.10.2.180.

[12]

W. L. Hsu and G. L. Nemhauser, Easy and hard bottleneck location problems, Discrete Applied Mathematics, 1 (1979), 209-215. doi: 10.1016/0166-218X(79)90044-1.

[13]

S. KhullerR. Pless and Y. J. Sussmann, Fault tolerant k-center problems, Theoretical Computer Science, 242 (2000), 237-245. doi: 10.1016/S0304-3975(98)00222-9.

[14]

S. Khuller and Y. J. Sussmann, The capacitated k-center problem, SIAM Journal on Discrete Mathematics, 13 (2000), 403-418. doi: 10.1137/S0895480197329776.

[15]

S. O. Krumke, On a generalization of the p-center problem, Information Processing Letters, 56 (1995), 67-71. doi: 10.1016/0020-0190(95)00141-X.

[16]

E. Liberty, R. Sriharsha and M. Sviridenko, An algorithm for online k-means clustering, Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), Society for Industrial and Applied Mathematics, (2016), 81–89. doi: 10.1137/1.9781611974317.7.

[17]

R. G. Michael and S. J. David, Computers and intractability: A guide to the theory of NP-completeness, W. H. Freeman, San Francisco, (1979), 90-91.

Figure 1.  An example of a sequence of points $1, x^2, ..., x^{2t}$ coming over time. The circles with dotted lines are the clusters produced by an online algorithm. Solid ones are those produced by the optimal off-line algorithm
Figure 2.  An illustration of partition of $S^*_i$ for any given $i\in[k]$
[1]

Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565

[2]

Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems & Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025

[3]

Gurkan Ozturk, Mehmet Tahir Ciftci. Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial & Management Optimization, 2015, 11 (3) : 921-932. doi: 10.3934/jimo.2015.11.921

[4]

Jiping Tao, Ronghuan Huang, Tundong Liu. A $2.28$-competitive algorithm for online scheduling on identical machines. Journal of Industrial & Management Optimization, 2015, 11 (1) : 185-198. doi: 10.3934/jimo.2015.11.185

[5]

Guojun Gan, Kun Chen. A soft subspace clustering algorithm with log-transformed distances. Big Data & Information Analytics, 2016, 1 (1) : 93-109. doi: 10.3934/bdia.2016.1.93

[6]

Jiangchuan Fan, Xinyu Guo, Jianjun Du, Weiliang Wen, Xianju Lu, Brahmani Louiza. Analysis of the clustering fusion algorithm for multi-band color image. Discrete & Continuous Dynamical Systems - S, 2018, 0 (0) : 1233-1249. doi: 10.3934/dcdss.2019085

[7]

Aude Hofleitner, Tarek Rabbani, Mohammad Rafiee, Laurent El Ghaoui, Alex Bayen. Learning and estimation applications of an online homotopy algorithm for a generalization of the LASSO. Discrete & Continuous Dynamical Systems - S, 2014, 7 (3) : 503-523. doi: 10.3934/dcdss.2014.7.503

[8]

Baoli Shi, Zhi-Feng Pang, Jing Xu. Image segmentation based on the hybrid total variation model and the K-means clustering strategy. Inverse Problems & Imaging, 2016, 10 (3) : 807-828. doi: 10.3934/ipi.2016022

[9]

Gaidi Li, Zhen Wang, Dachuan Xu. An approximation algorithm for the $k$-level facility location problem with submodular penalties. Journal of Industrial & Management Optimization, 2012, 8 (3) : 521-529. doi: 10.3934/jimo.2012.8.521

[10]

Jiping Tao, Zhijun Chao, Yugeng Xi. A semi-online algorithm and its competitive analysis for a single machine scheduling problem with bounded processing times. Journal of Industrial & Management Optimization, 2010, 6 (2) : 269-282. doi: 10.3934/jimo.2010.6.269

[11]

Ran Ma, Jiping Tao. An improved 2.11-competitive algorithm for online scheduling on parallel machines to minimize total weighted completion time. Journal of Industrial & Management Optimization, 2018, 14 (2) : 497-510. doi: 10.3934/jimo.2017057

[12]

Jie Shen, Jian Lv, Fang-Fang Guo, Ya-Li Gao, Rui Zhao. A new proximal chebychev center cutting plane algorithm for nonsmooth optimization and its convergence. Journal of Industrial & Management Optimization, 2018, 14 (3) : 1143-1155. doi: 10.3934/jimo.2018003

[13]

Yongbin Ou, Cun-Quan Zhang. A new multimembership clustering method. Journal of Industrial & Management Optimization, 2007, 3 (4) : 619-624. doi: 10.3934/jimo.2007.3.619

[14]

Petteri Piiroinen, Martin Simon. Probabilistic interpretation of the Calderón problem. Inverse Problems & Imaging, 2017, 11 (3) : 553-575. doi: 10.3934/ipi.2017026

[15]

Sarah Day; William D. Kalies; Konstantin Mischaikow and Thomas Wanner. Probabilistic and numerical validation of homology computations for nodal domains. Electronic Research Announcements, 2007, 13: 60-73.

[16]

Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2017, 2 (5) : 1-8. doi: 10.3934/bdia.2017020

[17]

Yufei Sun, Grace Aw, Kok Lay Teo, Guanglu Zhou. Portfolio optimization using a new probabilistic risk measure. Journal of Industrial & Management Optimization, 2015, 11 (4) : 1275-1283. doi: 10.3934/jimo.2015.11.1275

[18]

Joost R. Santos. Interdependency analysis with multiple probabilistic sector inputs. Journal of Industrial & Management Optimization, 2008, 4 (3) : 489-510. doi: 10.3934/jimo.2008.4.489

[19]

Vitali Kapovitch, Anton Petrunin, Wilderich Tuschmann. On the torsion in the center conjecture. Electronic Research Announcements, 2018, 25: 27-35. doi: 10.3934/era.2018.25.004

[20]

Keith Burns, Amie Wilkinson. Dynamical coherence and center bunching. Discrete & Continuous Dynamical Systems - A, 2008, 22 (1&2) : 89-100. doi: 10.3934/dcds.2008.22.89

2017 Impact Factor: 0.994

Metrics

  • PDF downloads (23)
  • HTML views (310)
  • Cited by (0)

Other articles
by authors

[Back to Top]