2016, 1(1): 15-29. doi: 10.3934/bdia.2016.1.15

Towards big data processing in clouds: An online cost-minimization approach

1. 

College of Information System and Management, National University of Defense Technology, Changsha 410073, Hunan, China

2. 

College of Information System and Management, National University of Defense Technology, Changsha, Hunan, 410073

3. 

Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada M3J 1P3

Received  July 2015 Revised  August 2015 Published  September 2015

Due to its elastic and on-demand nature of resource provisioning, cloud computing provides a cost effective and powerful technology for the processing of big data. Under this paradigm, Data Service Provider (DSP) may rent geographically distributed datacenters to process their large amount of data. As the data are dynamically generated and the resource pricing varies over time, moving the data from differently geographic locations to different datacenters while provisioning adequate computation resource to process them is an essential task to achieve cost effectiveness for DSP. In this paper, a joint online approach is proposed to address this task. We formulate the problem into a joint stochastic optimization problem, which is then decoupled into two independent subproblems via the Lyapunov framework. Our method is able to minimize the long-term time average cost including computing cost, storage cost, bandwidth cost and latency cost. Theoretical analysis shows that our online algorithm can produce a solution within an upper bound to the optimal solution achieved through offline computing and guarantee that the data processing can be completed with preset delays.
Citation: Weidong Bao, Wenhua Xiao, Haoran Ji, Chao Chen, Xiaomin Zhu, Jianhong Wu. Towards big data processing in clouds: An online cost-minimization approach. Big Data & Information Analytics, 2016, 1 (1) : 15-29. doi: 10.3934/bdia.2016.1.15
References:
[1]

, Moving an elephant: Large scale hadoop data migration at facebook,, , ().

[2]

, AWS Import/Export,, , ().

[3]

P. Barham, B. Dragovic and K. Fraser, Xen and the art of virtualization,, SIGOPS Operating Systems Review, 37 (2003), 164. doi: 10.1145/945445.945462.

[4]

B. Cho and I. Gupta, New algorithms for planning bulk transfer via internet and shipping networks,, in Proc. IEEE ICDCS, (2010), 305. doi: 10.1109/ICDCS.2010.59.

[5]

B. Cho and I. Gupta, Budget-constrained bulk data transfer via internet and shipping networks,, in Proc. ACM ICAC, (2011), 71. doi: 10.1145/1998582.1998595.

[6]

J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters,, Communications of the ACM, 51 (2008), 107.

[7]

Y. Feng, B. Li and B. Li, Airlift: Video conferencing as a cloud service using inter-datacenter networks,, in Proceedings of the IEEE International Conference on Network Protocols(ICNP'12), (2012), 1. doi: 10.1109/ICNP.2012.6459966.

[8]

L. Georgiadis, M. J. Neely and L. Tassiulas, Resource allocation and cross-layer control in wireless networks,, Foundations and Trends in Networking, 1 (2006), 1. doi: 10.1561/1300000001.

[9]

Z. Huang, C. Mei, L. Li and T. Woo, CloudStream: Delivering high-quality streaming videos through a cloud-based SVC proxy,, in Proceedings of the IEEE INFOCOM, (2011), 201. doi: 10.1109/INFCOM.2011.5935009.

[10]

F. Liu, Z. Zhou, H. Jin, B. Li, B. Li and H. Jiang, On arbitrating the power-performance tradeoff in SaaS clouds,, IEEE Transactions on Parallel and Distributed Systems, 25 (2014), 2648. doi: 10.1109/TPDS.2013.208.

[11]

X. Mo and H. Wang, Asynchronous index strategy for high performance real-time big data stream storage,, in Network Infrastructure and Digital Content (IC-NIDC), (2012), 232. doi: 10.1109/ICNIDC.2012.6418750.

[12]

X. Nan, Y. He and L. Guan, Optimal resource allocation for multimedia cloud based on queuing model,, in Proc. of IEEE MMSP Workshop, (2011), 1. doi: 10.1109/MMSP.2011.6093813.

[13]

M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems,, Morgan and Claypool, (2010). doi: 10.2200/S00271ED1V01Y201006CNT007.

[14]

M. J. Neely, Opportunistic scheduling with worst case delay guarantees in single and multi-hop networks,, in Proc. of INFOCOM, (2011), 1728. doi: 10.1109/INFCOM.2011.5934971.

[15]

E. E. Schadt, M. D. Linderman, J. Sorenson, L. Lee and G. P. Nolan, Computational solutions to large-scale data management and analysis,, Nat Rev Genet, 11 (2010), 647. doi: 10.1038/nrg2857.

[16]

J. Tang, W. P. Tay and Y. Wen, Dynamic request redirection and elastic service scaling in cloud-centric media networks,, IEEE Transactions on Multimedia, 16 (2014), 1434. doi: 10.1109/TMM.2014.2308726.

[17]

L. Tassiulas and A. Ephremides, Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,, IEEE Transactions on Automatic Control, 37 (1992), 1936. doi: 10.1109/9.182479.

[18]

C. Union, Homepage, , ().

[19]

R. Urgaonkar, U. Kozat, K. Igarashi and M. J. Neely, Resource allocation and power management in virtualized data centers,, in Proceedings of the IEEE Network Operations and Management Symp(NOMS'10), (2010), 479. doi: 10.1109/NOMS.2010.5488484.

[20]

J. Wang, W. Bao, X. Zhu, L. T. Yang and Y. Xiang, FESTAL: Fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds,, IEEE Transactions on Computers, 64 (2014), 2445. doi: 10.1109/TC.2014.2366751.

[21]

F. Wang, J. Liu and M. Chen, CALMS: Cloud-assisted live media streaming for globalized demands with time/ region diversities,, in Proceedings of the IEEE INFOCOM, (2012), 199. doi: 10.1109/INFCOM.2012.6195578.

[22]

D. Wu, Z. Xue and J. He, iCloudAccess: Cost-effective streaming of videogames from the cloud with low latency,, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2014), 1405. doi: 10.1109/TCSVT.2014.2302543.

[23]

Y. Wu, C. Wu, B. Li, X. Qiu and F.C.M. Lau, Cloudmedia: When cloud on demand meets video on demand,, In Proc. of IEEE ICDCS, (2011), 268. doi: 10.1109/ICDCS.2011.50.

[24]

Y. Wu, C. Wu, B. Li, L. Zhang, Z. Li and F. Lau, Scaling social media applications into geo-distributed clouds,, in Proc. IEEE INFOCOM, (2012), 684. doi: 10.1109/INFCOM.2012.6195813.

[25]

W. Xiao, W. Bao, X. Zhu, C. Wang, L. Chen and L. T. Yang, Dynamic request redirection and resource provisioning for cloud-based video services under heterogeneous environment,, IEEE Transactions on Parallel and Distributed Systems, pp (2015). doi: 10.1109/TPDS.2015.2470676.

[26]

Y. Yao, L. Huang and A. B. Sharma, L. Golubchik and M. J. Neely, Power cost reduction in distributed data centers: A two-time-scale approach for delay tolerant workloads,, IEEE Transactions On Parallel and Distributed Systems, 25 (2014), 200. doi: 10.1109/TPDS.2012.341.

[27]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica., Spark: cluster computing with working sets,, In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing(HotCloud'10), (2010).

[28]

L. Zhang, C. Wu, Z. Li, C. Guo, M. Chen and F. C. M. Lau, Moving big data to the cloud: An online cost-minimizing approach,, IEEE Journal on Selected Areas in Communications, 31 (2013), 2710. doi: 10.1109/JSAC.2013.131211.

[29]

X. Zhu, C. Chen, L. T. Yang and Y. Xiang, ANGEL: Agent-based scheduling for real-time tasks in virtualized clouds,, IEEE Transactions on Computers, pp (2015). doi: 10.1109/TC.2015.2409864.

[30]

X. Zhu, R. Ge, J. Sun and C. He, 3E: Energy-efficient elastic scheduling for independent tasks in heterogeneous computing system,, Journal of Systems and Software, 86 (2012), 302. doi: 10.1016/j.jss.2012.08.017.

[31]

X. Zhu, X. Qin and M. Qiu, QoS-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters,, IEEE Transactions on Computers, 60 (2011), 800. doi: 10.1109/TC.2011.68.

show all references

References:
[1]

, Moving an elephant: Large scale hadoop data migration at facebook,, , ().

[2]

, AWS Import/Export,, , ().

[3]

P. Barham, B. Dragovic and K. Fraser, Xen and the art of virtualization,, SIGOPS Operating Systems Review, 37 (2003), 164. doi: 10.1145/945445.945462.

[4]

B. Cho and I. Gupta, New algorithms for planning bulk transfer via internet and shipping networks,, in Proc. IEEE ICDCS, (2010), 305. doi: 10.1109/ICDCS.2010.59.

[5]

B. Cho and I. Gupta, Budget-constrained bulk data transfer via internet and shipping networks,, in Proc. ACM ICAC, (2011), 71. doi: 10.1145/1998582.1998595.

[6]

J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters,, Communications of the ACM, 51 (2008), 107.

[7]

Y. Feng, B. Li and B. Li, Airlift: Video conferencing as a cloud service using inter-datacenter networks,, in Proceedings of the IEEE International Conference on Network Protocols(ICNP'12), (2012), 1. doi: 10.1109/ICNP.2012.6459966.

[8]

L. Georgiadis, M. J. Neely and L. Tassiulas, Resource allocation and cross-layer control in wireless networks,, Foundations and Trends in Networking, 1 (2006), 1. doi: 10.1561/1300000001.

[9]

Z. Huang, C. Mei, L. Li and T. Woo, CloudStream: Delivering high-quality streaming videos through a cloud-based SVC proxy,, in Proceedings of the IEEE INFOCOM, (2011), 201. doi: 10.1109/INFCOM.2011.5935009.

[10]

F. Liu, Z. Zhou, H. Jin, B. Li, B. Li and H. Jiang, On arbitrating the power-performance tradeoff in SaaS clouds,, IEEE Transactions on Parallel and Distributed Systems, 25 (2014), 2648. doi: 10.1109/TPDS.2013.208.

[11]

X. Mo and H. Wang, Asynchronous index strategy for high performance real-time big data stream storage,, in Network Infrastructure and Digital Content (IC-NIDC), (2012), 232. doi: 10.1109/ICNIDC.2012.6418750.

[12]

X. Nan, Y. He and L. Guan, Optimal resource allocation for multimedia cloud based on queuing model,, in Proc. of IEEE MMSP Workshop, (2011), 1. doi: 10.1109/MMSP.2011.6093813.

[13]

M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems,, Morgan and Claypool, (2010). doi: 10.2200/S00271ED1V01Y201006CNT007.

[14]

M. J. Neely, Opportunistic scheduling with worst case delay guarantees in single and multi-hop networks,, in Proc. of INFOCOM, (2011), 1728. doi: 10.1109/INFCOM.2011.5934971.

[15]

E. E. Schadt, M. D. Linderman, J. Sorenson, L. Lee and G. P. Nolan, Computational solutions to large-scale data management and analysis,, Nat Rev Genet, 11 (2010), 647. doi: 10.1038/nrg2857.

[16]

J. Tang, W. P. Tay and Y. Wen, Dynamic request redirection and elastic service scaling in cloud-centric media networks,, IEEE Transactions on Multimedia, 16 (2014), 1434. doi: 10.1109/TMM.2014.2308726.

[17]

L. Tassiulas and A. Ephremides, Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,, IEEE Transactions on Automatic Control, 37 (1992), 1936. doi: 10.1109/9.182479.

[18]

C. Union, Homepage, , ().

[19]

R. Urgaonkar, U. Kozat, K. Igarashi and M. J. Neely, Resource allocation and power management in virtualized data centers,, in Proceedings of the IEEE Network Operations and Management Symp(NOMS'10), (2010), 479. doi: 10.1109/NOMS.2010.5488484.

[20]

J. Wang, W. Bao, X. Zhu, L. T. Yang and Y. Xiang, FESTAL: Fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds,, IEEE Transactions on Computers, 64 (2014), 2445. doi: 10.1109/TC.2014.2366751.

[21]

F. Wang, J. Liu and M. Chen, CALMS: Cloud-assisted live media streaming for globalized demands with time/ region diversities,, in Proceedings of the IEEE INFOCOM, (2012), 199. doi: 10.1109/INFCOM.2012.6195578.

[22]

D. Wu, Z. Xue and J. He, iCloudAccess: Cost-effective streaming of videogames from the cloud with low latency,, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2014), 1405. doi: 10.1109/TCSVT.2014.2302543.

[23]

Y. Wu, C. Wu, B. Li, X. Qiu and F.C.M. Lau, Cloudmedia: When cloud on demand meets video on demand,, In Proc. of IEEE ICDCS, (2011), 268. doi: 10.1109/ICDCS.2011.50.

[24]

Y. Wu, C. Wu, B. Li, L. Zhang, Z. Li and F. Lau, Scaling social media applications into geo-distributed clouds,, in Proc. IEEE INFOCOM, (2012), 684. doi: 10.1109/INFCOM.2012.6195813.

[25]

W. Xiao, W. Bao, X. Zhu, C. Wang, L. Chen and L. T. Yang, Dynamic request redirection and resource provisioning for cloud-based video services under heterogeneous environment,, IEEE Transactions on Parallel and Distributed Systems, pp (2015). doi: 10.1109/TPDS.2015.2470676.

[26]

Y. Yao, L. Huang and A. B. Sharma, L. Golubchik and M. J. Neely, Power cost reduction in distributed data centers: A two-time-scale approach for delay tolerant workloads,, IEEE Transactions On Parallel and Distributed Systems, 25 (2014), 200. doi: 10.1109/TPDS.2012.341.

[27]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica., Spark: cluster computing with working sets,, In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing(HotCloud'10), (2010).

[28]

L. Zhang, C. Wu, Z. Li, C. Guo, M. Chen and F. C. M. Lau, Moving big data to the cloud: An online cost-minimizing approach,, IEEE Journal on Selected Areas in Communications, 31 (2013), 2710. doi: 10.1109/JSAC.2013.131211.

[29]

X. Zhu, C. Chen, L. T. Yang and Y. Xiang, ANGEL: Agent-based scheduling for real-time tasks in virtualized clouds,, IEEE Transactions on Computers, pp (2015). doi: 10.1109/TC.2015.2409864.

[30]

X. Zhu, R. Ge, J. Sun and C. He, 3E: Energy-efficient elastic scheduling for independent tasks in heterogeneous computing system,, Journal of Systems and Software, 86 (2012), 302. doi: 10.1016/j.jss.2012.08.017.

[31]

X. Zhu, X. Qin and M. Qiu, QoS-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters,, IEEE Transactions on Computers, 60 (2011), 800. doi: 10.1109/TC.2011.68.

[1]

Nick Cercone, F'IEEE. What's the big deal about big data?. Big Data & Information Analytics, 2016, 1 (1) : 31-79. doi: 10.3934/bdia.2016.1.31

[2]

Kyosuke Hashimoto, Hiroyuki Masuyama, Shoji Kasahara, Yutaka Takahashi. Performance analysis of backup-task scheduling with deadline time in cloud computing. Journal of Industrial & Management Optimization, 2015, 11 (3) : 867-886. doi: 10.3934/jimo.2015.11.867

[3]

Jian-Wu Xue, Xiao-Kun Xu, Feng Zhang. Big data dynamic compressive sensing system architecture and optimization algorithm for internet of things. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1401-1414. doi: 10.3934/dcdss.2015.8.1401

[4]

Ali Asgary, Jianhong Wu. ADERSIM-IBM partnership in big data. Big Data & Information Analytics, 2016, 1 (4) : 277-278. doi: 10.3934/bdia.2016010

[5]

Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen. Big data collection and analysis for manufacturing organisations. Big Data & Information Analytics, 2017, 2 (2) : 127-139. doi: 10.3934/bdia.2017002

[6]

Enrico Capobianco. Born to be big: Data, graphs, and their entangled complexity. Big Data & Information Analytics, 2016, 1 (2&3) : 163-169. doi: 10.3934/bdia.2016002

[7]

Yaguang Huangfu, Guanqing Liang, Jiannong Cao. MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data & Information Analytics, 2016, 1 (4) : 349-376. doi: 10.3934/bdia.2016015

[8]

Yang Yu. Introduction: Special issue on computational intelligence methods for big data and information analytics. Big Data & Information Analytics, 2017, 2 (1) : i-ii. doi: 10.3934/bdia.201701i

[9]

Xiangmin Zhang. User perceived learning from interactive searching on big medical literature data. Big Data & Information Analytics, 2017, 2 (5) : 1-16. doi: 10.3934/bdia.2017019

[10]

Michael Krause, Jan Marcel Hausherr, Walter Krenkel. Computing the fibre orientation from Radon data using local Radon transform. Inverse Problems & Imaging, 2011, 5 (4) : 879-891. doi: 10.3934/ipi.2011.5.879

[11]

Qiying Hu, Wuyi Yue. Optimal control for resource allocation in discrete event systems. Journal of Industrial & Management Optimization, 2006, 2 (1) : 63-80. doi: 10.3934/jimo.2006.2.63

[12]

Irina Kareva, Faina Berezovkaya, Georgy Karev. Mixed strategies and natural selection in resource allocation. Mathematical Biosciences & Engineering, 2013, 10 (5&6) : 1561-1586. doi: 10.3934/mbe.2013.10.1561

[13]

Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111

[14]

Dan Li, Li-Ping Pang, Fang-Fang Guo, Zun-Quan Xia. An alternating linearization method with inexact data for bilevel nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2014, 10 (3) : 859-869. doi: 10.3934/jimo.2014.10.859

[15]

Ali Gharouni, Lin Wang. Modeling the spread of bed bug infestation and optimal resource allocation for disinfestation. Mathematical Biosciences & Engineering, 2016, 13 (5) : 969-980. doi: 10.3934/mbe.2016025

[16]

Weidong Bao, Haoran Ji, Xiaomin Zhu, Ji Wang, Wenhua Xiao, Jianhong Wu. ACO-based solution for computation offloading in mobile cloud computing. Big Data & Information Analytics, 2016, 1 (1) : 1-13. doi: 10.3934/bdia.2016.1.1

[17]

Masataka Kato, Hiroyuki Masuyama, Shoji Kasahara, Yutaka Takahashi. Effect of energy-saving server scheduling on power consumption for large-scale data centers. Journal of Industrial & Management Optimization, 2016, 12 (2) : 667-685. doi: 10.3934/jimo.2016.12.667

[18]

Jian Xiong, Yingwu Chen, Zhongbao Zhou. Resilience analysis for project scheduling with renewable resource constraint and uncertain activity durations. Journal of Industrial & Management Optimization, 2016, 12 (2) : 719-737. doi: 10.3934/jimo.2016.12.719

[19]

Semu Mitiku Kassa. Three-level global resource allocation model for HIV control: A hierarchical decision system approach. Mathematical Biosciences & Engineering, 2018, 15 (1) : 255-273. doi: 10.3934/mbe.2018011

[20]

Tsuguhito Hirai, Hiroyuki Masuyama, Shoji Kasahara, Yutaka Takahashi. Performance optimization of parallel-distributed processing with checkpointing for cloud environment. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-20. doi: 10.3934/jimo.2018014

 Impact Factor: 

Metrics

  • PDF downloads (2)
  • HTML views (3)
  • Cited by (1)

[Back to Top]