American Institute of Mathematical Sciences

2016, 12(2): 515-527. doi: 10.3934/jimo.2016.12.515

A combinatorial optimization approach to the selection of statistical units

 1 Università di Roma "Sapienza", Dip. di Ing. Informatica, Automatica e Gestionale (DIAG), Via Ariosto 25, Roma, 00185, Italy 2 Italian National Statistic Office “Istat", Dip. per i Censimenti e gli Archivi Amm. e Statistici (DICA), Viale Oceano Pacifico 171, Roma, 00144, Italy 3 Italian National Statistic Office “Istat”, Dip. per i Censimenti e gli Archivi Amm. e Statistici (DICA), Viale Oceano Pacifico 171, Roma, 00144, Italy

Received  April 2014 Revised  November 2014 Published  June 2015

In the case of some large statistical surveys, the set of units that will constitute the scope of the survey must be selected. We focus on the real case of a Census of Agriculture, where the units are farms. Surveying each unit has a cost and brings a different portion of the whole information. In this case, one wants to determine a subset of units producing the minimum total cost for being surveyed and representing at least a certain portion of the total information. Uncertainty aspects also occur, because the portion of information corresponding to each unit is not perfectly known before surveying it. The proposed approach is based on combinatorial optimization, and the arising decision problems are modeled as multidimensional binary knapsack problems. Experimental results show the effectiveness of the proposed approach.
Citation: Renato Bruni, Gianpiero Bianchi, Alessandra Reale. A combinatorial optimization approach to the selection of statistical units. Journal of Industrial & Management Optimization, 2016, 12 (2) : 515-527. doi: 10.3934/jimo.2016.12.515
References:
 [1] E. Balas, Facets of the knapsack polytope,, Mathematical Programming, 8 (1975), 146. doi: 10.1007/BF01580440. [2] R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census,, The National Academic Press, (2008). [3] G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data,, Applied Mathematical Sciences, 6 (2012), 6241. [4] G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization,, Optimization Letters, 8 (2014), 1553. doi: 10.1007/s11590-013-0652-3. [5] G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data,, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, (2012). doi: 10.1007/978-3-642-35588-2_28. [6] E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events,, Mathematics of Operations Research, 39 (2014), 1311. doi: 10.1287/moor.2014.0657. [7] R. Bruni, Discrete models for data imputation,, Discrete Applied Mathematics, 144 (2004), 59. doi: 10.1016/j.dam.2004.04.004. [8] R. Bruni, Error correction for massive data sets,, Optimization Methods and Software, 20 (2005), 295. doi: 10.1080/10556780512331318281. [9] European Parliament, Regulation of the European Parliament,, N. 1166/2008, (1166). [10] Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys,, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005)., (2005). [11] M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing,, Computational Statistics & Data Analysis, 13 (1992), 47. doi: 10.1016/0167-9473(92)90153-7. [12] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness,, W.H. Freeman, (1979). [13] R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology,, Wiley Series in Survey Methodology, (2009). [14] T. Hailperin, Best possible inequalities for the probability of a logical function of events,, The American Mathematical Monthly, 72 (1965), 343. doi: 10.2307/2313491. [15] P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes,, Mathematical Programming, 8 (1975), 179. doi: 10.1007/BF01580442. [16] W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling,, Journal of the American Statistical Association, 81 (1986), 1070. [17] H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming,, Discrete Applied Mathematics, 123 (2002), 397. doi: 10.1016/S0166-218X(01)00348-1. [18] C. A. Moser, Quota sampling,, Journal of the Royal Statistical Society, 115 (1952), 411. doi: 10.2307/2980740. [19] A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture,, Springer, (2009). doi: 10.1007/978-0-387-88615-2. [20] K. G. Murty, Linear Programming,, John Wiley & Sons Inc., (1983). [21] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization,, John Wiley, (1988). doi: 10.1002/9781118627372. [22] T. M. F. Smith, Populations and selection: Limitations of statistics,, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144. doi: 10.2307/2982726. [23] S. Sudman, Probability sampling with quotas,, Journal of the American Statistical Association, 61 (1966), 749. doi: 10.1080/01621459.1966.10480903. [24] L. A. Wolsey, Faces for a linear inequality in 0-1 variables,, Mathematical Programming, 8 (1975), 165. doi: 10.1007/BF01580441. [25] Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem,, Journal of Industrial and Management Optimization, 1 (2005), 315. doi: 10.3934/jimo.2005.1.315.

show all references

References:
 [1] E. Balas, Facets of the knapsack polytope,, Mathematical Programming, 8 (1975), 146. doi: 10.1007/BF01580440. [2] R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census,, The National Academic Press, (2008). [3] G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data,, Applied Mathematical Sciences, 6 (2012), 6241. [4] G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization,, Optimization Letters, 8 (2014), 1553. doi: 10.1007/s11590-013-0652-3. [5] G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data,, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, (2012). doi: 10.1007/978-3-642-35588-2_28. [6] E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events,, Mathematics of Operations Research, 39 (2014), 1311. doi: 10.1287/moor.2014.0657. [7] R. Bruni, Discrete models for data imputation,, Discrete Applied Mathematics, 144 (2004), 59. doi: 10.1016/j.dam.2004.04.004. [8] R. Bruni, Error correction for massive data sets,, Optimization Methods and Software, 20 (2005), 295. doi: 10.1080/10556780512331318281. [9] European Parliament, Regulation of the European Parliament,, N. 1166/2008, (1166). [10] Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys,, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005)., (2005). [11] M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing,, Computational Statistics & Data Analysis, 13 (1992), 47. doi: 10.1016/0167-9473(92)90153-7. [12] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness,, W.H. Freeman, (1979). [13] R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology,, Wiley Series in Survey Methodology, (2009). [14] T. Hailperin, Best possible inequalities for the probability of a logical function of events,, The American Mathematical Monthly, 72 (1965), 343. doi: 10.2307/2313491. [15] P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes,, Mathematical Programming, 8 (1975), 179. doi: 10.1007/BF01580442. [16] W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling,, Journal of the American Statistical Association, 81 (1986), 1070. [17] H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming,, Discrete Applied Mathematics, 123 (2002), 397. doi: 10.1016/S0166-218X(01)00348-1. [18] C. A. Moser, Quota sampling,, Journal of the Royal Statistical Society, 115 (1952), 411. doi: 10.2307/2980740. [19] A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture,, Springer, (2009). doi: 10.1007/978-0-387-88615-2. [20] K. G. Murty, Linear Programming,, John Wiley & Sons Inc., (1983). [21] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization,, John Wiley, (1988). doi: 10.1002/9781118627372. [22] T. M. F. Smith, Populations and selection: Limitations of statistics,, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144. doi: 10.2307/2982726. [23] S. Sudman, Probability sampling with quotas,, Journal of the American Statistical Association, 61 (1966), 749. doi: 10.1080/01621459.1966.10480903. [24] L. A. Wolsey, Faces for a linear inequality in 0-1 variables,, Mathematical Programming, 8 (1975), 165. doi: 10.1007/BF01580441. [25] Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem,, Journal of Industrial and Management Optimization, 1 (2005), 315. doi: 10.3934/jimo.2005.1.315.
 [1] Subrata Dasgupta. Disentangling data, information and knowledge. Big Data & Information Analytics, 2016, 1 (4) : 377-389. doi: 10.3934/bdia.2016016 [2] Jose-Luis Roca-Gonzalez. Designing dynamical systems for security and defence network knowledge management. A case of study: Airport bird control falconers organizations. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1311-1329. doi: 10.3934/dcdss.2015.8.1311 [3] Sunmoo Yoon, Maria Patrao, Debbie Schauer, Jose Gutierrez. Prediction models for burden of caregivers applying data mining techniques. Big Data & Information Analytics, 2017, 2 (5) : 1-9. doi: 10.3934/bdia.2017014 [4] Anupama N, Sudarson Jena. A novel approach using incremental under sampling for data stream mining. Big Data & Information Analytics, 2017, 2 (5) : 1-13. doi: 10.3934/bdia.2017017 [5] Zhen Mei. Manifold data mining helps businesses grow more effectively. Big Data & Information Analytics, 2016, 1 (2&3) : 275-276. doi: 10.3934/bdia.2016009 [6] Qinglei Zhang, Wenying Feng. Detecting coalition attacks in online advertising: A hybrid data mining approach. Big Data & Information Analytics, 2016, 1 (2&3) : 227-245. doi: 10.3934/bdia.2016006 [7] Feyza Gürbüz, Panos M. Pardalos. A decision making process application for the slurry production in ceramics via fuzzy cluster and data mining. Journal of Industrial & Management Optimization, 2012, 8 (2) : 285-297. doi: 10.3934/jimo.2012.8.285 [8] Sarah Ibri. An efficient distributed optimization and coordination protocol: Application to the emergency vehicle management. Journal of Industrial & Management Optimization, 2015, 11 (1) : 41-63. doi: 10.3934/jimo.2015.11.41 [9] Xueting Cui, Xiaoling Sun, Dan Sha. An empirical study on discrete optimization models for portfolio selection. Journal of Industrial & Management Optimization, 2009, 5 (1) : 33-46. doi: 10.3934/jimo.2009.5.33 [10] Dan Li, Li-Ping Pang, Fang-Fang Guo, Zun-Quan Xia. An alternating linearization method with inexact data for bilevel nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2014, 10 (3) : 859-869. doi: 10.3934/jimo.2014.10.859 [11] Lianshuan Shi, Enmin Feng, Huanchun Sun, Zhaosheng Feng. A two-step algorithm for layout optimization of structures with discrete variables. Journal of Industrial & Management Optimization, 2007, 3 (3) : 543-552. doi: 10.3934/jimo.2007.3.543 [12] Tao Zhang, Yue-Jie Zhang, Qipeng P. Zheng, P. M. Pardalos. A hybrid particle swarm optimization and tabu search algorithm for order planning problems of steel factories based on the Make-To-Stock and Make-To-Order management architecture. Journal of Industrial & Management Optimization, 2011, 7 (1) : 31-51. doi: 10.3934/jimo.2011.7.31 [13] Donglei Du, Tianping Shuai. Errata to:''Optimal preemptive online scheduling to minimize $l_{p}$ norm on two processors''[Journal of Industrial and Management Optimization, 1(3) (2005), 345-351.]. Journal of Industrial & Management Optimization, 2008, 4 (2) : 339-341. doi: 10.3934/jimo.2008.4.339 [14] Guillaume Bal, Eric Bonnetier, François Monard, Faouzi Triki. Inverse diffusion from knowledge of power densities. Inverse Problems & Imaging, 2013, 7 (2) : 353-375. doi: 10.3934/ipi.2013.7.353 [15] Jian-Wu Xue, Xiao-Kun Xu, Feng Zhang. Big data dynamic compressive sensing system architecture and optimization algorithm for internet of things. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1401-1414. doi: 10.3934/dcdss.2015.8.1401 [16] Rodrigo I. Brevis, Jaime H. Ortega, David Pardo. A source time reversal method for seismicity induced by mining. Inverse Problems & Imaging, 2017, 11 (1) : 25-45. doi: 10.3934/ipi.2017002 [17] Alexandre Bayen, Rinaldo M. Colombo, Paola Goatin, Benedetto Piccoli. Traffic modeling and management: Trends and perspectives. Discrete & Continuous Dynamical Systems - S, 2014, 7 (3) : i-ii. doi: 10.3934/dcdss.2014.7.3i [18] A. Marigo, Benedetto Piccoli. Cooperative controls for air traffic management. Communications on Pure & Applied Analysis, 2003, 2 (3) : 355-369. doi: 10.3934/cpaa.2003.2.355 [19] Andrew P. Sage. Risk in system of systems engineering and management. Journal of Industrial & Management Optimization, 2008, 4 (3) : 477-487. doi: 10.3934/jimo.2008.4.477 [20] Shaojun Lan, Yinghui Tang, Miaomiao Yu. System capacity optimization design and optimal threshold $N^{*}$ for a $GEO/G/1$ discrete-time queue with single server vacation and under the control of Min($N, V$)-policy. Journal of Industrial & Management Optimization, 2016, 12 (4) : 1435-1464. doi: 10.3934/jimo.2016.12.1435

2016 Impact Factor: 0.994