September  2019, 1(3): 329-387. doi: 10.3934/fods.2019015

Randomized learning of the second-moment matrix of a smooth function

1. 

Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

2. 

Department of Electrical Engineering, Colorado School of Mines, Denver, CO 80401, USA

3. 

Departments of Statistics and Computer Science, Rutgers University, Piscataway, NJ 08854, USA

4. 

Department of Computer Science, University of Colorado Boulder, Boulder, CO 80309, USA

* Corresponding author: Armin Eftekhari

Published  September 2019

Fund Project: AE was partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and also by the Turing Seed Funding grant SF019. MBW was partially supported by NSF grant CCF-1409258 and NSF CAREER grant CCF-1149225. PGC was partially supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0011077 and the Defense Advanced Research Projects Agency's Enabling Quantification of Uncertainty in Physical Systems

Consider an open set $ \mathbb{D}\subseteq\mathbb{R}^n $, equipped with a probability measure $ \mu $. An important characteristic of a smooth function $ f:\mathbb{D}\rightarrow\mathbb{R} $ is its second-moment matrix $ \Sigma_{\mu}: = \int \nabla f(x) \nabla f(x)^* \mu(dx) \in\mathbb{R}^{n\times n} $, where $ \nabla f(x)\in\mathbb{R}^n $ is the gradient of $ f(\cdot) $ at $ x\in\mathbb{D} $ and $ * $ stands for transpose. For instance, the span of the leading $ r $ eigenvectors of $ \Sigma_{\mu} $ forms an active subspace of $ f(\cdot) $, which contains the directions along which $ f(\cdot) $ changes the most and is of particular interest in ridge approximation. In this work, we propose a simple algorithm for estimating $ \Sigma_{\mu} $ from random point evaluations of $ f(\cdot) $ without imposing any structural assumptions on $ \Sigma_{\mu} $. Theoretical guarantees for this algorithm are established with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements.

Citation: Armin Eftekhari, Michael B. Wakin, Ping Li, Paul G. Constantine. Randomized learning of the second-moment matrix of a smooth function. Foundations of Data Science, 2019, 1 (3) : 329-387. doi: 10.3934/fods.2019015
References:
[1]

R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238. doi: 10.4064/ba53-2-10. Google Scholar

[2]

F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349.Google Scholar

[3]

M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077. Google Scholar

[4]

D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344. Google Scholar

[5]

I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359. Google Scholar

[6]

T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138. doi: 10.1214/14-AOS1267. Google Scholar

[7]

E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998. Google Scholar

[8]

Y. ChenY. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059. doi: 10.1109/TIT.2015.2429594. Google Scholar

[9]

A. CohenI. DaubechiesR. DeVoreG. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243. doi: 10.1007/s00365-011-9147-6. Google Scholar

[10]

P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545.Google Scholar

[11]

P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ. Google Scholar

[12]

P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038. Google Scholar

[13]

R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25.Google Scholar

[14]

G. DasarathyP. ShahB. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388. doi: 10.1109/TIT.2015.2391251. Google Scholar

[15]

R. DeVoreG. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143. doi: 10.1007/s00365-010-9105-8. Google Scholar

[16]

D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106. doi: 10.1214/aos/1176347004. Google Scholar

[17]

A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784. Google Scholar

[18]

A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020. Google Scholar

[19]

M. FornasierK. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262. doi: 10.1007/s10208-012-9115-y. Google Scholar

[20]

J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5. Google Scholar

[21]

J. FriedmanT. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441. doi: 10.1093/biostatistics/kxm045. Google Scholar

[22]

J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823. doi: 10.1080/01621459.1981.10477729. Google Scholar

[23]

K. FukumizuF. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99. Google Scholar

[24]

S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573. doi: 10.1214/07-EJS077. Google Scholar

[25]

A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227.Google Scholar

[26]

G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15. Google Scholar

[27]

A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844. Google Scholar

[28]

D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566. doi: 10.1109/TIT.2011.2104999. Google Scholar

[29]

N. HalkoP. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288. doi: 10.1137/090771806. Google Scholar

[30]

T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC. Google Scholar

[31]

J. HauptR. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235. doi: 10.1109/TIT.2011.2162269. Google Scholar

[32]

M. HristacheA. JuditskyJ. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566. doi: 10.1214/aos/1015345954. Google Scholar

[33]

P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475. doi: 10.1214/aos/1176349519. Google Scholar

[34]

A. B. JuditskyO. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404. doi: 10.1214/08-AOS611. Google Scholar

[35]

S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892. Google Scholar

[36]

M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558.Google Scholar

[37]

A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751.Google Scholar

[38]

R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567.Google Scholar

[39]

K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327. doi: 10.1080/01621459.1991.10475035. Google Scholar

[40]

E. LibertyF. WoolfeP. G. MartinssonV. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172. doi: 10.1073/pnas.0709640104. Google Scholar

[41]

P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664. doi: 10.1214/12-AOS1018. Google Scholar

[42]

K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058. doi: 10.3150/12-BEJ487. Google Scholar

[43]

E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084. Google Scholar

[44]

F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010.Google Scholar

[45]

A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195. doi: 10.1017/S0962492900002919. Google Scholar

[46]

A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124. Google Scholar

[47]

F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169. Google Scholar

[48]

C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. Google Scholar

[49]

P. RavikumarM. J. WainwrightG. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980. doi: 10.1214/11-EJS631. Google Scholar

[50]

B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430. Google Scholar

[51]

B. RechtM. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501. doi: 10.1137/070697835. Google Scholar

[52]

A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847. doi: 10.1080/01621459.1993.10476348. Google Scholar

[53]

T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37. Google Scholar

[54]

C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705. doi: 10.1214/aos/1176349548. Google Scholar

[55]

J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980. Google Scholar

[56]

R. TripathyI. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223. doi: 10.1016/j.jcp.2016.05.039. Google Scholar

[57]

H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412. doi: 10.1016/j.acha.2014.01.002. Google Scholar

[58]

R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027. Google Scholar

[59]

F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619.Google Scholar

[60]

P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111. doi: 10.1007/bf01932678. Google Scholar

[61]

H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005. Google Scholar

[62]

Y. XiaH. TongW. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410. doi: 10.1111/1467-9868.03411. Google Scholar

[63]

X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416. doi: 10.1214/11-AOS950. Google Scholar

[64]

O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922.Google Scholar

show all references

References:
[1]

R. Adamczak, Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, Bull. Pol. Acad. Sci. Math., 53 (2005), 221-238. doi: 10.4064/ba53-2-10. Google Scholar

[2]

F. P. Anaraki and S. Hughes, Memory and computation efficient PCA via very sparse random projections, in Proceedings of the International Conference on Machine Learning (ICML-14), 2014, 1341–1349.Google Scholar

[3]

M. Azizyan, A. Krishnamurthy and A. Singh, Extreme compressive sampling for covariance estimation, IEEE Trans. Inform. Theory, 64 (2018), 7613–7635, arXiv: 1506.00898. doi: 10.1109/TIT.2018.2871077. Google Scholar

[4]

D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Second Edition), Princeton reference, Princeton University Press, 2009, https://books.google.co.uk/books?id=x7isojLkDTcC. doi: 10.1515/9781400833344. Google Scholar

[5]

I. Bogunovic, V. Cevher, J. Haupt and J. Scarlett, Active learning of self-concordant like multi-index functions, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 2189–2193. doi: 10.1109/ICASSP.2015.7178359. Google Scholar

[6]

T. T. Cai and A. Zhang, ROP: Matrix recovery via rank-one projections, The Annals of Statistics, 43 (2015), 102-138. doi: 10.1214/14-AOS1267. Google Scholar

[7]

E. J. Candes, Ridgelets: Theory and applications, PhD thesis, Stanford University, 1998. Google Scholar

[8]

Y. ChenY. Chi and A. J. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Transactions on Information Theory, 61 (2015), 4034-4059. doi: 10.1109/TIT.2015.2429594. Google Scholar

[9]

A. CohenI. DaubechiesR. DeVoreG. Kerkyacharian and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, 35 (2012), 225-243. doi: 10.1007/s00365-011-9147-6. Google Scholar

[10]

P. Constantine and D. Gleich, Computing active subspaces with Monte Carlo, arXiv preprint, arXiv: 1408.0545.Google Scholar

[11]

P. G. Constantine, Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies, SIAM, Philadelphia, 2015, https://books.google.com/books?id=TOJ9BwAAQBAJ. Google Scholar

[12]

P. G. Constantine, A. Eftekhari and R. Ward, A near-stationary subspace for ridge approximation, Computer Methods in Applied Mechanics and Engineering, 326 (2017), 402–421, arXiv: 1606.01929. doi: 10.1016/j.cma.2017.07.038. Google Scholar

[13]

R. D. Cook, Using dimension-reduction subspaces to identify important inputs in models of physical systems, in Proceedings of the section on Physical and Engineering Sciences, American Statistical Association Alexandria, VA, 1994, 18–25.Google Scholar

[14]

G. DasarathyP. ShahB. N. Bhaskar and R. D. Nowak, Sketching sparse matrices, covariances, and graphs via tensor products, IEEE Transactions on Information Theory, 61 (2015), 1373-1388. doi: 10.1109/TIT.2015.2391251. Google Scholar

[15]

R. DeVoreG. Petrova and P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constructive Approximation, 33 (2011), 125-143. doi: 10.1007/s00365-010-9105-8. Google Scholar

[16]

D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel methods, The Annals of Statistics, 17 (1989), 58-106. doi: 10.1214/aos/1176347004. Google Scholar

[17]

A. Eftekhari, L. Balzano and M. B. Wakin, What to expect when you are expecting on the Grassmannian, IEEE Signal Processing Letters, 24 (2017), 872–876, arXiv: 1611.07216. doi: 10.1109/LSP.2017.2684784. Google Scholar

[18]

A. Eftekhari, M. B. Wakin and R. A. Ward, MC$^2$: A two-phase algorithm for leveraged matrix completion, Inf. Inference, 7 (2018), 581–604, arXiv: 1609.01795. doi: 10.1093/imaiai/iax020. Google Scholar

[19]

M. FornasierK. Schnass and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, 12 (2012), 229-262. doi: 10.1007/s10208-012-9115-y. Google Scholar

[20]

J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5. Google Scholar

[21]

J. FriedmanT. Hastie and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9 (2008), 432-441. doi: 10.1093/biostatistics/kxm045. Google Scholar

[22]

J. H. Friedman and W. Stuetzle, Projection pursuit regression, Journal of the American statistical Association, 76 (1981), 817-823. doi: 10.1080/01621459.1981.10477729. Google Scholar

[23]

K. FukumizuF. R. Bach and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, The Journal of Machine Learning Research, 5 (2004), 73-99. Google Scholar

[24]

S. Gaiffas and G. Lecue, Optimal rates and adaptation in the single-index model using aggregation, Electronic Journal of Statistics, 1 (2007), 538-573. doi: 10.1214/07-EJS077. Google Scholar

[25]

A. T. Glaws, P. G. Constantine and R. D. Cook, Inverse regression for ridge recovery, arXiv preprint, arXiv: 1702.02227.Google Scholar

[26]

G. K. Golubev, Asymptotic minimax estimation of regression in the additive model, Problemy Peredachi Informatsii, 28 (1992), 3-15. Google Scholar

[27]

A. Gonen, D. Rosenbaum, Y. Eldar and S. Shalev-Shwartz, The sample complexity of subspace learning with partial information, J. Mach. Learn. Res., 17 (2016), Paper No. 52, 21 pp. arXiv: 1402.4844. Google Scholar

[28]

D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, 57 (2011), 1548-1566. doi: 10.1109/TIT.2011.2104999. Google Scholar

[29]

N. HalkoP. G. Martinsson and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), 217-288. doi: 10.1137/090771806. Google Scholar

[30]

T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1990, https://books.google.com/books?id=qa29r1Ze1coC. Google Scholar

[31]

J. HauptR. M. Castro and R. Nowak, Distilled sensing: Adaptive sampling for sparse detection and estimation, IEEE Transactions on Information Theory, 57 (2011), 6222-6235. doi: 10.1109/TIT.2011.2162269. Google Scholar

[32]

M. HristacheA. JuditskyJ. Polzehl and V. Spokoiny, Structure adaptive approach for dimension reduction, The Annals of Statistics, 29 (2001), 1537-1566. doi: 10.1214/aos/1015345954. Google Scholar

[33]

P. J. Huber, Projection pursuit, The annals of Statistics, 13 (1985), 435-475. doi: 10.1214/aos/1176349519. Google Scholar

[34]

A. B. JuditskyO. V. Lepski and A. B. Tsybakov, Nonparametric estimation of composite functions, The Annals of Statistics, 37 (2009), 1360-1404. doi: 10.1214/08-AOS611. Google Scholar

[35]

S. Keiper, Analysis of generalized ridge functions in high dimensions, in International Conference on Sampling Theory and Applications (SampTA), IEEE, 2015,259–263. doi: 10.1109/SAMPTA.2015.7148892. Google Scholar

[36]

M. Kolar and E. P. Xing, Consistent covariance selection from data with missing values, in Proceedings of the International Conference on Machine Learning (ICML-12), 2012,551–558.Google Scholar

[37]

A. Krishnamurthy, M. Azizyan and A. Singh, Subspace learning from extremely compressed measurements, arXiv preprint, arXiv: 1404.0751.Google Scholar

[38]

R. Lam, O. Zahm, Y. Marzouk and K. Willcox, Multifidelity dimension reduction via active subspaces, arXiv preprint, arXiv: 1809.05567.Google Scholar

[39]

K. C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 86 (1991), 316-327. doi: 10.1080/01621459.1991.10475035. Google Scholar

[40]

E. LibertyF. WoolfeP. G. MartinssonV. Rokhlin and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, 104 (2007), 20167-20172. doi: 10.1073/pnas.0709640104. Google Scholar

[41]

P. L. Loh and M. J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40 (2012), 1637-1664. doi: 10.1214/12-AOS1018. Google Scholar

[42]

K. Lounici, High-dimensional covariance matrix estimation with missing observations, Bernoulli, 20 (2014), 1029-1058. doi: 10.3150/12-BEJ487. Google Scholar

[43]

E. Novak and H. Woźniakowski, Tractability of Multivariate Problems: Standard information for functionals, vol. 12, European Mathematical Society, 2010. doi: 10.4171/084. Google Scholar

[44]

F. W. J. Olver, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010.Google Scholar

[45]

A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8 (1999), 143-195. doi: 10.1017/S0962492900002919. Google Scholar

[46]

A. Pinkus, Ridge Functions, Cambridge Tracts in Mathematics, Cambridge University Press, 2015, https://books.google.com/books?id=dtMmCgAAQBAJ. doi: 10.1017/CBO9781316408124. Google Scholar

[47]

F. Pourkamali-Anaraki, Estimation of the sample covariance matrix from compressive measurements, IET Signal Processing, 10 (2016), p1089, arXiv: 1512.08887. doi: 10.1049/iet-spr.2016.0169. Google Scholar

[48]

C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. Google Scholar

[49]

P. RavikumarM. J. WainwrightG. Raskutti and B. Yu, High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence, Electronic Journal of Statistics, 5 (2011), 935-980. doi: 10.1214/11-EJS631. Google Scholar

[50]

B. Recht, A simpler approach to matrix completion, The Journal of Machine Learning Research, 12 (2011), 3413-3430. Google Scholar

[51]

B. RechtM. Fazel and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), 471-501. doi: 10.1137/070697835. Google Scholar

[52]

A. M. Samarov, Exploring regression structure using nonparametric functional estimation, Journal of the American Statistical Association, 88 (1993), 836-847. doi: 10.1080/01621459.1993.10476348. Google Scholar

[53]

T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), IEEE, 2006,143–152. doi: 10.1109/FOCS.2006.37. Google Scholar

[54]

C. J. Stone, Additive regression and other nonparametric models, The Annals of Statistics, 13 (1985), 689-705. doi: 10.1214/aos/1176349548. Google Scholar

[55]

J. F. Traub and H. Wozniakowski, A General Theory of Optimal Algorithms, Technical report, Academic Press New York, 1980. Google Scholar

[56]

R. TripathyI. Bilionis and M. Gonzalez, Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation, Journal of Computational Physics, 321 (2016), 191-223. doi: 10.1016/j.jcp.2016.05.039. Google Scholar

[57]

H. Tyagi and V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, Applied and Computational Harmonic Analysis, 37 (2014), 389-412. doi: 10.1016/j.acha.2014.01.002. Google Scholar

[58]

R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing, Cambridge Univ. Press, Cambridge, 2012, 210–268, arXiv: 1011.3027. Google Scholar

[59]

F. Vivarelli and C. K. I. Williams, Discovering hidden features with Gaussian processes regression, Advances in Neural Information Processing Systems, 613–619.Google Scholar

[60]

P. Wedin, Perturbation bounds in connection with singular value decomposition, BIT Numerical Mathematics, 12 (1972), 99-111. doi: 10.1007/bf01932678. Google Scholar

[61]

H. Wendland, Scattered Data Approximation, vol. 17, Cambridge University Press, Cambridge, 2005. Google Scholar

[62]

Y. XiaH. TongW. K. Li and L. X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), 363-410. doi: 10.1111/1467-9868.03411. Google Scholar

[63]

X. Yin and B. Li, Sufficient dimension reduction based on an ensemble of minimum average variance estimators, The Annals of Statistics, 39 (2011), 3392-3416. doi: 10.1214/11-AOS950. Google Scholar

[64]

O. Zahm, P. Constantine, C. Prieur and Y. Marzouk, Gradient-based dimension reduction of multivariate vector-valued functions, arXiv preprint, arXiv: 1801.07922.Google Scholar

Figure 1.  Visualization of the problem setup. The probability measure $ \mu $ is supported on the domain $ \mathbb{D}\subseteq \mathbb{R}^n $ of a smooth function $ f(\cdot) $. Here, $ N = 3 $ and $ X = \{x_i\}_{i = 1}^N $ are drawn independently from $ \mu $. For sufficiently small $ \epsilon $, we let $ \mathbb{B}_{x_i, \epsilon} $ denote the $ \epsilon $-neighborhood of each $ x_i $ and set $ \mathbb{B}_{X, \epsilon} = \cup_{i = 1}^N \mathbb{B}_{x_i, \epsilon} $. On $ \mathbb{B}_{X, \epsilon} $, $ \mu $ induces the conditional measure $ \mu_{X, \epsilon} $, from which $ N_{X, \epsilon} $ points are independently drawn and collected in $ Y_{X, \epsilon} = \{y_{ij}\}_{i, j} $. Here, $ x_1 $ has $ N_{x_1, \epsilon} = 2 $ neighbors in $ Y_{X, \epsilon} $ and we set $ Y_{x_1, \epsilon} = \{y_{1, j} \}_{j = 1}^{N_{x_1, \epsilon}} $. Similarly, $ Y_{x_2, \epsilon} $ and $ Y_{x_3, \epsilon} $ are formed. Note that $ Y_{X, \epsilon} = \cup_{i = 1}^N Y_{x_i, \epsilon} $. Our objective is to estimate the second-moment matrix of $ f(\cdot) $ (with respect to the probability measure $ \mu $) given $ \{x_i, f(x_i)\} $ and $ \{y_{ij}, f(y_{ij})\} $
Figure 2.  A simulation study using a 500-dimensional ($ n = 500 $) quadratic function for the relative error in the approximation (7) as a function of the number $ N_{X, \epsilon} $ of total samples. All simulations use $ \epsilon = 10^{-4} $. Each subfigure uses a different value for the minimum number $ N_{X, \text{min}, \epsilon} $ of points in the $ \epsilon $-neighborhood of each center point in $ X $, and varies the number $ N $ of centers. Each of the five groups of blue dots in each subfigure indicates a different number $ N $. Within each group, there are ten replications. The slope of each line is $ -1/2 $, which verifies the expected convergence rate from (6)
[1]

Lori Badea, Marius Cocou. Approximation results and subspace correction algorithms for implicit variational inequalities. Discrete & Continuous Dynamical Systems - S, 2013, 6 (6) : 1507-1524. doi: 10.3934/dcdss.2013.6.1507

[2]

YunKyong Hyon. Hysteretic behavior of a moment-closure approximation for FENE model. Kinetic & Related Models, 2014, 7 (3) : 493-507. doi: 10.3934/krm.2014.7.493

[3]

T. Hillen. On the $L^2$-moment closure of transport equations: The Cattaneo approximation. Discrete & Continuous Dynamical Systems - B, 2004, 4 (4) : 961-982. doi: 10.3934/dcdsb.2004.4.961

[4]

Pablo Ochoa. Approximation schemes for non-linear second order equations on the Heisenberg group. Communications on Pure & Applied Analysis, 2015, 14 (5) : 1841-1863. doi: 10.3934/cpaa.2015.14.1841

[5]

Martin Redmann, Peter Benner. Approximation and model order reduction for second order systems with Levy-noise. Conference Publications, 2015, 2015 (special) : 945-953. doi: 10.3934/proc.2015.0945

[6]

Yongchao Liu, Hailin Sun, Huifu Xu. An approximation scheme for stochastic programs with second order dominance constraints. Numerical Algebra, Control & Optimization, 2016, 6 (4) : 473-490. doi: 10.3934/naco.2016021

[7]

Yong-Jung Kim. A generalization of the moment problem to a complex measure space and an approximation technique using backward moments. Discrete & Continuous Dynamical Systems - A, 2011, 30 (1) : 187-207. doi: 10.3934/dcds.2011.30.187

[8]

Arnaud Münch, Ademir Fernando Pazoto. Boundary stabilization of a nonlinear shallow beam: theory and numerical approximation. Discrete & Continuous Dynamical Systems - B, 2008, 10 (1) : 197-219. doi: 10.3934/dcdsb.2008.10.197

[9]

Yanzhao Cao, Anping Liu, Zhimin Zhang. Special section on differential equations: Theory, application, and numerical approximation. Discrete & Continuous Dynamical Systems - B, 2015, 20 (5) : i-ii. doi: 10.3934/dcdsb.2015.20.5i

[10]

Haixia Liu, Jian-Feng Cai, Yang Wang. Subspace clustering by (k,k)-sparse matrix factorization. Inverse Problems & Imaging, 2017, 11 (3) : 539-551. doi: 10.3934/ipi.2017025

[11]

Laurent Baratchart, Sylvain Chevillard, Douglas Hardin, Juliette Leblond, Eduardo Andrade Lima, Jean-Paul Marmorat. Magnetic moment estimation and bounded extremal problems. Inverse Problems & Imaging, 2019, 13 (1) : 39-67. doi: 10.3934/ipi.2019003

[12]

Gilberto M. Kremer, Wilson Marques Jr.. Fourteen moment theory for granular gases. Kinetic & Related Models, 2011, 4 (1) : 317-331. doi: 10.3934/krm.2011.4.317

[13]

Marco Di Francesco, Simone Fagioli, Massimiliano D. Rosini. Many particle approximation of the Aw-Rascle-Zhang second order model for vehicular traffic. Mathematical Biosciences & Engineering, 2017, 14 (1) : 127-141. doi: 10.3934/mbe.2017009

[14]

Mohamed Assellaou, Olivier Bokanowski, Hasnaa Zidani. Error estimates for second order Hamilton-Jacobi-Bellman equations. Approximation of probabilistic reachable sets. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 3933-3964. doi: 10.3934/dcds.2015.35.3933

[15]

D. Lannes. Consistency of the KP approximation. Conference Publications, 2003, 2003 (Special) : 517-525. doi: 10.3934/proc.2003.2003.517

[16]

Cristina Stoica. An approximation theorem in classical mechanics. Journal of Geometric Mechanics, 2016, 8 (3) : 359-374. doi: 10.3934/jgm.2016011

[17]

Abraão D. C. Nascimento, Leandro C. Rêgo, Raphaela L. B. A. Nascimento. Compound truncated Poisson normal distribution: Mathematical properties and Moment estimation. Inverse Problems & Imaging, 2019, 13 (4) : 787-803. doi: 10.3934/ipi.2019036

[18]

Jakub Cupera. Diffusion approximation of neuronal models revisited. Mathematical Biosciences & Engineering, 2014, 11 (1) : 11-25. doi: 10.3934/mbe.2014.11.11

[19]

Bernd Aulbach, Martin Rasmussen, Stefan Siegmund. Approximation of attractors of nonautonomous dynamical systems. Discrete & Continuous Dynamical Systems - B, 2005, 5 (2) : 215-238. doi: 10.3934/dcdsb.2005.5.215

[20]

Rua Murray. Approximation error for invariant density calculations. Discrete & Continuous Dynamical Systems - A, 1998, 4 (3) : 535-557. doi: 10.3934/dcds.1998.4.535

 Impact Factor: 

Metrics

  • PDF downloads (40)
  • HTML views (68)
  • Cited by (0)

[Back to Top]