Big Data and Information Analytics (BDIA)

On balancing between optimal and proportional categorical predictions

Pages: 129 - 137, Volume 1, Issue 1, January 2016      doi:10.3934/bdia.2016.1.129

       Abstract        References        Full Text (289.1K)       Related Articles       

Wenxue Huang - Department of Mathematics, Guangzhou University, Guangzhou, Guangdong 510006, China (email)
Yuanyi Pan - Kochava Inc, 414 Church Street, Suite 306, Sandpoint, Idaho 83864, United States (email)

Abstract: A bias-variance dilemma in categorical data mining and analysis is the fact that a prediction method can aim at either maximizing the overall point-hit accuracy without constraint or with the constraint of minimizing the distribution bias. However, one can hardly achieve both at the same time. A scheme to balance these two prediction objectives is proposed in this article. An experiment with a real data set is conducted to demonstrate some of the scheme's characteristics. Some basic properties of the scheme are also discussed.

Keywords:  Bias-variance dilemma, categorical data, optimal prediction, proportional prediction, point estimation, conditional distribution.
Mathematics Subject Classification:  Primary: 68T10, 62H20; Secondary: 62G86.

Received: May 2015;      Revised: August 2015;      Available Online: September 2015.