# American Institute of Mathematical Sciences

ISSN:
2380-6966

eISSN:
2380-6974

All Issues

## Big Data & Information Analytics

2016 , Volume 1 , Issue 1

Select all articles

Export/Reference:

2016, 1(1): i-iii doi: 10.3934/bdia.2016.1.1i +[Abstract](382) +[PDF](145.5KB)
Abstract:
The growing appreciation for the role of Big Data in addressing important problems and societal needs has led to many welcome developments, including the new journal we are celebrating with its inaugural issue today. In Toronto, an interdisciplinary network of faculty has been working together on big data problems for a number of years through the Centre for Information Visualization and Data-Driven Design (CIVDDD). Founded and Led by Nick Cercone (York) and supported by the Ontario provincial government, this network has brought together researchers in data analytics and visualization from York University, OCAD University and the University of Toronto with industry partners to tackle challenging Big Data problems. A key emphasis of CIVDDD is the importance of fusing analytics with powerful visualization methods that allow the full value of the data to be extracted.

2016, 1(1): v-v doi: 10.3934/bdia.2016.1.1v +[Abstract](553) +[PDF](92.0KB)
Abstract:
Professors Jimmy Huang, Ali Asgary and Jianhong Wu from York University in Canada have received \$1, 650,000 through the Natural Sciences and Engineering Research Council of Canada's (NSERC) Collaborative Research and Training Experience (CREATE) Grants over six years to lead the industry program titled Computational Approaches for Advanced Disaster, Emergency and Rapid Response Simulation (ADERSIM)". The proposed Advanced Disaster, Emergency, and Rapid Response Simulation NSERC CREATE industry stream program will enhance Canada's capacities in public safety and emergency management through innovative training, research, and development of professionals in state-of-the-art simulations and emergency management information systems.

2016, 1(1): 1-13 doi: 10.3934/bdia.2016.1.1 +[Abstract](422) +[PDF](721.9KB)
Abstract:
The cloud computing has attracted growing attentions for its benefits to providing on-demand services, mobile cloud computing (MCC) enables an increasing number of applications and computational services available on mobile devices. In MCC, computation offloading is one of the most important challenges to provide remote execution of applications to the mobile devices. Here we mainly introduce the ant colony optimization (ACO) to address this challeng and propose an ACO-based solution to the computation offloading problem. The proposed method can be well implemented in practice and presents with low computing complexity.
2016, 1(1): 15-29 doi: 10.3934/bdia.2016.1.15 +[Abstract](608) +[PDF](547.6KB)
Abstract:
Due to its elastic and on-demand nature of resource provisioning, cloud computing provides a cost effective and powerful technology for the processing of big data. Under this paradigm, Data Service Provider (DSP) may rent geographically distributed datacenters to process their large amount of data. As the data are dynamically generated and the resource pricing varies over time, moving the data from differently geographic locations to different datacenters while provisioning adequate computation resource to process them is an essential task to achieve cost effectiveness for DSP. In this paper, a joint online approach is proposed to address this task. We formulate the problem into a joint stochastic optimization problem, which is then decoupled into two independent subproblems via the Lyapunov framework. Our method is able to minimize the long-term time average cost including computing cost, storage cost, bandwidth cost and latency cost. Theoretical analysis shows that our online algorithm can produce a solution within an upper bound to the optimal solution achieved through offline computing and guarantee that the data processing can be completed with preset delays.
2016, 1(1): 31-79 doi: 10.3934/bdia.2016.1.31 +[Abstract](617) +[PDF](870.8KB)
Abstract:
This position paper is based on a major cooperative research and development proposal to form a Big Data Research, Analytics, and Information Network (BRAIN). Challenges presented by Big Data research are introduced and several projects are sketched in four important Big Data research theme areas, the solutions of which will further decision making in these areas of investigation. The four themes are large-scale data analytics and cloud computing, computational biology, health informatics, and interactive content analytics. These theme areas are certainly not inclusive, rather indicative of the wide variety to which Big Data now occupies decision analytics. The importance of training highly qualified personnel (HQP), knowledge mobilization and novelty are discussed.
2016, 1(1): 81-91 doi: 10.3934/bdia.2016.1.81 +[Abstract](428) +[PDF](426.3KB)
Abstract:
With the amount of data accumulated to tens of billions of scale, HBase, a distributed key-value database, plays a significant role in providing effective and high-throughput data service and management. However, for the applications involving spatio-temporal data, there is no good solution, due to inefficient query processing in HBase. In this paper, we propose spatio-temporal keyword searching problem for HBase, which is a meaningful issue in real life and a new challenge in this platform. To solve this problem, a novel access model for HBase is designed, containing row keys for indexing spatio-temporal dimensions and Bloom filters for fast detecting the existence of query keywords. And then, two algorithms for spatio-temporal keyword queries are developed, one is suitable for the queries with ordinary selectivity, the other is a parallel algorithm based on MapReduce aiming for the large range queries. We evaluate our algorithms on a real dataset, and the empirical results show that they are capable to handle spatio-temporal keyword queries efficiently.
2016, 1(1): 93-109 doi: 10.3934/bdia.2016.1.93 +[Abstract](692) +[PDF](387.2KB)
Abstract:
Entropy weighting used in some soft subspace clustering algorithms is sensitive to the scaling parameter. In this paper, we propose a novel soft subspace clustering algorithm by using log-transformed distances in the objective function. The proposed algorithm allows users to choose a value of the scaling parameter easily because the entropy weighting in the proposed algorithm is less sensitive to the scaling parameter. In addition, the proposed algorithm is less sensitive to noises because a point far away from its cluster center is given a small weight in the cluster center calculation. Experiments on both synthetic datasets and real datasets are used to demonstrate the performance of the proposed algorithm.
2016, 1(1): 111-127 doi: 10.3934/bdia.2016.1.111 +[Abstract](525) +[PDF](5848.4KB)
Abstract:
Since being recently raised, curriculum learning (CL) and self-paced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.
2016, 1(1): 129-137 doi: 10.3934/bdia.2016.1.129 +[Abstract](577) +[PDF](289.1KB)
Abstract:
A bias-variance dilemma in categorical data mining and analysis is the fact that a prediction method can aim at either maximizing the overall point-hit accuracy without constraint or with the constraint of minimizing the distribution bias. However, one can hardly achieve both at the same time. A scheme to balance these two prediction objectives is proposed in this article. An experiment with a real data set is conducted to demonstrate some of the scheme's characteristics. Some basic properties of the scheme are also discussed.