R440, Astronomy-Mathematics Building, NTU
(台灣大學天文數學館 440室)
Kernel Regression and Estimation: Learning Theory/Application and Errors-in-Variables Model Analysis
Pei-Yuan Wu (National Taiwan University)
Abstract
Kernel method plays an important role in both supervised and unsupervised machine learning applications such as clustering, classification, regression, feature selection, etc. Instead of explicitly representing data samples in the form of feature vectors, kernel method only requires a user-defined kernel function which describes the similarity over pairs of data samples.
There are some challenges applying kernel methods to big data analysis, which is usually characterized by 5Vs: Volume, Velocity, Variety, Veracity, and Value. Regarding the volume and velocity issues, the Gaussian radial basis function (RBF) kernel, being one of the most popular and effective kernels adopted, suffers from the so-called the curse of dimensionality problem that its learning and classification complexities grow drastically with the size of training dataset. The first part of the talk will be dedicated to the cost-effectiveness issue in kernel-based learning algorithms. A fast kernel ridge regression learning algorithm along with the truncated-RBF kernel is introduced to offer computational advantages over the traditional support vector machine with Gaussian-RBF kernel. Application on arrhythmia detection and large-scale free-text active authentication system shows the cost-effectiveness of our work.
The second part of the talk is dedicated to the veracity issue where some features collected may be erroneous. This motivates the errors-in-variables analysis for kernel-based estimation theory. This talk will discuss the impact of input noise on nonlinear regression functions by a spectral decomposition analysis. This enables the decomposition of a nonlinear function into various spectral components, each having independent and heterogeneous “robustness” towards the presence of input noise. Our work establishes a theoretical foundation on how to select nonlinear basis functions under errors-in-variables models, by which the estimation error caused by input noise will be minimized.
People who are far away from the NCTS are also welcome to join the forum via Skype.