## MEMBERS

### Program Members

I was with NEC Corporation from 1989 until March 2006, and I worked on fundamental research in machine learning and data mining, its applications, and implementation of these findings into products. After joining Kyushu University in 2006, I have been conducting education and research activities based primarily on fundamental research as well as industrial applications. My emphasis, particularly in education, is to produce graduates with both mathematical knowledge and an engineering sense.

Machine learning is a technology, which provides machines with the ability to learn similar to humans. Roughly speaking, machine learning has two major aspects: information based one, which concerns extraction of as much information as possible from given data, and computational one, which concerns execution of learning process as quickly as possible. My group is primarily focused on the former (information-based induction science).

Specifically, our research covers the following areas: 1) Minimum Description Length Principle (MDL Principle), 2) Analysis of Bayesian statistics using information geometry, 3) Application of anomaly detection to information security, and 4) Data mining in time-series analysis.

The MDL principle, which was initially proposed by Rissanen of IBM in 1978, is based on coding in information theory, and it derives MDL criterion for the model selection problem, which is important in statistics and learning theory. Model selection means to choose an appropriate model for a given data set from multiple options. For instance, the figure on the right shows learning problem of classification rule to classify ○ and × into two regions. In this figure, the model on the left has too many exceptions, i.e. the model does not fit the data sufficiently, while the one on the right is too complicated and too sensitive to noise to be able to make a prediction on new data. The model in the center is an appropriate one. In view of the MDL criterion, we measure the complexity of a model as the “model description length”, and fitness of the model to the data as “the data description length”. Then, the MDL criterion says, select the model for which the sum of the two lengths is minimum. It has been demonstrated that this criterion leads a quick convergence to the true model under certain regularity conditions.

Rissanen further developed this concept as well as established a concept called Stochastic Complexity (SC). This can be formulated as a code length limit when data is compressed using a model. SC is not only used as a criterion for model selection, but also as a guiding principle to determine a good learning method.

In my work on the SC, I have collaborated with Barron of Yale University and Kawabata of the University of Electro-Communications to demonstrate that SC can be achieved for an exponential family of densities and an important class called Markov model by method of Bayes mixuture with Jeffreys prior, as well as further determined their values down to a constant order. It should be noted that Markov model asymptotically becomes an exponential family, and therefore, these results can be understood in a unified view. Furthermore, I demonstrated a possibility that this result can be expanded into non-exponential cases using an informational geometrical point of view.

Information geometry is differential geometry for a space of probability distributions, and it is unique in that it uses a one-parameter family of connections called α-connection. In information geometry, an exponential family of distributions is positioned at a space where the embedding exponential curvature is zero in the entire probability distribution space. Therefore, the result described earlier means that the code length by the Bayes mixture is closely related to the exponential curvature of the model. More recently, we also consider geometry of subspaces in a Markov model. As other topic of information geometry, I have collaborated with Amari to introduce an α-prior (volume element parallel with respect to α-connection) that includes Jeffreys prior as a special case as well as analyzed its properties.

Our research in application areas is based primarily on MDL and informational geometry, and is concentrated in areas where specific issues exist. While I was with NEC, I collaborated with Yamanishi of NEC to devise a novel anomaly detection method, which is applicable to areas such as incident detections in network security. My group has also contributed to travel time prediction in Intelligent Transport Systems (ITS) in collaboration with Nakata of NEC. Our work in network security has been and remains to be conducted in collaboration with the National Institute of Information and Communications Technology (NICT).

I have also helped establish Information-Based Induction Sciences (IBIS) workshops (1998), and am still involved in their operation. Furthermore, I intend to contribute to Math-for-Industry through my activities by clarifying mathematical structures behind engineering issues surrounding machine learning, establishing universal solutions based on this, and training people who can advance these activities.