Learning from Multiple Sources with Applications to Robotics
NIPS Workshop, December 12th 2009, Whistler, Canada
Background

Learning from multiple sources. In machine learning, the most traditional research has focused on analyzing the output of a single data source. However, often multiple sources of data are available that reflect the same phenomenon or related ones. Recently, much research has been done on data integration in fields such as bioinformatics, multimodal signal processing, and information retrieval. Then sophisticated machine learning algorithms must be developed, to find the relevant information shared between different data sources.

In one traditional setup, features from many data sources are observed together. Such research is inspired, for instance, by the human brain's ability to integrate different sensory input streams into a representation of its environment. Both generative and discriminative approaches have been used for this setup. Early approaches extracted a set of features for each data source by optimizing a dependency criterion, as in canonical correlation analysis (CCA; [9]), its kernel variants [8,13], and methods that optimize mutual information between extracted features [2]. For example, CCA has been used to find spectral co-regularization methods. In these discriminative approaches, incorporating prior knowledge about the shared information is difficult; a possible solution could be a probabilistic generative approach [1,11,14]. Additionally, CCA-type approaches assume that data consist of independent pairs of related data points from the same distribution, which can be restrictive in real-world problems with complex co-variation. Recently, methods have been introduced that do not need pre-given pairs of points [22].

In transfer learning and multitask learning, models for several learning problems are inferred at the same time; the most traditional scenario is having several outputs for the same input (e.g. [5]) but more generally learning is done from separate data sets where each set (task) is partially relevant to the task(s) of interest, and dependencies between tasks are modeled to transfer information between the tasks [3,4,10,23]. The subfields of domain adaptation [7], and learning under covariate shift [19,20] typically assume p(t|x) remains unchanged between different tasks while the input domain p(x) differs.

In multiview learning (e.g. [6]), techniques have been proposed whose performance provably improves with the number of `useful' views. Different formalizations and heuristical approaches to multiview learning include ensemble methods, graphical models, and methods that combine multiple kernels with a co-regularization function. Many of these methods are semi-supervised: unsupervised data are used to build view-dependent co-regularizers.

Robotics. In robotics, the ultimate goal is to develop robots able to do a wide variety of tasks in unconstrained settings, with a degree of robustness and adaptability similar to that of humans. Robots typically use different sensor modalities (vision, audio, range sensors, etc.) to capture information from the environment. How to combine these multi-sensor data in a principled way is of the utmost importance. Typical applications of multi-modal learning in robotics are: localization and mapping, where the robot builds dynamically a space representation based on range sensors, visual and possibly auditory data; object grasping and manipulation, where the visual input that recognize and localize the object must be coordinated with the internal senso-motor map that determines the arm trajectory and hand posture. Current approaches to these problems are largely heuristic and heavily tailored to the application at hand. Hence, there is a strong need for principled approaches, well grounded in theory and able to accommodate the variety of the sensor data structure, the intrinsic dynamic of the problem, and the large amount of unlabeled data that is received continuously.

Several approaches to learning from multiple sources have been used in robotics. Transfer learning methods with small computational complexity have been designed to reduce the training sample complexity of regression and classification tasks [17]. Multi-view learning has been used in place recognition: information from different sensors has been used to deal with long-term visual variations in indoor environments [18]. Thrun and Mitchell [16,21] studied exchanging knowledge related to different tasks in the context of artificial neural networks and argued for the importance of knowledge-transfer schemes for lifelong robot learning. Knowledge transfer has also been studied from the perspective of Reinforcement Learning, including transferring learned skills between different RL agents [12,15].

References
  1. Bach, F. R. and Jordan, M. I. 2005. A Probabilistic Interpretation of Canonical Correlation Analysis. Tech. Report. 688. Dept. of Statistics, University of California.
  2. Becker, S. 1996. Mutual Information Maximization: models of cortical self -organization. Network: Computation in Neural Systems, 7, 7-31.
  3. Bickel, S., Sawade, C., and Scheffer, T. 2009. Transfer Learning by Distribution Matching for Targeted Advertising. Advances in Neural Information Processing Systems (NIPS).
  4. Bonilla, E. V., Chai, K. M. A., and Williams, C. K. I. 2008. Multitask Gaussian Process Prediction. Advances in Neural Information Processing Systems (NIPS).
  5. Caruana, R. Multitask learning. Machine Learning, 28, 41-75, 1997.
  6. Christoudias, M., Urtasun, R., and Darrell, T. 2008. Multi-View Learning in the Presence of View Disagreement. 9 pp., In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).
  7. Daume, H., and Marcu, D. 2006. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, (26):101-126.
  8. Hardoon, D. R., Szedmak, S. and Shawe-Taylor J. 2004. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 16(12), 2639-2664.
  9. Hotelling, H. 1936. Relations between two sets of variates. Biometrika, 28, 312-377.
  10. Kaski, S. and Peltonen, J. 2007. Learning from Relevant Tasks Only. Pages 608-615 of Machine Learning: ECML 2007.
  11. Klami, A. and Kaski, S. 2006. Generative models that discover dependencies between two data sets. Pages 123-128 of Machine Learning for Signal Processing XVI.
  12. Konidaris, G. and Barto, A. G. 2006. Autonomous shaping: knowledge transfer in reinforcement learning. Proceedings of the 23rd International Conference on Machine Learning.
  13. Lai, P. L. and Fyfe, C. 2000. Kernel and Nonlinear Canonical Correlation Analysis. International Journal of Neural Systems 10(5), 365-377.
  14. Leen, G. and Fyfe, C. 2006. A Gaussian Process Latent Variable Model Formulation of Canonical Correlation Analysis. Pages 413-418 of: Proceedings of the 14th European Symposium of Artificial Neural Networks (ESANN).
  15. Malak, Jr. R. J. and Khosla, P. K. 2001. A framework for the adaptive transfer of robot skill knowledge using reinforcement learning agents. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'01).
  16. Mitchell, T. 2006. The discipline of machine learning. Technical Report CMU-ML-06-108, CMU.
  17. Orabona, F., Castellini, C., Caputo, B., Fiorilla, A. E. and Sandini, G. 2009. Model adaptation with least-squares SVM for adaptive hand prosthetics. Proceedings of the International Conference on Robotics and Automation (ICRA).
  18. Pronobis, A., Martínez Mozos, O. and Caputo, B. 2008. SVM-based Discriminative Accumulation Scheme for Place Recognition. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA08).
  19. Storkey, A. J., and Sugiyama, M. 2007. Mixture Regression for Covariate Shift. Advances in Neural Information Processing Systems.
  20. Sugiyama, M., Krauledat, M., and Muller, K-R. 2007. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, (8): 985-1005.
  21. Thrun, S., and Mitchell, T. 1995. Lifelong robot learning. Robotics and Autonomous Systems 15.
  22. Tripathi, A., Klami, A., and Kaski, S. 2009. Using dependencies to pair samples for multi-view learning. Pages 1561-1564 of Proceedings of ICASSP 2009.
  23. Zhang, J., Ghahramani, Z., and Yang, Y. 2008. Flexible Latent Variable Models for Multitask Learning. Machine Learning, 73(3):221-242.
Contact Persons

For questions about the workshop, contact David R. Hardoon at D.Hardoon AT cs.ucl.ac.uk.
For questions about the website, contact Simon Rogers at srogers AT dcs.gla.ac.uk.