SSPNet Summer School
on Social Signal Processing

Monday, June 3rd 2013

Afternoon Session (14.30-18.00)

Alessandro Vinciarelli

Social Signal Processing: an Introduction

The goal of this course is to provide a general introduction to Social Signal Processing, the domain aimed at modelling, analysis and synthesis of nonverbal communication in social interactions. The first part of the course introduces the core concepts of Social Signal Processing, in particular when it comes to nonverbal communication and its relationship with computing technologies. The second part will show examples of the methodologies typically applied in SSP, from the collection of data to the experiments and their interpretation. The third part will highlight recent SSP trends as well as the most important open issues of the domain. Furthermore, it will introduce the overall design of the school and the different teachers and courses.

A.Vinciarelli, M.Pantic, and H.Bourlard, Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing Journal, vol. 27, no. 12, pp. 1743-1759, 2009.

A.Vinciarelli, M.Pantic, D.Heylen, C.Pelachaud, I.Poggi, F.D'Errico and M.Schroeder, Bridging the Gap Between Social Animal and Unsocial Machine: A Survey of Social Signal Processing, IEEE Transactions on Affective Computing, vol. 3, no. 1, pp. 69-87, 2012.

G.Mohammadi and A.Vinciarelli, Automatic Personality Perception: Prediction of Trait Attribution Based on Prosodic Features, IEEE Transactions on Affective Computing, vol. 3, no. 3, pp. 273-284, 2012.

A.Vinciarelli, Capturing Order in Social Interactions, IEEE Signal Processing Magazine, vol. 26, no. 5, pp. 133-137, September 2009.

H.Salamin, S.Favre, and A.Vinciarelli, Automatic Role Recognition in Multiparty Recordings: Using Social Affiliation Networks for Feature Extraction, IEEE Transactions on Multimedia, vol. 27, no. 12, pp. 1373-1380, 2009.

Tuesday, June 4th 2013

Morning Session (9.00-12.30)

Kerstin Dautenhahn

The course will cover the following topics:
1) The first part will introduce foundational issues in Human-Robot Interaction research with a particular emphasis on companion robots. Key concepts and methodological issues will be covered
2) The second part will provide examples of recent HRI research trends with companion robots, with a focus on assistive applications, i.e. where companion robots are meant to provide physical, cognitive and social assistance e.g. to elderly people, or are being used as tools in robot-assisted therapy for children with autism
3) Key research challenges, open issues and future trends will be discussed

M.A. Goodrich and A.C. Schultz, Human-Robot Interaction: A Survey, Foundations and Trends in Human-Computer Interaction, vol. 1, no. 3, pp. 203-275, 2007.

K. Dautenhahn, Socially intelligent robots: dimensions of human - robot interaction, Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 362, no. 1480, pp. 679-704, 2007.

T. Fong, I. Nourbakhsh and K. Dautenhahn, A Survey of Socially Interactive Robots, Robotics and Autonomous Systems, vol. 42, no. 3-4, pp. 143-166, 2003.

KASPAR Project


Afternoon Session (14.30-18.00)

Adam Kendon

Communication conduct in co-present interaction: Three Lectures

The course will cover the following topics:
1) Varieties and nature of copresence. The varieties of co-presence are discussed, how these recruit different communication channels, the different ways in which persons provide information for each other and different kinds of participation. The distinction between focused and unfocused interaction and the spatial organization of gatherings.
2) Structures of participation in focused interaction. The organisation of occasions of focused interaction and how participation frameworks may be organised. A discussion of how participants establish interactional axes or address reciprocals with one another and the forms of visible bodily action that are involved in this.
3) Modality orchestration in utterance production. Communicatively explicit acts or "utterances" in co-present interaction commonly involve the mobilisation of different modalities which function at several different semiotic levels simultaneously. We will examine examples to illustrate how visible bodily actions - head movements, movements of the hands, positioning of the body, and so forth - articulate with units of speech and discuss the significance of such "multimodal orchestration" for the understanding of language and interaction.

A. Kendon, Erving Goffman's approach to the study of face-to-face interaction, In A. Wootton and P. Drew (eds.), Erving Goffman: Exploring the Interaction Order, Cambridge: Polity Press, pp. 14-40, 1988.

A. Kendon, Behavioral foundations for the process of frame attunement in face-to-face interaction, In G.P. Ginsburg, M. Brenner, and M. von Cranach (eds.), Discovery Strategies in the Psychology of Action, London: Academic Press, pp.229-253, 1985.

A. Kendon, Spacing and orientation in co-present interaction, In A. Esposito, N. Campbell, C. Vogel, A. Hussein and A. Nijholt, (eds.), Development of Multimodal Interfaces: Active Listening and Synchrony, Second COST 2102 International Training School, Springer Verlag, pp. 1-15, 2010.

A. Kendon, Kinesic Components of Multimodal Utterances, Berkeley Linguistics Society Proceedings 2009.

A. Kendon, Some topics in Gesture Studies, In A. Esposito, M. Bratanic, E. Keller and M. Marinaro (eds.), The fundamentals of verbal and non verbal communication and the biometrical issues, IOS Press BV for NATO SERIES PUBLISHING, pp. 3-19, 2007.

Wednesday, June 5th 2013

Morning Session (9.00-12.30)

Nicu Sebe

Human Centered Computing (HCC) is an emerging field that aims at bridging the existing gaps between the various disciplines involved with the design and implementation of computing systems that support people's activities. HCC aims at tightly integrating human sciences (e.g. social and cognitive) and computer science (e.g. human-computer interaction (HCI), signal processing, machine learning, and computer vision) for the design of computing systems with a human focus from beginning to end. This course will address the existing challenges in HCC and will focus on real-time and robust solutions for eye detection and tracking, head pose estimation and their applications to gaze estimation, attention detection and personality.

R. Valenti, N. Sebe, T. Gevers, Combining Head Pose and Eye Location Information for Gaze Estimation, IEEE Transactions onImage Processing, vol. 21, pp. 802-815, 2012.

R. Valenti, N. Sebe, T. Gevers, What are you looking at? Improving Visual Gaze Estimation by Saliency. International Journal of Computer Vision, vol. 98, pp. 324-334, 2012.

B. Lepri, R. Subramanian, K. Kalimeri, J. Staiano, F. Pianesi, N. Sebe, Connecting Meeting Behavior with Extraversion - A Systematic Study, IEEE Transactions on Affective Computing, 2013 (to appear).

H. Joho, J. Staiano, N. Sebe, J. Jose, Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents, Multimedia Tools and Applications, vol. 51, pp. 505-523, 2011.

Afternoon Session (14.30-18.00)

Philippe Schyns

The face expresses a number of signals that the brain can code within a few hundred milliseconds. Amongst these, facial expressions of emotion have been of particular biological importance for the survival of the species. Here, I will discuss the state‐of‐the‐art on the understanding of what information in the face represents each one of the six basic facial expressions of emotion (i.e. happy, surprise, fear, disgust, anger and sadness). We will then review the dynamics of cortical coding of this information, both from event related potentials and from oscillatory activity. Finally, I will discuss a new approach that generalises the extraction of information to dynamically rendered three‐ dimensional faces.

R.E. Jack, O.G.B Garrod, H. Yu, R. Caldara and P.G. Schyns, Facial expressions of emotion are not culturally universal, Proceedings of the National Academy of Sciences of the United States of America, vol. 109, pp. 7241-7244, 2012.

P.G. Schyns, G. Thut, and J. Gross, Cracking the code of oscillatory activity, PLoS Biology, vol. 9, no. 5, e1001064, 2011.

N. Van Rijsbergen, and P.G. Schyns, Dynamics of trimming the content of face representations for categorization in the brain, PLoS Computational Biology, vol. 5, no. 11, e1000561, 2010.

P.G. Schyns, L. Petro, and M.L. Smith, Dynamics of visual information integration in the brain to categorize facial expressions, Current Biology, vol. 17, pp. 1580-1585, 2007.

M.L. Smith, G.W. Cottrell, F. Gosselin and P.G. Schyns, Transmitting and decoding facial expressions, Psychological Science, vol. 16, pp. 184-189, 2005.

R. Adolphs, F. Gosselin, T.W. Buchanan, D. Tranel, P.G. Schyns, A.R. Damasio, A mechanism for impaired fear recognition after amygdala damage, Nature, vol. 433, 68-72, 2005.

Thursday, June 6th 2013

Morning Session (9.00-12.30)

Jan de Ruiter

Multimodal Human-Human Communication

The course will consist of three parts: a) theory, b) models, and c) data.

In the first part I will lay some conceptual foundations by addressing the central question: what is communication? This seemingly trivial question is in fact difficult to answer. For instance, how do we distinguish it from other forms of interaction, like gravity? Important here are the differences between symptoms and signals, and the corresponding notions of manipulation, exploitation, and intentionality.

In the second part I will discuss the famous communication model by Shannon, and why and how this model fails to capture the essential properties of the communication between intentional agents like humans. Relevant here is also the discussion about the so-called “conduit metaphor”. Further models I will address are the Interactive Alignment model, and the theory of non-natural meaning by Grice.

In the third and final parts, I will talk about some empirical studies and debates. I will address nonverbal communication, and the popular myth that our communication is to a very large degree relying on nonverbal (or better, 'non linguistic') communication. I will also discuss the communicative aspects (or lack thereof) of facial expressions and different types of speech related gesture. To what degree are these modalities communicative, and what research methods can we use to find out more?

A.J. Fridlund, Evolution and facial action in reflex, social motive, and paralanguage, Biological Psychology, vol. 32, pp. 3-100, 1991.

H.P. Grice, Meaning, Philosophical Review, vol. 66, pp. 377-388, 1957.

A. Kendon, Do gestures communicate?: A review, Research in Language and Social Interaction, vol. 27, no. 3, pp. 175-200, 1994.

R.M. Krauss, W. Apple, N. Morency, C. Wenzel and W. Winton, Verbal, vocal, and visible factors in judgments of another's affect, Journal of Personality and Social Psychology, vol. 40, pp. 312-319, 1981.

M.J. Pickering and S. Garrod, Toward a mechanistic psychology of dialogue, Behavioral and Brain Sciences, vol. 27, no. 2, pp. 169-226, 2004.

M.J. Reddy, The conduit metaphor. A case of frame conflict in our language about language, In A. Ortony (Ed.), Metaphor and Thought, pp. 284-310, Cambridge University Press, 1979.

C.E.Shannon, A mathematical theory of communication, Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, 1948.

Afternoon Session (14.30-18.00)

Bjoern Schuller

The Researcher's Guide to Challenge one's Community

Research Challenges hold the promise to provide unified test-beds for evaluation and exchange of ideas across the community on specific given tasks. Provided their proper definition, they may help overcome the often present lack of comparability of findings due to different data-sets, partitioning, evaluation measures, and many further conditions that can vary. If successfully organised, they may serve as long-standing reference helping to advance research in their field. This lecture aims to provide a tutorial overview on how to design and hold such Challenges. The discussion bases on the presenter's experiences in organising four consecutive first-of-their-kind Challenges held at INTERSPEECH from 2009 to 2012 dealing with various aspects of Computational Paralinguistics such as emotion and personality of speakers, and the two first Audio/Visual Emotion Challenges, as well as participation in many further related events such as CHiME, MediaEval or MIREX. The interactive presentation follows the time-line from task preparation and sponsor acquisition over proposition to advertising, result collection, holding of the actual Challenge, awarding, and post-event activities such as dissemination and editing of related Special Issues. Details will be discussed for any of these steps focussing on the domain of Social Signal Processing - in particular touching provision of features, baselines, partitioning, evaluation measures, fusion and analysis of participants' results. Examples will include a wider selection of typical evaluation campaigns held in the broader field.

Key references

B. Schuller, "The Computational Paralinguistics Challenge," IEEE Signal Processing Magazine, vol. 29, pp. 97-101, 2012

B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Mueller, and S. Narayanan, "Paralinguistics in Speech and Language - State-ofthe-Art and the Challenge," Computer Speech and Language, Special Issue on Paralinguistics in Naturalistic Speech and Language, vol. 27, pp. 4-39, 2013.

B. Schuller, A. Batliner, S. Steidl, and D. Seppi, "Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge," Speech Communication, Special Issue on Sensing Emotion and Affect - Facing Realism in Speech Processing, vol. 53, pp. 1062-1087, 2011.

B. Schuller, S. Steidl, A. Batliner, E. Noth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, and B. Weiss, "The INTERSPEECH 2012 Speaker Trait Challenge," in Proceedings INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, (Portland, OR), ISCA, 2012.

B. Schuller, M. Valstar, R. Cowie, and M. Pantic, "AVEC 2012 - The Continuous Audio/Visual Emotion Challenge," in Proceedings of the 14th ACM International Conference on Multimodal Interaction, ICMI, (Santa Monica, CA), pp. 449-456, ACM, 2012.

Friday, June 7th 2013

Morning Session (9.00-12.30)

Louis-Philippe Morency

Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal displays and signals. Human interpersonal behaviors have long been studied in linguistic, communication, sociology and psychology. The recent advances in machine learning, pattern recognition and signal processing enabled a new generation of computational tools to analyze, recognize and predict human communication behaviors during social interactions. This new research direction have broad applicability, including the improvement of human behavior recognition, the synthesis of natural animations for robots and virtual humans, the development of intelligent tutoring systems, and the diagnoses of social disorders (e.g., autism spectrum disorder).

The objectives of this course are: (1) To give a general overview of human communicative behaviors (language, vocal and nonverbal) and show a parallel with computer science subfields (natural language processing, speech processing and computer vision); (2) To understand the multimodal challenge of human communication (e.g. speech and gesture synchrony) and learn about multimodal signal processing; and (3) To understand the social aspect of human communication and its implication on statistical and probabilistic modeling.

L.-P. Morency, Computational Study of Human Communication Dynamics, Proceedings of International Workshop on Social Signal Processing (SSPW2011), pp. 13-18, 2011.

L.-P. Morency, R. Mihalcea and P. Doshi, Towards Multimodal Sentiment Analysis: Harvesting Opinions from The Web, Proceedings of the International Conference on Multimodal Interfaces, pp. 169-176, 2011.

L.-P. Morency, I. de Kok and J. Gratch, Context-based Recognition during Human Interactions: Automatic Feature Selection and Encoding Dictionary, International Conference on Multimodal Interfaces, pp. 181-188, 2008.

L.-P. Morency, J. Whitehill and J. Movellan, Generalized Adaptive View-based Appearance Model: Integrated Framework for Monocular Head Pose Estimation, International Conference on Automatic Face and Gesture Recognition, pp. 1-8, 2008.

L.-P. Morency, A. Quattoni and T. Darrell, Latent-Dynamic Discriminative Models for Continuous Gesture Recognition, Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2007.

This template downloaded form free website templates