The Talking Thinking Teaching Head

From Talking Heads to Teaching Heads

Towards the Total Turing Test

How can real-time interaction between humans and machines be made significantly more effective? The Thinking Head team takes current research on Talking Heads into the realm of Thinking Heads, in the process addressing a range of fundamental interdisciplinary issues about verbal-aural communication, the most efficient of human communication systems. The approach is novel in its integration of best-practice Talking Head science and technology with careful analysis and evaluation from the perspective of cognitive science to create a tight feedback loop for Thinking Head development and elaboration.

The Head X Research Platform is one of Flinders AILab's major research outputs for this project, taking us into the realm of a high-fidelity Thinking Head with a platform that is freely available to other researchers – both for basic research, further developing the individual technologies in the pipeline, and for applied research, where our own applications focus in the area of Assistive and Educational Technology.

The research and technology is relevant to human-machine communication, telecommunications, e-commerce, and mobile phone technology; personalised aids for disabled users, the hearing impaired, the elderly, and children with learning difficulties; and foreign language learning; and will facilitate the development of animation in new media, film, and in particular games. The various Heads have been demonstrated widely, with public visibility for the project will be facilitated by the incorporation of high-profile installations and exhibitions, including the Arts Festival preceeding the Beijing Olympics, and a permanent display, as well as occasional robotic displays, at the Powerhouse museum in Sydney.

The Thinking Head incorporates components focussed on dialogue management, speech generation and speech understanding. At the same time the project seeks to move beyond the current engineering orientation to explore the evolution of interactive behaviour and the role of emotion and facial gestures in communication. The ability of the Thinking Head to display/understand emotion/gestures is being explored in association with performance artists and technologists at our partner institutions, and is leading to increased understanding of how to produce realistic animation models for the game and movie industries. In multiyear interactive museum display, a large projection screen was used to display word associations while "colouring" the ambience to match the emotions being expressed.

Future directions for the Talking Head will incorporate and extend the Flinders University Lip Reading and Audio-Visual Speech Recognition technology developed by Prof. David Powers and Dr Trent Lewis, which is integrated with Auditory Speech Recognition and Speech Synthesis technology from Carnegie Mellon University in partnership with A/Prof. Alan Black and Dr Tanja Schultz at CMU. We are also starting to use EEG to monitor subjects interacting with the Thinking Head in order to understand their learning and engagement with the technology, as well as to develop a Hybrid AudioVisual Brain Computer Interface technology that uses multimodal input to improve speech understanding.

KIT has an associated program in Evolutionary Robotics and Natural Language Learning, building on Prof. Powers' Robot Baby and Language Learning research as well as the research of Dr Martin Luerssen and Dr Richard Leibbrandt on Grammar Evolution and Induction of Part of Speech categories from child-directed speech (CHILDES). This will seek to evolve improved architectures and develop the adaptability required to deal with changing social, linguistic and environmental conditions. Another way of looking at this is that we are looking to develop a system that can pass the Total Turing Test or TTT. Turing felt that to pass his Imitation Game, the traditional "pen pal" Turing Test, it would be necessary for the computer to actually learn as a robot, and deal with the real world and social/cultural context - this includes behaving/acting in a way that is indistinguishable from humans, and thus also includes addressing Human Computer Interaction at the level of Gestures, Emotions and Expressions – see  the Role of Emotion (sidebar Feature on this page). In fact Harnad and Schweizer have each proposed higher levels of indistinguishability or TTTTs: such Total Total Turing Tests or Truly Total Turing Tests have even stronger conditions.

 

®Headline Projects/Products/Acronyms

CleverMe/Clevertar  –  A clever friend is always close by

  Open Day Buddy, Clevertar Connect – a companion for  elderly residents at Australia's Award Winning Retirement Village, On Statenborough.

What else? An Android Real Estate Agent, and many more to come...
You haven't seen the last of this Clever Avatar!

 

™Head 0+ – In house workhorse for the Thinking Head project 

  THEMAC - Flinders University Open Day MC

UWS/Powerhouse Articulated Head and Adopt-a-Robot

 

™Head X – Flinders' Freely Available Customizable Virtual Head

™PETA: Pedagogical Embodied Teaching Agent
(Languages and Literacy, Health and Motivational Interviewing
German with TUB, Berlin and Mandarin with BJUT, Beijing)

™VALIANT: Virtual Autonomous Literacy, Indigenous-health And Numeracy Tutor
(Reading, Arithmetic, Basic Health Principles - with NACCHO and CDU)

AVAST: Autonomous Virtual Agent for Social Tutoring
(with the assistance of Novitatech , DeafCanDo and AutismSA)

MANA: Memory, Appointment & Navigation Assistant
(with AlzheimersSA, Resthaven, MDPP, OFTA and Google Calendar)

MAGICian: Multiple Autonomous Ground Vehicles International Challenge
by ™ian – Innovative Autonomous Navigators with Individual Accent & Nationality
(funded by the US Air Force Research Laboratory, FA2386-10-4024)

The six basic emotions as displayed by a Head X character:

™WebHead – web-delivered relative of Head X and Clevertar

MANA and Motivational Interviewing are likely to be the first web apps.

AIML/ALICE bots can be ported trivially to Head 0+, Head X or WebHead.

  

™STANLIE: System To Analyze Natural Language In Environments

Magrathea, Microjaea and Hybrid World

 

Recent Publications

Downloads of Open Source Software and Demos

Books

Luerssen, M.H., 2009. Experimental investigations into graph grammar evolution : a novel approach to evolutionary design, Saarbrücken, Germany: VDM Verlag Dr. Müller.

Book chapters

Anderson, T.A., Chen, Z., Wen, Y., Milne, M.K., Atyabi, A., Treharne, K., Matsumoto, T., Jia, X., Luerssen, M.H., Lewis, T.W., et al., 2012. Thinking Head MulSeMedia: A Storytelling Environment for Embodied Language Learning. In Multiple Sensorial Media Advances and Applications: New Developments in MulSeMedia. Hershey, Pennsylvania: IGI Global, pp. 182-203.

Milne, M.K., Luerssen, M.H., Leibbrandt, R.E., Lewis, T.W., & Powers, D.M., 2011. Embodied Conversational Agents for Education in Autism. In A Comprehensive Book on Autism Spectrum Disorders. Rijeka, Croatia: InTech, pp. 387-412.

Milne, M.K., Luerssen, M.H., Lewis, T.W., Leibbrandt, R.E., & Powers, D.M., 2011. Designing and Evaluating Interactive Agents as Social Skills Tutors for Children with Autism Spectrum Disorder. In Conversational Agents and Natural Language Interaction: Techniques and Effective Practices. Hershey, USA: IGI Global, pp. 23-48.

Luerssen, M.H. & Powers, D.M., 2009. An empirical study of graph grammar evolution. In Evolutionary Computation. Vukovar, Croatia: In-Tech, pp. 445-472.

Refereed journal articles

Luerssen, M.H. & Powers, D.M., 2008. Evolving encapsulated programs as shared grammars. Genetic Programming and Evolvable Machines, 9(3), 203-228.

Lewis, T.W. & Powers, D.M., 2003. Audio-Visual Speech Recognition using Red Exclusion an Neural Networks. Journal of Research and Practice in Information Technology, 35(1), 41-64.

Refereed conference papers

Cottrell, J., Fitzgibbon, S.P., Lewis, T.W., & Powers, D.M., 2012. Investigating a Gaze-Tracking Brain Computer Interface Concept Using Steady State Visually Evoked Potentials. Spring World Congress on Engineering and Technology.

Ali, H.B., Powers, D.M., Leibbrandt, R.E., & Lewis, T.W., 2011. Comparison of Region Based and Weighted Principal Component Analysis and Locally Salient ICA in Terms of Facial Expression Recognition. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2011, 368, 81-89.

Burnham, D.K., Estival, D., Fazio, S., Viethen, J., Cox, F.M., Dale, R., Cassidy, S., Epps, J.R., Togneri, R., Wagner, M., et al., 2011. Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable Black Box. Interspeech 2011, Proceedings of the 12th Annual Conference of the International Speech Communication Association, 841-844.

Powers, D.M., Luerssen, M.H., Lewis, T.W., Leibbrandt, R.E., Milne, M.K., Pashalis, J., & Treharne, K., 2010. MANA for the Ageing. Proceedings of the 2010 Workshop on Companionable Dialogue Systems, ACL 2010, 7-12.

Newman, W., Franzel, D., Matsumoto, T., Leibbrandt, R.E., Lewis, T.W., Luerssen, M.H., & Powers, D.M., 2010. Hybrid World Object Tracking For A Virtual Teaching Agent. Proceedings of the International Joint Conference on Neural Networks 2010, 2244-2252.

Luerssen, M.H., Lewis, T.W., & Powers, D.M., 2010. Head X: Customizable Audiovisual Synthesis for a Multi-purpose Virtual Head. AI 2010: Advances in Artificial Intelligence, 486-495.

Milne, M.K., Luerssen, M.H., Lewis, T.W., Leibbrandt, R.E., & Powers, D.M., 2010. Development of a Virtual Agent Based Social Tutor for Children with Autism Spectrum Disorders. Proceedings of the International Joint Conference on Neural Networks 2010, 1555-1563.

Lewis, T.W. & Powers, D.M., 2008. Distinctive Feature Fusion for Recognition of Australian English Consonants. Interspeech 2008, 2671-2674.

Powers, D.M., Leibbrandt, R.E., Pfitzner, D.M., Luerssen, M.H., Lewis, T.W., Stevens, K., & Abrahamyan, A., 2008. Language Teaching in a Mixed Reality Games Environment. 1st International Conference on Pervasive Technologies Related to Assistive Environments, PETRA 2008, 282.

Leibbrandt, R.E., Luerssen, M.H., Matsumoto, T., Treharne, K., Lewis, T.W., Santi, M.L., & Powers, D.M., 2008. An immersive game-like teaching environment with simulated teacher and hybrid world. Animation, multimedia, IPTV and edutainment: proceedings of CGAT '08, 215-222.

Lewis, T.W. & Powers, D.M., 2005. Distinctive feature fusion for improved audio-visual phoneme recognition. Proceedings of the 8th International Symposium on Signal Processing and Its Applications, 1(1580196), 62-65.

Lewis, T.W. & Powers, D.M., 2004. Sensor Fusion Weighting Measures in Audio-Visual Speech Recognition. Proceedings of the Twenty-Seventh Australasian Computer Science Conference (ACSC2004), Dunedin, New Zealand. CRPIT, 26 (1).

 

Further Information

We would be please to supply further information about our activities.  In particular, opportunities exist for high achieving postgraduates to join the program.  Please contact Professor David Powers.

 

Available Products

Downloads

  • Clipsal Homespeak (I2Net Orion) – Control your home or C-bus installation by talking, initially developed by undergraduates, popular for people with disabilities.

  • Head X – Free Customizable Virtual Head developed under the ARC Thinking Systems SRI, "From Talking Heads to Thinking Heads". Free download for Windows systems (also tested on Mac virtual machines).

Downloads

Recent Successes

  • Thinking Head : Burnham, D. K., Dale, R., Stevens, C. J., Powers, D. M., Davis, C. W., Buchholz, J. M., Kuratate, T., Kim, J., Paine, G. C., Kitamura, C. M., Wagner, M., Moeller, S., Black, A. W., Schultz, T. and Bothe, H. H. (2006-2010). From Talking Heads to Thinking Heads: A Research Platform for Human Communication . ARC Thinking Systems: $3,400,000.
  • AusTalk : Burnham, D., Cox, F., Butcher, A., Fletcher, J., Wagner, M., Epps, J., Ingram, J., Arciuli, J., Togneri, R., Rose, P., Kemp, N., Cutler, A., Dale, R., Kuratate, T., Powers, D., Cassidy, S., Grayden, D., Loakes, D., Bennamoun, M., Lewis, T., Goecke, R., Best, C., Bird, S., Ambikairajah, E., Hajek, J., Ishihara, S., Kinoshita, Y., Tran, D., Chetty, G. and Onslow, M. (2010). The Big Australian Speech Corpus: An audio-visual speech corpus of Australian English. ARC LIEF: $650,000.

People

Feature

The Role of Emotion

 

What does it take to think, talk and act like an ordinary person? This remains one of the great challenges of Artificial Intelligence and Cognitive Science.  It is easier to produce a champion chess playing program or provide university level advice on any subject, than it is to duplicate the capabilities of a typical two-year old.

The Turing Test focuses on the brain as a computer that communicates in normal language, and in 1950 Alan Turing predicted that by the year 2000 a computer would fool 30% of people that talked to it for 5 minutes into thinking it was a person.  This was actually achieved by the winner of the Loebner Prize at the annual competition held at Flinders in 1998!

However, Turing thought that for real intelligence you needed sensors, you needed to be able to understand the world, and we built this condition into the requirements for Loebner Prize Gold Medal - a kind of show-and-tell aspect to the Turing Test. Harnad talks about the Total Turing Test, which requires not just sensors but robotic capabilities, the ability to interact with the world and learn about the world and society, and their laws, at the same time as you learn about language, and its laws (otherwise known as grammar).  This is the focus of KIT and its AI/LT Lab. But Harnad goes on to talk about the Total Total Turing Test – that is physical indistiguishability. Why on earth would we need this?

Language and Intelligence are caught up with every aspect of who we are and how we interact with the world, our fellow humans, and the rest of creation. It is caught up with our drives and our feelings, our hungers and our pains, and unless the computer/robot/android has the same physical structure as us, the best we can do is attempt to simulate all these things – silicon-based "lifeforms" may indeed be possible, Androids that are superficially human-like may indeed be created by us, and may indeed be intelligent, but they are unlikely to be mistaken for humans for very long. Actually we find it unsettling to talk to or even just watch a human-like being that somehow just doesn't gel as being human - the so-called Uncanny Valley effect.

A major part of our research effort thus goes on exploring the gestures, expressions and emotions that colour our conversations, and convey information beyond the mere words of a simple Talking Head.  Having a Teaching Head show appropriate expression versus neutral expression, can lead to the students learning much more, and achieving a grade point higher on average! Of course the flip side of putting emotional expression into our faces and speech is recognizing emotional expression in the speech and expressions we see and hear. This is also the capability which when it missing leads to a diagnosis of autism - for our AIs!

Rather than developing an Autistic Head, we are putting a lot of effort into understanding emotion and expression, and in fact designing a Teaching Heads that teach a standard and specific social skills curriculum to children that suffer from Autism Spectrum Disorders, Attention Deficit Hyperactivity Disorder, or Hearing Impairment. We are also designing a Teaching Head to help health sciences students learn the skills necessary for motivational interviewing - that is helping people understand their problems, decide they want to make changes, and determine a solution.