Emergent Behavior
Posts
EB-5: Hume - Transcript - Part 1

EB-5: Hume - Transcript - Part 1

Prakash Ate-A-Pi
April 14, 2024

Transcript: Hume - Part 1

[00:02.682]👩‍🎤 Ate-A-Pi: So Alan, what is Hume?

[00:07.342]🧑‍🔬 Alan Cowen: Hume is a research lab and startup. We build AI models that understand more than just language. They understand the voice, they understand facial expression, and they can learn from those signals. And our goal is to optimize AI models for human wellbeing.

[00:27.482]👩‍🎤 Ate-A-Pi: So when you say, okay, so when you've just released your first product, it is a, I would call it kind of a emotion detector for voices and faces to start off with and a text to speech after that. Would you say that's correct?

[00:53.262]🧑‍🔬 Alan Cowen: It unites both language understanding, expression measurements, and text to speech. So it's a full voice to voice API. And it links together text to speech with your voice and the language that you're saying in a way that I don't think there's any other AI model that does that. So there's AI models that have a kind of a separate text to speech model, a separate large language model, separate transcription.
Um, but don't link them together and don't do anything end to end to control the voice based on your voice and your language.

[01:32.026]👩‍🎤 Ate-A-Pi: So I played around with it with the demo. So you have a demo site. And when you go to the demo site, it's basically a voice chat that you can engage in. And I noted, so I'm not a very good tester because I'm very tech. So I always test these things on family members. So I handed it off to my 15 -year -old niece. And.
What I noted was, so she started chatting with the bot and I noted that she was actually able to engage for much longer than I expected. It was almost like a five minute back and forth conversation, which is unusual because normally when you talk to chat GPT, you kind of get in there, get what you want and get out, right? Because the level of engagement,
is not significant enough for, especially someone who is not technically inclined to actually want to engage that deeply. And so I watched as she had this conversation, she kind of asked what kind of board games did the bot like. The board came back with Settlers of Catan, which is a big deal because in our families, Catan is like the game that we have played probably like hundreds of hours.
And so she had a very deep discussion on Settlers of Catan at that point. And then at the end of the five minute chat, she kind of asked, OK, what age do you think I am? And the bot came back with mid -20s to early 30s. And she asked why, and the bot gave some explanations. And we noticed that the bot was, how do we say, trying to be nice to her. It was buttering her up. It was.
It was saying like, oh, you know, because you're so mature and so, um, you know, you're so elegant and like, you, you like such strategic games. And so we were, and she, and she kind of got it to, she was like, she was like, I don't think it's being like completely, completely above the board. I think it's being like, like kind of like trying to be nice to me, nicer than it should, but, but she appreciated it anyway. So, so it was a very interesting interaction and it, it raised a lot of like.

[03:52.538]👩‍🎤 Ate-A-Pi: So the first thing that we noted was that it is very fast. Like it's much faster to detect kind of like the latency felt much faster. Like in terms of like processing what she was saying and coming back with a response, it just felt much faster. Was that something that you guys optimized for?

[04:12.132]🧑‍🔬 Alan Cowen: Yeah. So part of that is that we have a model that detects when you're done speaking because that depends on the voice, depends on the vocal modulations, as much as it does language. We're able to do that much more accurately than other people. And then we can return a fast generated response. We have like a fast and slow system. So we can return a fast response and it carries on the conversation. It's a very conversationally intelligent small model.

[04:17.114]👩‍🎤 Ate-A-Pi: Mm -hmm.

[04:41.572]🧑‍🔬 Alan Cowen: And then they have a slow response which can integrate tools. So the slow response, we have our own larger model, but we also use Claude. We're using Claude pretty heavily in the demo. And that factors into the slow response that's longer. But the slow response is also integrated into the small model that understands your voice. So even if the response is coming from Claude, Claude has information that we send it about what your voice is doing. And it can...

[05:07.866]👩‍🎤 Ate-A-Pi: Mm -hmm. Mm -hmm.

[05:09.732]🧑‍🔬 Alan Cowen: Send back language and then we can take that language and speak it out in the right tone of voice as well. So.

[05:15.578]👩‍🎤 Ate-A-Pi: So yeah, go ahead.

[05:18.116]🧑‍🔬 Alan Cowen: I was just gonna say that setup enables us to have both extremely fast latency and have the best reasoning as well.

[05:26.138]👩‍🎤 Ate-A-Pi: So when you see a fast response, the faster model, it's basically a smaller model which receives your input. What is it supposed to do? Does it come back with, hey, this is a provisional response that I will start talking, and then while I'm waiting for an updated response from the back end, is that what it's doing?

[05:47.108]🧑‍🔬 Alan Cowen: That's pretty much it. And it links up to text to speech as well. Yeah. So it's what we call an empathic LLM. So it has an understanding of your voice natively along with language. And it's using that to detect when you're done speaking and respond itself. So it's sort of this end -to -end audio model.

[05:51.418]👩‍🎤 Ate-A-Pi: I see. I see.

[06:06.778]👩‍🎤 Ate-A-Pi: So when you then make a decision, when the model is making a decision on what to say, is it getting past, let's say, a kind of breakdown of what I've said, like hello, but hello with attachment, like syntactic sugar indicating anger, embarrassment? Because I notice that's how you break down the speech when the detection of the emotion.
you break it down into various, you know, the same word can have classifications of various different kinds of emotion when you do the detection. So in the back end, are you passing those kind of syntactic sugar, like kind of tagged like words into, let's say a back end or a Claude, and then asking Claude, when you respond, use similar kind of syntactic sugar or tags to indicate to me what kind of emotion.
the model should be expressing when vocalizing? Is that something that you do?

[07:07.908]🧑‍🔬 Alan Cowen: So we don't turn it into tags. We actually have a model that natively understands these expression measures that we generate. And that model is actually trained on tons and tons of data that we collect. So we have our own survey platform, and we have people watching videos and talking to each other and acting and they're labeling their own expressions. And we collect data for millions of people doing this.
And so we have a very huge database of that kind of data. And we have extremely accurate and nuanced speech prosody models. So, so models of the tune, rhythm and timbre of speech that come out of that. So we're actually measuring hundreds of dimensions of vocal modulation that are influencing the language that we're outputting.
When we use Claude, some of that data we're sending to Claude. We've sort of optimized what we send to Claude. You don't want to send it too much, but some of the voice data we're sending to Claude, and that's also decided based on what's gonna generate the best response from Claude. And then Claude sends back its response, and that goes back into the model that has the more nuanced pros of the understanding, if that makes sense.

[08:24.154]👩‍🎤 Ate-A-Pi: Right. And how does the model decide what emotions to invoke when it vocalizes? Like how does that happen? I mean, am I controlling a puppet? If I use the API, do I control a puppet to say, hey, be angry when you say this? Or is it supposed to? Because how would that work?

[08:48.356]🧑‍🔬 Alan Cowen: It's all, it's not rule -based. It's all learned behavior. So we have data from people having conversations and their ratings of those conversations. And so we know based on that data or our model knows because it's trained on that data, what kind of response is going to elicit a positive reaction? So that's how we train the model. Um, and it learns that if the users, if somebody said,

[08:52.41]👩‍🎤 Ate-A-Pi: I see.

[08:56.73]👩‍🎤 Ate-A-Pi: Mm -hmm. Mm -hmm.

[09:16.484]🧑‍🔬 Alan Cowen: and the listener expresses sympathy, that generates more positive reaction. Or if somebody's angry and the listener is more conciliatory, it generates a positive reaction. Also, those terms like anger, sad, we're really referring to...
speech modulations, they're kind of these objective things. You kind of have to give them a label, otherwise you can't talk about what's going on. So we surface those labels for visibility. But actually it's a little bit more nuanced than that. So sometimes it'll say it's angry, but actually that just means that it's using a pattern of speech that people might.
associate with anger and isolation, but actually it's just a pattern of speech that is used. And so that might be the right pattern of speech, even though it sounds like it's bad to be angry, that might be the right pattern of speech in a given situation to elicit a positive reaction from the user.

[10:07.546]👩‍🎤 Ate-A-Pi: So it's really about this sense of naturalness of like what naturally would feel to a human is the right emotional response to evoke a positive response in the other person.

[10:20.26]🧑‍🔬 Alan Cowen: naturalness is kind of stage one. Yeah, you want to train it to be natural. You want to train it to sort of have the understanding of the distribution of how somebody might respond to somebody with their voice first. And then you want to teach it to pick from that distribution. It's not just gonna take the mode. It's gonna say, all right, out of this distribution of possible responses,

[10:23.258]👩‍🎤 Ate-A-Pi: Stage one.

[10:34.618]👩‍🎤 Ate-A-Pi: Mm -hmm. Mm -hmm.

[10:43.94]🧑‍🔬 Alan Cowen: these are the ones that lead to the most positive reactions in the user. And conditioning that not just on the voice, but also the language. So it's generating language and voice, really speech is just like a combination of language and prosody. It's generating that right combination to generate a positive reaction from the user based on their.
speech and prosody throughout the whole conversation. So this is a, it's like a richer model. It's not just relying on language. So you have other language models that are multimodal in a sense, but actually their language at the interface and then you insert videos or you insert audios and that that's not what we're doing. We're actually multimodal in a deeper sense. Like the interface layer is multimodal in the sense that it understands both voice and language and it can also understand facial expression. We haven't integrated that into our API yet.

[11:33.914]👩‍🎤 Ate-A-Pi: So it's actually understanding the content as well. If it receives a happy birthday, it understands that that is supposed to be a happy event, and it should respond in that way.

[11:45.06]🧑‍🔬 Alan Cowen: Right.

[11:48.9]🧑‍🔬 Alan Cowen: Yeah, yeah, unless you're like happy birthday, then it's confused and maybe it needs to confirm that it's your birthday or something, you know, it understands the tone of voice.

[11:55.834]👩‍🎤 Ate-A-Pi: Right, right, right, right. Right, right. That is amazing. So, and just to take a step back, so you've been in, you're like one of the pioneers of this effective computing field in the last like, I think 10, 15 years. I think you've chased down, like what is the fundamental question that you've been chasing down for the last like decade, decade and a half? Like, because when I look at your track record through Yale,
and Berkeley and Facebook and Google, you've been chasing down this fundamental idea for quite a while now. What is that fundamental idea that you've been, the fundamental question that you've been trying to chase down?

[12:36.292]🧑‍🔬 Alan Cowen: I would say it's more than just a question. It's sort of a way of looking at emotion that is, I think, very different than it was when I started 15 years ago, which is kind of like Copernicus looking at the stars and documenting things. Just like, this moves that way, this moves that way, and not oversimplifying it. The way that people were looking at emotion pretty much since the 60s was in terms of six facial expressions.
Nobody, not many people studied the voice. There are some studies, but they also are very reductive. And that was pioneered by Paul Ekman, who is my advisor's advisor. He went around with pictures of six, actually himself in many cases, or six actors posing facial expressions. And he went to different cultures and he asked them what these images meant to them. And he found some consistency across cultures. And for some reason, those six expressions just stuck.
And like, it's all people studied for a long time. And it was sort of implicit that these are the six emotions, anger, fear, sadness, surprise, disgust, and happiness. And there's only one positive emotion, which is, you know, you can already say that there's.
There should be more than one because we have all pleasure, amusement, love, like you would think that people would have understood that there's more than one positive emotion. Um, but that was sort of the conceptualization of emotion that persisted for a long time. And then there's a more reductive version, which is there's actually just positive, negative and calm aroused. Those were the two basic frameworks people used up until relatively recently. And the first thing I did was I was like, all right, let's just observe human behavior and see if it fits into those.
classification schemes and it doesn't. If you try to arrange human behavior in different situations in terms of how people label expressions and so forth along those axes or into those six categories, it captures about 25 % of the variance. So that's reliable in terms of what situations, expressions are associated with or how people conceptualize them in different ways. And so there was something huge missing there and I just started mapping out the broader space.

[14:51.718]🧑‍🔬 Alan Cowen: of what expressions look like and just publishing, oh, this is like, these are 28 different dimensions of facial expression that people find to be distinct in the US. And later we did a study across many countries and found that at least 21 of those are very consistent across every country on earth. And what are the situations in which people express emotion? Well, let's look in online videos. And so at Google, we analyzed.
millions and millions of online videos, we analyzed over 20 million.

[15:21.85]👩‍🎤 Ate-A-Pi: So that was in like 2020, I think, like you published a study in Nature and you had, I think, like 100 ,000 plus, you know, one to three second YouTube clips and you had classifications done on them. You had people classifying, you know, how, what emotions were expressed. And so that was a landmark study, right? Like what were the real, like, you know, significant findings that you guys found?

[15:26.852]🧑‍🔬 Alan Cowen: Yeah.

[15:49.732]🧑‍🔬 Alan Cowen: So we looked at 16 different dimensions of facial expression that we were able to measure at Google using models that we had trained then. And all 16 of them were associated with.
similar contexts in different cultures. So over 60 % of the variance in the context expression correlations was preserved across cultures. And you could see that in specific contexts like, let's say fireworks. In videos with fireworks, there was a higher rate of people expressing awe in every culture. We divided the earth into 12 cultural regions just to compute those correlations, but you could also look at individual countries. There were 144 countries
in the study, almost every country has the all expression, which is like, or I can't really do it, but anyway, it's a hard one, but it's associated with fireworks and a few other contexts. And concentration is associated with martial arts and confusion is associated.
with similar things and like sports is associated with triumph for example, there's this triumph facial expression that looks a little bit like anger, but it's positive. You could confuse it for anger if you put it in isolation, but it's associated with both team sports and individual sports all around the world. So we see these consistent reliable associations and I looked at how many dimensions you need to explain that and you need 16 dimensions to explain. So each facial expression has a distinct meaning. There are.
all associated with similar contexts across cultures. I don't think anybody had really looked at this at all. Nobody had really looked at context expression correlations really at all across cultures, because they're so subtle, because there's already so much noise in natural videos as to what's going on and what the emotions could be that people are expressing. So you have to look at it at a large scale. So we're the first ones to do that. And in most cultures that looked at,

[17:49.7]🧑‍🔬 Alan Cowen: variation, most studies that looked to cultural variation and expression, they found massive amounts of cultural variation, supposedly, but most of the time it was just noise because they were small samples and the effect size is relatively small because, you know, it's a context, it's a word that has a different meaning across cultures, it's different people who have different associations with the same expression, every expression has multiple meanings and all these will reduce your correlation. And so it was argued,
By some people it was argued actually that there was no consistency at all in the meaning of facial expressions across cultures. And they pointed to countries in East Asia versus the US and they said nothing is the same. Well, I mean, outliers are really the ones that are small scale remote cultures, but that's sort of a separate issue. But even in these like...

[18:31.61]👩‍🎤 Ate-A-Pi: Outliers. Yeah.

[18:43.684]🧑‍🔬 Alan Cowen: Like in large scale country like China and the US, in our data we see extremely similar usage of facial expressions across cultures with variations in the intensity on average. So people are much less likely to smile intensely in East Asia than in online videos at least versus in the West. But the smile is still associated with the same context even though it's not as intense. But people...
psychologists who study this in very small samples with a few images usually that were culturally biased and looking at like specific words and not translating well all these different sources of noise like there are psychologists who thought that you know 80 % of the variance was cultural and 20 % was systematic.
and almost none of that was consistent to expressions. People have made all these kind of wild arguments based on very small datasets with a lot of noise. So they wouldn't consider this to be, it wouldn't be an outlier effect. They thought the norm was nothing is preserved. You could go to China and see people smile and it means something totally different than what you think, which is not true. Yeah.

[19:50.362]👩‍🎤 Ate-A-Pi: My gosh. And I mean, this kind of work kind of, it overturns something like half a century of anthropology, like all the cloud, Levi -Strauss, like all of that stuff just gets thrown. I mean, there must have been a lot of like pushback, right? There's a lot of academic, because all of these guys, they're like, you know, they're like, you know.

[20:04.548]🧑‍🔬 Alan Cowen: Hehehe

[20:17.146]👩‍🎤 Ate-A-Pi: in their 70s, 60s, they've been there for a long time and they all go to these conferences and they smoke weed and they're like, oh, they have these grand theories with small sample size, like well -preserved and they've been in the field for a long time. And then you just go in there and you have YouTube videos and you have evidence and you have data and it must have been very, very challenging. I mean...

[20:26.148]🧑‍🔬 Alan Cowen: Yes.

[20:44.804]🧑‍🔬 Alan Cowen: Yeah.
In that kind of situation where you've had many decades of people expounding theories without much data, people get really entrenched about their theories. And because they don't have sort of a data -led orientation, their approach to their career is, I'm going to defend this theory. That is my career. Each paper is going to be an argument in favor of my treasured theory. So those people do not review papers well when they contradict that theory. They'll always try to dig and
find some issue and say that this is actually, this tiny issue that I found actually changes the entire, all the results and therefore it needs to be rejected. Of course if you're a statistician or a data scientist, you'll see the broader picture and so it's actually easier sometimes to publish those studies in journals like Nature than in like the domain specific journals where you're gonna get those reviewers. But what's funny is like, in those situations, yes you have,
you had extremes in the science. There were some scientists who thought that emotions are very universal and some who thought they were very culturally variable. A lot of it depends on sort of the incentive structure in your subfield. So if you have more of an anthropological lens, that subfield tends to emphasize cultural differences in part because...
That's why it's interesting to go study other cultures. If you go study other cultures and they're exactly the same as your culture, it's not a very interesting finding, right? So they tend to have publication bias toward things that have cultural diversity. And so there was a lot of pushback from that direction. What was interesting is that the first person who really made a strong statement about emotional behavior was Charles Darwin.

[22:31.716]🧑‍🔬 Alan Cowen: And there was no baggage of any theory and he was very kind of inductive in his approach. And he just goes through this massive list of expressions that he sees in different cultures, in all the countries that he's visiting.
And also in primates and animals and he notes all of these similarities and there's a whole book about it expression of emotion and man and animals, which is very interesting And because there's no theory behind it. It's all inductive. He just finds all these massive parallels and he starts to try to Guess at why these things exist but not not with the kind of evolutionary psychology lens that we have now where we tell this story and it's more of an argument He was very honest that it was all speculation, but had there had to be similarities?
in the origins of these expressions because they were so similar across cultures and animals. And it actually even goes back before that. So the early anthropologists who nobody trusts anymore because they were colonial, but the early anthropologists who came to the Americas, who were basically were the conquistadors, wrote these codex.
basically that their version of like journal publications for this like handwritten codices where they would describe the behaviors of
native people on the continent. And there's all these descriptions of like funerals where there were crying ceremonies, and they draw them out and like everyone's crying. And they talk about laughter and they also allude to the universality of expression. And then also you get deeper into like pottery that shows different expressions. We actually did a study on pottery and, and sculpture in the ancient Americas and it shows similar expressions across cultures. So if you're just being very inductive, and just describing things, you notice these

[24:25.094]🧑‍🔬 Alan Cowen: similarities. And then in the mid 20th century, it became popular for anthropologists and psychologists to second guess kind of our full intuitions about psychology. And and that's when you see a lot of publications that say there's actually nothing across cultures that's preserved in our emotional behavior, which turns out to be very mistaken, unfortunately.

[24:50.138]👩‍🎤 Ate-A-Pi: Indeed. After the Nature publication in 2020, you founded HUM in 2021, correct? So what kind of drove that idea? Like, what did you need to do that was different that you decided to found HUM?

[25:08.9]🧑‍🔬 Alan Cowen: Yeah, I mean, I was at Google before and I was also doing research independently as a postdoc at Berkeley on and off. And there was sort of a disconnect between what I thought were the resources that were needed to solve the problem and what Google could easily provide as a large company. Not that they didn't have the resources, but...
it was unusual to run large psychology studies at Google. Like, just nobody, I mean, there's UX studies, but it's a completely different paradigm where it's under the terms of use. But collecting data based on consent to participate in an experiment and all of the affordances that that allows where you can ask people about their feelings and sort of look at their cultural background and so forth. Like, that was very, very difficult to do at Google. And I...

[26:03.418]👩‍🎤 Ate-A-Pi: You basically needed to run a science project. You needed to run a basic fundamental science project to collect basic data in order to define the field. And it just wasn't the right setup at Google in order to do that.

[26:14.18]🧑‍🔬 Alan Cowen: Exactly.

[26:18.404]🧑‍🔬 Alan Cowen: Yeah, there was also a long time horizon of when that would produce something that would be valuable. My intuition was always like first understand expressions, then mix them into generative models so that generative models understand more than just what you're saying. They understand how you're saying it. And then you'll improve the generative models by fine tuning on expression data, which is what we do now.
Obviously it took a few years to get there, but at Google it's hard to get buy -in for that kind of long time horizon based on research intuitions. I'm the only psychologist really, so there weren't that, I mean there were other psychologists at Google, but I was the only one actually doing psychology research. So it was hard to get buy -in for that, it was just unusual. And I also had the support of many people at Google in going out and starting my own thing.
So it just made sense to go get that data on my own.

[27:08.346]👩‍🎤 Ate-A-Pi: And let's talk a bit about, so one of the things I noted, which is very unusual in any AI company, is that you actually define the study and then went out and got the data. And really the fundamental was this amazing kind of like, you have a classified data on a voice, on even the vocal outbursts. I noted that you guys have a vocal outburst data set. You have a facial expression data set.

[27:21.54]🧑‍🔬 Alan Cowen: Yeah.

[27:38.202]👩‍🎤 Ate-A-Pi: You even updated the facial action coding system, ECMN's facial action coding system, which I think he and Friesen used to zap each other with electrodes in the face in the 70s. So you even updated that data set. So how difficult was it to define the study and go out and get the data? What were the challenges there?

[27:48.612]🧑‍🔬 Alan Cowen: Right. Right.

[28:02.564]🧑‍🔬 Alan Cowen: So yeah, so first, what is the problem that we needed to solve was that.
If you just take a bunch of images of people and you get raters to rate them, or if you take audio samples and you get raters to rate them, then people's ratings will depend strongly on the context and who's forming those facial expressions and are they outside or are they inside and all of these other factors that add all this noise and bias to the model. Some of that bias we were able to address, some of it we weren't. So we found that the bias put kind of a ceiling on what we were able to do internally at Google, where we could only capture like 40 % of what people were expressing.
and the rest of the dimensions that we would get were so dependent on other things. Like for example, we would see in the ratings that when people were wearing sunglasses, their expression was rated completely differently. It was like the default for a neutral expression with sunglasses is to rate it as somebody who's expressing pride, basically. And that's what the model picks up on. And that's one of the more overt ones. But then there are all these subtle things too that we had to kind of rule out.
And the best way to do it is to get different people to form expressions and then train the model to identify the expression independent of who is forming it. And that requires you to use methods from psychology where you can randomize participants relative to the task they're undergoing, what they might be feeling, who they're talking to, what they're talking about, all of these different things. We can start to parse these all.
with experimental randomization. So we use all these different kinds of randomizations in our studies. We collect data at the scale that you need to train machine learning models. I don't think that anybody else really does this. We collect psychology data that's experimentally controlled, but we do it at the scale that we need to do it to train machine learning models. And nobody in psychology does that. So nobody in machine learning does the experimental control, but nobody in psychology does the scale. So we have to combine those. It is very costly to do it, but it's worth it.

[30:01.894]🧑‍🔬 Alan Cowen: And that's how we train models that are able to identify extremely nuanced dimensions of facial expression, of vocal expression, speech prosody, vocal bursts, like laughs and sighs and projections, umbs and ah -hahs, facial action codes. They're able to do all of these things at a very subtle level without it getting washed out by noise and bias.

[30:24.57]👩‍🎤 Ate-A-Pi: Let's talk a little bit about semantic space theory. What is semantic space theory?

[30:31.332]🧑‍🔬 Alan Cowen: Semantic space theory, so there were two theories of emotion, basically, or three, basic emotion theory, constructivism, and appraisal theory, and emotion science. And I just thought, you know, they're all kind of, they're not really data -driven theories. Like the basic emotion theory.
paradigm was there were six expressions that Paul Ackman chose to go out and show people. So there's six emotions, like not based on any actual data.

[31:00.09]👩‍🎤 Ate-A-Pi: That's it. The richness of human experience boiled down to six classified emotions. Right. Right.

[31:08.708]🧑‍🔬 Alan Cowen: Yeah, exactly. And then you, in the opposites and the spectrum, there's valence and arousal, which was the paradigm for constructivism, that you could identify variations and how pleasant or unpleasant an expression was, and how calm or excited it was. And that explained everything. But that was based on principal component analysis that was done on like 1000 images in like the 40s and 50s. And that was like the most rigorous study of that. I mean, it was ahead of its
time but but because the sample size was so small you could not discover more dimensions than that and so there was a lot of work to do to figure out kind of how do you how do you come up with a way of define of like defining the taxonomy without assuming that it has certain properties of it has this many clusters or this many dimensions or this you know this is the way to conceptualize it and so the first thing was
for, you know, what I realized we had to do is come up with a new kind of framework, a new theory of emotion taxonomies. And that's what semantic space theory is. It says all taxonomies in motion have three properties. The first is their dimensionality, like how many different kinds of emotion there are. The second is their distribution, how many different clusters there are in that space. And the third is their conceptualization. And these are all separate things. So what do people call these expressions and how does that vary?
depending on where things lie in the space. If there's clusters, you imagine like each cluster has a different name, Anchor, Surprise, Sadness, Fear, Disgust, that would be, and Happiness, that would be the six basic theory. If there's, if everything is just continuous, and there's only two dimensions that they vary on, that could be valence and arousal, or it could be something else, right? But, but let's like separate these questions first.
And once you define emotion taxonomies that way, there's a way to actually derive them from the data, which is you estimate dimensionality distribution and conceptualization simultaneously. And you do it with the cost function being.

[33:15.428]🧑‍🔬 Alan Cowen: How closely can I recapitulate human behavior based on the locations of things along these dimensions, essentially? How accurately can I guess what the average person will say this expression means? Or what the distribution of people would be across individuals and cultures?
How accurately can I predict what context this will occur in and so forth? And so that's what semantic space theory does. And from that standpoint, we can say, okay, there's 24 different dimensions of vocal bursts. So laughs have many dimensions and sighs and screams, and ahs and ahas and interjections. There's 28 different dimensions of facial expression that you see across cultures, at least 21 that are preserved across all cultures and have pretty much exactly the same meaning.
And in speech prosody, there's at least 18 different dimensions that we see that that's a little trickier because they don't exist in every language. But that's it's not clear they don't exist in every language, but you can't have somebody in English name all those dimensions when they're hearing, for example, Mandarin, because there's a that could be for many reasons. There's a lot of tonality to Mandarin and you're just not used to it. So.
So there's probably more cultural universality there, but there's 18 dimensions that people can name that are preserved across many, many languages.

[34:44.346]👩‍🎤 Ate-A-Pi: So instead of basically naming the emotions, you just kind of admit that there are emotions and that people of different cultures can have commonality in those emotions and that it's a continuous space rather than discrete. And you can jointly, if we both decide this is an emotion, we can both kind of identify it across cultures. Even though we might have slightly different

[35:03.236]🧑‍🔬 Alan Cowen: Yeah.

[35:13.69]👩‍🎤 Ate-A-Pi: names or meanings for it internally, we can both kind of like, this is that kind of emotion. And then once you have that continuous space there, you kind of add the names as you want. You can kind of label the names as you wish. And different people can have different labels and different clustering, but the space is there. The space is there and the points in the space are there. And then you can interpolate, right, I guess.

[35:40.964]🧑‍🔬 Alan Cowen: Exactly. Yeah.
And the labels are emotions, they're descriptions, they're valence and arousal, anything you want to label these emotions, these states with that you think are responsible for these behaviors. But what's important is that we're not trying to read minds. We don't assume that all of these are distinct subjective experiences. I mean, we can measure reported experience, but you just can't measure subjective experience. So what we're trying to do is figure out if there are dimensions that explain what
is reliable and what we can observe.

[36:15.29]👩‍🎤 Ate-A-Pi: Right. So it doesn't go to qualia. It stops at the external expression of the emotion. How did you, yeah, how did you, did you have this sense of the, did you have this idea before you started the research work or was it something that you came to as a result of the data? Did you have a hunch?

[36:20.324]🧑‍🔬 Alan Cowen: Yeah.

[36:27.3]🧑‍🔬 Alan Cowen: It, it, yeah. Sort of.

[36:43.236]🧑‍🔬 Alan Cowen: It took me a while to think about what was needed in the fields and I think in 2016 I came up with this idea of semantic space theory and that really helped analyze a lot of the data that we had gathered. And so then we started publishing a lot.

[36:59.546]👩‍🎤 Ate-A-Pi: Amazing. So.
What, okay, I'm gonna ask an open -ended question. What is emotion?

[37:13.188]🧑‍🔬 Alan Cowen: Yeah, that's, that's the golden question, right? I think emotion is the underlying dimensions that explain self -reported experience and emotional behavior. I know that's kind of recursive, but bear with me. I mean, like expressions and, and the way that we, we report on emotions affecting our decisions. Emotions are sure of the underlying dimensions of, of, of that.
dimensional space that can explain those things. That's a somewhat circular definition. It relies on people to kind of report on what they think is an emotion. And in reality, there's sort of a continuous spectrum between emotion and cognition. So when you look at states like confusion, it's sort of a cognitive state, but it has a qualia. I think it does have some self -reported...
consistent, what people say is a consistent experience of being confused. And it doesn't necessarily line up with the belief that you don't know something, or the belief that you don't understand something. You can, you can feel confused, independent of belief. So another example where the belief and the sort of emotional disposition towards something really vary is like if you're on a glass bridge, right? If you're looking down,
you can be scared of falling while knowing that you're not actually in danger. So there's emotional disposition that you're feeling. Tamar Janler, the philosopher, calls it an aleph. I don't know if that's really a useful term because people don't know it, but it's distinct from the belief that you're actually safe as you're crossing this bridge. And so emotions are those things that are experiences that sort of reverberate through our psyche.
I would say what makes them different than sensations, like visual sensations, auditory sensations, temperature, so forth, is that emotions have a way of reverberating across propositional logic about the world.

[39:27.108]🧑‍🔬 Alan Cowen: So in other words, you could say, I don't like this person and they've given me a coffee, so I'm not gonna, and the taste of the coffee is bad. There's this logical relationship between the coffee and the emotion, and the person and the emotion spreads from one to the other. Or you could, maybe a more clear example is somebody dips a cockroach into your orange juice and takes it out.
Totally sterilized cockroach, not dangerous at all. But now the orange juice is disgusting, right? It's viscerally disgusting. Like it's not just that your perception of it doesn't really change, but the feeling.
that is caused by that perception is completely different now than it would have been before the cockroach was put in. And the only thing that changes that is this logical relationship. Like you could, if you knocked out memory, you wouldn't even know. Like there's something that links the cockroach and the orange juice through your memory and emotions reverberate across that line. And I think that's what makes them different from perceptions. And because we can't control them and they're not, they can kind of exist.
in parallel to our beliefs, they're different than beliefs and cognition. That's a long -winded answer.

Reply

or to participate.