EB-3: Ego - Transcript - Part 1

Transcript: Ego - Part 1

Transcript: Ego - Part 1


[00:02.87]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right, and go off cam. All right, so here we are. Welcome to Emergent Behavior, podcast number three. And I am here today with Peggy and Vish. They are from Ego Avatars. And I'm going to, you know, let's take it away. Let's just start like, Vish, what do you guys do?

[00:27.326]πŸ—‘οΈ vish: Yeah, so we started the company as a startup that allowed anyone to become 3D generated avatars that have perfect face tracking quality, which is currently what we're using right now. It's an iOS app called Ego. You can use a character creator and scan your face to create a 3D avatar that you can customize and then you can live stream with. We started last year, but we've since pivoted after joining YC. So we're a YC company. We're in the current batch in W24.
Peggy and I actually originally met at Facebook. We've had sort of long-term goal to create pseudonymous 3D social spaces and to sort of build an infinite game, right? If you're in a 3D pseudonymous social space or things you'd want to do with your avatar and you'd want to do with others, whether they're real people or AI, that's always been sort of our North Star vision. Ever since joining YC, we've decided to go a little bit more ambitious beyond just doing avatars. We're actually building a game and a game engine that's gonna allow anyone to create and generate their own.
3D worlds along with avatars and fill them with AI-generated NPCs. So that's a long answer, but yeah, that's what we're working on.

[01:32.338]πŸ‘©β€πŸŽ€ Ate-A-Pi: Um, Peggy, what does this pseudonymous social space mean? What is that? It seems like kind of a long, long like, you know, set of words. Like, what does that really mean?

[01:48.062]🐱 Peggy: Yeah, so I think historically, most of the pseudonymous social spaces have existed online in MMOs or Club Penguin, MapleStory, even WoW, World of Warcraft. And that's kind of historically where we saw pseudonymous social spaces. Obviously, I think right now, even currently, you have things
like a little bit more niche, like second life, VR chat, et cetera, where people like kind of show up and as avatars, whether that's in VR or, you know, on the computer and the PC, and they talk with other people and they all kind of engage in this like massive roleplay online and in like a 3D world. And when Vish and I met at Facebook, I mean, Facebook is obviously chasing after this big vision with Horizon Worlds and
um you know like Horizon and everything and we basically were like but like you know this space has always existed in the form of games so what we should really be building if we want to build a pseudonymous online space is a huge uh it's a it's a game where people can you know play as avatars and there is a core gameplay loop that people are drawn into the game and then people basically stay after the game for kind of that social space that social community.

[02:57.046]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[03:17.175]πŸ‘©β€πŸŽ€ Ate-A-Pi: Um, yeah, go ahead.

[03:17.682]πŸ—‘οΈ vish: Yes, just to add to that, like the fundamental motivation behind why we decided to build ego was this observation that the one thing that's common across all human societies is actually masking traditions. If you go all the way back to like, you know, a random hill tribe in New Guinea or, you know, Indian culture, Chinese cultures, every culture on earth at some point has had a tradition where people wear masks and, you know, take on the role of gods and spirits and tell stories.
And we saw that manifest in online spaces in 3D where people could kind of come out of their body and become something else. And that's fundamentally something people want, especially kids. We just didn't see a product that was doing it in a seamless way. So we decided to do it. We kind of saw the future coming with generative AI, especially Gen.AI in 3D, and we thought, let's build in this space.

[04:05.89]πŸ‘©β€πŸŽ€ Ate-A-Pi: Just to take a step back, I'm going to describe basically my avatar and maybe you guys can just and maybe also the build process behind this. So I am using a Unreal MetaHuman. I'm running an Unreal Engine on a RTX 4070, I think, RTX 4070 on a Windows laptop.
And I have a iPhone focused on my face, which is capturing ARKit data. And that's been piped to the Unreal Engine. And from there, that's been piped to a virtual camera, which has a zoom. And we're in a zoom-like studio environment. Unreal had a free broadcast studio map. So I'm in that map on the broadcast studio.
The avatar itself was designed by a game company called Infinite Production from the Czech Republic, infinite.cz. And it was originally, I guess, a metahuman scaffold. And I gave certain hints on what I wanted the facial features to look like. And they took some time. And this is what I got back. I have blue hair. The hair moves.
I have glasses, and the facial features move, perhaps not as expressive as they could be, but it's expressive. And perhaps you guys can describe exactly how your avatars were designed so that we have a sense of how that works for you.

[05:45.846]🐱 Peggy: Yeah, for sure. I can go ahead and start. So the design is actually very, very similar to what the workflow that you just described, Ait. I think the couple of improvements that we made, so I used to work on face tracking at Meta Reality Labs. My team actually shipped, what is the equivalent of face tracking for Meta that's very similar to Apple's ARKit Memoji.

[06:02.506]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[06:12.406]🐱 Peggy: but for meta for both AR and VR avatars. Part of my team worked on the Quest Pro and the other part of my team worked on the meta video avatar product. And so basically by working on there, we found a couple of tricks to make the avatar more expressive.
One is just using face tracking without a depth sensor. So you can use a face tracking with any camera. And that was pretty cool. Some other things we met was actually we worked with scientists to basically figure out how different people's facial muscles move and kind of translate that on both the avatar side and on the tracking side. So on the avatar side, we actually

[06:46.898]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh.

[07:08.59]🐱 Peggy: Um, and artists like hand sculpt our avatars into like different, um, blend shapes. Um, like it's, are just, you know, how to mesh the forms while you're, you're talking or moving different facial features. Um, and we, we just had the artists kind of hand sculpted it like anatomically, like, um, like a human would. And some other things we've done is also improved basically the face tracking. So it may get more smooth, less jittery. Um, it made it faster, made it available on any camera.

[07:14.367]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[07:37.794]🐱 Peggy: So those were kind of some of the things we did.

[07:40.33]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right on. So you guys on your side right now, so I'm just going to describe Peggy as an anime avatar with kind of like green and blue hair and expressive features, and which looks like a warrior from, you know, he's got like battle gear. I guess he looks a little bit like...

[07:52.558]🐱 Peggy: I'm gonna go to bed.

[07:58.58]🐱 Peggy: Mm-hmm.

[08:09.782]πŸ—‘οΈ vish: Geralt. Yeah, Geralt of Rivia, my favorite game.

[08:10.334]πŸ‘©β€πŸŽ€ Ate-A-Pi: Gerald, that's right. That's right. Yes. Yeah. That's right, Gerald Arivia. So the Witcher. And can you use an iPhone or an Android phone focused on your face? And it's running the Ego app. Is that correct? It's running the Ego app. And that Ego app is connecting to a laptop.

[08:32.823]πŸ—‘οΈ vish: That's correct.

[08:39.89]πŸ‘©β€πŸŽ€ Ate-A-Pi: or a computer. Yes. Okay. Taxi.

[08:40.514]πŸ—‘οΈ vish: It's a desktop PC, but it could be any PC device. We just connect it with a QR code, and then I just link it up to OBS to get the background, which is just ego.

[08:48.306]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right. And then from there, you're piping into this Zoom-like environment. Is that correct?

[08:54.666]πŸ—‘οΈ vish: Yeah, we just create a virtual camera on OBS and then just turn on the virtual camera.

[08:58.526]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right, right. So you don't require the ARKit anymore because you're just using vision. Is that an accurate description?

[09:09.646]🐱 Peggy: So we still require some sort of tracking ability. ARKit is the best on iOS, but obviously ARKit does not exist on some of the other platforms, especially on Windows natively. So we use a different tracking software for that.

[09:12.329]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.
I see. I see.

[09:19.838]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[09:26.853]πŸ‘©β€πŸŽ€ Ate-A-Pi: Are you able to just take in directly from the laptop camera instead of using a phone to laptop?

[09:33.694]🐱 Peggy: Yep. We shipped it out. It's available on Steam. If you go to Steam and you search up Ego, you can try it. The face tracking quality is actually not as good because it doesn't use ARKit. ARKit is actually one of the better, probably the best face tracking software out there. But we did actually manage to ship a version without requiring ARKit on, you know, laptop or PC directly.

[09:34.539]πŸ—‘οΈ vish: Yeah.

[09:36.936]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh.

[09:44.374]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[09:49.532]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[09:57.318]πŸ—‘οΈ vish: Yeah, the real problem is we lose depth information, which makes face tracking just a lot harder. It's not impossible to get to ARKit level quality from any input, any sort of camera input, but it was an engineering challenge that we didn't really want to spend time on for the PC because the discovery we made last year when we shipped the product is that a lot of people who stream as avatars don't actually really care too much about face tracking quality. They index more heavily on customization. So that was just like.

[10:06.919]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[10:20.51]πŸ‘©β€πŸŽ€ Ate-A-Pi: or they index more heavily on customization, I see.

[10:23.222]πŸ—‘οΈ vish: That's right, yeah, that was a discovery that we made. And that's something that doesn't scale with artists. Like you need to keep hiring more and more artists to satisfy every demand people might have for how they want their avatars to look. This could change in the future with 3DGen, but the technology for 3DGen is still relatively nascent. So we're not quite there yet where we can just prompt a beautiful beard and a man with white hair into existence and have it look exactly like this.

[10:47.174]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right on. So right now, I can download the Ego app on Steam and iOS.

[10:57.282]πŸ—‘οΈ vish: The Ego app is not available on iOS. As we mentioned, we actually pivoted away, so we decided not to release the iOS app. We released a Steam app anyway, just to sort of put it out there. It's free for anyone to use, but we're not actively supporting it. We are actively developing our game engine, instead of just continuing to do the Avatar stuff. But eventually, once we have the game ready, we do expect people to use the Ego app to then interface with their Avatar within the game, so they can communicate with other players and also with other AI NPCs.

[11:22.506]πŸ‘©β€πŸŽ€ Ate-A-Pi: So let's get to the game engine. So you have, I'm guessing it's an MMORPG kind of world. Is that correct?

[11:32.538]πŸ—‘οΈ vish: That's the eventual vision, but what we're shipping right now is slightly different. So we decided to go down this route because we believed three key things to be true. And the first one is that the problem with a lot of games, especially MMOs and open-world games, is that they're relatively empty, especially if you don't have other players there. The single-player experience in most of these games kind of suck. But the way to solve that problem is to have autonomous AI NPCs, or you can call them agents.
who have their own lives, who have their own dynamic, almost human-like behaviors, navigating the 3D world and even interacting and engaging, but not just one another, but also the player characters, human-driven players. So we believe, one, AI and PCs are gonna get a lot better and they're gonna improve the problem, or at least solve the problem of empty 3D worlds. And then number two, 3D gen is gonna get better and better to the point where you can accurately simulate not just the physics and collisions that you need in the 3D world, but also any asset that you want to bring into existence, right?
So if let's say that I want my armor to appear green instead of what it is currently, which is kind of a black color, I should be able to write in natural language and the engine should be able to interpret that and change the asset to the right color or even give me a helmet if I want a Viking helmet on top of my head, right? So we know that that's already getting better. It's not quite there yet. We're able to generate relatively high quality meshes using SDFs and Gaussian splatting, but it's not quite game ready yet. But we expect it to get there in sort of three to four months. So that's the second axiom, 3D generation.

[12:42.228]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right, right.

[12:56.482]πŸ—‘οΈ vish: from natural language and text reference images are gonna get better and cheaper in the near future. The third and final one is that fundamentally, when people are in 3D worlds or expressing themselves in a 3D space, they want to create experiences for one another or they want to experience something together. And usually that takes the form of a game. And we know that for sure, kids and teens like making games for each other. That's evidence enough in Roblox and pretty much the entire modding community online. People like making things for one another.
But it's currently hard to do. You need to understand scripting languages like TypeScript, Alua. It's still possible for someone to do it, but it's hard. If you could instead use natural language to describe what kind of experiences you wanted to script, that's gonna basically increase the number of people doing scripting. It's gonna increase the number of people customizing their games and gameplay experiences, which is gonna lead to a lot more activity, right? So that's the third thing we really need to be to. AI is gonna make it easy for people to generate pretty good scripting code with just natural language prompts.
So if we believe all those three things to be true, you can basically create a game that is also a game engine. The first thought anyone has when they play a game is, hey, wouldn't it be cool if XYZ happened? Or I'd like this mechanic to be slightly different. Or I'd like a motorcycle to suddenly appear in front of me. So that's kind of the vision that we were going after, which is build an AI native game engine. It's something that the current players aren't really doing or are able to do. And then use avatars and expression as a way of
finding and building social and community within these spaces.

[14:25.594]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right, right. Go ahead.

[14:26.146]🐱 Peggy: This is also like kind of an insight of, sorry, go ahead. Okay. An insight that we got while building our Vtubing software and like the first version of Ego, which is like the avatars that we have right now, is that people wanted more and more customization. And so because we were making avatars with more of a character creator in one style and like different styles with artists,

[14:30.122]πŸ‘©β€πŸŽ€ Ate-A-Pi: No, no, go ahead.

[14:44.598]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[14:53.558]🐱 Peggy: it's really hard to manually scale up the number of artists required to make more and more art assets, especially 3D arts assets. That becomes more and more possible with AI, however, because right now it's not quite there yet, which is why we didn't decide to focus on this, but you can now, like, basically, instead of scaling up the number of artists required to make these high quality models, you can, all you have to do is like prompt AI
generate these high quality characters. Again, it's not there yet. It's gonna be there in like about like one to two years. But we just realized that lowering kind of the barrier of entry to creation, especially creation in 3D is something that people really care about, especially when they wanted to use ego, they wanted to customize everything, they wanted to create everything. And this is when we kind of realized it's like, oh, if we lower the barrier of entry to anyone who,
who don't know how to code, who don't know how to do 3D art and give them the tools in the case of us, just like a simple prompt and have them to be able to prompt and edit and generate whatever they want. Like, wouldn't that be cool? Like, wouldn't more people want to do it? So yeah, that's kind of where this came from.

[16:08.918]πŸ‘©β€πŸŽ€ Ate-A-Pi: So, yeah, go ahead.

[16:09.902]πŸ—‘οΈ vish: And right now we're kind of just testing hypotheses. So sorry, go ahead. I know we've been talking for a little bit, so happy to. So we wanted to test the hypothesis one by one, right? So the first hypothesis we tested was, do people want to express themselves so anonymously as a 3D avatar? That turned out to be true last year. A lot of people did. The problem was that we couldn't scale customization demands to meet the needs of the various number of people who wanted to appear as, like you, like Ada Pai, or me, like Geralt or Rivia, but.

[16:14.014]πŸ‘©β€πŸŽ€ Ate-A-Pi: No, no, go ahead, go ahead.

[16:39.17]πŸ—‘οΈ vish: kind of looks like me. So we couldn't do that, right? So we thought, okay, well, what's the next thing we can test? The next thing we can test is can AI and PCs essentially make 3D worlds a lot more richer and more dynamic? And we read a paper, you might have heard of this paper last year, the Jennifer Jennerative Agents paper, kind of commonly known as like Smallville, the Smallville paper. So the paper is super cool for those who don't know. They basically demonstrated that you

[16:59.268]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[17:08.77]πŸ—‘οΈ vish: inhabit a village, have agents kind of hang around in a 2D space and have their own routines, their own lives, and then you'd basically see emergent, almost human-like behavior come out of there, you know, the way they used to, the way they navigate these spaces. Which is really interesting to us, and we're like, okay, well, what if we adopted that into 3D, and what if we had these agents sort of converse with one another? So we built that first, and I live-streamed these agents in a Twitch channel called twitch.tv

[17:35.607]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[17:37.67]πŸ—‘οΈ vish: And what we realized that audiences don't really care about the agents doing routines. They would care about it if they were in the space. But when they watch the agents talk to one another, they were fascinated with the conversations with the agents were having, which gave us this like deep inside. We're like, okay, people like to see these AI agents talk to one another. What if we brought characters from popular IPs and have them talk to one another? Like, you know, something that common, like, you know, Hermione and Harry Potter, right? Sure. People would love to see them. But what if we had Draco talk to Big Bird? What if we had Big Bird talk to Shrek? What would they even talk about?
So that's where our stream is right now where we're just like, you know, just bizarrely randomly putting in characters, 3D characters from popular media and IP and having them sort of talk to one another and engage in witty banter and conversations. And that's the tool that we're going to be launching in about two weeks. We're going to allow anyone to summon any character from any popular media or even characters of their own imagination, put in a personality prompt and see what kind of fully voiced conversations they have in 3D.

[18:33.206]πŸ‘©β€πŸŽ€ Ate-A-Pi: So let me, there's a lot to unpack there. So let me break it down. So let's start off with, okay, so the smallville, I think smallville was using GPT three or four. I think it was three. I think it was three or three or four. I can't remember.

[18:50.798]🐱 Peggy: was GPT-3 and predecessor almost to GPT-3.

[18:53.242]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah, GPT-3. And I think what they saw emerging was, for example, the agents would organize a birthday party and invite each other, and they would show up at the birthday party, and all without human intervention. Maybe there was a seed at the beginning, but then it just developed over time. So when you guys say there's also, I think, the endless Seinfeld at some point.

[19:15.55]🐱 Peggy: Yeah.

[19:22.198]πŸ‘©β€πŸŽ€ Ate-A-Pi: There was the Endless Seinfeld, which was this Seinfeld show which continued forever and ever. And it was really bad, really bad graphics and also really bad jokes. But somehow it lasted for quite a while. And I think they had like several hundred viewers in a minimum throughout the entire, I think they were alive for like a month or so. I don't know how long they were. And so when you guys say, okay, you have this kind of like a 3D world, you have this game engine.

[19:31.842]πŸ—‘οΈ vish: Yeah.

[19:52.438]πŸ‘©β€πŸŽ€ Ate-A-Pi: So a 3D world is basically just a set of coordinates where you have these locations of everything that is there, and then you have kind of like graphic generations based on those coordinates. You have some physics and etc. to calculate what happens there. When you say an AI native 3D game engine, what does that mean in terms of what is AI native?

[20:10.06]πŸ—‘οΈ vish: Mm-hmm.

[20:21.674]🐱 Peggy: I think the short answer is basically we let people create without understanding how to do 3D art or how to code with just prompting. Something that's super intuitive for kids, teens, non-technical people, anyone who doesn't have any technical skills.

[20:34.655]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[20:41.266]πŸ—‘οΈ vish: Let's actually bring it down to what people do in games first. People show up as some three representation of themselves or a character, and they navigate a 3D space and they can take certain actions in the 3D space. They can communicate with characters, either it's other human driven players or NPCs in the 3D space. And once they get some quest, they then go out and do something or the other, and they get some sort of player progression and reward. I think those are the principles of a game in a 3D space. Would you agree with that?

[21:10.835]πŸ‘©β€πŸŽ€ Ate-A-Pi: Um, yeah, I think, I think so. Yeah. Um, yeah.

[21:14.29]πŸ—‘οΈ vish: Yeah. And then there's some sort of emergent social behavior that comes out of it. Like, you know, once you kind of make friends, you kind of want to come back, not just for the gameplay loop, but just for the fact that there's other people whom you like kind of inhabiting, co-inhabiting these 3D spaces. So I think that's a game. Now, what an AI native version of that means is that you abstract away the difficulty of generating interactions, characters and the worlds themselves, where people, instead of needing to find art assets, textures,
do the whole UV mapping and all that nonsense that is very complicated for most people. Rigging, especially, oh gosh, rigging is really difficult. All of that is, all the difficult stuff is more or less done automatically, either procedurally or through AI Gen, right? And you can then have the players focus on bringing their imaginations to reality. So whatever you do with Mid Journey, which is like, there's a really technical process of like creating some kind of artwork, you abstract that away.

[21:47.702]πŸ‘©β€πŸŽ€ Ate-A-Pi: Thank you. Yeah.

[22:12.142]πŸ—‘οΈ vish: to a natural language prompt or a reference image. We believe the same can be done in games. And when you add scripting, which is the ability to almost imbue some sort of life force into assets or have it interact with the world and players in fun and interesting ways, you can then take the scripting language away, which is TypeScript Lua, and have a player describe it in natural language. That is an AI native game engine.

[22:36.31]πŸ‘©β€πŸŽ€ Ate-A-Pi: So it's like instead of like spending like, you know, an expert's time and hours creating a 3D blend shape, et cetera, importing it into the engine, and then writing like some complex TypeScript, Alua, et cetera, to kind of like this is what this, you know, bicycle can do. This is the speed. This is how fast it goes, et cetera, et cetera. Instead of doing that, you can instead kind of like wave a magic wand. And like...
I want a bicycle red in color and it has rocket engines and it goes really fast. And then like boom. Right? Yeah.

[23:13.438]πŸ—‘οΈ vish: Yeah. And that's kind of the holy grail, right? And there's a bunch of people attempting this, but we kind of have a fundamental insight that differs from how most people are approaching the space. The reality is what you described is a really fun novelty, but if you just do it and then you kind of go around, that doesn't mean anything, right? Like, oh, this is cool. I can build this bike and zoom around a 3D space, but so what? What can you do with it? Who else is watching you, right? That's where the game aspect becomes pretty important.

[23:15.935]🐱 Peggy: Yeah.

[23:42.982]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[23:43.862]πŸ—‘οΈ vish: Generally speaking, modding communities, and even like what currently exists with Roblox, happens because people are already in the game and their imagination goes, well, what if I could do this? What if I could do that instead? That's why Skyrim's modding community has been so, it's lasted for such a long time because you have a very strong base game, right? That is fun, that keeps the gameplay loops really tight. That's our fundamental insight. You need to build both for it to work. There's a lot of people building an AI game engine, but it doesn't mean anything unless there's like a game.

[24:00.198]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[24:11.81]πŸ—‘οΈ vish: that is underlying it that traps people in there. And all this insight basically comes from us watching a ton of anime, like we watch Sword Art online, we've read all the books about all these fun spaces that people actually want to see, and the commonality in all of that is that there's some core gameplay loop that ties players in, and then they get to use AI to customize their own game experience, or have some sort of fun immersion social behavior with either other players or AI NPCs. That's our insight.

[24:13.127]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[24:18.286]🐱 Peggy: I'm sorry.

[24:23.432]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[24:38.488]πŸ‘©β€πŸŽ€ Ate-A-Pi: So let's, yeah, go ahead.

[24:39.478]🐱 Peggy: We also realized that people really like to play with other people, whether that's their friends and this kind of multiplayer front, uh, world space or whether that's, you know, with, um, AI NPCs that can act as their friends or act as like a real character in some ways. And so we found that really compelling, which is why we kind of started on, you know, the AI NPCs, like talking, AI characters talking to each other in 3d space.

[25:04.451]πŸ—‘οΈ vish: Mm-hmm.

[25:06.871]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right. I have so many questions, so I'm just going to... So let's go to the AI NPC build. I guess character.ai has done quite a bit on the initial... Initially, last year, they were up there on this interactive personalities.

[25:10.545]πŸ—‘οΈ vish: That's good!

[25:11.171]🐱 Peggy: I'm going to go eat some.

[25:32.058]πŸ‘©β€πŸŽ€ Ate-A-Pi: What do you think that they do? Is it like LLM attached to a fine-tuned interaction, in and out, like input-output pair interaction? Is that what they have? And is that what you think works as an AI NPC? Or is it something else? How does that work?

[25:57.726]🐱 Peggy: It's something more than that. So the way that character AI does it is they basically take, they prompt engineer an LLM into like taking on the personality of a specific character. So whether that's, you know, famous characters in anime, in Genshin, in games, in real life, like Donald Trump, Joe Biden, et cetera.

[26:10.346]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[26:24.55]🐱 Peggy: Oh, you could basically prompt the LLM to act as if it was like the roleplay as the specific character, like AI, you know, Trump and AI Biden, and then have them, like you can talk to them using that. Um, however, in a game, which is kind of a 3d space and a 3d world, it's, it's really hard to, um, like you need more, you can't just like stand there and chat with the agent. I feel like that's like very, very limiting. You need

[26:51.719]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[26:53.582]🐱 Peggy: them to basically act as if they are their, like, they have their own, like they are their own person. So instead of like NPCs and games just like kind of standing around, imagine if they were actually, they had their own goals in life. So I think the Stanford Smallville paper showed that, you know, when Isabella was having her Valentine's Day, the next day, all of the agents actually showed up to her house. And the way to do that is basically you add memory.

[27:18.294]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[27:23.278]🐱 Peggy: to the agent and you also add a system of routines, like what they plan to do their daily lives and also their own kind of goals and motivations for what to do. And on top of this, you also give them the action space of performing actions in a 3D space, whether that's walking around in a 3D space, whether that's chopping wood, whether that's building a house, whether that's, you know, what else do you do? Fight monsters.

[27:33.779]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[27:49.031]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[27:50.326]🐱 Peggy: Like all of this is now possible by the agent because it has its own goals and motivations and its own plan. And now it has this action space to actually go ahead and do it. Um, open AI has this thing called function calling, which almost abstractly kind of does the same thing. But what we've done is basically give function calling a 3d action that they can actually take in a 3d space.

[28:04.906]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[28:10.454]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm. Right.

[28:13.302]πŸ—‘οΈ vish: So our fundamental insight in what character I built was totally cool, right? Super awesome. People spend like two to three hours a day talking to these characters. What our insight for that was, well, what else can you do, right? Like this is awesome. We understand that people enjoy talking to either characters of their own imagination or from IP, but what else can you do with these characters? Nothing much beyond text, right? And I think our hypothesis is that people wanna do more, right? When you're in a 3D space, the space of possible actions you can take, you know, kind of increases almost infinitely.
So we want to understand what that looks like, whether that's engaging to players, and whether that makes a 3D pseudonymous face more likely to exist.

[28:50.194]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right on, right on. And if you were to use a particular LLM right now as the engine for an NPC, what would you use? I mean, I think OpenAI GPT-4 is obviously too slow and expensive. Like, would you use a llama too? Would you use something else, Mistral? I mean, just your personal preference at this point in time, given that things change every, like...

[29:17.075]πŸ—‘οΈ vish: This is an area of active debate within our company. Yeah.

[29:20.265]πŸ‘©β€πŸŽ€ Ate-A-Pi: What are the pros and cons? What are the preferences? What do you guys do?

[29:25.682]🐱 Peggy: Yeah, so yeah, I mean, actually, so what's really interesting and almost counterintuitive is that OpenAI GPT-4 has actually one of the best roleplay systems out there. Like it actually acts as the character itself. Now, it's what's really annoying is exactly what you said. It's super slow. It's super annoying. It's not real time. It takes like 20 seconds for it to, you know, generate a single

[29:51.818]πŸ—‘οΈ vish: And it's a little too agreeable. Like a little too... We have to do a lot of prompt engineering to make sure that, you know, they're not just like a hug boxing each other.

[30:00.292]🐱 Peggy: Yeah, so we've been experimenting with some other uncensored LLM models. The one that seems most promising right now is Mistral, the eight times, I forgot, like 17 billion parameter model. The one, there's a lot of like uncensored versions of it that are fine tuned on

[30:15.062]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[30:23.77]🐱 Peggy: a variety of different data sets, including some that are trained on fanfiction forums and Reddit and really interesting parts of the internet.

[30:33.818]πŸ‘©β€πŸŽ€ Ate-A-Pi: And just to interrupt a moment, what does fine tuning in this sense involve? Is it like input-output, Laura pair, something like that?

[30:43.782]🐱 Peggy: I think they just do something really simple, which is, yeah, some of them use LORAs, but I think most of them is just like literally just on additional corpuses or additional text, which honestly seems to work quite well. I think the downside of using something like Mistral is that it's actually not very good at instruction following. So with ChatGBT, we're like, basically, you tell us to do this and it will do it, even if it's not, you know.

[30:44.732]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[30:51.67]πŸ‘©β€πŸŽ€ Ate-A-Pi: I see.

[30:53.259]πŸ—‘οΈ vish: Mm-hmm.

[31:03.606]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[31:12.198]🐱 Peggy: best version, like it's too nice or too censored or whatever, it still kind of most of the time does it. And if you prompt it to do it again, it'll actually do it. So what we've found is, we're still kind of experimenting is basically using a version of Mistral, a censored Mistral, but then using a wrapper like GBT 3.5, which still does like prompt following pretty well, instruction following pretty well. And like,

[31:17.236]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[31:40.878]🐱 Peggy: kind of like massaging it into the correct like JSON format or whatever format we need to actually like make it work with our game engine. So we basically have like two, actually like at this rate, we already have like multiple layers of LLMs. It's like chaining together prompts, which is why I think like frameworks like lane chain has become really popular is like basically have to create this whole system of prompt engineering and it's prompt on top of prompts, on top of prompts and then.

[31:44.728]πŸ—‘οΈ vish: Hmm

[31:55.795]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[32:08.794]🐱 Peggy: memory, which is goes to like a database, like we use super base as a backend, right? Like it's just like this whole super complicated system.

[32:10.258]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

Join the conversation

or to participate.