Cost of dialogs – part 2 (With Video !!!)

In December, I wrote an entry here about the cost of dialogues in games, promising that I was going to write a follow up once we figured out how exactly we were going to tackle this in Dragon Commander.

The problem we needed a solution for was that because Dragon Commander features tons of choices & consequences, it also features a veritable avalanche of dialogue that somehow needs to be presented to our players.

In our dream scenario, all of this dialogue is fully animated and voiced, but because we’re dealing with several hours of dialogue in multiple languages, the cost of doing this is quite high and it’d actually be insane to animate it all.

So after long discussions and deliberations, we decided to do the insane thing.

Was this really such a wise idea?

Here’s a video of something that transpired in the Larian offices a month ago. I suggest you have a look at it and then read on…

As you can see, we bought a facial capture system – I guess it’s obvious from the video that some in our team were quite excited about this momentous event ;)

Our plan of approach is to put those cameras you see in the video in the voice recording booth, and then use this captured data to put emotion in the faces of our 3D protagonists and antagonists.

This facial capture data will then be overlaid on top of motion captured body animations (we also have a motion capture system in the office), and the end result should be believable dialogues when talking to all of the characters in Dragon Commander.

At least that’s the plan.

The decision to do it this way came after checking plenty of other solutions, ranging from trying to set something up oursevles with Kinect devices (cheapest) to hiring simultanous body & facial capture studios (most expensive).

The latter had prices in the range of $1000 to $2000 per minute which would cost us between 0,5M US$ to 1M US$. I actually contemplated this for some time, but then decided against it. I figured that in the end we’d be best served if we could come up with a homebrewn solution, even if that causes a bit more pain and might in the short term not give us the highest quality solution.

My thinking was that for whatever game we do, we’ll always need to hire voice actors, so in all cases that’s a cost we’ll have to carry. Now, while they are acting, they are actually generating the data we need – we just need the ability to extract that data and project it on 3D characters.

The equipment we bought allows us to record the facial marker data at 100 frames per second from seven directions. Should we discover for some reason that that’s not enough, we can always add extra cameras, but from the looks of it, the raw data looks to be good enough to work with.

So if we organize ourselves such that for every future recording session, we record the facial expressions of the actors in addition to their voice, we should have sufficient base material to work from. Obviously, this does cause extra complications in the recording booth as we’re increasing actor/studio time and thus recording cost, but from the tests we’ve done, it looks like it should be manageable.

The real problem is mapping this data to our 3D characters, preferably in an automated fashion, and if that’s not possible, at least in a semi-automated manner. The software solutions we have for the moment give us reasonable results, but it’s clear that the current off the shelf solutions don’t allow us to use all of the rich detail that’s present within raw data. Or in other words – when you look at the markers that were tracked, and then at how this translates into animations, you see there’s a significant loss of data.

Additionally, you also see that manual labor is required to fix wrong interpretations of the data. This is probably going to be the most expensive part of the entire operation, but it’s also the area that presents us with the largest opportunity to reduce costs, if we can be clever about it.

To be honest, I have no clue exactly how much we’ll be spending in the end on this, but I do know a couple of things already:

a)    Whatever result we get, it’s going to be better than what we did in Divinity II: The Dragon Knight Saga

b)    If we invest sufficient time into working on the mapping process, it can only get better which will benifit not only Dragon Commander, but also our future games

Before concluding, there’s one question I need to answer still: Why did we decide to do the insane thing after all ?

Some people commented that we shouldn’t be doing this, and instead focus on the gameplay. From the gut I would say they are right, but gameplay is something you create from many layers, and visualisation is definitely an important aspect. Specifically, in this particular game, I really think it’s vital that we provide you with believable characters, and we should invest sufficient resources to try achieving that.

This game is about you living the life of a Dragon Commander who’s forging an empire. That means that in addition to giving you the the experience of dealing with your troops and deciding on what strategy/ tactics you use, we also want you to deal with politics, the media and whatever social life you have left.

To do that convincingly, we need characters that look lifelike enough. Hence, facial expressions, motion capture, full voice recordings, truckloads of cash ;)

Whether or not it’ll be worth it, I’ll only know when the results are in, but I have good hopes.

I’ll keep you posted on how it goes. If all goes well, the E3 footage should show you how it turned out, because in addition to showing off project E there, we’re also planning on showing the multiplayer of Dragon Commander, together with more details on everything you can do in the game.

Have a great weekend !

  • Arne

    Wow things definitely changed since the 90ies. I wish you good luck with your new equipment. And happy Easter holidays!

  • Illusive Man

    Interesting.
    With the “Do it ourselves” approach, you’ll have to deal with the process Try / Fail / Retry / Fail / Retry / Almost succeed / Retry.
    The good part is that when you’ve done it once, next time it will be better, and better the time after.
    The bad part IMO is that it might push Dragon Commander’s release until you’re “satisfied” with the result of your MoCap / Voices recording.

    Anyway, can’t wait for some videos of DC and project E.

    • http://www.lar.net/ Swen Vincke

      That a very correct observation – I realize that on the quality front we might have to compromise, but I was indeed thinking that at some point we should learn how to do this anyway, so might as well start now. The decision became a lot easier when after I think two weeks of trials, I saw something that was most definitely better than what we had in DKS and actually showed emotion.

  • http://twitter.com/optitrack OptiTrack

    Great post and hilarious vid, guys.  Some tips, if you haven’t explored them already:

    We have a free plugin for streaming opticals into MotionBuilder, if that’s part of your pipeline.  Alongside the plugin, we have a “goody pack” that you can download and use as a basic guideline for driving bone-based face rigs via optical mocap data.  It contains sample data & scripts for use with the MoBu plugin.

    Both the plugin and the goody pack can be downloaded from our Expression download page (about halfway down):

    http://www.naturalpoint.com/optitrack/downloads/expression.html

    • http://www.lar.net/ Swen Vincke

      Happy to hear you guys have a sense of humor ;)  

      Yes, we are using MotionBuilder but found that the way it’s set up doesn’t map all of the detail that we can see is present in the optical data. That’s why we decided to add a few extra software layers to see if’ll help – eyelids are one problem area, as are subtleties around the mouth. 

      From what I could see, the optical data itself seems to be quite solid.     

  • Ryandann

    Didn’t get to watch the vid, due to my great iPhone service not wanting to load it, so this may have been addressed there:

    Will going this route mean the cutting of dialogue/choices and outcomes to those choices? This was the give(face capture/full voice) so what was the give?

    • Ryandann

      *what was the take.

    • http://www.lar.net/ Swen Vincke

      I hope not. I think we would’ve had a hard time not fully voicing the entire game, and this entire set-up is us trying to use the data that is generated during the voice recording and map it automatically or semi-automatically to the animations of our characters.

  • qpqpqp

    congrats man! i am a stalwart walls of text stick in the mud, but I can understand your reasoning perfectly well. Thus far, it seems impossible to get a fully animated game with the dialogue density of the classics, but you can try! And your trying will likely result in more sales, and better skills and resources for the future. Just like I’ve wasted a lot of time learning 3d modelling, given that I can draw a lot better, faster, but it’s a skill I can take advantage of later…

  • Arne

    Did anyone of the veterans at Larian ever imagine performing motion capturing? Who knows what technology will be used in 10 years? By the looks of it there will allways be enthusiasts to discover it, for many many years to come. Succes!

  • Farflame

    Im little surprised that you announced full voiceover. In the first post you inclined more towards “use the money for game features” and most players supported that and hinted that you should use partial voiceover (with focus on quality, not quantity). Does it mean that:
    - you have enough money now, so you dont have to care so much about the cost?
    - you will shorten the text a little?
    - you changed priorities in production?
    - there is no room to add more features to game, so the rest of budget can go to voiceover?

    • http://www.lar.net/ Swen Vincke

      The voice budget isn’t that much of a problem – well, obviously we spend quite some money on it, but in the scope of things, it’s not the biggest problem. It can cost us to the equivalent of 1 man year per language, but remember that when we’re dealing with the development of games like DC, we’re used to thinking in many man years.

      But accompanying the voice recording with animated characters, that can be an entirely different ball park, because there your cost can easily run up to over 10 man years, and that obviously has a bigger impact.

      What we’re trying to do is still get the animated actors, but at a fraction of the impact that we might have if we’d go for the brute force approach.

  • Florian Enders

    Now all you need it this engine for the rest of the body animations: http://www.naturalmotion.com/euphoria