Shark still looks fake

This is the third article in the Back to the Future theme week series.


A holographic shark in the future

The parser is a curious piece of technology. Since its first appearance about 40 years ago it has survived almost unchanged to this day, apart from superficial improvements.

Parsers come in two generations of sophistication. The first generation is the now practically extinct two-word parser that only understands commands in the form of VERB or VERB NOUN. The second generation is the modern parser that understands VERB NOUN PREPOSITION NOUN and VERB [NOUN PREPOSITION] TEXT where "text" is freeform input.

The third generation, which we don't yet have, is a parser that understands any reasonable input and is able to transform the player's intent into lower level tokens for the story engine. For example, a third level parser would know how to interpret the intent from OK LET'S TAKE A PEEK INSIDE THAT MAILBOX NOW and tokenize it into [LOOK IN] [MAILBOX] – all without the author having to anticipate and write grammar rules for that specific input or even those specific words.

The good news is that we probably have the technology to make a third level parser, at least to a reasonable extent. It would solve a big part of the age-old tutorial problem and remove many causes of frustrations associated with the parser. It would take a dedicated team some serious effort and university-level research but it's certainly doable.

The bad news is that there's another piece of the puzzle that needs to complement the parser or that kind of sophistication would completely go to waste. It's not enough for the story to understand input, it has to also respond to it appropriately.

Let's look at an example. In the past a relatively common complaint was that the parser didn't understand commands with adverbs, like OPEN DOOR CAREFULLY. In reality patching the parser to recognize adverbs is trivial in most modern development systems. A story that "understands" adverbs would be expected to respond to input something like this:

CLOSE DOOR

You close the door.

CLOSE DOOR QUIETLY

You close the door, careful not to make a sound.

CLOSE DOOR AGGRESSIVELY

You slam the door closed!

Where are those different responses coming from? It's not the parser that spanws them. Someone has to write the text, either as default responses to the CLOSE DOOR [ADVERB] action or as custom responses for interacting with this specific door.

Realistically you'd group adverbs together so that considering synonymous and closely related adverbs you'd have maybe 3-5 separate adverb groups. Even in the best case scenario you'd have to write up to five extra custom responses for every action to take into account all reasonable user commands.

After the author has done all that work, the end result is perhaps mildly interesting for the player to explore but has yet no real meaning within the story. It's not really worth the huge amount of extra work just to acknowledge the adverbs player has used by varying the game's responses. If you drop a glass violently instead of carefully, shouldn't it break? Shouldn't NPCs react differently if you talk to them amicably or aggressively? How should the parser respond if the command is nonsensical, like CLOSE DOOR LOVINGLY?

Any adverb-aware system that had any real effect to the gameplay would suffer from a combinatorial explosion of both all the extra responses that would need to be written and the results of actions that it would have to take into account. Making such a game wouldn't be beyond imagination but it would practically require dedicated effort from a fulltime team. Apart from trivially short works it wouldn't be feasible to a solo hobbyist.

Doc Brown wearing a "mind reading device" on his head

"Do you know what this means? It means that this damn thing doesn't work at all!"

To further illustrate why the parser and prose are inseparable, consider a story with a parser that has a human-level understanding of language but the story engine isn't sophisticated enough to deal with the input.

First a neutral command. This is what you'd generally see in any standard parser game.

TAKE ALMANAC

You reach for the almanac very carefully, holding your breath so that Mr. Strickland won't notice you...

Now imagine a more complex command:

DASH TOWARDS THE ALMANAC AND GRAB IT QUICKLY

The story's prepared response to a neutral TAKE ALMANAC command would be almost completely the opposite of what the player's intention is. If the story engine ignores everything in the command except the basic intent, the result is this:

DASH TOWARDS THE ALMANAC AND GRAB IT QUICKLY

You reach for the almanac very carefully, holding your breath so that Mr. Strickland wouldn't notice you...

The effect is jarring and because the real command is masked it looks like the parser has almost completely misunderstood the player's intent.

Another option would be to communicate the lower level command to which the parser reduces the original command.

DASH TOWARDS THE ALMANAC AND GRAB IT QUICKLY

[–> TAKE ALMANAC]
You reach for the almanac very carefully, holding your breath so that Mr. Strickland wouldn't notice you...

This would justify the disrepancy between the input and the response, but exposing the internal workings of the parser would further make the complex parser even more of a gimmick. Once the player notices the pattern there's no point to keep writing the "natural" phrases when it's obvious that they're just going to be reduced to the bare minimum. (As a teaching device it wouldn't be that bad though.)

The third option is to discard any command that conflicts with what the story is expecting, but that would not end well. It would lead to either horrible guess-the-verb-and-adverb puzzles or incredible frustration when the parser seemingly understands the basic intent but refuses to carry out the action.

The final option is to write separate responses for every type of intent, but this has the same problems as mentioned above, most notably the multiplied effort required to write the text and test all the combinations.

To sum it up: A system with a mismatch between the capabilities of the parser and the capabilities of the engine is not viable. The two are intrinsically linked. We're in a situation where improving the parser requires advancements in technology in several areas, both in understanding the input and generating the response. A smart parser is somewhat feasible, but only if the content generation problem is solved.

If all this sounds too pessimistic, fear not! Tomorrow we'll explore some untapped potential that could already be available with the tech we have now.


Did you find this article useful? By subscribing to the blog's mailing list you will receive an email whenever a new blog post is published. The address won't be used for any other purposes.