Starborn is a short interactive sci-fi story that uses a simple keyword interface instead of a standard IF parser. It was made as a speed-if for this year's New Year's Speed-if event on ifMUD. It was released to the public about a week ago and at first gained the usual amount of interest: the logs show 7 online plays on the release day and less on the next two days.
Last Wednesday however C.E.J. Pacian wrote a blog post about it which was picked up by the IndieGames.com blog. From there a couple of people tweeted about it and suddenly the play count jumped to about 600 plays per day. The main Parchment site reported similar amounts of traffic.
I'm running the online version of the story on a modified version of Parchment that saves the transcripts to the server. I now have 1557 transcripts consisting of 23 896 turns (and counting, but the traffic is slowing down considerably). To my knowledge this is more play data collected than for any other single IF game so far.
So here's a random bunch of analysis from those transcripts. There are no spoilers other than keywords used throughout the game, but if you haven't played yet the data would probably make more sense if you at least checked out a couple of turns just to see what the parser looks like. Online version and downloads are available.
Parser errors
Since the gameplay is just typing single keywords given by the story, the amount of understood input was substantially high (93%). Most of the errors were typos (53.3% of all invalid input).
Type | Amount | % of all commands | % of invalid input |
---|---|---|---|
Typo | 899 | 3.8% | 53.3% |
IF command | 491 | 2.1% | 29.1% |
Sign of confusion | 79 | 0.3% | 4.7% |
Poke | 73 | 0.3% | 4.3% |
Gibberish | 69 | 0.3% | 4.1% |
Non-keyword | 45 | 0.2% | 2.7% |
Swear word | 32 | 0.1% | 1.9% |
total | 1688 |
Standard IF commands were disabled so they count as unrecognized commands. The "sign of confusion" category contains commands like HUH, ?, WHERE, or verb commands that don't follow the standard IF convention (I GO OUT). A "poke" is the player testing the limits of the parser by trying to communicate with it (HELLO, BYE, OH NO) or trying to do things out of character, mostly involving trying to kill poor Kevin (DIE, KILL COUSIN), and even trying if the prompt recognizes Unix commands (LS, CD ..). "Gibberish" is a random string of letters or arbitrary words that don't appear anywhere in the story text (WALRUS, AWESOME). Non-keywords are words that appear in the story text but aren't keywords that you can choose (MOM, HOME).
command | # | % of errors | % of all commands |
---|---|---|---|
look | 109 | 6.5% | 0.46% |
foucalt | 50 | 3.0% | 0.21% |
exit | 42 | 2.5% | 0.18% |
something | 36 | 2.1% | 0.15% |
excercise room | 31 | 1.8% | 0.13% |
docking bay | 20 | 1.2% | 0.08% |
starborn | 18 | 1.1% | 0.08% |
go | 18 | 1.1% | 0.08% |
? | 18 | 1.1% | 0.08% |
dr.strepke | 15 | 0.9% | 0.06% |
turn on music | 14 | 0.8% | 0.06% |
shuttleport | 14 | 0.8% | 0.06% |
leave | 13 | 0.8% | 0.05% |
megellan | 13 | 0.8% | 0.05% |
open door | 13 | 0.8% | 0.05% |
walk | 12 | 0.7% | 0.05% |
m | 12 | 0.7% | 0.05% |
open | 11 | 0.7% | 0.05% |
cousins | 11 | 0.7% | 0.05% |
dock | 11 | 0.7% | 0.05% |
weighlessness | 11 | 0.7% | 0.05% |
turn music on | 10 | 0.6% | 0.04% |
back | 10 | 0.6% | 0.04% |
excercise equipment | 10 | 0.6% | 0.04% |
Sticking with it to the end
About 11% of players quit without typing a single command, the same amount after the first command and a quarter kept playing until the game's conclusion.
Loaded the story, didn’t play | 170 | 11% |
Quit after first turn | 177 | 11% |
Played more than one turn, not to completion | 805 | 52% |
Played to completion | 405 | 26% |
total | 1557 |
---|
What happens when the story tells the player that the input wasn't understood? In this case the game tells up front what kind of input it accepts. The parser can complain about the input only if the player disregards or misunderstands the instructions, makes a typo, or if the game has a bug that doesn't parse valid input correctly. There is a bug in Starborn where commanding MAGELLAN in one room gives a frustrating "Which do you mean, Magellan or the Magellan?" disambiguation question, but other than that the keyword system is immune to implementation issues where input consistent with the game's rules is disregarded because the author didn't implement a correct response.
Player made errors like typos can affect the player's experience just as much as the game's shortcomings. If you aren't able to make progress in the game, no matter the reason, you're likely to become frustrated and quit. I took a look at what happened when the player gave invalid input at the first turn. I expected that those players would be more likely to quit immediately than those who gave a valid command, but it turned out that there was no significant difference in that regard. They were still more likely to stop playing later on before reaching the end.
Parser error on turn 1 | Valid command on turn 1 | |||
---|---|---|---|---|
Quit immediately | 24 | 10.7% | 153 | 13.2% |
Kept playing, but not to completion | 158 | 70.5% | 647 | 55.6% |
Kept playing and finished the game | 42 | 18.8% | 363 | 31.2% |
total | 224 | 1163 |
The reason for this might be that people who are prone to making typos are more likely to become frustrated when the story keeps stalling. This theory is supported by the fact that people who didn't complete the story gave almost twice as much invalid commands than people who completed the story (proportionate to the total amount of input; transcripts with less than 10 moves not counted).
Game completed | 5.3% |
Game unfinished (at least 10 turns) | 10.1% |
The first turn
The story starts with short instructions on how to play.
The story you are about to play uses a keyword interface. Whenever you see a word in upper case you can type it on the command prompt to advance the story. If you prefer a different way of showing the keywords, command SETUP to change the settings.
Whenever you wish to see the list of currently available keywords, type L or just press enter.
Please press SPACE to continue.
After pressing space the screen is cleared and the game starts.
by Juhana Leinonen
It's quiet, as always. I've turned the MUSIC off so that it doesn't bother my COUSIN's sleep. A long day of travel and WEIGHTLESSNESS has exhausted him.
The door leads out to the HALLWAY.
At this point the player has four keywords to choose from. It was expected that as the first command the first keyword was picked the most often and the last keyword the second most often and the transcripts show that the assumption was correct.
# | % of commands | |
---|---|---|
MUSIC | 551 | 40% |
COUSIN | 112 | 8% |
WEIGHTLESSNESS | 109 | 8% |
HALLWAY | 184 | 13% |
L or enter | 140 | 10% |
IF command | 130 | 9% |
SETUP | 49 | 4% |
gibberish | 25 | 1.8% |
typo | 22 | 1.6% |
multiple keywords | 20 | 1.4% |
ABOUT, HELP, CREDITS | 12 | 0.9% |
other | 33 | 2.4% |
total | 1387 |
What wasn't as expected was that some people (20 out of the total, that's more than one percent) typed every available keyword at the same time ("MUSIC COUSIN WEIGHTLESSNESS HALLWAY"). Other typical mistakes were typos and assuming that the keyword in "...my COUSIN's sleep" is "COUSIN'S" including the apostrophe and the "s". Presumably out of habit the players would also try the usual IF commands (X ME, TURN MUSIC ON).
Category "other" includes, among others, choosing words from the text that weren't keywords, swearing, and trying to communicate with the parser (HI, OK, LOL).
Leading the player
I assumed that if choosing a keyword leads to a description that contains exactly one new keyword, the player is likely to pick that keyword instead of any of the previous yet unexplored ones. For example, if the player chooses the keyword MUSIC in the intro, the description will be:
The computer has a comprehensive library of movies, music, books and games. They have been just about the only available entertainment on the MAGELLAN.
The transcripts reveal that the most common command was indeed MAGELLAN, although I expected the figure to be much higher.
MAGELLAN | 344 | 69 % |
previous keyword | 140 | 28 % |
other | 18 | 4 % |
total | 502 |
---|
By default the story shows available keywords in the story text in upper case, so about one sixth of the players typed them in upper case as well.
All lower case | 17652 | 74% |
All upper case | 4150 | 17% |
Mixed case | 672 | 3% |
Other | 1422 | 6% |
total | 23896 |
---|
("Other" includes empty input, numbers, and other input where there are no characters.)
As the game never instructs the player that they can type only one of the words in a two-word keyword, it wasn't surprising that the players were generally inclined to type the entire name of the keyword.
Keyword | Both words written | One word written | % of one word input |
---|---|---|---|
SICK BAY | 669 | 15 | 2% |
DOCKING STATION | 1533 | 134 | 8% |
TICKET MACHINE | 517 | 47 | 8% |
Typos
How many ways can you write WEIGHTLESSNESS? At least 20 different ways:
weghtlessness | weifhtlessness | weighlessneess | weighlessness |
weightessness | weightkessness | weightlesness | weightless |
weightlessneess | weightlessnes | weightlessness | weightlesssness |
weightltssness | weigtlessness | weitghlessness | wheightlessness |
wieghtlessness | wiehgtlessness | wightlessness | wwightlessness |
Because you can never take into account every possible way a word could be typoed, it'd be great if the parser could just go through the available vocabulary and choose the closest match to the player's input. You could then have the parser understand the most outrageous misspellings the players could come up with. This game is an exception of sorts because all available keywords (and therefore all acceptable commands) are at all times visible to the player so there's no need to be wary of the parser choosing a word that the player hasn't encountered yet, but even in a traditional IF game you could have the parser go through only the words that have been previously shown to the player.
Further steps
Just by fixing the 30 most common unrecognized commands fixes one third of all unrecognized input. The cases where the player types more than one keyword at a time or tries to use a standard IF command can be handled with a better error message, something like "Please type keywords only, one at a time. Currently available keywords are ..."
Analyzing the transcripts is great for finding out which unimplemented commands need attention or what synonyms to add, but it's by no means a replacement for traditional testing: the transcripts don't tell you how good the story is, do the puzzles work, is the pacing correct, are there spelling errors in the text... The benefit of data collection from a live release is that the data represents the actual audience. The beta testers are probably on average more experienced players than people playing the game after its release.
It'd be great to have similar volume of data from a traditional parser IF game so that the results would be more generally applicable. Even a couple of transcripts from a lot of different games would probably show interesting results.
(Updates: follow-up after an update; the recording plugin for Parchment released)
I wonder if it would be possible to have a more accurate typo-fixer than Aaron's extension. For a keyword based approach like this the dictionary would be so small that calculating the Levenshtein distances shouldn't take too long at all. For a fuller IF work we'd need some way to filter the dictionary first. I don't know how to do that though...
I wonder if there's some way to make a good guess of the percentage of prior IF players from these numbers?
George: Considering that almost 10% of players tried a "normal" IF command right after being told that the story uses a keyword parser, I'd guess that the percentage is quite high. (Of course some people might have just tried a verb command even though they weren't familiar with IF before.)
Oh, and just to be precise, most of the stats I counted were for your site, as it still uses my proxy. I'm not sure how many hits there were on parchment.googlecode.com.
Wie IF gespielt wird...
Mit Starborn, einem der ersten Spiele 2011, hat die Autorin Juhana Leinonen etwas Statistik betrieben und dafür über 1.500 Transkripte ausgewertet, die über die Online-Version des Spiels mitgeschnitten wurden. Es ist allerdings kein typisches Spiel, da...
Maybe something like Inform's dismbig stuff, where it takes a decent guess at what you meant and tells you what correction it's made?
FOUCALT
(I assume you meant FOUCAULT)
That way, the player is made aware that they've made a mistake (and will hopefully learn to improve their spelling :) ), but it doesn't stall their gameplay.
Thank you so much for this, J. It's a very interesting analysis that not only illuminates some interesting perspectives on how people approach IF play, but the value in embracing Parchment.
Hi,
Excellent stuff, this is.
I'm working on a game prototype that would really profit from a Parchment hack that would enable saving transcripts of a game session to the server. Could you consider sharing the source code of your hack with me?
Antti: Sure. I'm moving to making a proposal of integrating this to the official Parchment branch, but in the meanwhile I can package what I have and send that. Might take a day or two while I put everything together...
[...] and the guy who wrote it. The Ebb and Flow of the Tide, and Treasures of a Slaver’s Kingdom. Also, Delightful [...]