Audio puzzles are often divisive among players, with many ending up getting stuck, and unable to pass them. However I believe this approachability issue lies in the design of existing implementations, not audio puzzles as a concept.
In this article I address 3 key concepts for the design of approachable audio puzzles in games.
For a game to be fun the player needs to know how to play it. Thus games have established common design patterns, such as:
The challenge in designing good audio puzzles is that there are no established tropes. They rarely feature in games, and perceiving differences in the timbre and pitch of sounds are not common skills in western cultures in general.
Thus the first principle of designing good audio puzzles is:
Do not assume any base skill from the player.
Many audio puzzles introduce tasks which are too difficult too quickly. For example the piano puzzle in Myst. If you are not familiar with it, the puzzle works like this:
For someone with zero background in music this requires far too many leaps of logic in one go.
Solving audio puzzles involves many separate skills:
Instead of expecting the player to learn all of this at once:
Introduce concepts gradually over multiple puzzles of growing complexity
A game that demonstrates that idea is 'the witness', it introduces its audio puzzles in a very simple form, and then makes them progressively more complex.
The following video is a good analysis of how the game works:
Using that principle, the Myst piano puzzle could be improved as follows:
These could also be shown to the player by an NPC in the game interacting with the puzzle in a cut scene.
It is worth considering how games communicate with their players, and the impact that audio puzzles have on it. Games may communicate:
The majority of games rely almost exclusively on visual communication, and many games that use audio puzzles are not an exception.
The use of audio puzzles in such a context is really problematic as the game has been training the player to look for visual cues.
Suddenly expecting them to listen instead is jarring as it breaks the players mental model of how the game works. Games try to work around it using visual cues like an in-world speaker, but it is still not ideal.
Thus I believe:
If they are used at all, audio puzzles must be a critical element of the entire gameplay experience, which build in complexity over the whole game.
By featuring numerous audio puzzles of increasing complexity, the points that i mentioned above happen naturally. Concepts can be introduced in a simple form, and progressively develop with the player's skill.
Very few people actually are 'tone deaf', and really just lack training. By analogy if you haven't learned to read Chinese it looks like a random mess of squiggles. Ear training works the same way.
By breaking things down and building complexity progressively, the player can learn the skills they need, without it being overwhelming:
And by using appropriate visual design, such as a musical instrument, a game can clearly communicate to the player the need to listen.
It would be really interesting to see more creative uses of audio puzzles as the possibility space remains largely unexplored. Good uses of them in games could go a long way towards eliminating much of the cultural misinformation about musical skill.