Touch-based interfaces are still in sort of an embryonic state. We’re not quite sure exactly what works yet, so we try a lot of different approaches.

The aspect of the touch interface that still seems fraught with uncertainy is the role of gestures. Are they the input paradigm of the future? If so, how do we make them discoverable and intuitive? Few of us can afford commercials to explain our gestural interfaces.

Some developers have taken to creating in-app tutorials that greet their users on launch, but come on, that sucks. There’s nothing surprising or delightful about an instruction manual.

On one hand you have people dedicated to and invested in the idea that gestural interfaces are the way of the future. A way to create super-rational interactions with a minimum of on-screen controls. On the other are people who believe gestures are opaque, lacking in discoverability, and hard to teach to users (particularly if you aren’t Apple).

Until recently, I haven’t been able to take a position I felt comfortable with. One would seem too naive, too fetishistic, and the other too staunch and curmudgeonly. I want to see the envelope-pushed, I just don’t want to have to five-finger pinch the little bastard open.

I believe that gestures are opaque and lacking in discoverability, yet I love many of the gestures in iOS. Swipes, pinches, zooms, taps, drags. I want to keep them all.

Reconciling those two beliefs seemed impossible until I really thought about the gestures that just feel right and what makes them feel that way.

The First Class

The gestures I have no desire to live without all seem to share a few traits in common. More than that, all those traits stem from one overarching similiarity, direct manipulation of content. (It’s important when discussing gestures in this way to describe them in the context of the content they operate on.)

Swiping cells in a table view, pinch-zooming a photo, swiping photos in a scroll view; these are all gestures that we’ve come to know very well. It’s hard to remember a time when they didn’t feel right. Many people attribute that to Apple’s masterful demonstrations of how such gestures work, which on its face seems entirely reasonable (it’s likely what stopped me from thinking about this before). I think there’s more to it than that.

These gestures feel natural because they’re discoverable, and they’re discoverable because they have a gradation of feedback.

Imagine you’re in the iOS Photos app for the first time after taking a few photos. You tap on your camera roll and you’re presented with a grid of photos. Putting your finger down on a photo causes it to dim slightly, instant feedback to your tap.

You move your finger a little more and the view starts to move up and down tracking your finger. A series of succesive changes in response to your gesture. Specifically, change in position of the on-screen (and off-screen) content. Gradation of feedback.

You are now equipped to make a reasonable assumption about what a faster swipe might do, you try it, and it sends photos scrolling past your finger. This is a what I will call a First Class Gesture.

The Second Class

Anointing a group of gestures as “First Class” implies that there’s at least a second class. These are the gestures that shouldn’t be used a primary interface mechanism. Second Class Gestures seem disconnected from their resulting action or state change. They seem arbitrary and opaque in much the same way as keyboard shortcuts on the desktop. They usually require a reference or tutorial to learn, and even then after some amount of repetition.

Some examples are swipe-to-delete in Mail, the swipes left and right to reveal related tweets and conversations in Tweetbot, double-taps and triple-taps to perform ancilary actions on buttons or views.

Many of these gestures don’t fail across the board with regard to the criteria for first class gestures. Most of them (when implemented correctly) show adequate feedback and even a meaningful gradation of that feedback.

But, they do all lack one thing, meaningful relation to the content. Directly maniuplating content to move it or zoom it is obviously directly related to its position onscreen. Swiping a cell that represents a tweet to see its related ones is arbitrary. There’s nothing in that swipe that really means “show me related content”. (I don’t mean to pick on Tweetbot too much, it’s just a nice, familiar example. It’s actually a really nice app)

These gestures aren’t necessarily “bad”. They’re just not an adequate primary input method. They’re learnable shortcuts that should have obvious UI alternatives but they should by no means be an exclusive means of input.


So, all that to say, some gestures are more intuitive than others.

I don’t think the future of touch interfaces will be some hyper-rational utopia free from the shackles of on-screen controls, nor do I think it will be a drab, boring future full of UI chrome and buttons for every concievable option.

So, if you’re tinkering with a novel gestural interface, think about which class it falls into. If it falls into the first, congratulations, you’ve done something very difficult. Share that with the world.

If it falls into the second class, that’s fine too.

If you’re against opaque shortcuts, throw it out and try again. If not, just make sure to include an obvious alternative to that control. If it controls an important function of your app, you’ll need an alternative. Users may get frustrated far before they are ever rewarded with the discovery of your clever gesture.