Earlier this month I heard a fascinating talk by Facebook’s Alan Cannistraro at iOSDevUK, in which he shared his thoughts around the next big developments in technology. One aspect of his talk has been bobbling around in my head ever since, and it’s this aspect that I’d like to explore further in this post.

Alan talked about the relevance of physical context when considering a given set of devices. Specifically, he noted that there will always be convergence towards a single device within a given physical context. Here’s the example he used to explain what this means in practice.

The past few years have seen a huge growth in what we know as the ‘quantified self’. One good example of this is the tracking of your fitness activity and exercise using simple, dedicated devices such as the Fitbit or Nike+ Fuelband. These devices are attached to your person, and use motion sensors to count steps and activity without requiring user input.

The defining characteristics of these devices are:

  1. they are always with you
  2. they are constantly tracking your movement without your involvement
  3. they can connect to your smartphone to upload that movement data for aggregation and analysis.

Now: in the iPhone 5S, Apple added a new chip – the M7 – which handles all tracking and monitoring of device motion for the iPhone. By handing this job to a dedicated chip, the task of motion tracking can now be achieved with minimal impact on battery life, and can be left to run all day.

The key thing about the iPhone is that if you own one, it will always be in your pocket. It’s the first thing you’ll make sure you pick up before leaving the house, and the one thing you’re least likely to forget. Indeed, as Alan noted, it’s almost more important than your house keys or wallet. (And it has the potential to replace both of those in the future, too.)

So now, with the M7, we have a device that is 1) already always with you; 2) able to constantly track your movement, and 3) already has its own network connection built in. By adding the M7 chip to the iPhone 5S, there’s no longer any need to carry a FitBit for motion tracking. And this is what Alan means by device convergence – within a certain physical context, one device will accomplish many tasks, without the need to carry around multiple gadgets.

This shouldn’t come as a surprise, however. We’ve already seen device convergence to great extent with the iPhone, which has replaced a whole host of individual devices.

The iPhone in your pocket is:

  • a point-and-shoot camera
  • a personal music player
  • a Dictaphone
  • a handheld games console
  • a video camera
  • a Sat Nav
  • a pocket diary
  • a contacts book
  • a notebook
  • a pocket calculator

…and, of course, a mobile phone. (And this is all before you installing any apps on it yourself.)

I’d like to dig into this a little deeper, however, and specifically to look at what Alan referred to as convergence within a specific ‘physical context’. It’s this aspect of his talk that has been bobbling around in my brain the most.

Essentially, there are several different physical contexts within which this kind of convergence can take place. The extent of the convergence, and the level to which it can occur, is defined and constrained by the physical characteristics of the context in question. More generally, single-device convergence doesn’t happen across these physical contexts – it only happens within the scope of each individual context.

To help explain and explore this further, I’d like to suggest several recognisable contexts in which devices are currently operating. Each of these contexts is a candidate for continuing convergence in the future.

The first physical context is what we would currently recognize as the iPhone. However: the iPhone is not a phone. It can make phone calls, sure, but it is in no way defined by its phone-ness. Making calls is not even the device’s primary use for the majority of users. The ‘phone’ aspect of an iPhone is a single app on a multi-purpose device, and is nothing more than a synchronous, single-channel, voice-only communications app, long since superseded by its FaceTime capabilites. The only thing that actually makes the iPhone ‘phone’-like, as opposed to its sister device the iPod touch, is its always-on cellular connection.

So what, then, is an iPhone? To answer this question, we need to think of it in terms of its definining physical context. Strictly speaking, it has two contexts: the pocket, and the hand. The iPhone, at essence, is an amalgam of all the things that are possible within a pocket-and-hand-sized form factor. The physical and ergonomic properties of this context come first; the device is shaped and formed to fit this context. By chance, this makes it a good form factor for making phone calls (with speaker and microphone at opposite ends). It also happens to make it good for taking pictures (with a large screen for reviewing photos, and a physical shutter button in a convenient place near to the holder’s index finger). These two capabilites are not the device’s defining characteristics, however – rather, they are useful by-products of the characteristics of its defining physical context.

So, with this initial Pocket-and-Hand context in mind, what other primary physical contexts might exist, and how might they be utilized?

Firstly, let’s look at three different physical contexts that achieve one common goal. This is what I call the ‘field of vision’ goal. I define this goal as a device’s ability to fill our field of vision as completely as possible. When working effectively, such a device keeps our visual focus and attention, while avoiding distraction from other visual inputs. It does this by ensuring that our field of vision is sufficiently filled by the contextual device so as to leave no room for distraction. (Note that this isn’t an effective goal for the Pocket-and-Hand – its physical context makes it slightly too small to achieve this goal effectively.)

The three primary classes of device that set out to achieve this goal are what I would call:

  1. the Fixed Distant Display
  2. the Handheld Portable Display, and
  3. the Head-Mounted Display

Some of these contexts fulfil the field-of-vision goal more effectively than others, and each has a trade-off between portability, flexibility, and completeness of fill.

The first context – Fixed Distant Display – is what we would traditionally call a ‘television’. However, its ability to display half-hour sitcoms is not its defining characteristic. Rather, it is defined by its large size, fixed position, and comfortable viewing distance from our eyes. Its large size means that it can fill our field of vision just as effectively as a smaller, nearer display, while placing less stress on the eye due to its relative distance. (It places less stress on the arms than a Handheld Portable Device, too.) The Fixed Distant Display’s limitation – at least in terms of filling our field of vision – is that it demands we look at a fixed location, rather than moving itself with our own field of vision as we turn our heads. It is also not directly manipulable, due to its distance from the viewer, and requires some form of indirect control mechanism to interact with whatever it is displaying.

The second context – the Handheld Portable Display – is typified by what we currently know as tablet devices. These tend to be held at a relaxed arm’s length, and have a size that balances between field-of-vision-filling and acceptable weight and ergonomics. They can be moved – albeit manually – in reasonable sync with our field of vision, and are directly manipulable due to their immediate proximity to the user’s hands.

The third physical context – the Head-Mounted Display – would include devices such as Google Glass and Oculus Rift. These devices sit at the other extreme to Fixed Distant Display, and fill the field of vision by being small but very close to the eye. Being head-mounted, they tend to move perfectly in sync with the field of vision, albeit at the expense of placing more stress on the eye due to their proximity. They also suffer from the additional challenge of social stigma, given the importance we place on facial communication in social interaction.

There is a fifth context of particular interest at the moment, which I’ve decided to name ‘Permanent Wrist’. (You’ll probably know it more commonly as a ‘smartwatch’.) However, as with Pocket-and-Hand’s misleading pseudonym of ‘smartphone’, the smartwatch’s defining characteristic is not its ability to tell the time. Rather, this context is defined by the speed and convenience of access to of a small, wrist-mounted display. This may be coupled with the ability to make subtle notifications through gentle vibration; an ability to track the user’s motion at a hand (and therefore gestural) level, rather than at an overall body (or pocket) level; and a means to provide rudimentary input response with a second hand through a small number of on-device buttons.

As with Pocket-and-Hand, Permanent Wrist benefits hugely from always being to hand (no pun intended). Its defining characteristics make it particularly useful as a notification and information consumption device for short, timely communications and information. Notifications (in this general sense) can arrive at any time, and so this ever-presence is a key aspect for Permanent Wrist’s effectiveness, especially given that its speed of access (as opposed to having to remove a device from a pocket) is one of the defining benefits of its physical context. This speed of access, and relative lack of opportunity for response, also benefits from an improved lack of social intrusion, at least when compared to removing a device from a pocket (such as in a business meeting environment).

So: that defines five physical contexts in which devices currently operate (although there are certainly more). Given that a smartwatch is no more a watch than an iPhone is a phone, why then do we use such over-specific terms to define and market devices within these physical contexts?

As humans, we struggle to move straight to this arbitrary view of devices and objects without some kind of reference. We tend to think of the future in terms of our current experience. So, to help people transition to these new, multi-function devices, we contextualise them into something we already know (hence ‘Apple TV’ for a device that plugs into a Fixed Distant Display). The very fact that we often prepend the word ‘smart’ onto an existing device category is telling in itself – i.e. “this new thing is a smarter version of that thing you are already familiar with”. This gives a way to ease people into a more flexible device than they might otherwise be comfortable with. (It’s also telling that the primary response to the iPad at launch was “what would I use it for?”. For iPad, the Handheld Portable Display context came with no prior reference point for the majority of users.)

So why am I breaking away from these historical reference points, and using the more abstract definitions above, to think about these devices? Well, I have three main reasons.

Firstly, it helps us to see the potential beyond what we already know. Thinking of a device in terms of its defining physical attributes makes it easier to spot potential applications for that device, outside of how we might have used other devices previously.

Secondly, it helps us to spot where the convergence between existing devices is likely to occur. If you can envisage the capabilities of one device being fulfilled within the physical context if another, then that convergence will almost certainly happen sooner or later, as Alan suggested in his talk.

Thirdly, it enables us to spot where a device has the potential to be even more useful when combined with a device from another physical context. Rather than thinking about these objects as discrete devices, we can then start to think about combinations of devices as systems, with new possibilities beyond what each device can do in isolation. This takes us beyond convergence within one physical context, and into a wider convergence of devices as a combined system within our physical environment as a whole.

To give an example: the combination of Pocket-and-Hand and Permanent Wrist gives an always-connected, non-intrusive notification system for all of your networked messages. And the combination of Pocket-and-Hand and Fixed Distant Display is what we would currently know as a ‘games console’, but without the need for a separate dedicated device.

This, then, is another opportunity for device manufacturers – creating devices that both satisfy their own physical context, and also work together as an integrated whole to create combined systems that simply would not be possible in isolation.