‘The cracks are more visible online’: why virtual interaction is more complex than it looks

Since the start of the COVID-19 pandemic with its lockdowns and work-from-home adaptations, pressure has risen to interact online. Transitioning traditional face-to-face (FtF) methods to online conferencing and consultation was already underway, of course, and COVID-19 has accelerated that trend.

Online interaction (OL) that worked as effectively as FtF could offer many advantages to geographically dispersed organisations. Travelling to FtF events consumes time and money. It can be hard to coordinate busy diaries and, once gathered, pressure exists to ‘do everything now, while we’re all here’.

In contrast, OL is cheap, easy to organise and simpler to coordinate allowing shorter, more frequent and more focused exchanges. Yet OL meetings bring some downsides, often seeming clunky, artificial and dissatisfying. For example, committee meetings, which can be somewhat boring FtF may become dreadful if transplanted to an OL setting without suitable adaptation[1].

[1] Not all effects of OL are negative, however. We use an FtF process where people tell a story to 2 others, then turn their backs to listen in silence while the others discuss what they heard, before changing places and repeating. This has considerable impact on the storyteller and the ‘analysts’. It seems likely that using break out rooms of three and replacing back-turning with muting the camera and microphone for the teller affords greater listening and learning than FtF.

Previously, we at Kinnford prided ourselves on employing successful methods in FtF contexts to promote energy and engagement. Facing the disruption of the pandemic, we’ve searched for OL alternatives that might be as effective.

Combining our experiences, a burgeoning commentary by others and our social science training, we’ve identified three crucial, inter-connected factors that underpin interaction and which (a) explain why OL can become clunky and (b) may guide create improvements to it.

First, ‘being is embodied’. Andy Clark cites a lovely passage from a sci-fi story where aliens discover humans and cannot fathom how meat, lacking any central processor unit, can function.

We are, as Clark says, ‘thinking meat, feeling meat’.

The implications are profound. Cognitive neuroscience shows cognition and emotion are not separate, they are singular (call it C/E). C/E is a cascading process of prediction, perception and interpretation of events. Imagine a kayaker shooting a rapid: reading the water, balancing herself and the kayak and adjusting her course moment to moment. That is how C/E unfolds.

Moreover C/E is not abstract, ethereal consciousness. Rather, it is embodied, embedded in contexts and extended: we think ‘beyond the brain’ with other body parts and things (imagine the potter thinking with hands, clay and wheel) and in interaction with others (a lively party is co-created with intermingled C/E.) Contrast the C/E of a hungry man, wearing suit and tie, walking briskly towards a café, hot and uncomfortable in the noon sun with that same man in a comfortable chair and casual clothes, caressed by a gentle breeze, brandy in hand, watching the sun sink into the ocean after a fine dinner.

Finally, C/E is helped or hindered by features that afford or discourage responses. ‘Affordances’ (‘triggers’ is a rough synonym) are numerous and typically we set them up routinely, knowing, without thinking deeply, what C/E we want to afford (or trigger). A table setting, for example, affords many, encoded possibilities (but largely excludes others.) This table is not set to play Scrabble, nor for a family dinner with toddlers. You won’t be served beans on toast.

In the OL environment, many important perceptual data are limited: we see faces, perhaps, but not postures, hands or crossed legs. Side conversations and meaningful glances are absent. In contrast, ‘chat’ is slow, verbally more formal and ‘flat’. (Think how ‘nice’ means myriad things dependent on tone and nuance.) OL we do not share the same ambience of temperature and smell. We cannot touch. (While touching is rightly constrained in a world concerned about paedophiles and harassers, it remains true that touch is essential to human communication and wellbeing.) We are less clear about what affords good interaction OL and what does not.

Lack of embodiment, then is a key issue, to which we will return.

Second, interaction is a ‘team sport’ which we learn to play from birth. In the 1960s and 70s, sociologist Harold Garfinkel summarised some of the key features: interaction, he said, is ‘more or less artfully accomplished’.

Garfinkel showed—and a raft of social science elaborates this argument—that the boundaries around interaction, the sense-making within it and the perception of a comfortable flow result from everyone playing along. Like a volleyball team, everyone shares the effort of keeping the ball in the air.

The team effort is ‘artfully accomplished’ when the ball stays aloft and no one needs think about who should bunt it, how high, etc. Garfinkel saw this as an interactive achievement, not just people following clear cut rules like automata.

This game varies between cultures (and a foreign game can be confusing) but people can play along pretty well, most of the time.

How do we interact in OL Land? There’s no established culture to draw upon, no Lonely Planet Guide available to navigate the unfamiliar. Relying on FtF assumptions may leave us ‘lost in translation’.

Third, we operate with two brain systems. System 1 consists of automatic rules of thumb or ‘heuristics’ (which may include stereotypes) while System 2 is conscious thought and problem solving. Social interaction is managed mostly by System 1 (biologically low cost, rapid and reliable) freeing up resources for the occasional System 2 action (costly and slow), a weighting that conserves our energy and resources.

This links with the previous point: ordinarily, we deploy many heuristics spiced with a little System 2 thinking to get along. But our familiar FtF heuristics often don’t work OL. Instead, we have to use System 2 which is slow, it is hard work and may be exhausting.

What conclusions might follow? We don’t have a detailed answer but we have a few suggestions.

Starting with C/E as embodied, embedded etc.: in OL events ‘we are all in the same meeting but not the same room’. Jane, perhaps, is in her office, using the desktop with a good camera and microphone, door closed, lighting adjusted. She feels and looks professional and engaged. Meanwhile, John is at home in the spare bedroom, balancing his iPad on his knees. The blinds are up and he’s harshly backlit. He’s keeping one ear on what the kids are doing and the builder’s noise from next door is loud. He does not look or feel professional, while distraction affects his engagement. Furthermore, depending on familiarity and expertise with the technology and software, the ‘extension’ of C/E may differ for Jane and John.

Resources may differ: John doesn’t have an office or desktop at home, perhaps he’s not had the same training opportunities. Also, most people know how to ‘turn up’ to a FtF meeting, what the dress codes might be, how to present oneself, etc. The same isn’t true OL and the varied log-in locations may have major effects on participation and group dynamics.

Absent a normative code, it’s not always clear what ‘place settings’ to prepare and not everyone ‘turns up’ the same way. This is a cultural work-in-progress which may eventually settle into shared knowledge that organises embodiment, embedding and extension of C/E OL.

Turning second to interaction as a team sport akin to volleyball, the same limitation seems relevant: lacking the 200,000 years of human social evolution that grounds Ftf we’ve not yet worked out what OL does and does not afford and have no reliable rules-of-thumb for cooperating and collaborating. While in FtF we signal, often without realising it, with gestures, gaze and so on. We do not have an OL code to replace these. A useful strategy is to keep communicating and reflecting on this: every time we re-examine FtF and ‘make the implicit, explicit’ we increase the chance that we can find a different way to achieve outcomes we mutually seek.

Finally, the heuristics we use FtF may or may not translate to the OL context. How we interpret silences when FtF, for example, may not be good guides for silence OL, let alone for the simulations of silence—the muted camera and microphone. It will take time and effort to constitute a reliable set of OL heuristics to—meanwhile, we consume a lot of System 2 resources, which may explain why people find attention span OL is shorter and the overall impact more exhausting.

OL is not going to go away—we need to keep working at understanding and performing it better, combining trial-and-error experience on the one hand with the most useful concepts on the other. This short piece has hopefully been a small step in that direction.