Chapter in The Future of Text III, 2022
Naomi silently moves her tongue with out opening her mouth. The MSR sensor – the mumbled speech recogniser – on her neck detects her intent and opens a matrix of chapter previews. Twenty Twenty MSR She points with her finger in the air. Then she opens her hand. The room dims down while transforming into her preferred reading environment. Naomi has smart-designed this room according to some old photographs she got from her grand-grand-grandmother Isa Bowman. Isa Bowman: The Story of Lewis Carroll She begins to read: }
Language is a well-formed sequence of words to express thoughts and ideas. Spoken language is linear. Spoken language can be turned into text by writing it down. Text is linear to the extent that it consists of rows of words, separated by automatic line feeds at the margin, or by hard carriage return control characters to give way to a new thought in the following paragraph. Once a sheet of paper is filled up, the words continue their journey on the next page… until this is full… and so on… The sheets of paper pile up to form a book. A book is a physical object in real life.
Books are a natural habitat for text – same as magazines, newspapers, reports, hand-written letters… basically all paper-based media. Before capturing the messages on paper or papyrus our ancestors used to impress clay or carve in stone. A few thousand years later we use invisible magnetic or electronic charges as computer storage and memory. Each charged physical spot represents a bit, a binary digit 1 or 0. Eight bits to the byte and a decoding convention like ASCII or Unicode – these are the basic principles to interpret the bits as characters and to display them with glowing pixels on screen. All three modalities of text – pre-paper, paper, digital – are still in use today; for instance (i) on gravestones, (ii) for the classical publishing industry, and (iii) for all kinds of computer media from personal word processing to social media.
Text is linear – thinking is not. Language has the expressive power to put complex ideas into words by utilising its meta-referential properties. This enables an author to directly approach a reader and point to certain sections of the text. Complex causalities or abstract ideas can be described and discussed with words. New concepts or things can be handled by assigning new names, and by putting them in context with familiar terms. In fact it is quite difficult to find words that are not metaphorically derived from prior words. Quotes are often used to indicate that a word is not meant as such but shall be understood in a metaphorical sense. ‘Virtual’ is another attribute to inform readers that the following word should not be taken literally. We will discuss “virtual reality” further below.
A discussion among several people can be captured with linear text – as long as they do not speak at the same time. If they do anyway, we would either need a multi-track score like music notation for the instruments of an orchestra, or the text itself explains that the following sentences are meant to be spoken simultaneously. That would be an example of written language’s meta-referentiality. Footnotes are like a second track as well. They are anchored to the main body, i.e. a little spatial hint which indicates when reading the side track might be intended and appropriate.
Hypertext – a term coined by Ted Nelson in the early 1960s Ted Nelson Media Center – is non-sequential writing. Text passages are individual units. They can be connected by hyperlinks to provide related content to each other. Each link bears the invitation to follow a different, but somehow connected thought though a rabbit hole. {Naomi smiles.} Quotes and references are a primary citizen in hypertext because the origin can alway be accessed in its original context. Link and reference structures are visible on screen, e.g. as lines or coloured shapes between related text sections. [Ted Nelson, 1972: Parallel Textface™ in Xanadou™ in Matthias Müller-Prove, 2002: Vision and Reality of Hypertext and Graphical User Interfaces, section 2.1.2] 60 years after the idea of hypertext, the online environment is not a dream come true.
On the pro side, the Internet is a common communications infrastructure connecting all continents. It delivers all kinds of data and services to each point on the planet. A tremendous success and innovation which shall be used in a beneficial way for all of us. However… The Web as we know it today has almost nothing to do with the original vision of an interconnected dynamic global library. The only link between hypertext of the 1960s and the Web of the 1990s are hyperlinks between Web pages. Even Web 2.0 is history already. Web 2.0 was a term made popular by O’Reilly’s Web 2.0 conference series in the 2000s. It is the shift from tech-savvy or professional website creators to average people who want to upload “user generated content” and edit their personal pages. Web 2.0 is the beginning of a democratic medium where everybody can participate and easily edit wikis and write blog articles for the interconnected blogosphere. Since the 2010s big tech and media corporations rule the market, for instance Meta (facebook, instagram, WhatsApp, Meta Quest 2 and presumably the Metaverse), Amazon (Kindle, Echo, Prime, AWS), Apple (Mac, iPhone, iPads, Watch, podcasting, TV), google (search, Docs, YouTube, Android, A.I. research), Microsoft (LinkedIn, Teams, Skype, Flight Simulator), Zoom, Twitter (a global micro-blogging platform until its acquisition in October 2022). Tencent (Qzone, WeChat) and Sina (Weibo) dominate the market in China while ByteDance’ TikTok is popular around the globe. This list is far from being comprehensive. Games is a huge sector that is also quite relevant for VR because level designers already have the know-how to create engaging 3D worlds.
The most important revenue stream is selling ads. Therefore the social media platforms do massively collect user data to offer micro-targeting services to marketeers. For short: user’s online time and behavioural usage profiles are sold to run targeted commercial and political campaigns. If you are not paying for a service, then you are the product.
Cooling down. Back to text.
Reading is a linear repetitive activity. It is a fast cascade of focussing the words to harvest their meaning. Not every word is deciphered one by one. Instead the eye jumps 3 to 7 times per line to send sharp signals to the brain. Frequent reading improves the ability to detect certain patterns in the shape of text to obtain the meaning quite efficiently.
Reading a detective story remains linear even in case of cheating: Reading the last pages first is just a different order of reading the one-dimensional text. Scientific papers use footnotes or offer supplemental material in the appendix. Reading is optional; it’s up to the reader to take any way through a text.
Even reading hypertext is a linear activity. At certain points in text-space and personal-time the reader makes a deliberate decision to jump to a next chunk of text. Therefore browsing hypertext remains personally linear. However the reader (or user) might get lost in cyberspace. Then it is a matter of information architecture to provide a useful and usable navigation structure with sufficient hints to guide the reader (or user) along an intended trail.
According to Marshall McLuhan speech is a cold medium: »so little is given and so much has to be filled in by the listener.«
[Marshall McLuhan, 1964: Understanding Media, chapter 2] Resources on Marshall McLuhan Even more so when speech is delivered as text. Intonation, mood, and any body language of the speaker or author are missing during a pure reading experience.
Reading is a cool activity – like in cool jazz. The reader has to contribute her own background and fantasy to unfold the whole story. Reading text stimulates the brain to create a mental theatre with the plot and ideas that are encapsulated in black ink on white paper.
A similar phenomenon is called closure [Scott McCloud, 1993: Understanding Comics]. The reader of comic strips has to close the gaps between frames by imagining the missing pictures. [c.f. The New Yorker cover, Feb 25, 2008; via Barbara Tversky’s chapter in this volume] {While Naomi’s eye cascades over the reference, the image dissolves next to the paper. A gaze causes the image to zoom and she ponders a book’s shelf life.}
As a visual 2D medium, graphical novels are still a cold medium, while movies are a hot medium – to follow McLuhan’s terminology. There is no need to apply imagination to complete the rich visuals and Dolby surround audio of blended effects and a symphonic music score.
Much like movies, virtual reality (VR) is a hot medium. The user experiences a 3D world which is projected into a sphere of pixels and an endless audio track is playing over headphones. Alternatives to head mounted displays should be mentioned as well: For instance the CAVE (Cave Automatic Virtual Environment) is a stereo projection inside a box – large enough for a human to make a few steps. Other systems use large rooms covered with curved OLED displays behind protection glass on the floor. Amusement parks try to attract people with 360 domes – similar to planetarium’s night sky projections. All systems have some advantages and also some drawbacks for certain contexts of use. Technical requirements, affordability, ergonomic form factor of the hardware, availability and compatibility of software, interoperability with other computer platforms, interactivity, the lack of well established VR design patterns and poor usability… just to name a few issues that need to be addressed.
On the other hand there are several features of VR that make the platform desirable and interesting to explore new concepts – not just for gaming. VR offers more degrees of freedom than TV or cinema, i.e. the user can turn the head to look around, change her position by “walking”, and interact with virtual objects by “touching” “buttons” and “pulling” “levers”. Hand tracking and gesture recognition is necessary to interact with virtual objects.
The term immersion is used as a quality measure how convincing the VR experience is, whether the user believes to be “really - there - now”. The sensational impression of presence is supported by high resolution 3D graphics, high refresh rates, and extremely short lag times on turning the head to mitigate motion sickness.
It depend on the implementation effort of the development team whether believable creatures or humanoid characters populate the scenery and whether ambient sound provides subtle cues und realistic flavours during acting inside the VR environment. Good quality in all these aspects is necessary to offer an immersive experience.
VR is a slightly cooler medium than TV because the user can interact with the scenery and change the flow of events. In other words, VR requires physical und mental user participation while a cinema experience can be watched and enjoyed quite motionless from the armchair. But VR is definitely a hot medium compared to text because reading text requires creative imagination to revive the written words. Ready-made VR world just need to be observed.
It always poses problems when cold and hot medium categories compete on the user’s attention. Images draw attention over text. Videos draw attention over text and images. As a young medium, VR requires the most amount of lead time to get started before use. The perceived cost/value relation of reading in VR is just too high at the moment.
Text is text independent from the medium, whether it be paper or pixels. But since McLuhan’s »The medium is the message«
we must consider the channel, the display properties, the interaction design, and the social context.
Text in the post-paper modality is mostly used for news and information or for personal short text forms like e-mail, micro-blogging, public or private chats, and texting <sic!>. Books have not fully completed the transition into the digital world yet. Too rigid the software compared to paper – too tiresome the reading activity itself. Better display devices with higher resolution, higher refresh rates, or even electronic ink offer an experience of text that is as stable and legible as printed text on paper. However, a few issues remain: digital text is not spatially persistent. It always depends on the tool and the recent click or swipe activities how and where a paragraph is displayed. Hence it is a desperate attempt to look for a paragraph that was located somewhere on the upper third of a right page roughly after the introduction. Other interaction challenges are personal highlighting and annotations. Some propriety silo solutions are available. But none of them is as flexible as pencil scribbles on paper or as standardised and connected as the Web itself. None of the annotation solutions explores the realm of dynamically connecting people and media.
Display quality gets better. Goggles get smaller and more ergonomic to wear for longer time periods. Lab experiments are being conducted to use contact lenses instead of clunky headsets Hopefully interface capabilities and usability for reading and annotating text and for text authoring tools will improve as well.
Text in the really real life – excluding the printed word and the digital domain for a moment – occurs in public urban spaces. Text IRL is used on highway signs, as street labels, signage on and in buildings, even as hints on doors – PUSH/PULL – not to bump your head. Text IRL is used on billboards; picture the neon marketing messages on Times Square or Piccadilly’s large urban displays, which blend into the digital world already. Text IRL has a purpose to inform the “users of RL” about certain features; for instance how to find your way in a city, or which coffee to order in a restaurant. Text IRL supports RL by delivering necessary or superfluous information to the inhabitants of the space.
Text IRL without a function might be considered as art. There are a couple of examples for this category. Maybe graffiti? Maybe city branding campaigns like the letter sculptures Iamsterdam. Certainly urban word art which makes the pedestrians slow down and ponder the philosophical relation between letter sculptures and the location.
These considerations are quite relevant for VR if you acknowledge that artificially created reality aims to mimic the real world until the scenery becomes indistinguishable and the sensational impression can be considered perfect. The real world is the primary metaphor of the virtual world until it passes a VR Turing Test.
Virtual objects might stimulate our senses like their counterparts in the real world. Flipping through a virtual book might provide a sense of weight, haptics of paper, the sound of waving sheets, eventually even a fresh breeze of air or the smell of yellowed paper. A gesture with a finger is sufficient to flip (or scroll?) though the pages.
Initially a new medium will embrace all content that has been created for prior generations of media technology until the characteristics become clear and evolve into a new medium of its own right. Hence it is no surprise that several Hollywood movies depict the future of VR in quite classical terms. Three movies stand out: In »Disclosure« [Barry Levinson, 1994; based on a novel by Michael Crichton, 1993] the VR user virtually walks through a virtual library and opens cabinets to look for specific documents in virtual folders. »Minority Report« [Steven Spielberg, 2002; John Underkoffler as a consultant for presumable user interface concepts] introduces hand gesture interaction on large curved screens to sift through a huge media library to find evidence. The user moves like a conductor in front of an orchestra to skim through image and video footage. Finally the »Matrix« trilogy in 4 parts [Lilly and Lana Wachowski, 1999-2021]: VR is indistinguishable from reality and the only perceived state of being. There are only a few glitches in the matrix that causes suspicion about his perceived reality for the hero Neo. {»Follow the white rabbit,« Naomi mumbles. At the periphery the scene from Matrix fades in.}
All of these Hollywood interaction design video prototypes are impressive – that’s part of the success of the movies. But do they represent a usable and desirable concept for text in VR as well? It is more likely that VR will be a 3D TikTok horror show with billboards, subtitles, speech bubbles; more like massively multiplayer online games (MMOG) with plenty of targeted marketing messages.
The cold text medium and the fairly hot VR medium do not fit together. The high definition environment will swamp any cold text medium that appears as a shy digital object. The virtual world offers so many attractions that the users cannot focus her attention on longer text blocks to read. The same is true for writing. Too many distractions provide a poor environment for sound reasoning or to create engaging stories.
Like in the real world the environment matters for concentrated reading or creative writing. If anything is possible in VR, then dedicated 3D rooms should be designed and offered to support authors and readers. Interior designers might be involved to create cozy and calm rooms which display the corpus of text as primary digital objects. Related material is within reach. Significance can be mapped to distance. Filing and retrieval of documents should not simply mimic real library architecture where long and narrow aisles lead to sky-high book shelfs. In real life shelf space is a scarce resource. Space in VR is endless. Effective and efficient navigation structures are crucial in VR. The visual design of VR libraries shall not resemble the aesthetics of sci-fi movies. Instead some imagery of real and therefor familiar libraries might set the mood and expectations to interact with the collections. Mood images work like icons and labels and provide orientation to the user. Algorithmic magic shall augment and assist the user’s ability on browsing papers and connecting the dots for new creative conclusions. Interacting with resources should not be any simpler than the motto »information at your fingertips.« The action to offer more material or to visualise concepts in animated 3D graphics must only be a response to a clearly articulated wish of the user – such as the tip of a finger or a mumbled command. Otherwise the focus of attention is allured to different media.
A new interaction language for gestures needs to be established. We’ve had mouse clicks and drag’n’drop for desktop WIMP systems (windows, icons, menus, pointing device). Swipe, pinch and tabs are finger gestures on mobile touch devices.
Take the full body tracking from »Minority Report«. Any gesture can be interpreted to control the virtual environment. Raising an eye brow, nodding the head, shrugging the shoulders, conducting with both arms… The possibilities and degrees of freedom to trigger actions in the VR environment are tremendous. Therefore it is necessary to establish vendor-independent conventions how to interact and behave in VR. The systems will adapt to individual preferences and habits like they do today for speech recognition. A prediction model will always calculate the user’s intention based on the current context and be ready to offer related information on demand. Gentle micro feedback – visual, audible or haptic force feedback – tells the user about the responsive state of the system.
Augmented reality (AR) will adopt the interaction paradigms from VR. In addition, an internal digital twin of the real space needs to be kept up-to-date. The AR experience might be more comfortable and satisfying than being in a VR world because the natural and therefore familiar environment is always present and can be used as a reference point and as a backdrop to superimpose digital text and other media. Real surfaces become interactive displays. Sticky notes become virtual sticky notes that can be placed on augmented surfaces or on virtual work spaces.
Collaborating with other people in shared AR environments can also be a productive setting; less for writing text, but to inspect and create hovering models in space.
Alan Kay shared an anecdote from times when he was a student at University of Utah in the mid 1960s. Alan Kay Media Center Alan and a class mate got the assignment to improve a Simula program. An endless paper printer has produced an almost endless printout of the program. They rolled out the paper “scroll” down a hallway. While crawling across the paper they shouted their findings to each other to understand the object-oriented principles of the programming language. (Later this experiences helped Alan Kay to shape Smalltalk) – The hallway scenario makes sense in VR or AR as well. An innovative approach would be to identify problems and scenarios (for dealing with text) that can be tackled easier in an infinite 3D space than with a windows environment or even on small mobile screens.
Finally, the paper metaphor get less relevant. Typewriters are exhibited in museums. DTP (desktop publishing) word processing, electronic mail among other means to communication online are common practice for more than a generation. Reading and writing text on screen does not have to refer to the paper mataphor anymore. People grow up with swiping text on smart phones. Pupils and students are always connected on free wifi. Autocomplete is the preferred input method for virtual on-screen keyboards. Voice UI is used for home entertainment systems. Although, voice-to-text still has to been proofed as a viable input modality for longer texts.
Josh Clark was concerned regarding “Natural User Interfaces” for touch devices. He said, »We are creating the illusion that there is no user illusion anymore.«
We – as interaction designers – are diluting ourselves when we aim towards this objective for VR once again. There is always a conceptual design layer and a technical layer between the user and the service. Any usage is alway mediated by the artificial environment. It is the responsibility of product & interaction designers to create solutions that meet the expectations and needs of the users to all regards.
Gestalt laws and human physiognomy are universal and should not be ignored. User centred design for AR & VR {Naomi looks at an empty spot on the margin and starts to mumble. The following text appears: “The human is at the center of the virtual sphere. The system augments her abilities to access a world of libraries, to find patterns among the texts and to enter new thoughts that can be read by other people. Smart heuristics should support her intentions and provide a productive and inspiring environment for research and collaboration.”} User centred design for AR & VR will have to find solutions that initially look and feel familiar even in 3D. Copying the real world can only be a first step. In the long run interaction paradigms of desktop and mobile will be extended to utilise virtual 3D world that is projected into a 360° sphere or augmented onto the real world. Free floating windows in space is merely a minimum viable solution. “Physical” motion and hole body gestures will be added to the interaction modes of mouse, multi-touch and voice. The virtual depth of VR can be used to create primary working areas, secondary side spaces and rooms in the vicinity for other resources or other primary activities. Rooms offer a specific set of actions. Rooms can be considered like apps today. cf. Hans Zimmer’s composing room. Multi-user environments need to pay attention to privacy concerns in shared spaces. But they offer the opportunity for collaborative dynamic spaces to tackle wicket problems collectively.
{ Naomi moves two finger downwards followed by a thumbs up gesture. The matrix of previews shows stacks for each chapter of »The Future of Text«, volume 3. Some stacks look a little bit crumbled. She will continue with Mez Breeze’s article tomorrow. The room lights up again. Naomi still prefers to actually read instead of having a SmartAssistant reading it to her.
Ustinoff auf Eis: Yeah. Good Text.