Matthias Müller-Prove 4-June-2023
Published in The Future of Text Journal 2.7(2023): AI & VR before Apple’s ‘RealityPro’
- »The best way to predict the future is to invent it.«
History is perceived as a sequence of logical steps only though the rearview mirror. The brief compilation of computing events form 1963 onwards is no exception. Nonetheless, it is quite likely that 2023 will be seen as a significant year because it marks once again a shift in computing paradigms. We went from punch cards and command line interfaces (CLIs) to GUIs. Since the 1990s desktop windows coexist with browser windows to access the Web. Mobile touch devices are part of our daily life since the first decade of the millennium – the iPhone was introduced in 2007.
Paradigm shifts in interaction design do not replace the previous modalities. In fact they are all used in parallel today. [cf. Back to Childhood – Infantilisation of UI Design, 2011] Depending on the tasks, we still enter text commands at terminal shells or search engine’s textfields; we drag file icons around, and touch, swipe and pinch pixels behind glass. Voice commands basically swap keyboard input with speech recognition. Due to the emulation of a human conversation, the interaction feels even more natural – until some commands are not properly interpreted and the illusion falls apart. Users have to cope with it. The systems represent decades of development that are layered on top of each other. Usage is only intuitive for a very small scope – on a larger scale metaphors and command structures are far form coherent and consistent.
2023: AI and VR
The research lab openAI has ignited an industry hype by launching chatGPT 3.5 last year and chatGPT 4 in 2023. Large language models and deep artificial neural networks generate text at a level of quality that seemed to be reserved for human abilities until now. By harvesting an average of all human writing, AI systems get better to “understand” the intention of their users and to respond in a seemingly reasonable and correct fashion. However, algorithmic philosophy does not yet provide a framework to describe what is happening in this area. Vint Cerf put it this way [Future of Text and AI, 20-Apr-2023]:
»Our problem is that the Turing test turned out to be too easy now for the ChatGPT to pass.«
It might be too early to provide an assessment on the techno-social implications of the new computing capabilities.
The other technique that eventually reaches the professional and prosumer market in 2023 is VR/MR/AR (virtual, mixed and augmented reality; cross reality –XR– is used as an umbrella term.) XR systems are under development for quite some time. A year before 1963, Morton Heilig has introduced the analog system Sensorama, that enhanced a movie clip with stereo-audio, aromas, wind, and vibrations. [interview footage with M.Heilig]
The Sword of Damocles by Ivan Sutherland et al. (1966) is considered as the first computer-based MR prototype: The user was able to inspect a free-floating cube by moving around. His perspective was tracked with telescopic arms from the ceiling, hence the nick name. Since then various research and development teams aim to provide a satisfying user experience in virtual worlds for play and work. Google Glass, Microsoft HoloLens, and Oculus Quest to enter the Metaverse are recent products.
For decades Apple has a track record to shape the industry by its products and services. The desktop publishing revolution was fuelled by the Mac. The digital music industry was completely disrupted by iPods/iTunes which lead to the iPhone and the AppStore market place. Other companies like google, Samsung and Microsoft followed to build smart phones and tablets and their own operating systems and app ecosystems.
Tomorrow, on June-5, 2023, Apple will introduce its VR headset. Like in desktop-GUI computing or mobile touch devices, all other research and development teams will conduct competitive analyses on Apple’s design solutions. This will inspire and inform their own road maps. All together they will establish a de-facto standard how to interact in XR environments by means of head-mounted-displays, gestures and plenty of sensors.
As mentioned before a paradigm shift in human computer interaction has always just added a new layer on top of the established stack of HCI design patterns. Users will continue to choose the device and the input mode that best suits their tasks. As a consequence this means that data, tools and users will and shall be able to seamlessly migrate between devices and systems.
A user – Eve – might start to take notes on a new idea by recording a voice message to herself. Later she opens the list in text-form on her laptop computer and continues to refine her concept. In order to incorporate related research and references she might virtually dive into the ocean of literature and other media of her field. The 3D info space is arranged in a way to make her feel comfortable and safe. It enables her to discover related materials with ease. Connecting the dots, drawing conclusions, and enhancing her considerations is a computer-augmented thought process.
It is a desired future to reuse and further improve common interaction modalities for XR – if possible and if it makes sense. Let’s take a look at the established interaction areas.
The desktop metaphor is fundamental to personal computing. For virtual and augmented scenarios it has to be replaced by a 3D rooms metaphor. Be reminded, that metaphors in computing are not just digital copies of real objects; instead they use familiar concepts and add extend the conceptual model to offer functionalities that would otherwise not be understandable and usable at all.
WIMP (an acronym for windows, icons, menus, pointing device): windows, i.e. rectangular areas with scrollbars to reveal sections of a much larger canvas… Windows can overlap like sheets of paper on a desk. Due to the limited screen real estate mobile devices don’t have windows Here, the only window is identical with the screen. Virtual rooms do not have edges – everything is projected into a sphere. Stereoscopic sight adds the third dimension of vicinity and distance. It has yet to be decided if metaphorical shelves –like the stockroom in The Matrix†– offer good access to documents and media files.
Icons either represent (closed) documents and folders or they provide cues for command buttons or apps. Both aspects are needed in XR. But simply using the graphical language of GUIs and mobile devices for VR seems not to be appropriate. New symbolic items that are born-3D should be developed and evaluated.
The use of menus is slower than other command panels like e.g. tool bars or hotkeys. The primary benefit of menus is a complete overview on the functional space of an applications. The lack of a standard menu structure for apps on mobile devices makes it extremely difficult to occasionally use apps. Apart from a few standard buttons there is no familiarity with apps that are used infrequently. Obscure interaction gestures remain hidden and cannot be learned by climbing through the menu tree. XR will face the same issues until standards and conventions will help the users to feel safe and in control in new XR app environments.
The P in WIMP stands for pointing device. Originally it was a mouse, but today trackpads are more common than mice for laptop computers. Another advantage of trackpads is multi-finger gesture support. These gestures have been adopted from mobile devices. Pointing and multi-finger gestures will also be used for XR interactions. New gestures will involve the hand, both hands, and even body gestures like nodding or shrugging the shoulders. Some of these new gesture can eventually even be used on other device categories.
A unique UI element that can be triggered by point and click is the hyperlink. It connects two Web pages with each other and teleports the user from a marked piece of text to a destination somewhere else on the Web. Magic portals are already used in VR games to change levels or virtual scenes quickly. Generic XR hyperlinks might connect various XR information environments with each other much like a browser jumps from one webpage to another.
VUIs – voice user interface – is the next interaction modality that evolved from speech-to-text dictation systems into voice command systems for mobile, car and home entertainment and shopping devices. As soon as the error rate comes down to an acceptable niveau, voice commands can be an additional input modality for XR environments as well.
The blinking i-beam cursor has been invented in the late 1960s.
Larry Tesler introduced text selecting by dragging the mouse and cut/copy/paste commands for the clipboard metaphor at Xerox PARC in the 1970s. Innovation research shows that new paradigms have a lead-time of at least a decade and also a life-time of half a century and more. This can be explained with The Pace Layer Model by Steward Brand [introduced in his book The Clock Of The Long Now: Time and Responsibility, 1999]. Changes in infrastructure are much slower than market cycles of products or the fashion collections of the seasons. Media theory says that a new medium will first use and embrace the former medium before a new appropriate language for its content formats will evolve.
XR is in the same situation. Prevailing GUI and gesture/touch and VUI interaction design systems will be first adopted for XR before a mainstream of typical interaction behaviours gets established. Chances are high that 2023 will be seen as the year of an AI break through and of an paradigm shift for XR due to the launch of Apple’s headset.
It is important to consider the entire suite of devices and to design for scenarios that involve the flawless migration between devices, various applications, apps and virtual environments. Proprietary silos will slow down innovation. Generic and flexible interaction methods, that are always available like the clipboard, will improve the adoption of XR environments and increase the joy of use. Standard system behaviour will flatten the learning curves an improve productivity.
Now is the time to shape the XR tools in order to support and augment collective human reasoning and collaboration.
The user Eve will decide whether Apple’s solution fits her needs and matches her expectations regarding augmented and virtual reality – or whether the industry needs to come up with even better tools to enhance Eve’s digital life.