(Der hybride Testansatz: Qualitative Empfehlungen und quantitative Kennzahlen)
Paper at Mensch und Computer 2005, Linz 4. - 7.9.2005
Tim Bosenick SirValUse Consulting GmbH Germany |
J.O. Bugental † USA |
Matthias Müller-Prove Sun Microsystems GmbH |
Hybrid testing combines the need for qualitative recommendations to improve a user inter-face with the need for quantitative figures to express the user experience of the user inter-face. On one hand, the recruitment, the structure of the sessions and the tasks are standardized so that controlled and repeated measurements are possible under the same conditions over time. During task solving, certain measures are taken and observations recorded. In con-junction with subjective assessments, these measurements form the base of a calibrated and condensed index that express the user experience of the user interface in one figure. On the other hand, it is possible to explore usability problems qualitatively after each task so that deep insights into user behaviour are achievable. These data allow recommendations to be made that help to improve the interface.
User Experience, qualitative and quantitative testing, hybrid testing
Qualitative usability tests belong to the standard repertoire of companies who want to design user-friendly websites or to develop computer applications best suited to the users (cf. Nielsen 1993, Mayhew 1999, Travis 2003). In general, these tests are conducted with 10 to 20 participants who represent the potential or actual user group. The objective is to find problems as effectively as possible in order to be able to improve the user interface. If the company really takes the topic seriously, usability tests are carried out not only with the final product but iteratively throughout the entire development process. This way, awful blunders can be discovered early, and, at the same time, expensive mistakes are avoided.
The costs associated with the iterative qualitative approach can be considerable. Where ever costs are incurred, the investment must be justified. Thus a method of making usability measurable is sought-after; for this is the only way to quantitatively assess the optimizations initiated through the qualitative tests.
Whereas qualitative testing is invaluable for product design, it cannot produce data for quantitative indices. Even worse, the qualitative approach interferes with a pure quantitative approach because the facilitator’s interventions introduce side effects on the participant’s actions. Objective and repeatable measurements are not possible under these conditions.
The objective for the presented approach is to express the usability of a software product numerically by ascertaining data with a sufficient number of cases via observations and other measurements that are as objective as possible. Subjective factors should not be disregarded, for they also contribute to an integrated user experience. These factors are above all those which describe the dimensions usefulness, appearance/design, and emotional quality (or »joy of use«).
The hybrid approach for usability testing consists of two phases. A pre-phase is needed to prepare for the hybrid testing sessions, to adjust the measurement tools, and most important to define the usability metric. The act of measuring takes place in the following hybrid phase. For iterative usability tests the hybrid phase is repeated over time.
During the pre-phase, 3 to 6 classic usability testing sessions are conducted: performance of task, thinking aloud, observing the participant’s behaviour, etc. Usability issues are assembled in an initial list of usability problems. The list will be used to observe and count problems in the subsequent measuring phase. The pilot study also provides some indication for the expected duration of the performance of the tasks. This defines a time limit for each task.
The next step consists of defining scales for the dimensions which are relevant for measurement. Usability can and must be expressed with »hard« data, for example, with extensive time and click measurements and with the exact recording of the participant’s behaviour. According to DIN ISO 9241, usability is defined as a combination of effectiveness, efficiency and satisfaction. Effectiveness is defined as accuracy and completeness with which users achieve specific goals. For the hybrid approach we use:
In order to measure the task completion, a task is split into several logical steps. The degree of task completion is the number of completed logical steps in relation to the total number of logical steps for a task.
Efficiency is defined as resources expended in relation to the accuracy and completeness with which users achieve goals. For our approach we use:
Satisfaction is defined as the subjective assessment of the participants on a validated questionnaire. We use:
Other subjective factors, which are important for the user experience, like appearance/design and usefulness, can be asked in a specially developed questionnaire after the performance of each task.
The indices for effectiveness, efficiency and satisfaction will be calibrated and adjusted using appropriate methods (factor, regression and discriminant analysis).
The hybrid phase has four main parts. First is the recruitment of the necessary number of participants. Part two and three – measurement and post-exploration – are executed in sequence for each task and each participant. The final part is obtaining the measures in accord with the usability metric and construction of usability indices.
The participants for the usability testing sessions should be representative of the domain to be tested (e.g. software, website, mobile phone). The recruitment conforms to well defined quotas of segmentation of user groups. For iterative testing, the composition of the sample needs to be constant for each usability study in order to gain comparable results.
The greater the number of participants involved, the smaller the sample error will be which is beneficial for the significance of two different measures. Statistical considerations (via t-Test calculation) have shown that 20 is the minimum number of participants needed to get meaningful data.
In order to avoid interference by the facilitator, the participant works alone in the testing room on a given task. The usability coefficients and usability problems are being tracked from the observation room. The measurement itself is carried out under constant and con-trolled conditions. This is ensured to a large extent by the use of tracking software that was specifically developed for this purpose. The software is able to record the time on task, clicks and keystrokes. Usability problems and use of features can be documented as well.
When the time limit is reached or when the task has been completed successfully within the given time, the facilitator enters the testing room and asks the participant to assess her task performance subjectively with a standardized questionnaire. After that, observed issues are explored in order to gain further qualitative insights regarding the usability problems. Additional questions can be included at any time. In case that the solution of the task is a precondition for the following task, the facilitator helps the participant out.
In order to construct a high level usability index, the factors – degree of task fulfilment, number of encountered problems, and elapsed time – are normed and weighted. The results are distilled to objective and subjective indices, so that the usability of the product is expressed with a few simple numbers.** cf. (Sauro & Kindlund 2005) for a similar approach
The usability metric, as presented so far, allows the project team to rate a product over time or to compare it with the competition.
The presented hybrid approach for usability testing delivers qualitative and quantitative data. It saves cost and time as the same team conducts one test rather than two. Furthermore, recruitment has to be done only once for the hybrid testing phase.
However, the hybrid approach evokes high costs to gain qualitative data. If a product is tested twice to measure the difference between two releases, it takes at least 40 participants for the qualitative findings. This is much more than necessary according to Nielson’s saturation curve of usability testing (Nielsen 2000), but the minimum to obtain valid quantitative data.
The approach has proven to be worthwhile in a couple of studies carried out.
hybrid 9 slides, 200 KB