Methods for evaluating functional prototypes

In this post I will take a look at three methods on how to evaluate Functional prototypes. 

The methods are:
1) System usability scale;
2) AttrakDiff; and
3) Think aloud protocol.

General classes of usability should be covered if usability is measured. They are effectiveness, efficiency and satisfaction. Vastly different metrics can  also be used while measuring these attributes. Context-specificity makes comparing different systems usability very difficult. It also means that one design which is very good in one system and with one set of users/use cases, might not work well in other context. Copying usable solutions is then very risky.

The usability of an artefact is defined by the context it is used in. Thus every usability study should require different and detailed approach. To make usability testing easier and more universally comparable, the System Usability Scale (SUS) was created. System usability scale is a reliable and low-cost usability scale which can be universally used for assessment of systems usability.  SUS is a Likert scale and the overall outcome will be a score between 0 and 100. Alltogether it is a valuable, reliable and robust evaluation tool that allows usability evaluation to be performed effectively.

AttrakDiff is used to assess users feelings about a system with a queastionnaire. The questionnaire studies both hedonic and pragmatic dimensions with semantic differentials. The data aquired is quantitative and also comparable, much like in the case of System Usability Scale, but the weakness of AttrakDiff is that it uses the reflections of the users rather than the real experiences themselves. The approach is also used both in lab and field studies.

Think aloud protocol incorporates another dimension to the studies. While doing usability testing, users are asked to talk aloud and say what comes to their minds while performing the set tasks. The thoughts are often not even related to the task but rather something that comes to mind while performing them. This might include what they are looking at, thinking, doing, and feeling. Also a set of questionnaires are used. The questionnaires can be used before or after performing the tasks, as necessary.

Methods for Collecting UX Data

There are three articles about physiological measures in collecting UX data on the table today. I will try to seek out the pros and cons, to see which would be the best for use in our own UX evaluation process. All of these methods are not always used in evaluating prototypes of certain type. Some of them are used mainly in web design etc. I need to find the most suitable approach for me to evaluate user experience.

The three physiological measuring approaches are:

Visual Complexity Evaluation
Pupillary dilation monitoring
Eye tracking techniques

Visual Complexity Evaluation is used often in website design, but its outcomes and effects are not always fully utilized or understood. Within the studies a hypotheses was proposed and tested, to see, if increasing a websites complexity would have a detremental cognitive and emotional impact on users. Users want their web environments to be usable and appealing. Adding visual complexity may play a huge factor in their first impressions and even later usage decisions. If it works this way than designing appealing and simple webpages might work the other way around also and attract users to a webpage, regardless of the content. These performed studies also included passive viewing task (PVT) and visual search task (VST) methods.

Pupillary dilation monitoring during music-induced aesthetic responses (chills). The study concentrated on the correlation of music-induced chills and pupil reaction. While listening to different songs, participants were asked to actively press a button when they got a chill when listening to a song. The point where participants pressed the button was then correlated to pupil reaction. The study concluded that pupil reaction during passive music listening can be monitored and translated into aesthetic responses.

The third study was performed with touch-screen devices and soft keyboards. Eye tracking was used to evaluate the user experience of participants when using different soft keyboard layouts. The aim of the study was to provide input into soft keyboard layout design to help users type more effectively.

I think that our concept, which is a collaborative music making experience, could benefit a lot from pupillary dilation monitoring. That is to see if people get engaged and enjoy the process itself. On the secon hand, the experience needs to be as simple as possible for the users to step in and start using our machine. Thus visual complexity needs to be toned down to the very minimum in order to attract ausers. Either of these methods could be then used to evaluate the user experience on our concept/prototype.



Methods for evaluating early prototypes (Reading assignment 2)

This evaluation and discussion is to learn and select from three methods for evaluating early prototypes. The methods are:

(1) Multiple Sorting
(2) Contextual laddering
(3) Wizard of OZ

Multiple Sorting
The first method addressed is multiple sorting. It relies on te assumption that People’s conceptualising and understanding of their world, and therefore their knowledge, is based on categorisation. A simple linear scale method (semantic differential – awkward vs. easy to use) was used to collect information on whow we see the world. Since a person is constantly updating their understanding of their surroundings and the world, Kelly (Kelly, G. A. The Psychology of Personal Constructs, Volume One: Theory and Personality. (1). 1955. New York, Norton.) developed a technique for eliciting personal constructs in an interview context which would be more multidimentional. This technique is known as the ‘Repertory Grid Test.
In the Repertory Grid Test, participants are asked to think aboud tiads of items. They need to think and describe, what is similar between two of them and why the third one is different. This method brings out contrastive dimentions and participants are then asked to rate those contrastive items.  The data collected then forms a grid and helps to explain how people interpret the items and connections genetrated by other people on the same items.
Both of these methods rely on the assumption that our constructs are polar. Categorization can often be multidimentional. The Repertory Grid Test method is quite timeconsuming also since interviewing the participants is also necessary. Verbalization of the answers is also often difficult. To address the linear approach of the earlier tests, new variations have been devised. A new version of the test provides the participants with a wide variety of objects and allows them to select as many to a group as they like. After each sorting, participants are asked for the reasons they made the decisions. Multidimensional Scalogram Analysis (MSA) is then performed on the resulting sort data, to yield spatial maps of constructs for interpretation alongside interview discussions.
Contextual laddering 
Laddering is especially a well-known interview technique where customers are asked multiple times to explain why an attribute, that has been given to a product by the user, is important. Many times after explaining the background of an attribute, the question why, is asked again. E.g. Why is that important to you? By probing into the reasons why, the interviewee will ‘climb up the ladder’. This way, the reasons (consequences) why certain attributes are important will first be revealed, followed by an expression of how
these consequences serve personal values. UX Laddering is useful to design for attributes that offer value and meaningful user experiences to users. The goal of laddering, is to identify and understand the links between key perceptual elements across the range of
attributes, consequences and values.
Wizard of OZ (WOz)
The WOz method helps designers avoid getting locked into a particular design or working under an incorrect set of assumptions about user preferences, because it lets them explore and evaluate designs before investing the considerable development time needed to build a complete prototype. The method is particularly useful in exploring user interfaces for pervasive, ubiquitous, or mixed-reality systems that combine complex sensing and intelligent control logic. The problem with the method is that it requires the engineering of an interface and integrating it with an incomplete system. Now when the system under investigation is developed further, the WOz interface is usually not and is thus a once (or twice) time used interface.
In WOz a person acts as a wizard and performs the steps that have not been automated or implemented by the system itself. Thus manually closing the gaps that the current state of development has within the system. As development goes onward the gaps, the wizard has to fill when testing, shrink.
In our case, the last method seems a good one to use since we need to test the overall system on potential users without having certain very important functionality developed. In order to see if it makes sense to go forward with the same plan, we need to test the overall perception of the system.

Three methods for evaluating design. Which would I choose?

There are three methods for evaluating design that are currently on the table. I will explore to learn and decide on which I would use to evaluate our future work and designs.

The methods are as follows:

(1) Sentence Completion
(2) AXE (Anticipated eXperience Evaluation)
(3) Exploration test

Sentence completion technique is often used in psychology and marketing. The method can be developed and applied for evaluating symbolic meaning. In the paper  “Sentence Completion for Evaluating Symbolic Meaning” by Sari Kujala and Piia Nurkka, sentence completion is used in two case studies to evaluate how people give symbolic meaning to objects based on their design and associations. Respondents are provided with a questionnaire full of beginnings of sentences where they can complete them in ways that are meaninful to them.


Sentence completion techique can be used through interviews and nowadays even often via electronic channels (electronic questionnaires via e-mail etc.). Interviewing allows for collecting large amounts of information, following how people react and associate things and events to design. The downside of interviewing is of course the organizational side. The time to gather necessary resources, get people to attend the interviews and set the meetings is a very resource heavy process. Using web based questionnaires is an easier way to go, but there are also downsides. The amount of attention and input you get from people attending the study is often lower and depending on the associations people have with the objects or design studied.

Having interviews with users in studies has often many drawbacks that need to be addressed. When evaluating concepts, it is necessary to get feedback from potential users. Especially important is perceived product character and its individual features. Through this information, it is possible to identify potential issues early on and make modifications to avoid them. But the bottom line is that these are concepts that are being evaluated and concepts are abstract. The presentation of a concept or the visual look of an early prototype might sway the feedback one way or the other. Even the storytelling behind the necessity or creation of a concept might put new and confusing thoughts in to the mix. There is also an issue with talking about the future or putting it into words. It becomes very dificult when people are asked to formulate their future needs. It is epecially so because using words to describe experiences is difficult and by adding imagination to the pot, it becomes more so.

To overcome all the difficulties in concept evaluation, the AXE (Anticipated Experience Evaluation) method is proposed by Lutz Gegner and Mikael Runonen. It is an approach  to gain insights on the perceived value of concepts by utilizing image-pairs as stimuli in user interviews. The approach has three steps: concept briefing, concept evaluation, and data analysis.

First in concept briefing, the participants are presented the concept each time in the same manner and order. All information like concept narratives and extra material is also provided to the participants, so they are able to access them during the session at any time. In the second and main part of the AXE method, Concept Evaluation, participants are provided with image-pairs and a scale in between. The images are used to make sure that all input towards the participants is similarily structured and would help steer them to talk about experimental aspects they perceive. The generative visual and enabling scaling methods help solve some challenges that normal interviews have. The image-pairs are composed to display a contrast and linked through a scale to strengthen the idea of bipolarity. Through selecting in between the images, people express their perception of the product and preference.

An evaluation interview is also carried out during concept evaluation. That is to get a deeper understanding on why the participant has chosen one or the other. The visuals associated with the concept are not always the visuals that are their preferred ones. When there is a difference between the chosen and prefferred images, the interviewer can get an insight on what would make the concept better. It is very important to use only adjectives and information provided by the participant during the interviews, otherwise the validity of feedback can be lowered significantly.

During data analysis, the data is transcribed, partitioned into manageable segments and then an analytical framework is built. In the framework every segment is coded and categorized. The main categories classes reflect the current state of the concept and comprise of perceived product features, associated attributes and anticipated consequences. All these categories and subcategories provide input towards UX development concerning the evaluated concept.

The third method, Vermersch’s `explicitation’ interviewing technique (Vermersch’s `explicitation’ interviewing technique used in analysing human-computer interaction, Ann Light, december 1999), is more based on interviewing and provides a HCI especially with a bigger range of application. HCI researchers are trying to understand the use of technologies, and regularly use qualitative research methods to do so. This approach will help them investigate, how tasks are completed. The method allows participants to enter evocation. The interviewee is encouraged to think of a particular episode involving an activity under investigation and go into a state of evocation so that the episode can be described in detail. This gives researchers clearer overview of what goes on in peoples mind when they perform certain tasks. The interview has to be carried out with extreme detail to avoid misguiding the interviewee.

All of these described methods have tasks that they perform best. In our case where we have an idea that has not fully been described and formulated, we find that we should use the first method –  Sentence Completion . We would be able to use sentence completion to find out what are the perceived qualities and symbolic meaning that our proposed concept or idea has. We wish to see if users give the concept meanings like having fun, spending quality time with friends or family etc. By providing participants with a questionnaire compiled of open-ended questions, we can see if our own vision and idea goes hand in hand with the perception of possible users and see if sümbolic meaning is given to the concept like we hope.