Google Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications User Manual
Page 3

new features. After launching a major redesign, they saw
an initial decline in their user satisfaction metric (measured
on a 7-point bipolar scale). However, this metric recovered
over time, indicating that change aversion was probably the
cause, and that once users got used to the new design, they
liked it. With this information, the team was able to make a
more confident decision to keep the new design.
Engagement
Engagement is the user’s level of involvement with a
product; in the metrics context, the term is normally used to
refer to behavioral proxies such as the frequency, intensity,
or depth of interaction over some time period. Examples
might include the number of visits per user per week, or the
number of photos uploaded per user per day. It is generally
more useful to report Engagement metrics as an average per
user, rather than as a total count – because an increase in
the total could be a result of more users, not more usage.
For example, the Gmail team wanted to understand more
about the level of engagement of their users than was
possible with the PULSE metric of seven-day active users
(which simply counts how many users visited the product at
least once within the last week). With the reasoning that
engaged users should check their email account regularly,
as part of their daily routine, our chosen metric was the
percentage of active users who visited the product on five
or more days during the last week. We also found that this
was strongly predictive of longer-term retention, and
therefore could be used as a bellwether for that metric.
Adoption and Retention
Adoption and Retention metrics can be used to provide
stronger insight into counts of the number of unique users
in a given time period (e.g. seven-day active users),
addressing the problem of distinguishing new users from
existing users. Adoption metrics track how many new users
start using a product during a given time period (for
example, the number of accounts created in the last seven
days), and Retention metrics track how many of the users
from a given time period are still present in some later time
period (for example, the percentage of seven-day active
users in a given week who are still seven-day active three
months later). What counts as “using” a product can vary
depending on its nature and goals. In some cases just
visiting its site might count. In others, you might want to
count a visitor as having adopted a product only if they
have successfully completed a key task, like creating an
account. Like Engagement, Retention can be measured over
different time periods – for some products you might want
to look at week-to-week Retention, while for others
monthly or 90-day might be more appropriate. Adoption
and Retention tend to be especially useful for new products
and features, or those undergoing redesigns; for more
established products they tend to stabilize over time, except
for seasonal changes or external events.
For example, during the stock market meltdown in
September 2008, Google Finance had a surge in both page
views and seven-day active users. However, these metrics
did not indicate whether the surge was driven by new users
interested in the crisis, or existing users panic-checking
their investments. Without knowing who was making more
visits, it was difficult to know if or how to change the site.
We looked at Adoption and Retention metrics to separate
these user types, and examine the rate at which new users
were choosing to continue using the site. The team was
able to use this information to better understand the
opportunities presented by event-driven traffic spikes.
Task Success
Finally, the “Task Success” category encompasses several
traditional behavioral metrics of user experience, such as
efficiency (e.g. time to complete a task), effectiveness (e.g.
percent of tasks completed), and error rate. One way to
measure these on a large scale is via a remote usability or
benchmarking study, where users can be assigned specific
tasks. With web server log file data, it can be difficult to
know which task the user was trying to accomplish,
depending on the nature of the site. If an optimal path exists
for a particular task (e.g. a multi-step sign-up process) it is
possible to measure how closely users follow it [7].
For example, Google Maps used to have two different types
of search boxes – a dual box for local search, where users
could enter the “what” and “where” aspects separately (e.g.
[pizza][nyc]) and a single search box that handled all kinds
of searches (including local searches such as [pizza nyc], or
[nyc] followed by [pizza]). The team believed that the
single-box approach was simplest and most efficient, so, in
an A/B test, they tried a version that offered only the single
box. They compared error rates in the two versions, finding
that users in the single-box condition were able to
successfully adapt their search strategies. This assured the
team that they could remove the dual box for all users.
GOALS – SIGNALS – METRICS
No matter how user-centered a metric is, it is unlikely to be
useful in practice unless it explicitly relates to a goal, and
can be used to track progress towards that goal. We
developed a simple process that steps teams through
articulating the goals of a product or feature, then
identifying signals that indicate success, and finally
building specific metrics to track on a dashboard.
Goals
The first step is identifying the goals of the product or
feature, especially in terms of user experience. What tasks
do users need to accomplish? What is the redesign trying to
achieve? Use the HEART framework to prompt articulation
of goals (e.g. is it more important to attract new users, or to
encourage existing users to become more engaged?). Some
tips that we have found helpful:
•
Different team members may disagree about what the
project goals are. This process provides a great