Google Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications User Manual

Page 4

Advertising

opportunity to collect all the different ideas and work
towards consensus (and buy-in for the chosen metrics).

•

Goals for the success of a particular project or feature
may be different from those for the product as a whole.

•

Do not get too distracted at this stage by worrying
about whether or how it will be possible to find
relevant signals or metrics.

Signals

Next, think about how success or failure in the goals might
manifest itself in user behavior or attitudes. What actions
would indicate the goal had been met? What feelings or
perceptions would correlate with success or failure? At this
stage you should consider what your data sources for these
signals will be, e.g. for logs-based behavioral signals, are
the relevant actions currently being logged, or could they
be? How will you gather attitudinal signals – could you
deploy a survey on a regular basis? Logs and surveys are
the two signal sources we have used most often, but there
are other possibilities (e.g. using a panel of judges to
provide ratings). Some tips that we have found helpful:

•

Choose signals that are sensitive and specific to the
goal – they should move only when the user experience
is better or worse, not for other, unrelated reasons.

•

Sometimes failure is easier to identify than success (e.g
abandonment of a task, “undo” events [1], frustration).

Metrics

Finally, think about how these signals can be translated into
specific metrics, suitable for tracking over time on a
dashboard. Some tips that we have found helpful:

•

Raw counts will go up as your user base grows, and
need to be normalized; ratios, percentages, or averages
per user are often more useful.

•

There are many challenges in ensuring accuracy of
metrics based on web logs, such as filtering out traffic
from automated sources (e.g. crawlers, spammers), and
ensuring that all of the important user actions are being
logged (which may not happen by default, especially in
the case of AJAX or Flash-based applications).

•

If it is important to be able to compare your project or
product to others, you may need to track additional
metrics from the standard set used by those products.

CONCLUSIONS

We have spent several years working on the problem of
developing large-scale user-centered product metrics. This
has led to our development of the HEART framework and
the Goals-Signals-Metrics process, which we have applied
to more than 20 different products and projects from a wide
variety of areas within Google. We have described several
examples in this note of how the resulting metrics have
helped product teams make decisions that are both data-
driven and user-centered. We have also found that the

framework and process are extremely helpful for focusing
discussions with teams. They have generalized to enough of
our company’s own products that we are confident that
teams in other organizations will be able to reuse or adapt
them successfully. We have fine-tuned both the framework
and process over more than a year of use, but the core of
each has remained stable, and the framework’s categories
are comprehensive enough to fit new metrics ideas into.
Because large-scale behavioral metrics are relatively new,
we hope to see more CHI research on this topic – for
example, to establish which metrics in each category give
the most accurate reflection of user experience quality.

ACKNOWLEDGMENTS

Thanks to Aaron Sedley, Geoff Davis, and Melanie Kellar
for contributing to HEART, and Patrick Larvie for support.

REFERENCES

1. Akers, D. et al. (2009). Undo and Erase Events as

Indicators of Usability Problems. Proc of CHI 2009,
ACM Press, pp. 659-668.

2. Burby, J. & Atchison, S. (2007). Actionable Web

Analytics. Indianapolis: Wiley Publishing, Inc.

3. Chi, E. et al. (2002). LumberJack: Intelligent Discovery

and Analysis of Web User Traffic Composition. Proc of
WebKDD 2002, ACM Press, pp. 1-15.

4. Dean, J. & Ghemawat, S. (2008). MapReduce:

Simplified Data Processing on Large Clusters.
Communications of the ACM, 51 (1), pp. 107-113.

5. Google Analytics: http://www.google.com/analytics
6. Grimes, C. et al. (2007). Query Logs Alone are not

Enough. Proc of WWW 07 Workshop on Query Log
Analysis: http://querylogs2007.webir.org

7. Gwizdka, J. & Spence, I. (2007). Implicit Measures of

Lostness and Success in Web Navigation. Interacting
with Computers 19(3), pp. 357-369.

8. Hadoop: http://hadoop.apache.org/core
9. Kaushik, A. (2007). Web Analytics: An Hour a Day.

Indianapolis: Wiley Publishing, Inc.

10. Kohavi, R. et al. (2007). Practical Guide to Controlled

Experiments on the Web. Proc of KDD 07, ACM Press,
pp. 959-967.

11. Omniture: http://www.omniture.com
12. Pike, R. et al. (2005). Interpreting the Data: Parallel

Analysis with Sawzall. Scientific Programming (13), pp.
277-298.

13. Tullis, T. & Albert, W. (2008). Measuring the User

Experience. Burlington: Morgan Kaufmann.

14. UserZoom: http://www.userzoom.com
15. Weischedel, B. & Huizingh, E. (2006). Website

Optimization with Web Metrics: A Case Study. Proc of
ICEC 06, ACM Press, pp. 463-470.

Advertising