[PDF]Data Analysis With Microsoft Excel
Please sign in to contact this author
Lata
Analysis
with
Microsoft® Excel
Updated for Office 2007®
Kenneth N. Berk
Illinois State University
Patrick Carey
Carey Associates, Inc.
oN BROOKS/COLE
| CENGAGE Learning”
Australia * Brazil « Japan * Korea * Mexico » Singapore * Spain * United Kingdom « United States
oe BROOKS/COLE
4 CENGAGE Learning”
Data Analysis with Microsoft® Excel: © 2010, 2004 Brooks/Cole, Cengage Learning
Updated for Office 2007®, Third Edition
ALL RIGHTS RESERVED. No part of this work covered by the copyright
Berk, Carey
herein may be reproduced, transmitted, stored, or used in any form or by
any means graphic, electronic, or mechanical, including but not limited to
photocopying, recording, scanning, digitizing, taping, Web distribution,
information networks, or information storage and retrieval systems, except
Associate Editor: Daniel Seibert as permitted under Section 107 or 108 of the 1976 United States Copyright
Editorial Assistant: Shaylin Walsh Act, without the prior written permission of the publisher.
Publisher: Richard Stratton
Senior Sponsoring Editor: Molly Taylor
Associate Media Editor: Catie Ronquillo
For product information and technology assistance, contact us at
Senior Marketing Manager: Greta Kleinert Cengage Learning Customer & Sales Support, 1-800-354-9706
Marketing Coordinator: Erica O’Connell For permission to use material from this text or product,
submit all requests online at www.cengage.com/permissions.
Further permissions questions can be emailed to
permissionrequest@cengage.com.
Marketing Communications Manager:
Mary Anne Payumo
Content Project Manager: Jessica Rasile
Art Director: Linda Helcher
Library of Congress Control Number: 2009928574
Print Buyer: Linda Hsu
ISBN-13: 978-0-495-39178-4
Permissions Editor: Margaret
Chamberlain-Gaston ISBN-10: 0-495-39178-6
Production Service/Compositor:
PrePress PMG Brooks/Cole
Photo Manager: John Hill 20 Channel Center Street
Cover Designer: Blue Bungalow Design le MA 02210
Cover Image: ©Fotolia
Cengage Learning products are represented in Canada by
Nelson Education, Ltd.
For your course and learning solutions, visit www.cengage.com
Purchase any of our products at your local college store or at our preferred
online store www.ichapters.com
Printed in the United States of America
1234567131211 41009
About the Authors
Kenneth N. Berk
Kenneth N. Berk (Ph.D., University of Minnesota) is an emeritus professor
of mathematics at Illinois State University and a Fellow of the American
Statistical Association. Berk was editor of Software Reviews for the American
Statistician for six years. He served as chair of the Statistical Computing
Section of the American Statistical Association. He has twice co-chaired
the annual Symposium on the Interface between Computing Science and
Statistics.
Patrick Carey
Patrick Carey received his M.S. in biostatistics from the University of
Wisconsin where he worked as a researcher in the General Clinical Research
Center designing and analyzing clinical studies. He coauthored his first
textbook with Ken Berk on using Excel as a statistical tool. He and his wife
Joan founded Carey Associates, Inc., a software textbook development com-
pany. He has since authored or coauthored over 20 academic and trade texts
for the software industry. Besides books on data analysis, Carey has written
on the Windows® operating system, Web page design, database manage-
ment, the Internet, browsers, and presentation graphics software. Patrick,
Joan, and their six children live in Wisconsin.
I thank my wife Laura for her advice, because here she is
the one who knows about publishing books.
—Kenneth N. Berk
Thanks to my wife, Joan, and my children, John Paul, Thomas,
Peter, Michael, Stephen, and Catherine, for their love and
support.
—Patrick M. Carey
Preface
Introduction
Data Analysis with Microsoft® Excel: Updated for Office 2007® harnesses
the power of Excel and transforms it into a tool for learning basic statistical
analysis. Students learn statistics in the context of analyzing data. We feel
that it is important for students to work with real data, analyzing real-world
problems, so that they understand the subtleties and complexities of analy-
sis that make statistics such an integral part of understanding our world.
The data set topics range from business examples to physiological studies
on NASA astronauts. Because students work with real data, they can appre-
ciate that in statistics no answers are completely final and that intuition and
creativity are as much a part of data analysis as is plugging numbers into
a software package. This text can serve as the core text for an introductory
statistics course or as a supplemental text. It also allows nontraditional stu-
dents outside of the classroom setting to teach themselves how to use Excel
to analyze sets of real data so they can make informed business forecasts
and decisions.
Users of this book need not have any experience with Excel, although
previous experience would be helpful. The first three chapters of the book
cover basic concepts of mouse and Windows operation, data entry, formulas
and functions, charts, and editing and saving workbooks. Chapters 4 through
12 emphasize teaching statistics with Excel as the instrument.
Using Excel in a Statistics Course
Spreadsheets have become one of the most popular forms of computer soft-
ware, second only to word processors. Spreadsheet software allows the user
to combine data, mathematical formulas, text, and graphics together in a
single report or workbook. For this reason, spreadsheets have become indis-
pensable tools for business, as they have also become popular in scientific
research. Excel in particular has won a great deal of acclaim for its ease of
use and power.
As spreadsheets have expanded in power and ease of use, there has been
increased interest in using them in the classroom. There are many advan-
tages to using Excel in an introductory statistics course. An important ad-
vantage is that students, particularly business students, are more likely to
be familiar with spreadsheets and are more comfortable working with data
entered into a spreadsheet. Since spreadsheet software is very common at
colleges and universities, a statistics instructor can teach a course without
requiring students to purchase an additional software package.
Having identified the strengths of Excel for teaching basic statistics, it
would be unfair not to include a few warnings. Spreadsheets are not statistics
packages, and there are limits to what they can do in replacing a full-featured
statistics package. This is why we have included our own downloadable
add-in, StatPlus™. It expands some of Excel’s statistical capabilities. (We
explain the use of StatPlus where appropriate throughout the text.) Using
Excel for anything other than an introductory statistics course would prob-
ably not be appropriate due to its limitations. For example, Excel can easily
perform balanced two-way analysis of variance but not unbalanced two-way
analysis of variance. Spreadsheets are also limited in handling data with
missing values. While we recommend Excel for a basic statistics course, we
feel it is not appropriate for more advanced analysis.
System Information
You will need the following hardware and software to use Data Analysis
with Microsoft® Excel: Updated for Office 2007°:
e A Windows-based PC.
e Windows XP or Windows Vista.
e Excel 2007. If you are using an earlier edition of Excel, you will have to
use an earlier edition of Data Analysis with Microsoft® Excel.
e Internet access for downloading the software files accompanying the text.
The Data Analysis with Microsoft® Excel package includes:
e The text, which includes 12 chapters, a reference section for Excel’s
statistical functions, Analysis ToolPak commands, StatPlus Add-In
commands, and a bibliography.
e¢ The companion website at www.cengage.com/statistics/berk contains
92 different data sets from real-life situations plus a summary of what
the data set files cover, ten interactive Concept Tutorials, and installa-
tion files for StatPlus—our statistical application. Chapter 1 of the text
includes instructions for installing the files.
e An Instructor’s Manual with solutions to all the exercises in the text is
available, password-protected on the companion website, to adopting
instructors.
Preface v
Excel’s Statistical Tools
Excel comes with 81 statistical functions and 59 mathematical functions.
There are also functions devoted to business and engineering problems. The
statistical functions that basic Excel provides include descriptive statistics
such as means, standard deviations, and rank statistics. There are also
cumulative distribution and probability density functions for a variety of
distributions, both continuous and discrete.
The Analysis ToolPak is an add-in that is included with Excel. If you
have not loaded the Analysis ToolPak, you will have to install it from your
original Excel installation.
The Analysis ToolPak adds the following capabilities to Excel:
e Analysis of variance, including one-way, two-way without replication,
and two-way balanced with replication
Correlation and covariance matrices
Tables of descriptive statistics
One-parameter exponential smoothing
Histograms with user-defined bin values
Moving averages
Random number generation for a variety of distributions
Rank and percentile scores
Multiple linear regression
Random sampling
t tests, including paired and two sample, assuming equal and unequal
variances
e ztests
In this book we make extensive use of the Analysis ToolPak for multiple
linear regression problems and analysis of variance.
StatPlus™
Since the Analysis ToolPak does not do everything that an introductory sta-
tistics course requires, this textbook comes with an additional add-in called
the StatPlus™ Add-In that fills in some of the gaps left by basic Excel 2007
and the Analysis ToolPak.
Additional commands provided by the StatPlus Add-In give users the
ability to:
Create random sets of data
Manipulate data columns
Create random samples from large data sets
Generate tables of univariate statistics
vi
Preface
Create statistical charts including boxplots, histograms, and normal
probability plots
Create quality control charts
Perform one-sample and two-sample ¢ tests and z tests
Perform non-parametric analyses
Perform time series analyses, including exponential and seasonal
smoothing
Manipulate charts by adding data labels and breaking charts down into
categories
Perform non parametric analyses
Create and analyze tabular data
A full description of these commands is included in the Appendix’s
Reference section and through on-line help available with the application.
Concept Tutorials
Included with the StatPlus add-in are ten interactive Excel tutorials that pro-
vide students a visual and hands-on approach to learning statistical concepts.
These tutorials cover:
Boxplots
Probability
Probability distributions
Random samples
Population statistics
The Central Limit Theorem
Confidence intervals
Hypothesis tests
Exponential smoothing
Linear regression
Preface vii
Acknowledgments
We thank Mac Mendelsohn, Managing Editor at Course Technology, for his
support and enthusiasm for the First Edition of this book. For this edition,
our thanks to Jessica Rasile, Content Project Manager, Blue Bungalow Design
for the cover design, and Carol A. Loomis, Copyeditor, for their professional
attention to all the details of production.
Special thanks go to our reviewers, who gave us valuable insights into
improving the book in each edition: Aaron S. Liswood, Sierra Nevada College;
Abbot L. Packard, State University of West Georgia; Andrew E. Coop, US Air
Force Academy; Barry Bombay, J. Sargeant Reynolds Community College;
Beth Eschenback, Humboldt State University; Bruce Trumbo, California
State University — Hayward; Carl Grafton, Auburn University; Carl R.
Williams, University of Memphis; Cheryl Dale, William Carey College; Dang
Tran, California State University — Los Angeles; Bruce Marsh, Texas A &
M University — Kingsvile; Edward J. Williams, University of Michigan —
Dearborn; Eric Zivot, University of Washington; Farrokh Alemi, George
Mason University; Faye Teer, James Madison University; Gordon Dahl,
University of Rochester; Ian Hardie, University of Maryland; Jack Harris,
Hobart and William Smith Colleges; Ames E. Pratt, Cornell University;
James Zumbrunnen, Colorado State University; John A. Austin, Jr.,
Louisiana State University — Shreveport; Kelwyn A. D’Souza, Hampton
University; Kevin Griffin, Eastern Arizona College; Lea Cloninger, University
of Illinois at Chicago; Lorrie Hoffman, University of Central Florida; Marion
G. Sobol, Southern Methodist University, and Matthew C. Dixon, USAF
Academy.
We thank Laura Berk, Peter Berk, Robert Beyer, David Booth, Orlyn Edge,
Stephen Friedberg, Maria Gillett, Richard Goldstein, Glenn Hart, Lotus
Hershberger, Les Montgomery, Joyce Nervades, Diane Warfield, and Kemp
Wills for their assistance with the data sets in this book. We especially want to
thank Dr. Jeff Steagall, who wrote some of the original material for Chapter 12,
Quality Control. If we have missed anyone, please forgive the omission.
Kenneth N. Berk
Patrick M. Carey
viii
Contents
Chapter 1
GETTING STARTED WITH EXCEL 1
Getting Started 2
Special Files for This Book 2
Installing the StatPlus Files 2
Excel and Spreadsheets 4
Launching Excel 5
Viewing the Excel Window 6
Running Excel Commands 7
Excel Workbooks and Worksheets 10
Opening a Workbook 10
Scrolling through a Workbook 11
Worksheet Cells 14
Selecting a Cell 14
Moving Cells 16
Printing from Excel 18
Previewing the Print Job 18
Setting Up the Page 19
Printing the Page 21
Saving Your Work 22
Excel Add-Ins 24
Loading the StatPlus Add-In 24
Loading the Data Analysis ToolPak 28
Unloading an Add-In 30
Features of StatPlus 30
Using StatPlus Modules 30
Hidden Data 31
Linked Formulas 32
Setup Options 32
Exiting Excel 34
Chapter 2
WORKING WITH DATA 35
Data Entry 36
Entering Data from the Keyboard 36
Entering Data with Autofill 37
Inserting New Data 40
Data Formats 41
Formulas and Functions 45
Inserting a Simple Formula 46
Inserting an Excel Function 47
Cell References 50
Range Names 51
Sorting Data 54
Querying Data 55
Using the AutoFilter 56
Using the Advanced Filter 59
Using Calculated Values 62
Importing Data from Text Files 63
Importing Data from Databases 68
Using Excel’s Database Query Wizard 68
Specifying Criteria and Sorting Data 71
Exercises 75
Chapter 3
WORKING WITH CHARTS 81
Introducing Excel Charts 82
Introducing Scatter Plots 86
Editing a Chart 91
Resizing and Moving an Embedded
Chart 91
Moving a Chart to a Chart Sheet 93
Working with Chart and Axis Titles 94
Editing the Chart Axes 97
Working with Gridlines and Legends 100
Editing Plot Symbols 102
Identifying Data Points 105
Selecting a Data Row 106
Labeling Data Points 107
Formatting Labels 109
Creating Bubble Plots 110
Breaking a Scatter Plot into
Categories 117
Plotting Several Variables 120
Exercises 123
Chapter 4
DESCRIBING YOUR DATA 128
Variables and Descriptive Statistics 129
Frequency Tables 131
Creating a Frequency Table 132
Using Bins in a Frequency Table 134
Defining Your Own Bin Values 136
Working with Histograms 138
Creating a Histogram 138
Shapes of Distributions 141
Breaking a Histogram into Categories 143
Working with Stem and Leaf Plots 146
Distribution Statistics 151
Percentiles and Quartiles 151
Measures of the Center: Means, Medians,
and the Mode 154
Measures of Variability 159
Measures of Shape: Skewness and
Kurtosis 162
Outliers 164
Working with Boxplots 165
Concept Tutorials: Boxplots 166
Exercises 175
Chapter 5
PROBABILITY DISTRIBUTIONS 182
Probability 183
Probability Distributions 184
Discrete Probability Distributions 185
Continuous Probability Distributions 186
Concept Tutorials: PDFs 187
Random Variables and Random Samples 189
Concept Tutorials: Random Samples 190
The Normal Distribution 193
Concept Tutorials:
The Normal Distribution 194
Excel Worksheet Functions 196
Using Excel to Generate Random
Normal Data 197
Charting Random Normal Data 199
The Normal Probability Plot 201
Parameters and Estimators 205
The Sampling Distribution 206
Concept Tutorials:
Sampling Distributions 211
The Standard Error 212
The Central Limit Theorem 212
Concept Tutorials:
The Central Limit Theorem 213
Exercises 218
Chapter 6
STATISTICAL INFERENCE 224
Confidence Intervals 225
z Test Statistic and z Values 225
Calculating the Confidence Interval
with Excel 228
Interpreting the Confidence Interval 229
Concept Tutorials:
The Confidence Interval 229
Hypothesis Testing 232
Types of Error 233
An Example of Hypothesis Testing 234
Acceptance and Rejection Regions 234
p Values 235
Concept Tutorials: Hypothesis Testing 236
Additional Thoughts about
Hypothesis Testing 239
The t Distribution 240
Concept Tutorials: The t Distribution 241
Working with the t Statistic 242
Constructing a t Confidence Interval 243
The Robustness of t 243
Applying the t Test to Paired Data 244
Applying a Nonparametric Test to
Paired Data 250
The Wilcoxon Signed Rank Test 250
The Sign Test 253
The Two-Sample t Test 255
Comparing the Pooled and Unpooled
Test Statistics 256
Working with the Two-Sample
t Statistic 256
Testing for Equality of Variance 258
Applying the t Test to Two-Sample Data 259
Applying a Nonparametric Test to
Two-Sample Data 265
Final Thoughts about Statistical Inference 267
Exercises 268
Chapter 7
TABLES 275
PivotTables 276
Removing Categories from a
PivotTable 280
Changing the Values Displayed
by the PivotTable 282
Displaying Categorical Data in a
Bar Chart 283
Displaying Categorical Data in a
Pie Chart 285
Two-Way Tables 288
Computing Expected Counts 291
The Pearson Chi-Square Statistic 293
>>>