Working With JMP
Working With JMP
Working With JMP
Introductory Guide
The real voyage of discovery consists not in seeking new
landscapes, but in having new eyes.
Marcel Proust
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2008. JMP 8 Introductory
Guide. Cary, NC: SAS Institute Inc.
JMP 8 Introductory Guide
Copyright 2008, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-59994-920-8
All rights reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the
prior written permission of the publisher, SAS Institute Inc.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by
the vendor at the time you acquire this publication.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related
documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set
forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st printing, October 2008
SAS Publishing provides a complete selection of books and electronic products to help customers use SAS
software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hardcopy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Contents
JMP Introductory Guide
1
Introducing JMP
Your First Look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What You Need to Know . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Learning About JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Using Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Searching in the Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Learning About Statistical and JSL Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Using the Context-Sensitive Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Learning JMP Tips & Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Using This Book in Combination with Other Included Books . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Conventions Used in this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Step 1: Start JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Step 2: Open a JMP Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Step 3: Learn About the Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Specifying the Values Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Data Table Cursor Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Selecting Rows and Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Step 4: Select an Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Casting Columns Into Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Step 5: View the Output Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Graphs and Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Statistical Tables and Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Step 6: Save the JMP Output Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
A Practice Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Open a Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Select an Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Spin the Cowboy Hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
21
23
23
24
ii
Summarizing Data
Look Closely at the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Grouping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating Statistics for Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Charting Statistics from Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Charting Statistics for Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finding a Subgroup with Multiple Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparative Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
What Has Been Discovered? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finding the Best Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
33
34
35
36
38
39
39
41
41
42
Looking at Distributions
Histograms, moments, quantiles, and proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Displaying Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understanding Histograms of Nominal and Ordinal Variables . . . . . . . . . . . . . . . . . . . . . . . .
Understanding Histograms of Continuous Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Learning About Report Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reports for Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Frequency Table for Ordinal or Nominal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adding a Computed Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
25
25
26
29
30
45
47
47
48
49
51
51
52
52
55
56
57
59
59
60
60
61
63
64
iii
69
71
71
71
71
75
75
75
78
65
65
66
67
81
83
83
83
83
84
85
85
86
87
87
88
88
90
91
93
94
A Factorial Analysis
Designed Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Open a Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
What Questions Can Be Answered? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
The Fit Model Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Graphical Display: Leverage Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Contents
Quantify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mean Estimates and Statistical Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
102
103
105
105
107
Exploring Data
Finding Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Solubility Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
One-Dimensional Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Two-Dimensional Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Three-Dimensional Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Principal Components and Biplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multivariate Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109
111
111
112
113
114
116
117
10 Multiple Regression
Examining Multiple Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Aerobic Fitness Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fitting Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fit Planes to Test Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Whole Model Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
More and More Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpreting Leverage Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
121
123
124
126
126
127
129
130
Index
JMP Introductory Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
vi
Blazek, Michael Friendly, Joe Hockman, Frank Shen, J.H. Goodman, David Ikl, Barry Hembree, Dan
Obermiller, Jeff Sweeney, Lynn Vanatta, and Kris Ghosh.
Also, we thank Dick DeVeaux, Gray McQuarrie, Robert Stine, George Fraction, Avigdor Cahaner, Jos
Ramirez, Gudmunder Axelsson, Al Fulmer, Cary Tuckfield, Ron Thisted, Nancy McDermott, Veronica
Czitrom, Tom Johnson, Cy Wegman, Paul Dwyer, DaRon Huffaker, Kevin Norwood, Mike Thompson, Jack Reese, Francois Mainville, and John Wass.
We also thank the following individuals for expert advice in their statistical specialties: R. Hocking and
P. Spector for advice on effective hypotheses; Robert Mee for screening design generators; Greg Piepel,
Peter Goos, J. Stuart Hunter, Dennis Lin, Doug Montgomery, and Chris Nachtsheim for advice on
design of experiments; Jason Hsu for advice on multiple comparisons methods (not all of which we
were able to incorporate in JMP); Ralph OBrien for advice on homogeneity of variance tests; Ralph
OBrien and S. Paul Wright for advice on statistical power; Keith Muller for advice in multivariate
methods, Harry Martz, Wayne Nelson, Ramon Leon, Dave Trindade, Paul Tobias, and William Q.
Meeker for advice on reliability plots; Lijian Yang and J.S. Marron for bivariate smoothing design;
George Milliken and Yurii Bulavski for development of mixed models; Will Potts and Cathy
Maahs-Fladung for data mining; Clay Thompson for advice on contour plotting algorithms; and Tom
Little, Damon Stoddard, Blanton Godfrey, Tim Clapp, and Joe Ficalora for advice in the area of Six
Sigma; and Josef Schmee and Alan Bowman for advice on simulation and tolerance design.
For sample data, thanks to Patrice Strahle for Pareto examples, the Texas air control board for the pollution data, and David Coleman for the pollen (eureka) data.
Translations
Erin Vang coordinated localization. Noriki Inoue, Kyoko Takenaka, and Masakazu Okada of SAS
Japan were indispensable throughout the project. Special thanks to Professor Toshiro Haga (retired, Science University of Tokyo) and Professor Hirohiko Asano (Tokyo Metropolitan University for reviewing
our Japanese translation. Special thanks to Dr. Fengshan Bai, Dr. Xuan Lu, and Dr. Jianguo Li, professors at Tsinghua University in Beijing, and their assistants Rui Guo, Shan Jiang, Zhicheng Wan, and
Qiang Zhao, for reviewing the Simplified Chinese translation. Finally, thanks to all the members of our
outstanding translation teams.
Past Support
Many people were important in the evolution of JMP. Special thanks to David DeLong, Mary Cole,
Kristin Nauta, Aaron Walker, Ike Walker, Eric Gjertsen, Dave Tilley, Ruth Lee, Annette Sanders, Tim
Christensen, Jeff Polzin, Eric Wasserman, Charles Soper, Wenjie Bao, and Junji Kishimoto. Thanks to
SAS Institute quality assurance by Jeanne Martin, Fouad Younan, and Frank Lassiter. Additional testing
for Versions 3 and 4 was done by Li Yang, Brenda Sun, Katrina Hauser, and Andrea Ritter.
Also thanks to Jenny Kendall, John Hansen, Eddie Routten, David Schlotzhauer, and James Mulherin.
Thanks to Steve Shack, Greg Weier, and Maura Stokes for testing JMP Version 1.
Thanks for support from Charles Shipp, Harold Gugel (d), Jim Winters, Matthew Lay, Tim Rey,
Rubin Gabriel, Brian Ruff, William Lisowski, David Morganstein, Tom Esposito, Susan West, Chris
Fehily, Dan Chilko, Jim Shook, Ken Bodner, Rick Blahunka, Dana C. Aultman, and William Fehlner.
Technology License Notices
The ImageMan DLL is used with permission of Data Techniques, Inc.
vii
XRender is Copyright 2002 Keith Packard. KEITH PACKARD DISCLAIMS ALL WARRANTIES WITH
REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL KEITH PACKARD BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Credits
Chapter 1
Introducing JMP
Your First Look
JMP uses an extraordinary graphical interface to display and analyze data. JMP is software for interactive statistical graphics and includes:
a data table window for editing, entering, and manipulating data
a broad range of graphical and statistical methods for data analysis
an extensive design of experiments module
options to highlight and display subsets of data
a formula editor for each table column to compute values as needed
a facility for grouping data and computing summary statistics
special plots, charts, and communication capability for quality improvement techniques
tools for printing and for moving analyses results between applications
a scripting language for saving and creating frequently used routines
This introductory chapter gives basic information about using JMP.
Contents
What You Need to Know . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Learning About JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Using Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Searching in the Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Learning About Statistical and JSL Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Using the Context-Sensitive Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Learning JMP Tips & Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Using This Book in Combination with Other Included Books . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Conventions Used in this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Step 1: Start JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Step 2: Open a JMP Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Step 3: Learn About the Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Specifying the Values Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Data Table Cursor Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Selecting Rows and Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Step 4: Select an Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Casting Columns Into Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Step 5: View the Output Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Graphs and Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
Statistical Tables and Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Step 6: Save the JMP Output Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
A Practice Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Open a Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Select an Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Spin the Cowboy Hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 1
Introducing JMP
What You Need to Know
Using Tutorials
JMP provides three types of tutorials:
Beginners Tutorial The beginners tutorial steps you through the JMP interface and explains the
basics of how to use JMP. It is accessible through JMPs Tip of the Day window, which appears when
you start JMP. To start the tutorial from the Tip of the Day window, click Enter Beginners
Tutorial. Or, start the tutorial by selecting Help (View on the Macintosh) > Tutorials > Beginners
Tutorial.
Specific Analysis Tutorials Tutorials that step you through creating an analysis in JMP are found
under Help (View on the Macintosh) > Tutorials. Tutorials describe how to create a chart, compare
means, how to design an experiment, and more.
JMP Introductory Guide The JMP Introductory Guide is a collection of tutorials designed to help
you learn JMP strategies. If you did not receive a printed copy of this book, view the .pdf file by
selecting Help > Books > JMP Introductory Guide. By following along with these step-by-step
examples, you can quickly become familiar with JMP menus, options, and report windows.
1 Introducing JMP
Introducing JMP
Learning About JMP
Chapter 1
On Windows and Linux, the Help > Contents, Help > Search, and Help > Index commands access
the JMP help system. The help system provides navigable online JMP documentation.
On the Macintosh, the Help > JMP Help command displays a list of JMP help items with search capabilities and a table of contents.
Chapter 1
Introducing JMP
Learning About JMP
description
JSL value/script
JSL Operators Index Presents a list of JSL operators, such as Sin, Cos, Sqrt, and Abbrev Date that
you would use when writing JSL. Highlight an operator name to see a description of the operator
appear in the window on the right. Click the Topic Help button to locate the topic in the online
help.
Object Scripting Index Presents a list of JSL objects. These are scriptable JSL building blocks.
Highlight an object name and messages the object recognizes appear in the window on the right.
DisplayBox Scripting Index Presents a list of the elements that make up a JMP report. These elements are the JSL building blocks with which you build output. Highlight a Display Box and available messages for each object appear in the window on the right.
1 Introducing JMP
list of topics
Introducing JMP
Learning About JMP
Chapter 1
Select the help tool ( ) from the Tools menu and click a place in a data table or report on which
you need assistance (Figure 1.2). Context-sensitive help tells about the items in the area you clicked.
Figure 1.2 Use the Help Tool for Context-Sensitive Help
In some reports, make a small circle with your cursor to reveal information about the item in the
area.
Figure 1.3 Making a Circle with the Cursor Displays Help
In some menus, hold the cursor on menu items to reveal information about the menu item.
Figure 1.4 Display a Description of Menu Items
Chapter 1
Introducing JMP
Conventions Used in this Book
When you first start JMP, you see the Tip of the Day window. This window provides tips about using
JMP that you might not know.
To turn off the Tip of the Day, un-check Show tips at startup. To view it again, select Help (View on
the Macintosh) > Tip of the Day.
Also use the JMP Quick Reference Guide to learn more advanced commands in JMP. View this document by selecting Help > Books > JMP Quick Reference Card.
1 Introducing JMP
Introducing JMP
Step 1: Start JMP
Chapter 1
To start the online tutorial, click Enter Beginners Tutorial. Or, click the Close button to close the
window and follow the tutorials in this book.
Selecting File > New (or clicking the New Data Table button on the JMP Starter window) creates
and displays a data table with an empty data grid. First, add rows and columns, then type in or paste
in new data. For details, see the JMP User Guide.
Selecting File > Open (or clicking the Open Data Table button on the JMP Starter window) presents a file selection window (Figure 1.6) with a list of existing tables. Select a file and click Open.
For details, see the JMP User Guide.
Chapter 1
Introducing JMP
Step 3: Learn About the Data Table
The JMP data table window is a flexible way to prepare data. Using it, you can accomplish a variety of
table management tasks, such as:
Editing the value in any cell
1 Introducing JMP
10
Introducing JMP
Step 3: Learn About the Data Table
Chapter 1
Values are ordered categories, which can have either numeric or character values.
)
)
Modeling types are changeable depending on how you want to look at your data.
For example, a variable like age should be specified continuous to find the mean
(average) age, but nominal or ordinal to find frequency counts for each age value.
The default modeling type is nominal for character values and continuous for
numeric values. To assign a different modeling type to your variables:
1 Click the icon next to the variable name.
2 Select the appropriate modeling type.
For details, see the JMP User Guide.
Chapter 1
Introducing JMP
Step 3: Learn About the Data Table
11
Click to select
column or double-click to edit
column name
Click to deselect rows
I-Beam Cursor
The cursor is an I-beam when it is over text in the data grid or highlighted column names in the data
grid or column panels. To edit text in the data grid:
1
2
3
4
1 Introducing JMP
12
Introducing JMP
Step 3: Learn About the Data Table
Chapter 1
Hand Cursor
The cursor changes to a hand when you move the mouse over a red triangle icon ( ) or diamond-shaped disclosure button (
on Windows and
on the Macintosh).
Click the red triangle to reveal the menu and select a menu icon. Click the disclosure button to open or
close a panel.
Instructions
Highlight a row
Highlight a column
Click the background area above the column name. Or, click the column name in the columns panels to the left of the data grid.
Extend a selection of
rows or columns
Shift-click the first and last rows or columns of the desired range.
Make a discontiguous
selection
selected rows
Chapter 1
Introducing JMP
Step 4: Select an Analysis
13
There are a variety of analyses available through the Analyze and Graph menus in the main menu. An
alternate way to access these analyses is through toolbar buttons and selections in the JMP Starter window. Selecting an analysis in the Analyze or Graph menus produces graphs, charts, plots, and/or tables.
For example, to see a histogram of columns in the data table you have open, select
Analyze > Distribution. Then, complete the window and click OK.
The JMP analysis methods are like stages or platforms for variables to dramatize their values. Each analysis requires information about which variables play what roles in an analysis.
The most typical variable roles are:
Y, Response
studied.
X, Factor Identifies a column as an independent, classification, or explanatory variable whose values divide the rows into sample groups.
Weight Identifies a numeric column whose values supply weights for each response.
Freq
By
Identifies a numeric column whose values assign a frequency to each row for the analysis.
Identifies a column that is used to create a report consisting of separate analyses.
1 Introducing JMP
14
Introducing JMP
Step 5: View the Output Report
Chapter 1
Histogram
Outlier Box
Plot
Display Options
To enhance the default graphical displays that show your results, JMP provides options that you can
add to them. These options are found by clicking the red triangle icon beside a report name. For example, the red triangle icon next to the histogram name lists available report options (Figure 1.11). For
practice, try selecting different combinations of these options and watch the effect they have on the displays and reports.
Select a column in Select Columns.
Select Y, Columns in Cast Selected Columns into Roles.
Select variables for Weight, Freq, and By.
Click OK.
Chapter 1
Introducing JMP
Step 5: View the Output Report
15
1 Introducing JMP
16
Introducing JMP
Step 6: Save the JMP Output Report
Chapter 1
JMP also gives you the ability to change the appearance of these tables. For details, see the JMP User
Guide.
Journal window
A Practice Tutorial
Before you begin the tutorials in the following chapters of this book, complete this brief practice tutorial that is a short guided tour through a JMP analysis. Follow the steps to see a three-dimensional scatterplot.
Chapter 1
Introducing JMP
A Practice Tutorial
17
This data table has three numeric columns and two row state columns. Columns x and y are x- and
y-coordinates, and z is created using the function
2
z = sin x + y
Select an Analysis
To plot the three columns of information from the Cowboy Hat data table:
Choose the Scatterplot 3D command from the Graph menu.
Select the x, y, and z columns from the column selector list on the left side of the window, and click
Y, Columns, as shown in Figure 1.14.
Figure 1.14 Scatterplot 3D Column Selection Window
These column names now appear in the list on the right side of the window.
Click OK.
The scatterplot 3-D appears. Initially, the data points look like a two-dimensional plot because the z
dimension is projected onto the x-y plane.
1 Introducing JMP
18
Introducing JMP
A Practice Tutorial
Chapter 1
Chapter 1
To stop the scatterplot 3-D, click again in the scatterplot 3-D frame.
19
1 Introducing JMP
Press the Shift key, and give the plot a push with the cursor.
Introducing JMP
A Practice Tutorial
Chapter 2
Creating a JMP Data Table
Entering and Plotting Data
This lesson evaluates a new drug developed to lower blood pressure. Data were recorded over a
six-month period for the following treatment groups:
300 mg dose
450 mg dose
placebo
control
Following are the mean monthly blood pressure for each group, recorded in a journal. This lesson
shows how to enter data values into the data table and to create a single neat and informative line chart
that shows the study results.
Objectives
Create rows and columns in a data table, one at a time and in groups.
Enter data into JMP.
Create a chart using the Chart command.
Rescale axes in a plot.
Animate a plot.
Control
Placebo
300mg
450mg
March
165
166
168
April
May
June
162
164
162
163
159
158
161
165
161
158
163
153
151
July
August
166
163
158
158
160
157
148
150
Contents
Starting a JMP Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Creating Rows and Columns in a JMP Data Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Add Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Set Column Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Add Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Entering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Plotting Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Document the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 2
23
The data values for this project are blood pressure statistics collected over six months and recorded in a
notebook page as shown in Figure 2.1.
Figure 2.1 Notebook of Raw Study Data
Control
Placebo
300mg
450mg
March
165
166
168
April
May
June
162
164
162
163
159
158
161
165
161
158
163
153
151
July
August
166
163
158
158
160
157
148
150
2 Data Tables
24
Chapter 2
Add Columns
First create the number of rows and columns that are needed.
Select Cols > Add Multiple Columns, which prompts for the number of columns to add, where to
add them, and which type of columns to add.
Chapter 2
25
Columns can have different characteristics. By default, they contain numeric data. However, the values
for month in this example are character values. To change the Month column from numeric to
character:
Highlight the column by clicking either in the area at the top of the column or the area beside its
name in the columns panel.
Select the Cols > Column Info to display the window in Figure 2.2.
Change Month to a character variable, as shown in Figure 2.2, by clicking the box beside Data Type.
The Column Info window is also used to change other column characteristics and to access the JMP
formula editor for computing column values.
Figure 2.2 Change Data Type
Add Rows
Now add new rows to the table:
Choose Rows > Add Rows.
Specify six new rows.
Alternatively, double-click anywhere in the body of the table to automatically fill it with new rows up
through the position of the cursor.
Select File > Save to name the table BP Study.jmp and save it.
The data table is now ready to hold data values. To summarize the table evolution so far, you:
Began with a new untitled table.
Added enough rows and columns to accommodate the raw data.
Tailored the characteristics of the table by giving the table and columns descriptive names.
Changed the data type of the Month column to accept character values.
Entering Data
To enter data into the data table, type values into their appropriate table cells.
Type the values from the study journal (Figure 2.1) into the BP Study.jmp table as shown here.
2 Data Tables
26
Chapter 2
Click to highlight
Begin typing
Plotting Data
When working with the Analyze and Graph menu commands, you tell JMP which columns to work
with and what to do with them. This section shows how to plot the months across the horizontal (x)
axis and the columns of blood pressure statistics for each treatment group overlaid on the vertical (y)
axis.
Select Graph > Chart.
The window in Figure 2.3 appears.
Assign x and y roles and choose the type of chart. This example specification is for a bar chart, with data
(as opposed to statistics) as chart points.
Assure that the default choice Vertical is selected from the chart type drop-down list.
Select one continuous variable in the list.
Select the Shift and down arrow keys to select the other continuous variables.
Chapter 2
27
2 Data Tables
Choose chart
type
Choose y-axis
information
28
Chapter 2
Click OK. The properties icon ( ) now appears next to the column name in the data tables column
panel, indicating the column contains a property.
In the analysis report, click the red triangle and select Script > Redo Analysis.
Rescale the Plot Axis
By default, y-axis scaling begins at zero and the overlay chart looks like the one shown here. But, to
present easy-to-read information, the y-axis needs to be rescaled and the chart needs labels.
Double-click the y-axis area, which accesses the Axis Specification window (Figure 2.5).
This window gives you the ability to:
Set the minimum and maximum of the axis scale.
Specify the tick mark increment.
Request minor tick marks.
Request grid lines at major or minor tick marks.
Format numeric axes.
Use either a linear or log-based scale.
Chapter 2
29
In this example, the plotted values range from about 145 to 175.
Enter these figures into the Axis Specification window for Minimum and Maximum.
Change the increment for the tick marks from 50 to 1 by entering a 1 in the Increment box.
Click OK.
Tip: The magnifier tool (
), found in the Tools menu and the cursor tool bar, can also be used to
change the scale of graphs. Drag the magnifier diagonally across the points of interest to see the chart
automatically adjust. Double-click the plot frame to reset the plot to its original scale.
Click the edge of the graph and drag it to the right to increase its width.
Change the name of the axis from Y to Blood Pressure:
Place cursor over Analysis Report until cursor becomes an I-bar.
Click for a text box and enter Blood Pressure.
2 Data Tables
30
Chapter 2
Figure 2.6 Bar Chart with Modified Y-Axis, Titles, and Footnotes
Chapter Summary
A study was done to evaluate the effect of a new drug on blood pressure. To complete this analysis, you:
Used the New Data Table command in the File menu to create a new JMP table.
Created the appropriate number of rows and columns for the data.
Typed the data into the empty data grid.
Used the Chart command in the Graph menu to request a bar chart of blood pressure measures over
time.
Ordered the values in chronological order so they would appear properly in the chart.
Tailored the chart with a specific axis scale and axis name, and added a plot title and footnote with
the annotate tool.
Chapter 3
Summarizing Data
Look Closely at the Data
The hot dog is a questionable item on a school cafeteria menu because of its reputation as an unhealthy
food, possibly classified in the junk food category. Many students feel this is unpatriotic and are upset.
This lesson examines the hot dog as a menu item, but not before looking into the multitude of brands
available. The data shows information about cost, nutritional ingredients of concern, and taste preference for 54 hot dog brands. This information is sufficient to provide a summary of hot dog statistics
and to identify the brands that are:
most nutritious
least costly
best tasting
The taste, cost, and nutritional variables used in this chapter are an enhancement of data from Moore,
D. S., and McCabe G. P., (1989), Introduction to the Practice of Statistics, and Consumer Reports (1986).
The brand names were changed to fictional names, and the taste preference labels correspond to a taste
preference scale.
Objectives
Find and mark subgroups of data
Produce scatterplots using the Fit Y by X command and use them as discovery tools
Label individual points in plots
Produce and plot summary statistics
Contents
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Grouping Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Creating Statistics for Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Charting Statistics from Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
Charting Statistics for Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Finding a Subgroup with Multiple Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Comparative Scatterplots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
What Has Been Discovered? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Finding the Best Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 3
Summarizing Data
Look Before You Leap
33
When you installed JMP, a folder named Sample Data was also installed. In that folder is a file named
Hot Dogs.jmp. Open the Hot Dogs.jmp file to see the data shown in Figure 3.1.
Figure 3.1 Hot Dogs Data Table
Examine the resulting report to see the distributions and levels of each variable.
3 Summarizing Data
34
Summarizing Data
Grouping Data
Chapter 3
Grouping Data
Of course, health is a primary concern of a school cafeteria. It is interesting to see if type of hot dog
plays a role in healthfulness. In particular:
Which type of hot dog has the fewest calories?
Is the amount of sodium different in the three types of hot dogs?
Which hot dogs have the highest protein content?
Which hot dogs taste good and are healthy?
To address these issues, the data need to be grouped into hot dog type and taste
preference categories with summary statistics computed for each group. The
Summary command in the Tables menu groups data and computes summary
statistics.
The Summary command creates a summary table. This table summarizes columns from the active data table, called its source table. The Hot Dogs.jmp table
is the source table in this example. A summary table has a single row for each
level (value) of a specified variable.
Select Tables > Summary.
Select Type and click the Group button to see the window as shown in Figure 3.3.
Click OK.
Figure 3.3 Summary Window
Chapter 3
Summarizing Data
Grouping Data
35
A summary table is not independent of its source table. It has these characteristics:
When rows are highlighted in the summary table, their corresponding rows highlight in the source
table.
The summary table is not saved when closed. Select File > Save As to specify a name and location
for the table.
Figure 3.4 Summary Table for Type of Hot Dog
3 Summarizing Data
The Hot Dogs By (Type) summary table (Figure 3.4) appears in a new window. The Type column lists
hot dog type and the NRows column gives the frequency of each type in the source table.
36
Summarizing Data
Grouping Data
Chapter 3
The new columns of statistics are displayed in the Hot Dogs By (Type) table (top table of Figure 3.6).
Repeat the previous steps to create a second summary table of Hot Dogs by Taste to look at health
factors and hot dog tastiness. The Hot Dogs By (Taste) summary table shows average calories,
sodium, and protein-to-fat ratio for each taste category (bottom table of Figure 3.6).
Figure 3.6 Summary Statistics for Hot Dog Groups
Chapter 3
Summarizing Data
Grouping Data
37
Click to
choose a
statistic
Remove
Click to add
highlighted
columns as
the x-variable
Overlay
checkmark
Change to
Horizontal
3 Summarizing Data
38
Summarizing Data
Grouping Data
Chapter 3
Select the Graph > Chart command, with both grouping variables as Categories (X) and the Nrows
column with the y role.
Select Statistics > Data.
Click OK.
This produces the chart shown here. In this
example, there are side-by-side charts that
show the frequency for each taste within each
type of hot dog.
Note: Graph > Chart can also be used to
directly chart data grouped by two variables;
the data doesnt have to be grouped first by
Tables > Summary.
To label each bar with the frequency it represents:
Label > Label by Value is selected by
default. Right-click the bars and select
Label > Show Labels.
The chart shows that the poultry hot dogs
excelled in nutrition factors and that most
people find them medium-tasting. However,
because the sodium content appears slightly
high in some poultry brands, more investigation is needed.
Chapter 3
Summarizing Data
Finding a Subgroup with Multiple Characteristics
39
Continue the search for the ideal hot dog. Add special markers to the summary table, Hot Dogs by
(Type, Taste), that identify each type of hot dog.
In the Hot Dogs by (Type, Taste) summary table:
Shift-click or click and drag over the medium and scrumptious beef rows (2 and 3) to select them.
Use the Markers command in the Rows menu to assign them the Z marker.
Deselect those rows.
Shift-click or drag the medium and scrumptious meat rows (5 and 6) to select them, assign them the
Y marker, and deselect them.
Shift-click or drag the medium and scrumptious poultry rows (8 and 9), assign them the X marker,
and deselect them.
The type-taste summary table now looks like the one shown here, and the corresponding rows in the
Hot Dogs.jmp table are marked likewise.
Comparative Scatterplots
Now, examine the relevant variables with scatterplots to identify specific points (brands). The Fit Y by
X command in the Analyze menu produces scatterplots when both the x and y are continuous numeric
variables.
The following scatterplots graphically show the relationship of cost and the nutritional factors together.
Click the Hot Dogs.jmp source table to make it active.
Select Analyze > Fit Y by X.
Make your selections in the window, giving $/lb Protein the y role and both $/oz and Protein/Fat
the x role.
Click OK.
This produces $/lb Protein by $/oz and a $/lb Protein by Protein/Fat scatterplots.
Click the red triangle icon and select Group By.
Choose Type as the grouping variable from the list of variables in the Grouping window.
3 Summarizing Data
40
Summarizing Data
Finding a Subgroup with Multiple Characteristics
Chapter 3
Press the Alt key (Alt-Shift on Linux and Option on Macintosh) and drag the brush in the lower left
quadrant of the Calories by Sodium scatterplot, as shown in Figure 3.9.
These points represent brands with both low sodium and low calories. The highlighted points of these
healthiest brands also highlight in the other scatterplots.
Figure 3.9 Select Low Sodium and Low Calorie Brands
Chapter 3
Summarizing Data
Finding a Subgroup with Multiple Characteristics
41
The costs of meat and beef brands range from low to high. However, it is not surprising to see the tight
low-cost cluster of poultry brands (X-marked) at the lower left of the $/lb Protein by $/oz scatterplot.
The highlighted points include poultry brands, one meat brand, and one beef brand. The selected beef
point (Z-marked) is in the upper-right corner of the plot, which places it in the most expensive category. The single meat point (Y-marked) is more costly than the poultry brands but less than the beef
brands.
A bigger surprise appears in the $/lb Protein by Protein/Fat scatterplot. As the protein-to-fat ratio
increases, the cost per pound of protein stays about the same. Further, the poultry brands not only cost
the least but also contain the most protein. Most of the selected points are in the three highest protein
categories.
The density ellipses on the Calories by Sodium scatterplot show clearly that the poultry brands have
about the same range of sodium content as the meat and beef brands, but many poultry brands have
fewer calories.
3 Summarizing Data
42
Summarizing Data
Chapter Summary
Chapter 3
As a final step, use Analyze > Fit Y by X to look again at the two scatterplots that compare costs.
Select Analyze > Fit Y by X.
Assign $/lb Protein as Y.
Assign both $/oz and Protein/Fat as X.
Click OK.
The plot in Figure 3.11 shows that the Estate Chicken brand is the most economical of the three
labeled brands (showing $/oz as continuous). The plot to the right indicates that the Calorie-less Turkey brand is in the group with the highest proportion of protein (showing Protein/Fat as nominal).
Figure 3.11 Winning Hot Dog Brands.
Chapter Summary
This lesson examined different hot dog brands for a cafeteria menu. A JMP table has data for 54 brands
of hot dog showing type of hot dog, taste preference, nutritional factors, and cost factors.
To find the ideal hot dog, we did the following:
Chapter 3
Summarizing Data
Chapter Summary
43
Used Graph > Chart to chart summary statistics and identify the subset of hot dog brands that are
both the most nutritious and the best tasting.
Assigned different markers to each type of hot dog.
Used Analyze > Fit Y by X to see scatterplots that compare cost factors and nutritional factors.
Selected the points representing the lowest cost, most nutritious, and used the Label/Unlabel command in the Rows menu to identify the Calorie-less Turkey brand as a possible cafeteria hot dog.
See the JMP User Guide for details about the Summary command. For scatterplot and bar chart examples, see the JMP Statistics and Graphics Guide.
3 Summarizing Data
Created a summary table that group the data by hot dog type and by taste preference within each
hot dog type.
Chapter 4
Looking at Distributions
Histograms, moments, quantiles, and proportions
The students in a local school are participating in a health study. This lesson summarizes basic information about the students for the school systems health care specialists. The data collected include age,
sex, weight, and height.
To document the sample of participating students and identify any students with unusual characteristics who may need special attention, we will need to view summaries of the data. This lesson produces
reports with graphs and short, straightforward explanations.
Objectives
Use the distribution analysis to explore several variables at once.
Produce reports of moments, quantiles, frequencies, and proportions.
Use the formula editor to compute a columns value.
Create a subset of a data table.
Contents
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Displaying Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Understanding Histograms of Nominal and Ordinal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Understanding Histograms of Continuous Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Learning About Report Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Reports for Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Frequency Table for Ordinal or Nominal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Adding a Computed Column. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Creating Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
Chapter 4
Looking at Distributions
Look Before You Leap
47
The first step in this analysis is to become familiar with the data in the Big Class.jmp file. Looking at
the information in the JMP data table helps us decide which summary charts and tables to use in the
health report.
Open the Big Class.jmp data table to see the data table shown in Figure 4.1.
Figure 4.1 Big Class.jmp Data Table
The file contains the name, age, sex, height, and weight for each student participating in the health
study. The data table is in order by age, and sex is ordered within each age group.
Even though there are only five columns of information, these variables address the following questions:
How many boys and how many girls are there?
How old are they?
What is the average height and weight of the students?
Are there any students drastically younger or older than the average age?
Are there any students whose height or weight might signal the need for special attention?
Displaying Distributions
To summarize the data:
Select the Distribution command from the Analyze menu.
In the window that appears, select the age and sex columns as Y, Columns.
4 Distributions
48
Looking at Distributions
Displaying Distributions
Chapter 4
Click OK.
The frequencies table that appears shows that the class of 40 contains 18 girls and 22 boys.
Chapter 4
Looking at Distributions
Displaying Distributions
49
In histograms of ordinal and nominal variables, you can display a mosaic plot by clicking
the red triangle icon in the variables title bar and selecting Mosiac Plot. A mosaic plot
(shown in the figure to the right) visualizes the proportion of each ordinal or nominal
level within the sample. It has a section for each level of the variable, where the size of the
section is proportional to the corresponding groups size. Think of a mosaic plot as a bar
chart with its bars stacked end to end.
Both height and weight appear to have approximately normal (bell-shaped) distributions, but notice
the extremely high weight value. It will be examined more closely later.
It is important to present data in the best possible form. Sometimes it is worthwhile to experiment with
the shape of a histogram by changing the number of bars or altering their arrangement on the axis.
To adjust the histogram bars:
Select the hand from the graph cursor tool bar.
Position the hand on the bars and press the mouse button to grab the plot.
4 Distributions
50
Looking at Distributions
Displaying Distributions
Chapter 4
Move the hand to the left to increase the bar width and combine intervals (see Graphs and Charts,
p. 14). The number of bars decreases as the bar size increases.
Move the hand to the right to decrease the bar width, showing more bars.
Move the hand up or down to change the boundaries of the bins. The height of each bar adjusts
according to the new number of observations within each bin.
Using Outlier Box Plots
Available by default in histograms with continuous variables, the outlier box plot (see Figure 4.4) is a
schematic that shows the sample distribution and allows identification of points with extreme values,
sometimes called outliers. You can display and hide an outlier box plot by clicking the red triangle icon
in the variables title bar and selecting Outlier Box Plot.
The ends of the box are the 25th and 75th quantiles, also called the quartiles. The difference between
the quartiles is the interquartile range. The line across the middle of the box identifies the median sample value.
The lines extending from each end of the box are sometimes called whiskers. The whiskers extend from
the ends of the box to the outermost data points that fall within the distance computed as quartile
1.5*(interquartile range). Points beyond the whiskers indicate extreme values that are possible
outliers. To label a point, click the point to highlight it, and then select Rows > Label/Unlabel.
The red bracket along the edge of the box identifies the shortest half, which is the most dense 50% of
the observations.
Figure 4.4 Outlier Box Plot
25th percentile
possible outliers
Interquartile
range
75th percentile
shortest half
Chapter 4
Looking at Distributions
Learning About Report Tables
51
means diamond
on the Macintosh)
4 Distributions
Looking at the quantile box plot and means rectangle together helps see if data are distributed normally,
as shown in Figure 4.5. If data are distributed normally (bell shaped), then the 50th quantile and the
mean are the same and other quantiles show symmetrically above and below them.
52
Looking at Distributions
Adding a Computed Column
Chapter 4
Chapter 4
Looking at Distributions
Adding a Computed Column
53
Construct the formula that calculates values for the ratio column as follows:
Highlight the empty term in the formula and select weight from the list of column names in the
upper-left corner of the formula editor.
Press the divide () key on the formula editor keypad.
With the empty denominator term highlighted, select height from the list of column names.
When the formula is complete, click Apply or OK on the formula editor, or just close its window.
The new column called ratio is now in the Big Class data table as shown here. Its values are the computed weight-to-height ratio for each student.
4 Distributions
54
Looking at Distributions
Adding a Computed Column
Chapter 4
The highlighted bars in the histogram represent a ratio either greater than or equal to 2.25 or less than
1.5. The corresponding points automatically highlight in the data table and in all other reports generated from the Big Class data table.
Chapter 4
Looking at Distributions
Adding a Computed Column
55
4 Distributions
Creating Subsets
Looking in the Big Class data table allows examination of the selected rows, but scrolling through a
large data grid can be tedious. For the final report to the health researchers, include a separate list containing only the highlighted studentsthose with extreme values. To do this, use Tables menu commands to create new data tables or modify existing tables.
Select Tables > Subset or click the Big Class red triangle icon and select Tables > Subset.
Click OK to accept the default choices presented in the window.
This creates a new data table that has only the selected rows and columns from the active data table.
The new data table, shown in Figure 4.8, contains only the students that have extreme weight-to-height
ratios. By default, the table is named Subset of Big Class. Change the name by clicking the existing
name (Subset of Big Class) in the panel located on the top left side of the window. The table can be
saved, exported for use in another application, or printed.
Figure 4.8 Data Table Containing a Selected Subset
56
Looking at Distributions
Chapter Summary
Chapter 4
Chapter Summary
In this chapter, the demographic and vital data of students participating in a health study were summarized. The profile was completed using the Distribution command and the data management features
of the JMP data table.
The Distribution command displayed histograms and box plots or stacked (mosaic) bar charts for each
variable assigned the role of response variable (y). Using display and text report options to look more
closely at the data, the following actions were completed:
Adjusted the number of bars and the scale of the histograms
Produced supporting statistical reports showing moments and quantiles of numeric variables and
frequencies and proportions of nominal and ordinal variables
Created a new column in the data table computed as a function of existing columns
Highlighted histogram bars to identify a subset of rows in the data table
Created a new data table from a subset of highlighted rows
Graphs and text reports can be printed directly from JMP. Graphs and reports can be copied to a JMP
journal or into other applications to complete a report for the school system health care specialists.
See the chapter Univariate Analysis in the JMP Statistics and Graphics Guide for more information
about distributions.
Chapter 5
Comparing Group Means
Testing Differences
The company has decided to replace all computer keyboards with the brand that produces the fastest
accurate typing. The employees participated in a study to help decide what kind of keyboards to buy.
The company selected three different brands of keyboards to test. These keyboards were randomly
assigned to three groups of employees with comparable typing skills. The employees completed typing
tests and recorded their words-per-minute scores.
This lesson finds out if the typing scores are significantly better on any one brand of keyboard than on
the others.
Objectives
Use the Fit Y by X command to produce plots and analyses appropriate for a one-way analysis of
variance.
Use JMPs interactive capabilities to examine differences among groups.
Produce text reports to display differences among groups.
Contents
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Graphical Display of Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Choose Variable Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Show Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Fit Means Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61
Fit Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
Comparison Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Quantify Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Analysis of Variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Mean Estimates and Statistical Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Chapter 5
59
The first step is to become familiar with the data. The typing test scores are in a JMP file so that they
can be reviewed and the kind of analysis determined.
When you installed JMP, a folder named Sample Data was also installed. In that folder is a file
named Typing Data.jmp. Open the file Typing Data.jmp.
The Typing Data table appears in the form of a data grid, as shown here.
The data table has columns named brand and speed. The modeling type for each column shows to the
left of each column name in the columns panel. The character variable brand has nominal ( ) values
and the numeric variable speed has continuous ( ) values.
There are 17 rows that represent typing scores for 17 employees. However, the number of participants
in the groups differs because some of the scheduled participants did not show up for the study. Perhaps
other statistics for the groups differ also. In particular:
Is the mean (average) typing speed the same for each brand?
Do any one of the three brands of keyboard stand out from the others?
Does it make a difference as to which brand the employees use?
5 Group Means
60
Chapter 5
Selecting Fit Y by X and completing the window produces a statistical analysis appropriate for the variable roles (x and y) and the modeling type (continuous and nominal or ordinal) of each variable.
Y, Response identifies a response (dependent) variable.
X, Factor identifies a classification (independent) variable.
The next step is to choose an analysis that investigates if there is a statistical difference between the
group mean values.
Show Points
Each of the typing test scores is plotted for each brand of keyboard. Note that the distance between tick
marks on the brand axis is proportional to the sample size of each group. The mean typing score for the
total sample is shown as a horizontal line across the plot.
Chapter 5
It is easy to see at a glance that most participants who used the SPEEDYTYPE machines typed faster
than the others.
5 Group Means
61
62
Chapter 5
group mean
The mean scores of the REGAL and WORD-O-MATIC keyboards appear to be nearly the same, but note
that the SPEEDYTYPE mean is much higher (Figure 5.3).
Chapter 5
63
The next logical step is to check the distribution of points within each group. This gives a better idea of
the spread of the values and shows the distance of extreme values from the center of the data.
Click the red triangle icon and select Quantiles.
When you select the Quantiles command, JMP automatically overlays a quantile box plot on each
group of typing scores, as shown in Figure 5.5. JMP also displays the report in Figure 5.5, which lists
the standard percentiles for each keyboard. The median (50th percentile) is the typing speed that
divides the sample in half. This means that 50% of the employees had speeds greater than the median,
and the other half had lower speeds.
Figure 5.5 Fit Quantiles Option
Figure 5.6 illustrates the quantile box plot. The median, or 50th quantile, shows as a line in the body of
the box. The top and bottom of the box represent the 75th and 25th quantiles, also called the upper
and lower quantiles. The box encompasses the interquantile range of the sample data. The 10th and
90th quantiles show as lines above and below each box.
Looking at the quantile box plot and the means diamond together helps show if data are distributed
normally within a group. If data are normally distributed (bell shaped), the 50th percentile and the
mean are the same and the other quantiles are arranged symmetrically above and below the median.
Figure 5.6 Quantiles Box Plot
sample mean
group mean
90th percentile
75th percentile
50th percentile
25th percentile
10th percentile
5 Group Means
Fit Quantiles
64
Chapter 5
The quantile box plots (Figure 5.5) show a difference in variation of scores across the three groups. The
scores in the REGAL group cluster tightly around the mean score but the WORD-O-MATIC scores show
much more variation. However, even with this variation among the groups, the SPEEDYTYPE brand
still appears to promote the best performance.
Comparison Circles
To complete the typing data inspection:
Click the red triangle icon and choose
Compare Means > All Pairs, Tukey HSD.
This option produces statistical reports (discussed later) and automatically draws a set of
comparison circles to the right of the plot that provides a graphical test of whether the mean typing
scores are statistically different. Comparison circles for the three word-processor groups are
shown in Figure 5.7.
The center of each circle is aligned with the mean
of the group it represents. For the Students t-test,
the diameter of each circle spans the 95% confidence interval for each group. Whenever two circles intersect, the confidence intervals of the two
means overlap, suggesting that the means may not be significantly different. Whenever two circles do
not intersect, the group means they represent are significantly different.
Click the SPEEDYTYPE comparison circle.
This graphically illustrates that the SPEEDYTYPE machine is statistically better than the other
machines. The comparison circles highlight to show the statistical magnitude of the difference between
typing scores. Circles for groups that are statistically the same have the same color.
Figure 5.7 Comparison Circles
The comparison circle for the SPEEDYTYPE brand does not intersect with either of the other two. The
REGAL and WORD-O-MATIC brands are statistically slower than SPEEDYTYPE but do not appear dif-
Chapter 5
65
Quantify Results
Now, examine the report beneath the plot that consists of several tables. The Summary of Fit table,
shown in Figure 5.8, summarizes the typing data distribution with these statistics:
Rsquare (R2) quantifies the proportion of total variation in the typing scores resulting from different keyboards rather than from different people.
Rsquare Adj
parameters.
Root Mean Square Error (RMSE) is a measure of the variation in the typing scores that can be
attributed to different people rather than to different machines.
Mean of Response
Observations
Analysis of Variance
When you select the Means/Anova command from the red triangle icon in the title bar, JMP gives you
a standard analysis of variance table. If there are only two group levels, the report also includes a t-test
table.
Note that the value of the F-probability (Prob>F) for the Analysis of Variance is 0.0004. This implies
that differences as great as seen in this typing trial are expected only four times in 10,000 similar trials if
the keyboards did not really promote different typing performances.
The Analysis of Variance table has the following information:
Source
(SS for short) identifies the sources of variation in the typing scores.
C. Total is the corrected total SS. It divides (partitions) into the SS attributable to brand and the
SS for Error. The brand SS is the variation in the typing scores explained by the analysis of variance
model, that hypothesizes the keyboards are different. The Error SS is the remaining or unexplained
variation.
5 Group Means
ferent from each other. A later section, Mean Estimates and Statistical Comparisons, p. 66, discusses
the multiple comparison tests the comparison circles represent.
66
Mean Square
F Ratio
Chapter 5
Prob > F is the probability of obtaining a greater F-value if the mean typing scores for the keyboards differed only because different people were typing on them rather than because the keyboards promoted different scores in any way.
Number
Lower 95%
Upper 95%
When you select the Compare Means command from the red triangle icon in the title bar, JMP gives
several multiple comparison options to statistically compare pairs of groups. This example uses the All
Pairs, Tukey HSD option, which performs a statistical means comparison for the three pairs of means
using the Tukey-Kramer HSD (honestly significant difference) test (Tukey 1953, Kramer 1956). This
means comparison method compares the actual difference between group means with the difference
that would be significantly different. The difference needed for statistical significance is called the LSD
(least significant difference).
The graphical results show as the comparison circles previously seen in Figure 5.7. The circles centers
represent the actual difference in the group means. The corresponding report is the Means Comparisons table (Figure 5.9), which shows the actual absolute difference between each mean and the LSD.
The top half of the report gives information based on a Students t comparison of each pair. The bot-
Chapter 5
67
Chapter Summary
In this chapter, the difference in mean typing scores for three brands of keyboard was summarized using
the Fit Y by X command in the Analyze menu. This command was also used to:
Plot the typing scores for the three brands of keyboard.
Overlay a means diamond on each group of typing scores to compare the means of each group.
Overlay a quantile box plot on each group of typing scores to compare the shape of the distribution
of scores in each group.
Produce comparison circles to visualize the difference in mean typing scores.
Compute and display a one-way analysis of variance table, which confirmed that at least one pair of
means is statistically different.
Display a table of the group means and standard errors.
Display a table showing the multiple comparison statistical test results for group means.
Using the selection tool ( ) from the Tools menu, the graphs or tables can be copied and prepared in
a report for presentation. The analysis concludes that, in this typing trial, the SPEEDYTYPE keyboard
produced significantly higher scores than either of the other two brands.
See the chapter Oneway Layout of the JMP Statistics and Graphics Guide for a complete discussion of
one-way analysis of variance.
5 Group Means
tom half shows the results of the Tukey-Kramer multiple comparison tests. Pairs with a positive value
are significantly different. The Means Comparison table confirms the visual results in Figure 5.7.
Chapter 6
Analyzing Categorical Data
Comparing Proportions
Survey data are frequently categorical data rather than measurement data. Analysis of categorical data
begins by simply counting the number of responses in categories and subcategories. Counting is easy,
but interpreting the relationship between categories based on counts is more complex. It requires computing probabilities and evaluating the likelihood of these probabilities compared to expectations.
For example, an American automobile manufacturerfeeling the pinch of competition from foreign
auto salesneeds a market analysis before proceeding with a multimillion-dollar advertising campaign.
A random sample of people is surveyed. The auto manufacturer wants to know each participants age,
sex, marital status, and auto information. The auto information consists of the manufacturing country,
the cars size, and the cars type, and whether it is a family, work, or sporty car. This information may
provide the advertising experts with direction for the upcoming advertising campaign.
Who buys what?
Objectives
Use the Fit Y by X command to compare two variables consisting of categorical data.
Use the formula editor to re-code a categorical variable as a numeric variable.
Produce and examine graphs and statistics appropriate for the comparison of proportions such as
Chi-squared tests and mosaic plots.
Contents
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Open a Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Address the Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Modify the Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Contingency Table Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Cast Variables Into Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Contingency Table Mosaic Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 6
71
The first step is to become familiar with the data. Begin by reviewing the data to determine the best
way to proceed with the market analysis.
) modeling types.
6 Categorical Data
72
Chapter 6
The distribution of a variable and its corresponding quantiles display a good way to form sample
groups. Use the distribution of the age column to find a reasonable value of age that divides the sample
into two groups.
Select Analyze > Distribution.
When the Distribution window appears, select age as the analysis column (Y, Column) and click
OK.
JMP displays a histogram with an accompanying outlier box plot, Quantiles table, and Moments table.
The Quantiles table, shown here, identifies 30 as the median age.
The next step is to create a new column whose values identify whether a subjects age is greater than 30,
or is less than or equal to 30.
Select Cols > New Column to display the New Column window, shown in Figure 6.1, which is
used to define column characteristics.
Data Type, Modeling Type, and Format options define the new columns characteristics. Enter charac-
Chapter 6
73
6 Categorical Data
Suppose 0 represents ages greater than the median (30) and 1 represents the ages less than or equal to
the median. To create a formula that divides the sample into two groups, follow these steps:
Click Conditional in the function selector list and select the If function.
74
Chapter 6
.
Highlight the left side of the comparison clause and click age in the Table Columns list.
Double-click the right side of the comparison clause to obtain a text entry box.
Enter 30 for the numeric comparison.
Double-click the term denoted then clause.
Enter 1 (in double quotes because this column is a character variable).
Double-click the term denoted else clause.
Enter 0 (with double quotes).
The complete equation should look like the one shown here.
Click Apply, OK, or the formula editors close box to fill the new column with calculated values.
Tip: Instead of using the buttons in the formula editor, you can double-click the outermost nesting box
to create a single text entry box and enter if(age<=30, 1, 0). Then, press Enter (or Return) or click
outside the text box, and the formula appears in formatted form.
Chapter 6
75
The nominal age grouping variable shows the relationship of age to the
other nominal variables using contingency tables. To look at combinations
of two variables:
Choose Analyze > Fit Y by X.
JMP does the statistical analysis appropriate for a variables modeling types
and role assignments.
6 Categorical Data
76
Chapter 6
A mosaic chart has side-by-side divided bars for each level of its x variable. The bars are divided into
segments proportional to each discrete level (value) of the y variable. The mosaic chart in Figure 6.4
shows the relationship of marital status to the manufacturing country.
The width of each bar is proportional to the sample size. When the lines dividing the bars align horizontally, the response proportions are the same. When the lines are far apart, the response rates of the
samples might be statistically different.
Figure 6.4 Mosaic Plot Axes
response rates
proportion of married people
with Japanese cars
The country by age group mosaic plot shows that the proportion of American car owners 30 years or
over is only slightly greater than the proportion of American car owners under age 30.
The most significant relationship is seen between marital status and country. The mosaic plot, shown
previously in Figure 6.4, and its supporting Tests table (Figure 6.5), suggest that married people are
more likely than single people to own American cars.
The Likelihood Ratio and Pearson Chi-squared tests evaluate the relationship between an automobiles
country of manufacture and the marital status of owner. If no relationship exists between country and
marital status, a smaller Chi-squared value than the one computed in this survey would occur only
seven times in 100 similar surveys.
Figure 6.5 Table of Statistical Tests for Marital Status By Country
Chapter 6
77
Scroll the report to see the relationship between size of car and each x variable (sex, marital status,
and age).
The three mosaic plots indicate no relationship between car size and gender, marital status, or age
group. This is seen numerically by looking at the Contingency Tables and the Tests tables beneath each
of the mosaic plots (see Figure 6.6).
Note that by default, Col% and Row% also appear in the Contingency Tables. Right-click (Ctrl-click
on the Macintosh) the table to access the Columns menu to turn columns on and off.
The Chi-squared values support the hypothesis that the purchase of large, medium, and small cars is
not significantly different across the sex, marital status, and age group factor levels. The Chi-squared
probabilities range from 0.06 to 0.30, so you should expect smaller Chi-squared values to occur six to
30 times in 100 similar surveys.
It probably makes no difference what size cars appear in advertisements.
Figure 6.6 Tables for Relationships with Size of Car
The market survey categorizes cars based on both size and type, where a cars type is work, sporty, or
family.
Scroll to see the plots that show the relationship between type of car and the three x variables.
The mosaic plots in Figure 6.7 show that the type of car varies for levels of marital status and age
group. As perhaps expected, many of the cars owned by married people are family automobiles, while
the largest proportion of cars owned by single people are sporty cars.
6 Categorical Data
These statistical results reveal that American auto manufacturers might want to direct advertising plans
toward married couples.
78
Chapter 6
Figure 6.7 Reports for Type of Car and Marital Status and Age Group
So, American automobile manufacturers may choose to focus advertisements toward married couples
buying family-type automobiles.
It follows logically that a relationship between age group and type of car also exists because older people
are more likely to be married. The graph to the right in Figure 6.7 shows graphically that the proportion of people over 30 years old who own family cars is much greater than those under 30. The small
Chi-squared values support the significant difference in proportions. The Chi-squared values of 0.0005
mean that proportions as varied as these are expected to occur only five times in 1,000 similar surveys.
Chapter Summary
This chapter looked at relationships between categorical variables obtained from a survey. The survey
recorded age, sex, marital status, and information about the type of automobile owned by a random
sample of people in the same geographical area. The auto information included manufacturing country,
size, and type of car. Car types were classified as work, sporty, and family. The question Is the size of
car, type of car, or manufacturing country related to the age, gender, or marital status of the owner?
was investigated.
The Fit Y by X command produced nine mosaic charts with supporting statistical summaries that
show:
No relationship between either sex or age and manufacturing country.
Chapter 6
79
6 Categorical Data
A significant relationship between marital status and manufacturing country with married people
more likely to own American cars than single people.
Chapter 7
Regression and Curve Fitting
Visualizing Relationships
This lesson demonstrates the interactive regression capabilities of JMP.
The data is from Eppright et al (1972) as reported in Eubank (1988, p. 272). The study subjects are
young males. The variables in the data table are age (in months) and the ratio of weight to height. A
third variable classifies the subjects into two groups based on age. The goal is to describe and model the
growth pattern of subjects for the age range given in the data table.
Objectives
Use the Fit Y by X command to fit least-squares lines to continuous data.
Fit polynomial curves and cubic splines to the data set and explore their goodness of fit.
Journal and save analysis results.
Use the Group By command to fit different lines to certain groups of data.
Contents
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Open a JMP File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Select an Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Choose Variable Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Fitting Models to Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Fitting the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
Fitting a Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
Understanding the Summary of Fit Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Understanding the Analysis of Variance Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Understanding the Parameter Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Excluding Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Journaling JMP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Examining a Polynomial Fit (Linear Regression) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Fitting a Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
Fitting By Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Chapter 7
83
The first step is to become familiar with the data. Begin by reviewing the data to determine the best
way to proceed with the regression.
Select an Analysis
To fit regression curves:
Select Analyze > Fit Y by X.
The Fit Y by X analysis does four kinds of analyses, depending on the modeling type of the variable:
Regression analysis when both x and y have continuous values, as in this example.
Categorical analysis when both x and y have nominal or ordinal values.
Analysis of variance when x is nominal and y has continuous values.
Logistic regression when x is continuous and y has nominal or ordinal values.
7 Regression
84
Chapter 7
Chapter 7
Fitting
Options
on Win-
Fitting a Line
To fit a simple regression line through the data points:
Click the red triangle icon in the title bar and select Fit Line.
The regression line minimizes the sum of squared distances from each point to the line of fit. Because
of this property, it is sometimes referred to as the line of best fit.
7 Regression
When clicked, the red triangle icon on the scatterplot title bar
reveals a variety of fitting commands and additional display
options. Options include Show Points, fitting commands, and
other features. The Show Points command alternately hides or
displays the points in the plot. Fitting options can be as simple
as fitting a straight line or involved as drawing density ellipses.
Fitting options can be used repeatedly to overlay different fits
on the same scatterplot.
85
86
Chapter 7
Rsquare (R2) quantifies the proportion of total variation in the growth ratios accounted for by fitting the regression line.
Rsquare Adj
parameters.
Root Mean Square Error (RMSE) is a measure of the variation in the ratio values that is attributable to different people rather than to different ages.
Mean of Response
Observations
Note: The first line of the report is the regression equation, which is editable.
Chapter 7
87
In addition to producing a Summary of Fit table, clicking the red triangle icon and selecting Fit Line
produces an Analysis of Variance table.
The elements of the table give an indication of how well the straight line fits the data points:
Source
DF
identifies the sources of variation in the growth ratio values (Model, Error, and C. Total).
Sum of Squares (SS for short) quantifies the variation associated with each variation source. The
C. Total SS is the corrected total SS computed from all the ratio values. It divides (partitions) into
the SS for Model and SS for Error. The Model SS is the amount of the total variation in the ratio
scores explained by fitting a straight line to the data. The Error SS is the remaining or unexplained
variation.
Mean Square lists the Sum of Squares divided by its associated degrees of freedom (DF) for
Model and Error.
is the regression (Model) mean square divided by the Error mean square.
F Ratio
Prob > F is the probability of a greater F-value occurring if the ratio values differed only because
of different subjects rather than because the subjects are different ages.
In this example, the significance of the F-value is 0.0001, which strongly indicates that the linear fit to
the weight/height growth pattern is significantly better than the horizontal line that fits the sample
mean to the data.
Term
Estimate
Std Error
t Ratio
Prob > |t| is the probability of a greater absolute t-value occurring by chance alone if the parameter has no effect in the model.
The significant F-ratio in the Analysis of Variance table tells the student that the regression line fits significantly better than the horizontal line at the mean (the simple mean model). However, while the
7 Regression
88
Chapter 7
regression line looks like a good fit for age groups above seven months, it does not describe the data well
for ages younger than seven months.
Excluding Points
Because the low-age points are the trouble spots for the linear fit, remove them from the analysis and
try fitting the model to the remaining values.
To highlight these outliers and exclude them from the analysis:
Select the lasso tool from Tools menu or toolbar.
Drag the lasso around the points to be excluded.
Select Rows > Exclude/Unexclude to exclude the selected points.
Right-click the selected points, select Row Markers, and select X to assign the X marker to the
excluded points.
Click the red triangle icon in the title bar and choose the Fit Line command again to see the results
of excluding the low-age points.
The scatterplot shown here has both regression lines. The low-age points still show on the plot but are
not included in the second regression lines computation.
Chapter 7
89
7 Regression
The open journal file contains all reports from the active report window. Plots can be resized, opened,
or closed, as can outlines. This allows for printing of certain parts of the report.
90
Chapter 7
Chapter 7
91
The tables in Figure 7.3 show the R2 value from the Summary of Fit tables for the linear fit, the second
degree polynomial fit, and the third degree fit. As polynomial terms are added to the model, the regression curve appears to fit the data better.
Fitting a Spline
Even the polynomial fit of degree 3 does not quite reach the outlying points of the very young subjects.
A free-form function that acts as if it smooths the data, such as a smoothing spline, may be better.
Use the Remove Fit command on both polynomial fits, so that only the
first linear regression line shows on the scatterplot.
Click the red triangle icon on the title bar and select Fit Spline three times,
with lambda values of 10, 1,000, and 100,000.
Lambda is a tuning factor that determines the flexibility of the spline. The
Fit Spline command submenu (shown to the left in Figure 7.4) lists lambda
values. The three new fits are overlaid on the scatterplot.
7 Regression
Figure 7.3 Comparison of Linear Fit and Polynomial Fits of Degree 2 and 3
92
Chapter 7
linear fit
= 100,000
= 1,000
= 10
By inspecting the plot, see that the lambda = 10 curve is too flexible and therefore local error has too
great an effect on it. The lambda = 100,000 curve is too stiff. It is so straight that it does not reach
down to model the lower ages closely. However, the lambda of 1,000 curve fits well. Its shape is not
influenced by local errors, and it appears to fit the data smoothly.
If a report of these results is needed, journal these results.
Select Edit > Journal.
The Journal command appends the scatterplot with spline fits and
text reports to the open journal file. After journaling the final analyses, the following draft notes about the spline-fitting technique
can be added at the bottom of the journal window.
Select the annotate tool from the tool bar.
Click and drag a large box at the bottom of the report.
Add the following text to the box.
"This fitting technique applies a cubic polynomial to the interval
between points; the polynomial is joined such that the curve meets at the
same point with the same slope to form a continuous and smooth curve. A
small enough lambda could make such a curve go through every point, which
would model the error, not the mean. A moderate lambda value forces the
curve to be smoother, i.e., less curved. This is accomplished by adding a
curvature penalty to the optimization that minimizes the sum of squares
error."
By comparing various regression fits, notice that both the polynomial fits and the spline fit with moderate flexibility best describe the data. These models show that infants grow most rapidly during the first
months of life and that growth rate decreases significantly at approximately 12 months.
Chapter 7
93
Excluding Points, p. 88 in this chapter, shows how to overlay a linear fit for the whole sample with a
linear fit for children over the age of seven months. Carry this idea one step further with overlay fits to
compare children under the age of one year with children over one year.
In the Growth.jmp data table, create a new column called group to act as a grouping variable.
Right-click in the new column area of the data table and select New Column from the resulting
menu. Write the column name and click OK.
Right-click in the Group column (Command-click on the Macintosh) and select Formula.
Enter the formula shown in Figure 7.5.
Click Conditional in the function selector list and select the If function.
The expression term, denoted expr, is highlighted.
Choose a < b from the Comparison functions.
The left side of the comparison clause is highlighted. Click age in the column selection list.
Enter 12 for the numeric comparison.
Double-click the term denoted then clause.
Enter Babies (in double quotes because this column is a character variable).
Double-click the term denoted else clause.
Enter Toddlers (with double quotes).
Click Apply and then OK.
This assigns the value Babies to each child less than 12 months old, and Toddlers to children who are
12 months or older.
Figure 7.5 Computed Age Grouping Variable
7 Regression
Fitting By Groups
94
Chapter 7
Clear the Smoothing Spline fits still showing, such as those seen in Figure 7.4, using each fits
Remove Fit command. Click the red triangle icon for all three smoothing spline fits and select
Remove Fit for each one.
Click the red triangle icon and select Group By to display the window shown here.
Select group, the newly created grouping variable, and click OK.
Choose the Fit Line command.
With a grouping variable (group) in effect, the overlaid regression lines shown in Figure 7.6 appear
automatically. The points that correspond to each regression give a dramatic visualization of the steep
growth rate for babies during the first year of life compared to the more moderate growth rate of toddlers and small children age one to five years.
Figure 7.6 Regression Lines for Levels of a Grouping Variable
Chapter Summary
To analyze some bivariate data, the Fit Y by X command was used to examine a variety of regression
model fits. The task was to model and describe the growth pattern of subjects over a range of ages. You
measured growth using the ratio of weight to height and accomplished this task by:
Fitting mean to use as a baseline comparison to other regression models and evaluate the fit using
statistical text reports.
Chapter 7
95
Excluding outliers and again fitting a straight line to compare the R2 values given by the Summary
of Fit tables for both lines.
Fitting second and third degree polynomials to see if they model the growth pattern more
realistically.
Fitting smoothing splines with lambda values of 10, 1,000, and 100,000 and comparing them with
each other and with the linear fit.
Clicking the red triangle icon and selecting the grouping facility (Group By) to compare growth
rates of babies under the age of one year with toddlers from age one to five years.
Using the Journal command to append each of these regression reports and graphs to a journal file.
7 Regression
Chapter 8
A Factorial Analysis
Designed Modeling
This lesson examines two treatments of popcorn. The plain, everyday type has been around for years,
but researchers claim to have discovered a special treatment of corn kernels. This new process supposedly increases the popcorn yield as measured by popcorn volume from a given measure of kernels.
Is this true? If so, how much is the increase? Are these increases the same for all groups of conditions?
The special treatment raises the cost of the popcorn, so the increase in yield must be significant enough
to warrant the higher costs.
The popcorn data used in this chapter and for examples in the JMP User Guide and the JMP Statistics
and Graphics Guide are artificial, but the experiment was inspired by experimental data reported in Box,
Hunter, and Hunter (1978).
Objectives
Learn techniques to analyze a designed factorial experiment using the Fit Model command.
Evaluate and interpret effects using JMPs interactive graphical techniques.
Examine supporting text reports.
Evaluate the significance of interaction effects using interaction plots.
Save a models predicted values for each observation.
Contents
Look Before You Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Open a Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
What Questions Can Be Answered?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
The Fit Model Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Graphical Display: Leverage Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Quantify Results: Statistical Reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102
Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Summary Reports For The Whole Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Summary Reports for Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
Chapter 8
A Factorial Analysis
Look Before You Leap
99
The popcorn yield data are the result of a designed experiment. The same amounts of different kinds of
corn were methodically popped under different conditions. First, look at the data to review the results
of the popcorn experiment.
8 Factorial Analysis
100
A Factorial Analysis
The Fit Model Window
Chapter 8
Select Effect Leverage from the box beside Emphasis at the top right of the Fit Model window.
Click Run Model to estimate the model parameters and view the results.
Chapter 8
A Factorial Analysis
Graphical Display: Leverage Plots
101
The Fit Model command graphically displays the whole model and each model effect as the
leverage plots shown in Figure 8.2 through Figure 8.5. It is possible to tell at a glance whether the factorial model explains the popcorn data and which factors are most influential.
The Whole Model plot to the left in Figure 8.2 shows actual yield by predicted yield values with a
regression line and 95% confidence curves. The regression line and the 95% confidence curves cross
the sample mean (the horizontal line), which show that the whole factorial model (all effects together)
explains a significant proportion of the variation in popcorn yield.
There is also a significant difference in yield between the two types of popcorn, as shown in the
right-hand leverage plot for the popcorn main effect. The small p-values beneath the plots quantify the
significant model fit and popcorn effect.
Figure 8.2 Leverage Plots of Actual by Predicted and of Popcorn Effect
In Figure 8.3, the confidence curves for oil amt and the popcorn*oil amt interaction do not cross the
horizontal mean line (rather, they encompass the mean line). This shows that neither of these factors
significantly affected popcorn yield.
Figure 8.3 Leverage Plots for the Oil Amt and Its Interaction with Popcorn
The leverage plots in Figure 8.4 show that the batch size effect (batch) and the interaction between
popcorn type and batch size (popcorn*batch) are significant effects. This means that the size of the
8 Factorial Analysis
102
A Factorial Analysis
Quantify Results: Statistical Reports
Chapter 8
batch makes a difference in the popcorn yield. Furthermore, the significant interaction means that
batch size affects each type of popcorn differently.
Figure 8.4 Leverage Plots for Batch and Its Interaction with Popcorn
The two leverage plots shown in Figure 8.5 show that there is no significant interaction between
amount of oil and batch size.
Figure 8.5 Leverage Plots for Other Interaction Effects
For more information about interpretation of leverage plots, see the chapters Understanding JMP
Analyses and Standard Least Squares: Introduction and the appendix Statistical Details of the JMP
Statistics and Graphics Guide.
Chapter 8
A Factorial Analysis
Analysis of Variance
103
Analysis of Variance
The whole model leverage plot in Figure 8.6 shows that the two-factor model describes the popcorn
experiment well. Examine the tables that accompany the whole model leverage plot.
The Analysis of Variance table (in Figure 8.6) that accompanies the whole model leverage plot quantifies the analysis results. It lists the partitioning of the total variation of the sample into components. The
ratio of the Mean Square components forms an F-statistic that evaluates the effectiveness of the model
fit. If the probability associated with the F-ratio is small, then the analysis of variance model fits better
statistically than the simple model that contains only the overall response mean.
8 Factorial Analysis
If the window is closed, click the red triangle icon on the report and select Script > Redo Analysis
to open a new Fit Model window.
104
A Factorial Analysis
Analysis of Variance
Chapter 8
identifies the sources of variation in the popcorn yield values (Model, Error, and C. Total).
Sum of Squares (SS for short) quantifies the variation in yield. C. Total is the corrected total SS.
It is divided (partitioned) into the SS for Model and SS for Error. The SS for Model is the variation
in the yield explained by the analysis of variance model, which hypothesizes that the model factors
have a significant effect. The SS for Error is the remaining or unexplained variation.
Mean Square
F Ratio
Prob > F is the probability of a greater F ratio occurring if the variation in popcorn yield resulted
from chance alone rather than from the model effects.
In this example, the p-value (Prob > F) is 0.0001. JMP indicates a significant p-value by placing an
asterisk beside it. The low value of this p-value implies that the difference found in the popcorn yield
produced by this experiment is expected only 1 time in 10,000 similar trials if the model factors do not
affect the popcorn yield.
Chapter 8
A Factorial Analysis
Analysis of Variance
105
Other tables in the Fit Model report provide statistical summaries. The Summary of Fit table shows the numeric summaries of
the response for the factorial model:
Rsquare (R2) of 0.809 tells the scientist that the two-factor model explains nearly 81% of the variation in the data.
Rsquare Adj adjusts R2 to make it more comparable over
models with different numbers of parameters.
Root Mean Square Error (sometimes called the RMSE) is
a measure of the variation in the yield scores that can be
attributed to random error rather than differences in the
models factors.
Mean of Response
scores.
Observations
The F-test probabilities in the Effect Test table tell the scientist
that all model effects explain a significant proportion of the total variation. JMP indicates a significant
F-value by placing an asterisk beside it. There is also a table that gives the parameter estimates for the
model.
The nature of the interaction is important in the interpretation of the popcorn experiment. To examine
the significant popcorn*batch interaction,
Click the red triangle icon from the Response yield title bar and select
Factor Profiling > Interaction Plots.
This command plots the least squares means for each combination of effect levels, as shown in
Figure 8.7.
8 Factorial Analysis
106
A Factorial Analysis
Analysis of Variance
Chapter 8
Chapter 8
A Factorial Analysis
Chapter Summary
107
The prediction formula, shown at the bottom of Figure 8.8, becomes part of the column information.
To see this formula:
Highlight the new column name (Pred Formula yield).
Select Formula from the Cols menu.
The prediction formula can be copied to the clipboard using standard cut and paste techniques.
Results show that popcorn should be packaged:
in small packages so that the yield will be good.
in family size packages with smaller packets inside.
in family size packages with popping instructions that clearly state the best batch size for good
results.
Chapter Summary
In this chapter, a designed experiment evaluated the difference in yield between two types of popcorn.
A three-factor factorial experimental design was the basis for popcorn popping trials. The results were
analyzed by using the Analyze > Fit Model command. The following results were found:
The leverage plots for the factorial analysis of three factors showed one main effect and its associated
interactions to be insignificant.
A more compact two-factor analysis with interaction adequately described the variation in yield for
the popcorn trials.
The interaction between the two main effects was significant. The Least Squares Means table for the
interaction showed how the two types of popcorn behaved under different popping conditions.
The new, more expensive gourmet popcorn had better yield than the plain everyday type only if
popped in small batches.
8 Factorial Analysis
This command creates a new column in the Popcorn data table called Pred Formula yield that contains the predicted values for each experimental condition.
Chapter 9
Exploring Data
Finding Exceptions
Exploration is the search to find something newthe endeavor to make some discovery. For data analysis, exploratory study is often the most fruitful part of the analytical process because it is the most
open to serendipity. Something noticed in a data set can be the seed of an important advance.
There are two important aspects of exploration:
What is the pattern or shape of the data?
Are there points unusually far away from the bulk of the data (outliers)?
When exploring data composed of many variables, the great challenge is dealing with this high dimensionality. There can be many variables that have interesting relationships, but its hard to visualize the
relationship of more than a few variables at a time.
Objectives
Use graphical techniques to search for outliers in one, two, three, and higher dimensions.
Perform a principal components analysis and examine it graphically.
Examine outliers graphically using Mahalanobis distance.
Contents
Solubility Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
One-Dimensional Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Two-Dimensional Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Three-Dimensional Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Principal Components and Biplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Multivariate Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Chapter 9
Exploring Data
Solubility Data
111
This lesson examines compounds for those with unusual solubility patterns in various solvents. When
you installed JMP, a folder named Sample Data was also installed. In that folder is a file named
Solubility.jmp. Data from an experiment by Koehler, Grigorus, and Dunn (1988) are in the
Solubility.jmp file.
Open Solubility.jmp.
There are 72 compounds tested with six solvents, in columns called 1-Octanol, Ether, Chloroform,
Benzene, Carbon Tetrachloride, and Hexane.
The Labels column in the table should serve as a label variable (Figure 9.1) so when you plot them, the
compound names instead of row numbers identify points. Although this is already done for you in
Solubility.jmp, you should know that to assign the label role to columns, you should select the columns,
then select Cols > Label/Unlabel or click the red triangle in the columns panel, make sure Labels is
highlighted, and select Label/Unlabel from the resulting menu.
Figure 9.1 Solubility Data Table
One-Dimensional Views
The Distribution command helps you summarize data one column at a time. It does not show any relationships between variables, but the shape of the individual distributions helps identify the
one-dimensional outliers.
To begin exploring the solubility data:
Choose Analyze > Distribution.
9 Exploring Data
Solubility Data
112
Exploring Data
Solubility Data
Chapter 9
Select the six solubility columns and click the Y, Columns button.
Click OK.
Their histograms, resized and trimmed of other output, are shown in Figure 9.2.
Click any histogram bar.
That bar, and all other representations of that data, are highlighted in all related windows. To see how
outlying values are distributed in the other histograms:
Shift-click the outlying bars in each histogram.
This identifies the outlying rows in each single dimension.
Use the Rows > Markers palette to assign the X marker to these selected rows.
The markers appear in the data table and in subsequent plots.
Figure 9.2 One-Dimensional Views
To create a new data table that contains only the outlying rows:
Use the Tables > Subset command as shown here.
Click OK to accept the default settings.
Scroll through the new subset table to see the compound names of the one-dimensional outliers.
Two-Dimensional Views
Return to Solubility.jmp.
Select Analyze > Multivariate Methods > Multivariate.
Highlight all the continuous columns in the table and click the Y, Columns button.
Click OK.
This displays a correlation matrix and a scatterplot matrix of all 30 two-dimensional scatterplots
(Figure 9.3).
The one-dimensional outliers appear as Xs in each scatterplot. Note in the scatterplot matrix that many
of the variables appear to be correlated, as evidenced by the diagonal flattening of the normal bivariate
density ellipses. There appear to be two groups of variables that correlate among themselves but are not
very correlated with variables in the other group.
Chapter 9
Exploring Data
Solubility Data
113
The variables Ether and 1-Octanol appear to make up one group, and the other group consists of the
remaining four variables. These two groups are outlined on the scatterplot matrix shown in Figure 9.3.
Scan these plots looking for outliers (points that fall outside the bivariate ellipses) of a two-dimensional
nature and identify them with square markers using the following steps.
Double-click on Selected in the rows panel in the Solubility.jmp data table to clear your current
selection.
Shift-click each outlier.
Select Rows >Markers and select the square marker from the palette.
Now, both one- and two-dimensional outliers are identified.
Three-Dimensional Views
To see points in three dimensions:
Double-click on Selected in the rows panel in the Solubility.jmp data table to clear the row selection.
Select Graph > Scatterplot 3D, which opens the Scatterplot 3D window.
Add all six continuous variables to the Y, Columns list.
Click OK.
After the plot appears, change the drop-down lists below the plot to any combination of the three
variables.
9 Exploring Data
114
Exploring Data
Solubility Data
Chapter 9
The goal is to look for points away from the point cloud for each combination of three variables. To aid
in this search:
Rotate and examine each three-dimensional plot by dragging the plot with the mouse.
Hover over points to identify outliers.
Figure 9.4 shows two three-dimensional outlying points in the view of Ether by 1-Octanol by Benzene
that hadnt been apparent before. To label them:
Shift-click these points.
Select Rows > Label/Unlabel.
Their labels, METHYLACETATE and ACETONE, appear on the plot.
Figure 9.4 Spotting Outliers in a Three-Dimensional View
Chapter 9
Exploring Data
Solubility Data
115
9 Exploring Data
To illustrate this:
116
Exploring Data
Multivariate Distance
Chapter 9
Click the red triangle icon in the Principal Components/Factor Analysis title bar and select
Eigenvectors.
The result is the Principal Components table in Figure 9.6. The cumulative percent row (Cum Percent)
shows that the first three principal components account for 97.8% of the six-dimensional variation.
Figure 9.6 Principal Components Report
Multivariate Distance
The basic concept of distance in several dimensions relates to the
correlation of the variables. For example, in a Multivariate scatterplot cell for Benzene by Chloroform (Figure 9.3),
HYDROQUINONE is located away from the point cloud. This
compound is not particularly unusual in either the x or y direction alone, but it is a two-dimensional outlier because of its
unusual distance from the strong linear relationship between the
two variables. The ellipse is a 95% density contour for a bivariate
normal distribution with the means, standard deviations, and
correlation estimated from the data. The concept of distance
that takes into account the multivariate normal density contours
is called Mahalanobis distance.
Though only three dimensions can be visualized at a time, the
Mahalanobis distance can be calculated for any number of dimensions. To produce a plot of the Mahalanobis distance:
Select Outlier Analysis > Mahalanobis Distances from the menu accessed by the red triangle at
the top of the multivariate report.
Figure 9.7 shows the Mahalanobis distance by the row number for each data point. To label these
points:
Select the brush tool (
While holding down the Shift key, drag the brush over the points labeled in Figure 9.7. These are
the five points with the greatest Mahalanobis distances.
Select Rows > Label/Unlabel.
Chapter 9
Exploring Data
Chapter Summary
117
Chapter Summary
In this example, commands from the Analyze and Graph menus were used for data exploration to
locate and identify unusual points. The data were first examined in one dimension using the
Distribution command and then in two dimensions using the Multivariate command to look for
unusual points in histograms and scatterplots.
Next, the Principal Components command was used to plot three columns at a time. This technique
was used to summarize six dimensions and to plot principal component rays. The Principal Components table showed that the first three principal components accounted for more than 97% of the total
variation.
Finally, the Outlier Analysis command in the Multivariate report produced the Mahalanobis outlier
distance plot, which summarizes the points in six dimensions. The multivariate outliers were highlighted and labeled in this multi-dimensional space.
See the chapter Correlations and Multivariate Techniques, in the JMP Statistics and Graphics Guide,
for documentation and examples of multivariate analyses. Three-Dimensional Scatterplots in the
JMP Statistics and Graphics Guide documents the 3D plot.
9 Exploring Data
Chapter 10
Multiple Regression
Examining Multiple Explanations
Multiple regression is the technique of fitting or predicting a response by a linear combination of several regressor variables. The fitting principle is like simple linear regression, but the space of the fit is in
three or more dimensions, making it more difficult to visualize. With multiple regressors, there are
more opportunities to model the data well, but the process is more complicated.
This chapter begins with an example of a two-regressor fit that includes three dimensional graphics for
visualization. The example is then extended to include six regressors (but unfortunately no
seven-dimensional graphics to go with it).
Objectives
Illustrate the concept of a fitting plane using graphical techniques.
Combine data tables using the Concatenate command.
Explore a three-dimensional version of a leverage plot.
Contents
Aerobic Fitness Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Fitting Plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Fit Planes to Test Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Whole Model Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
More and More Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Interpreting Leverage Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Chapter 10
Multiple Regression
Aerobic Fitness Data
121
Aerobic fitness can be evaluated using a special test that measures the oxygen uptake of a person while
running on a treadmill for a prescribed distance. However, it would be more economical to evaluate fitness with a formula that predicts oxygen uptake with simpler measurements.
To identify such an equation, runtime and fitness measurements were taken for 31 participants who ran
1.5 miles. The participants ages were also recorded.
When you installed JMP, a folder named Sample Data was also installed. In that folder is a file
named Fitness.jmp. Open Fitness.jmp.
The data are shown in Figure 10.1. For purposes of illustration, certain values of MaxPulse and
RunPulse have been changed from data reported by Rawlings (1988, p. 124).
Investigate Age and Runtime as predictors of oxygen uptake using the Fit Model platform.
To examine a multiple regression model with two effects,
Highlight Oxy in the Select Columns list and click Y.
Highlight both Age and Runtime.
Click Add to specify them as model effects.
You should now see the completed dialog shown in Figure 10.2.
10 Multiple Regression
122
Multiple Regression
Aerobic Fitness Data
Chapter 10
Figure 10.2 Completed Fit Model Dialog for Multiple Regression with Two Effects
Clicking the red triangle icon and selecting Save Columns displays a list of save commands. To save
predicted values and the prediction equation for this model:
Click the red triangle icon and select Save Columns > Prediction Formula.
Chapter 10
Multiple Regression
Fitting Plane
123
Right-click the Pred Formula Oxy column name and select Formula.
The Formula window opens and displays the formula
88.4356809 + -0.1509571 * Age + -3.1987736 * Runtime
This formula defines a plane of fit for Oxy as a function of Age and Runtime.
Click Cancel to close the window and return to the data table window.
Fitting Plane
JMP can show relationships between Oxy, Runtime, and Age in three dimensions with a surface plot.
Select Graph > Surface Plot.
Add Oxy and Predicted Formula Oxy as Columns, and click OK.
Change the Style drop down menu to Needles, and view the plot, as shown in Figure 10.4.
Figure 10.4 Initial View of the Surface Plot of Oxy, Age, and Runtime
Click and drag to rotate the plot so it looks like that in Figure 10.5.
10 Multiple Regression
This command creates a new column in the Fitness data table called Pred Formula Oxy. Its values are
the calculated predicted values for the model. To see the columns formula:
124
Multiple Regression
Fitting Plane
Chapter 10
Figure 10.5 Observed Points using Age, Oxy, and Runtime with the Predicted Plane of Fit
Observed
Oxy values
Plane of
predicted
Oxy values
To compare this fitted line with the plane in the previous example,
Select Graph > Surface Plot.
Add Oxy, Pred Formula Oxy, and Pred Formula Oxy 2 as Columns and click OK.
In the Point Response column drop-down menu, select Oxy for both Pred Formula Oxy and Pred
Formula Oxy 2.
Chapter 10
Multiple Regression
Fitting Plane
125
In the Surface drop-down menu, select Both Sides for Pred Formula Oxy and Pred Formula
Oxy 2.
Both this grid and the one in Figure 10.5 represent least squares regression planes, but this plane has a
slope of zero in the orientation of the Runtime axis. Figure 10.6 shows the plot from an angle.
Figure 10.6 Three-Dimensional Plot with Regression Planes
Observed Values (x)
Oxy by Age and
Runtime regression
plane
Oxy by Age
regression plane
Open the Ranges outline by clicking the diamond-shaped disclosure button (Figure 10.8).
10 Multiple Regression
In the Style drop-down menu, select Needles for both Pred Formula Oxy and Pred Formula
Oxy 2.
126
Multiple Regression
Fitting Plane
Chapter 10
Click the Oxy button and change the minimum to 32, the maximum to 60, and the increment to 5.
Click the Age button and change the minimum to 32, the maximum to 60, and the increment to 5.
Your plot shows the bivariate regression plane edge-on and represents the linear combination of the
effects fit by the plane.
Chapter 10
Multiple Regression
Fitting Plane
127
The Age variable seems significant, but Weight does not. The Runtime variable seems highly significant. Both RunPulse and MaxPulse also seem significant, but MaxPulse is less significant than
RunPulse.
Figure 10.10 Statistical Tables for Multiple Regression
10 Multiple Regression
Look at the significance of each regressor with t-ratios in the Parameter Estimates table or F-ratios in
the Effects Tests table (see Figure 10.10). Because each effect has only one parameter, the F-ratios are
the squares of the t-ratios, and have the same significance probabilities.
128
Multiple Regression
Fitting Plane
Chapter 10
Figure 10.11 Leverage Plots for the Age and Weight Effects
The leverage plot for Runtime shows that Runtime is the most significant of all the regressors. The
Runtime leverage line and its confidence curves cross the horizontal mean at a steep angle.
The leverage plots for RunPulse and MaxPulse shown in Figure 10.12 are similar. Each is somewhat
shrunken on the x-axis. This indicates that other variables are related in a strong, linear fashion to these
two regressors, which means the two effects are strongly correlated with each other.
Figure 10.12 Leverage Plots for the RunPulse and MaxPulse Effects
Chapter 10
Multiple Regression
Fitting Plane
129
When two or more regressors have a strong correlation, they are said to be collinear. These regression
points occupy a narrow band showing their linear relationship.When a plane is fit representing collinear regressors, the plane fits the points well in the direction where they are widely scattered. However, in the direction where the scatter is very narrow, the fit is weak and the plane is unstable.
In text reports, this phenomenon translates into high standard errors for the parameter estimates and
potentially high values for the parameter estimates themselves. This occurs because a small random
error in the narrow direction can have a huge effect on the slope of the corresponding fitting plane. An
indication of collinearity in leverage plots is when the points tend to collapse toward the center of the
plot in the x direction.
The Longley.jmp example shows collinearity geometrically in the strongly correlated regressors, X1 and
X2. To examine these regressors, examine Figure 10.13, which shows rotated views of the regression
planes. They illustrate a regression of X1 on Y, X2 on Y and both on Y. Most of the points are near the
intersection of the three planes. All three planes fit the data well, but their vastly different slopes show
that the hold is unstable.
Geometrically, collinearity between two regressors means that the points they represent do not spread
out in x space enough to provide stable support for a plane. Instead, the points cluster around the center causing the plane to be unstable. The regressors act as substitutes for each other to define one direction redundantly. This is cured by dropping one of the collinear regressors from the model. In this case,
drop either X1or X2 from the model, since both measure essentially the same thing.
Figure 10.13 Comparison of RunPulse and MaxPulse Effects
10 Multiple Regression
Collinearity
130
Multiple Regression
Chapter Summary
Chapter 10
Chapter Summary
Multiple regression uses the same fitting principle as simple regression, but accounting for significance
is more subtle. Each regressor opens a new dimension for fitting a hyperplane, and its significance is
tested by how much the fit suffers in its absence. When regressors correlate to each other, they are said
to be collinear, and they define directions where the fitting hyperplane is not well supported.
References
Becker, R.A., and Cleveland, W.S. (1987), Brushing Scatterplots, Technometrics, 29, 2.
Belsley, D.A., Kuh, E., and Welsch, R.E. (1980), Regression Diagnostics, New York: John Wiley & Sons.
Box, G.E.P., Hunter, W.G., and Hunter, J.S. (1978), Statistics for Experimenters, New York: John Wiley
& Sons, Inc.
Daniel C. and Wood, F. (1980), Fitting Equations to Data, Revised Edition, New York: John Wiley &
Sons, Inc.
Draper, N. and Smith, H. (1981), Applied Regression Analysis, 2nd Edition, New York: John Wiley &
Sons, Inc.
Eppright, E.S., Fox, H.M., Fryer, B.A., Lamkin, G.H., Vivian, V.M., and Fuller, E.S. (1972),
Nutrition of Infants and Preschool Children in the North Central Region of the United States of
America, World Review of Nutrition and Dietetics, 14.
Eubank, R. L., (1988), Spline Smoothing and Nonparametric Regression, New York: Marcel Dekker.
Gabriel, K.R. (1982), Biplot, Encyclopedia of Statistical Sciences, Volume 1, Kotz and Johnson editors,
New York: John Wiley & Sons, Inc.
Hartigan J.A. and B. Kleiner (1981), Mosaics for Contingency Tables, Proceedings of the 13th
Symposium on the Interface between Computer Science and Statistics, W. F. Eddy editor, New York:
Springer.
Hawkins, D.M., (1974), The Detection of Errors in Multivariate Data Using Koehler, Grigorus,
Dunn (1988), The Relationship Between Chemical Structure and the Logarithm of the Partition,
QSAR, 7.
Koehler, M.G., Grigorus, S., and Dunn, J.D. (1988), The Relationship Between Chemical Structure
and the Logarithm of the Partition Coefficient, Quantitative Structure Activity Relationships, 7.
Leven, J. R., Serlin, R. C., and Webne-Behrman, L. (1989), Analysis of Variance Through Simple
Correlation, American Statistician, 43.
Mosteller, F. and Tukey, J.W. (1977), Data Analysis and Regression, Reading Mass: Addison-Wesley.
Rawlings, J. O., Pantula, S.G., and Dickey, D.A. (1998), Applied Regression Analysis: A Research Tool2nd ed., New York, NY: Springer-Verlag New York Inc.
Sall, J. P. (1990), Leverage Plots for General Linear Hypotheses, American Statistician, 308-315.
SAS Institute (1987), SAS/Stat Guide for Personal Computers, Version 6 Edition, Cary NC: SAS Institute
Inc.
Snedecor, G.W. and Cochran, W.G. (1967), Statistical Methods, Ames Iowa: Iowa State University
Press.
132
References
Winer, B.J. (1971), Statistical Principals in Experimental Design, 2nd Edition, New York: McGraw-Hill,
Inc.
Index
JMP Introductory Guide
Symbols
? tool 3, 5
A
Add Columns 24
Add Rows 25
Add Statistics Column 35
B
bar chart 27
beginners tutorial 3, 8
box plot 50
BP Study.jmp 25
C
C. Total 65
calculator example 53
Car Poll.jmp 71
categorical analysis see Fit Y by X, Fit Model
categorical data 6979
categorical type 10
character column 10
Charts 26, 36
Chi-Square 77
classification variable 13
collinear 129
Column Info 25
column name 24
columns 9
Compare Means 57, 64, 6667
Comparison Circles 64
confidence curves 101, 127
confidence interval 50, 62
construct formula 5253
Construct Model Effects 100, 103
continuous 10, 15, 60, 83
Count 52
Cowboy Hat.jmp 16
create subset 5254
crossed effects 100
Cum Prob 52
cursors in data table 10
curve-fitting 83
D
data grid 9
data table
create 2130
density contour 116
designed experiment 99
DF 65, 87, 104
disclosure control 51
discrete data see nominal, ordinal
Display Options 14
DisplayBox scripting index 5
distance 116
Distribution 72, 111
example 14
documentation overview 7
double-arrow cursor 11
drug experiment example 2130
E
enter data 26
Estimate 87
Example button 4
134
Index
F
F Ratio 66, 87, 103104
F Statistic 103
Factor role 13
factorial analysis example 97107
fit by groups 93
Fit Line 85, 94
Fit Model 100
fit plane 124
Fit Polynomial 90
Fit Y by X 39, 59, 75
Fit Spline 91
Fitness.jmp 121
formula 52
example 53
prediction 107, 123
F-probability see Prob > F
Freq role 13
frequencies see Distribution
frequency table 15, 52
G
Graph menu 13
Group By 9495
grouped charts see Charts
grouped fitting 93
grouping data 34, 37
grouping variable 13
Growth.jmp 83
H
hand cursor 12, 49
Help
using online help 3, 5
high dimensionality 109
highlight see Select
histogram see Distribution
honestly significant difference see Tukey-Kramer
HSD
hotdog example 3143
Hotdogs.jmp 33
I
I-beam cursor 11
independent variable 13
Index tab on JMP Starter 4
interaction 99, 105
interquartile range 50
J
JMP Starter window 8
Journal 88, 92
journaling analysis results 88
JSL Operators menu item 5
L
Label/Unlabel 50, 111
lambda 91
large cross cursor 11
Launch button 4
Launch button in Help Index 4
Least Squares Means table 105106
Level 52, 66
leverage plot 102, 126
Likelihood Ratio 76
linked table 35
local error 92
logistic regression see Fit Y by X, Fit Model
Longley.jmp 129
M
magnifier tool 29
Mahalanobis distance 116
main effect 99100
Markers 88, 112
Mean 66
Mean of Response 65, 86, 105
Mean Square 66, 87, 104
Means Diamonds 50, 6162
Means for Oneway Anova table 66
Means, Anova/t-test 61
median 50
menus
tips 6
modeling type 10, 60
modify data table 7174
moments see Distribution
mosaic plot see Distribution
multiple comparison see Compare Means
multiple regression 119, 121130
135
Index
N
New Column 52
computing values 53
New Data Table 8
nominal 10, 15, 60, 75, 83
normal distribution 49, 51
notation used in manuals 7
Number 66
numeric column 10
O
object scripting index 5
Observations 65, 86, 105
Open Data Table 8
operators 5
ordinal 10, 15, 60, 83
outlier 50, 88, 109, 116
outlier box plot see Distribution
P
Parameter Estimates table 87
partitioning 103
pattern in data 109
Pearson Chi Square 76
percentile see quantile
plane fit 124
platforms 13
pointer cursor 12
Polynomial fit 90
post hoc see Compare Means
Prediction Formula 106
Prob 52
Prob > |t| 87
Prob > F 66, 87, 104
Q
Quantile Box Plot 50, 63
Quantile Box Plot see Distribution
question mark (?) tool 3, 5
Quick Reference Guide 7
R
R2 91
references 131
regression analysis 83
regression example 83
regression line 101
regression see Fit Y by X, Fit Model
Remove Fit 86, 91
report options 14
rescale axis 28
resize plot 89
Response role 13
role 13, 26, 75
Root Mean Square Error (RMSE) 65, 86, 105
rows 9
Rsquare 65, 86, 105
Rsquare Adj 65, 86, 105
Run Script
button on JMP Starter 4
S
Save As 89
Save Predicteds 86
Save Prediction Formula 122
Script button 4
select rows and columns 12
selection tool 67
Shift-Tab 26
shortest half 50
Show Points 85
smoothing 91
solubility study 109117
Solubility.jmp 111
Source 65, 87, 104
Spin Principal Components 115
spline 91
start JMP 23
statistical index 4
statistical summaries see Distribution
Std Error 66, 87
StdErrProb 52
Subset 5255, 112
Sum of Squares 65, 87, 104
summarizing data 3143
Summary 34
Summary of Fit table 65, 86, 91, 105
Index
Multivariate 112
136
Index
T
Tab 26
tension 91
Term 87
three-dimensional plots 119130
tick mark 60
Tip of the day 7
Tools 29
Topic Help button 4
t-ratio 87
t-test 87
tutorial 3, 8
tutorial examples
data table 2130
drug experiment 2130
exploratory study 109117
multiple regression 119130
popcorn experiment 97107
regression analysis 83
summarizing data 3143
survey data 6979
typing study 5767
tutorials
learning JMP 3
Typing Data.jmp 59
typing study 5767
W-Z
Weight role 13
weight-height ratio example 83
whiskers 50
Whole-model plot 101
X, Factor role 13
Y, Response role 13
Your Turn
We welcome your feedback.
If you have comments about this book, please send them to
yourturn@sas.com. Include the full title and page
numbers (if applicable).
support.sas.com/saspress
SAS Documentation
To successfully implement applications using SAS software, companies in every industry and on every
continent all turn to the one source for accurate, timely, and reliable informationSAS documentation. We
currently produce the following types of reference documentation: online help that is built into the software,
tutorials that are integrated into the product, reference documentation delivered in HTML and PDFfree on
the Web, and hard-copy books.
support.sas.com/publishing
support.sas.com/LE
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies. 2008 SAS Institute Inc. All rights reserved. 474059US.0108