0% found this document useful (0 votes)
49 views2 pages

R Commands Good

This document provides an overview of common R commands for statistical modeling and analysis. It introduces commands for reading in data, descriptive statistics, linear modeling, resampling techniques, and graphical displays. Key commands covered include lm() for linear regression, summary() and anova() for model evaluation, and resample() for bootstrapping standard errors. The document is intended to help users remember basic R syntax for common statistical analyses.

Uploaded by

shubhang2392
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views2 pages

R Commands Good

This document provides an overview of common R commands for statistical modeling and analysis. It introduces commands for reading in data, descriptive statistics, linear modeling, resampling techniques, and graphical displays. Key commands covered include lm() for linear regression, summary() and anova() for model evaluation, and resample() for bootstrapping standard errors. The document is intended to help users remember basic R syntax for common statistical analyses.

Uploaded by

shubhang2392
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

R Commands for Introduction to Statistical Modeling

D ANIEL K APLAN O CTOBER 5, 2008

This sheet is intended to help you remember R com- > mod = lm( width ~ length + sex, data=kids)
mands and some of the ways they are used. It’s as- Simple Descriptive Statistics or
sumed that you already understand the statistics and For describing data one variable at a time. > mod = with(kids, lm( width ~ length + sex))
purpose of the commands. Relevant operators: mean, sd, median, IQR, or even
> marks the command you type. summary, quantile, table, prop.table. > mod = lm(kids$width~kids$length+kids$sex)
+ marks the second line, if any, of the command. There are two basic styles when selecting a variable Display the coefficients:
from a data frame: using with or using the $ reference > mod
Installing R syntax: Coefficients:
Download and execute the “binary” file appropri- > with( kids, mean(width)) (Intercept) length sexG
ate for your operating system: Windows, Mac OS X, [1] 8.992308 3.641 0.221 -0.233
others. > mean( kids$width ) • R-squared
Starting R [1] 8.992308 > r.squared(mod)
Datasets and convenience functions for the Intro- Either way is fine. I encourage the $ method. [1] 0.45954
duction to Statistical Modeling course are contained in • Quantitative Variables • Regression table including standard errors:
> summary(mod)
the “workspace” file ISM.Rdata. Double-click on that > sd( kids$width )
Coefficients:
file to start a new session of R. [1] 0.5095843 Estimate Std. Error t value Pr(>|t|)
Functions defined in ISM.Rdata are: resample, > median( kids$width ) (Intercept) 3.6412 1.2506 2.91 0.0061
shuffle, r.squared, do. [1] 9 length 0.2210 0.0497 4.45 8e-05
> IQR( kids$width ) sexG -0.2325 0.1293 -1.80 0.0806
Reading in Spreadsheet/Tabular Data
[1] 0.7 • ANOVA table
A data table (called a “data frame” in R) is orga-
> quantile( kids$width, 0.60 ) > anova(mod)
nized into cases and variables. Analysis of Variance Table
60%
• Data from the ISM course
9.08
Relevant operators: ISMdata. This takes a file name Response: width
> summary( kids$width )
(in quotes) and returns a data frame Df Sum Sq Mean Sq F value Pr(>F)
Min 1st Qu. Median Mean 3rd Qu. Max
> kids = ISMdata("kidsfeet.csv") (Intercept) 1 3154 3154 21287.88 < 2e-16
7.90 8.65 9.00 8.99 9.35 9.80
length 1 4 4 27.38 7.4e-06
> runners = ISMdata("ten-mile-race.csv") • Categorical Variables sex 1 0.48 0.48 3.23 0.08
• Your own data Count the number of cases at each level: Residuals 36 5 0.15
Store your data in a spreadsheet in CSV format. > table( kids$sex )
Resampling
There should be a header row. After that, each row is B G
Relevant operators: resample, shuffle, do
one case, each column is one variable. 20 19
• Bootstrapping a Standar Error
Convert the count to a proportion. The standard error reflects variability due to sam-
> prop.table(table( kids$sex )) pling.
B G
> with( kids, mean(width) )
0.5128205 0.4871795
[1] 8.9923
Linear Modeling > trials = do(500)*
Constructing linear models. + with( resample(kids), mean(width) )
Relevant operators: lm, r.squared, summary, anova > sd(trials)
Relevant operators: read.csv
Fit a model. All of these three styles are equivalent, [1] 0.076145
> mydata = read.csv("fish.csv")
but I recommend the first one: For a model:
> campaign.spending
> lm( width~length+sex, data=kids) Causal Network with 4 vars:
(Intercept) length sexG 9.5
==============================
3.641 0.221 -0.233 9.0

width
popularity is exogenous
> trials = do(500)* 8.5
polls <== popularity
+ lm( width~length+sex, data=resample(kids)) 8.0
spending <== polls
> sd(trials) B G
vote <== popularity & spending
(Intercept) length sexG
• Observational Study
0.939525 0.036822 0.120385 • Histograms
> histogram( ~ age, data=runners) > run.sim(campaign.spending, 4)
• Hypothesis Testing popularity polls spending vote
Don’t forget the leading tilde (~).
Implement the null hypothesis that sex is not re- 1 44.141 46.875 59.484 42.780
lated to width, but length might be: 25 2 25.577 30.991 46.322 18.655

Percent of Total
20
3 47.376 46.616 45.275 52.207
> trials = do(500)* 15
4 48.393 50.003 56.839 54.389
10
+ lm( width~length+shuffle(sex), data=kids) 5

> mean(trials) 0
• Experimental Study
(Intercept) length shuffle(sex)G
20 40 60 80 Two types of experiments are supported.
age
2.8575581 0.2480365 0.0051818 Create the experimental intervention of the desired
> sd(trials) Extra: Try also length:
(Intercept) length shuffle(sex)G > histogram( ~ age | sex, data=runners) > intervene = rep( c(0,100), length.out = 5)
0.2228825 0.0085814 0.1326187 • Bar Charts 1) Impose the intervention directly.
> barchart( table(kids$sex), horizontal=FALSE) > run.sim( campaign.spending, 4,
Graphics + spending=intervene)
20

• Scatter Plot popularity polls spending vote


15
1 49.931 50.071 0 48.009
Freq

10
> xyplot( width ~ length, data=kids) 2 66.392 69.752 100 74.800
5
3 73.079 71.553 0 58.396
0
B G 4 51.752 53.136 100 57.155
9.5
2) Add the intervention on top of the “natural” val-
width

9.0

Simulations ues:
8.5

8.0 Simulations generate data from a hypothetical > run.sim( campaign.spending, 4,


22 23 24 25 26 27 causal network. + spending=intervene, inject=TRUE)
length
Relevant function: run.sim popularity polls spending vote
Available hypothetical causal networks in- 1 54.129 52.972 59.853 67.638
Try: xyplot(width~length|sex,data=kids)
clude: campaign.spending, jock, university.test, 2 70.974 65.608 126.752 85.580
• Box and Whiskers Plot 3 75.549 71.873 34.934 67.062
heights, electro, aspirin, salaries, etc.
To see the variables, type the name of the hypothet- 4 50.391 48.954 160.389 74.577
> bwplot( width ~ sex, data=kids)
ical causal network

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy