0% found this document useful (0 votes)
234 views

SAS R::: Cheat Sheet

This guide introduces SAS users to R by providing examples that make use of the tidyverse collection of packages. It covers importing and manipulating data frames, conditional filtering, combining datasets, counting and summarizing data, sorting, and dealing with strings. Examples are provided for common tasks like creating new variables, conditional editing, plotting, and more. Keyboard shortcuts for some R operations like assignment and piping are also included.

Uploaded by

Vijay Puram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
234 views

SAS R::: Cheat Sheet

This guide introduces SAS users to R by providing examples that make use of the tidyverse collection of packages. It covers importing and manipulating data frames, conditional filtering, combining datasets, counting and summarizing data, sorting, and dealing with strings. Examples are provided for common tasks like creating new variables, conditional editing, plotting, and more. Keyboard shortcuts for some R operations like assignment and piping are also included.

Uploaded by

Vijay Puram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

SAS <-> R :: CHEAT SHEET

Introduction New variables, conditional editing Some plotting in R


This guide aims to familiarise SAS users with R. data new_data; new_data <- old_data %>% ggplot( my_data , aes( year , sales ) ) +
R examples make use of tidyverse collection of packages. set old_data; mutate(total_income = wages + benefits) geom_point( ) + geom_line( )
total_income = wages + benefits ;
Install tidyverse: install.packages("tidyverse")
run;
Attach tidyverse packages for use: library(tidyverse)
R data here in ‘data frames’, and occasionally vectors (via c( ) ) data new_data; new_data <- old_data %>%
Other R structures (lists, matrices…) are not explored here. set old_data; mutate(full_time = if_else(hours > 30 , "Y" , "N"))
if hours > 30 then full_time = "Y";
Keyboard shortcuts: <- Alt + - %>% Ctrl + Shift + m else full_time = "N";
run;
Datasets; drop, keep & rename variables data new_data; new_data <- old_data %>%
ggplot( my_data , aes( year , sales ) ) +
geom_point( ) + geom_line( ) + ylim(0, 40) +
set old_data; mutate(weather = case_when( labs(x = "" , y = "Sales per year")
data new_data; new_data <- old_data if temp > 20 then weather = "Warm"; temp > 20 ~ "Warm",
set old_data; else if temp > 10 then weather = "Mild"; temp > 10 ~"Mild",
run; else weather = "Cold"; TRUE ~ "Cold" ) )
run;
data new_data (keep=id); new_data <- old_data %>%
set old_data (drop=job_title) ; select(-job_title) %>%
run; select(id) Counting and Summarising
data new_data (drop= temp: ); new_data <- old_data %>% proc freq data = old_data ; old_data %>%
set old_data; select( -starts_with("temp") ggplot(my_data, aes( year, sales, colour = dept) ) +
table job_type ; count( job_type )
run; For percent, add: geom_point( ) + geom_line( )
C.f. contains( ) , ends_with( ) run;
%>% mutate(percent = n*100/sum(n))
data new_data; new_data <- old_data %>% proc freq data = old_data ; old_data %>%
set old_data; rename(new_name = old_name) table job_type*region ; count( job_type , region )
rename old_name = new_name; run;
run; Note order differs
proc summary data = old_data nway ; new_data <- old_data %>%
Conditional filtering class job_type region ;
output out = new_data ;
group_by( job_type , region ) %>%
summarise( Count = n( ) )
run; ggplot( my_data , aes( year, sales, fill = dept) ) +
data new_data; new_data <- old_data %>% Equivalent without nway not trivially produced geom_col( )
set old_data; filter(Sex == "M") proc summary data = old_data nway ; new_data <- old_data %>%
if Sex = "M"; class job_type region ; group_by( job_type , region ) %>%
run; var salary ; summarise( total_salaries = sum( salary ) ,
output out = new_data Count = n( ) )
data new_data; new_data <- old_data %>% sum( salary ) = total_salaries ;
set old_data; filter(year %in% Lots of summary functions in both languages
run;
if year in (2010,2011,2012); c(2010,2011,2012)) Swap summarise( ) for mutate( ) to add summary data to original data
run;
Combining datasets Note ‘colour’ for lines & points, ‘fill’ for shapes

data new_data; new_data <- old_data %>% ggplot( my_data , aes( year, sales, fill = dept) ) +
set old_data; group_by( id ) %>% geom_col( position = "dodge" ) + coord_flip( )
data new_data ; new_data <- bind_rows( data_1 , data_2 )
by id ; slice(1) set data_1 data_2 ;
if first.id ; run;
run; C.f. rbind( ) which produces error if columns are not identical
Could use slice(n( )) for last
data new_data ; new_data <- left_join( data_1 , data_2 , by = "id")
data new_data; new_data <- old_data %>% merge data_1 (in= in_1) data_2 ;
set old_data; filter(dob > as.Date("1990-04-25")) by id ;
if dob > "25APR1990"d; if in_1 ;
run; run; C.f. full_join( ) , right_join( ) , inner_join( ) C.f. position = "fill" for 100% stacked bars/cols

CC BY SA Brendan O’Dowd • brendanjodowd@gmail.com • Updated 2021-09


Sorting and Row-Wise Operations Dealing with strings
proc sort data=old_data out=new_data; new_data <- old_data %>% data new_data; new_data <- old_data %>%
by id descending income ; arrange( id , desc( income ) ) set old_data; filter( str_detect( job_title , "Health" ))
run; if find( job_title , "Health" );
run;
proc sort data=old_data nodup; old_data <- old_data %>%
by id job_type; arrange( id , job_type)) %>% data new_data; new_data <- old_data %>%
run; distinct( ) set old_data; filter( str_detect( job_title , "^Health" ))
if job_title =: "Health" ;
Note nodup relies on adjacency of duplicate rows, distinct( ) does not
run;
Use ^ for start of string, $ for end of string, e.g. "Health$"
proc sort data=old_data nodupkey; old_data <- old_data %>%
by id ; arrange( id ) %>% data new_data; new_data <- old_data %>%
run; group_by( id ) %>% set old_data; mutate( substring = str_sub( big_string , 3 , 6 ))
slice( 1 ) substring = substr( big_string , 3 , 4 );
run;
Returns characters 3 to 6. Note SAS uses <start>, <length>, R uses <start>, <end>
data new_data; new_data <- old_data %>%
set old_data; group_by( id ) %>% data new_data; new_data <- old_data %>%
by id descending income ; slice(which.max( income )) set old_data; mutate( address = str_replace_all( address , "Street" , "St" ))
C.f.which.min( )
if first.id ; address = tranwrd( address , "Street" , "St" );
Swap to preserve duplicate maxima: … slice.max( income )
run; run;
Alternatively: … filter(income==max(income)) C.f. str_replace( ) for first instance of pattern only

data new_data; new_data <- old_data %>% data new_data; new_data <- old_data %>%
set old_data; mutate( prev_id = lag( id , 1 )) set old_data; mutate( full_name = str_c( first_name , surname , sep = " " ))
prev_id= lag( id ); full_name = catx(" " , first_name , surname );
run; run;
C.f. lead( ) for subsequent rows Drop sep = " " for equivalent to cats( ) in SAS
data new_data; new_data <- old_data %>% data new_data; new_data <- old_data %>%
set old_data; group_by( id ) %>% set old_data; mutate( first_word = word( sentence , 1 ))
by id; mutate( counter = row_number( ) ) first_word = scan( sentence , 1 );
counter +1 ; run;
R example preserves punctuation at the end of words, SAS doesn’t
if first.id then counter = 1;
run; data new_data; new_data <- old_data %>%
set old_data; mutate( house_number = str_extract( address , "\\d*" ))
house_number = compress( address , , "dk" );
Converting and Rounding run;
Wide range of regexps in both languages, this example extracts digits only

data new_data; new_data <- old_data %>%


set old_data ; mutate(num_var = as.numeric("5" )) %>% File operations
num_var = input("5" , 8. ); mutate(text_var = as.character( 5 ))
text_var = put( 5 , 8. ); Operate in ‘Work’ library. Operate in a particular ‘working directory’ (identify using getwd( ) )
run; Use libname to define file locations Move to other locations using setwd( )

data new_data ; new_data <- old_data %>% libname library_name "file_location"; save(data_in_use , file="file_location/saved_data.rda")
set old_data; mutate(nearest_5 = round(x/5)*5) %>% data library_name.saved_data; or
nearest_5 = round( x , 5 ) mutate(two_decimals = round( x , digits = 2) set data_in_use; setwd("file_location")
two_decimals = round( x , 0.01) run; save( data_in_use , file = "saved_data.rda")
run;
libname library_name "file_location"; load("file_location/saved_data.rda" )
Creating functions to modify datasets data data_in_use ;
set library_name.saved_data ;
or
setwd("file_location")
save( ) can store multiple data frames in a
run; load("saved_data.rda")
%macro add_variable(dataset_name); add_variable <- function( dataset_name ){ single .rda file, load( ) will restore all of these
data &dataset_name; dataset_name <- dataset_name %>% proc import datafile = "my_file.csv" my_data <- read_csv("my_file.csv")
set &dataset_name; mutate(new_variable = 1) out = my_data dbms = csv;
new_variable = 1; return( dataset_name ) run;
run; } Both examples assume column headers in csv file
%mend; my_data <- add_variable( my_data )
%add_variable( my_data );
Note SAS can modify within the macro,
whereas R creates a copy within the function
CC BY SA Brendan O’Dowd • brendanjodowd@gmail.com • Updated 2021-09

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy