SAS Tutorial: Presented By: Shashi Kumar
SAS Tutorial: Presented By: Shashi Kumar
R: R is the Open source counterpart of SAS, which has traditionally been used in academics and research. Because of its open
source nature, latest techniques get released quickly. There is a lot of documentation available over the internet and it is a very
cost-effective option.
Python: With origination as an open source scripting language, Python usage has grown over time. Today, it sports libraries
(numpy, scipy and matplotlib) and functions for almost any statistical operation / model building you may want to do. Since
introduction of pandas, it has become very strong in operations on structured data.
Secondary Window
1. Result : It contains list of procedure which are submitted and executed successfully.
2. Explorer : Provide easy navigation to SAS library icon ,window system, my computer etc.
The DATA step consists of a group of SAS statements that begins with a DATA statement. The DATA statement
begins the process of building a SAS data set and names the data set. The statements that make up the DATA step
are compiled, and the syntax is checked.
1. SAS Statement : A SAS Statement begin with SAS identifying Keyword and end with semi column (;).
Properties:-
1.1 A Single SAS statement can be written in multiple row.
1.2 Multiple SAS statement is written in single row.
1.3 One or more blank separated the word.
2. Step Boundary : Program ends with SAS identifying Keywords e.g. run, quit and begin of new SAS program.
3. SAS Step : It is a combination of SAS statements ,the SAS step begin or start with identifying keyword i.e. data or proc and end
with step boundary. Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Topic: Data Step , Boundary, Statement
Numeric Character
It can hold 0-9,integer number, decimal number It can hold any character values, such as letter or
number, special character and “ “ (blank)
Right align Left align
Missing Value/Blank assign as . (dot) Missing Value/Blank assign as “ “ (Space)
Default length is 8 bytes Default length is 8 bytes
test
test123
_test
@test
4test
Test&
TEST
_12test
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Library :
• SAS library is simply a collection of SAS files that are stored in the same folder or directory on your computer. Other files can be stored in the same folder or
directory, but only the files that have SAS file extensions are recognized as part of the SAS library.
• Depending upon on your need SAS library is of two types :
Sashelp or lib
Work as Temp ref are
permanent
Content
All content
retained even
deleted after
after SAS
SAS session
session
Dataset one is stored in temporary library “Work”. Dataset two is stored in Permanent library “sk”.
(All content deleted after SAS session) (Content retained even after SAS session)
Options:-
Noobs:- It suppress the observation column in the proc print o/p.
Double:- It provides space between the observation.
Obs=n:- It give the first n observation from the dataset.
Examples:-
proc print data =sashelp.class;run;
proc print data =sashelp.class; var age sex;run;
proc print data =sashelp.class; id age;run;
proc print data =sashelp.class double ;run;
proc print data =sashelp.class (obs=10);run; Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Proc Print :- proc print data =sashelp.class;run;
Proc Contents:-
Syntax:-
proc contents data=input data <option>;run;
It display description portion of dataset by default by show the result in three parts.
1. Attributes
2. Engine/Host
3. List of Variables
Examples:-
proc contents data=sashelp.class; run;
proc contents data=sashelp.class varnum; run;
proc contents data=sashelp._all_; run;
proc contents data=sashelp.class nods; run;
Before Actually moving to the PDV section let us try to explorer how SAS actually its data/steps in a sequence.
SAS initiate its code into two parts :
1. Compilation Phase
2. Execution Phase
➢ In the compilation phase SAS checks the syntax of the submitted code and if there is a syntax error then SAS stops
the further process and at the end of the compilation phase (i.e. after checking syntax and good to go) SAS creates
the descriptor portion of the dataset.
1. Syntax Error
2. Create PDV
3. Create descriptor portion
3. Descriptor Information :
Is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for
example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types
(character or numeric) of the variables.
• The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_
automatic variable is incremented by 1.
• SAS sets the newly created program variables to missing in the program data vector (PDV).
• SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data
vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.
• SAS executes any subsequent programming statements for the current record.
• At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system
automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing
in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing
here.
• SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current
observation.
• The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.
1. Read data step and initializes with missing values.
2. Set statement read one observation
3. Implicit output/Return
It is a logical area and virtual memory, which is used for manipulation purpose. It creates two automatically temporary variables.
1. _N_ : It represent number of iteration done by data steps. Where one iteration equal to one observation.
2. _ERROR_ : It represent data error in the record has two values.
a) 0 : It represent no error in particular record.
b) 1 : It represent error in the specific record. It does not represent that how many error are there in the record.
Program:-
A=Data one;
B=Set test;
C=run;
IF/Keep/Drop Statements
Data six1;
Set sashelp.class;
if age < 15 then salary=age*1000;
else if age=15 then salary=age*2000;
else salary=age*3000; Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
run;
IF -THEN-DO Statement :
IF Condition/expression THEN do;
statement1;
statement2;
End;
----------------------------------------------- False
IF Condition/expression THEN do; Condition
statement1;
statement2;
End;
Else IF Condition/expression THEN do; True
statement1;
statement2; Statement1
End;
Caution: If you don’t specify out= then parent dataset will be changed
Three Threes
Four
SIX
nodup=It keep the first unique
proc sort data=three out=five nodup dupout=six; observation, if entire row is duplicate
by _all_;run;
Presented By : Shashi Kumar
46
YouTube Channel : https://lnkd.in/fNSUTDE
Seven
Eight
proc sort data=three out=seven nouniquekey uniqueout=eight; ➢ nouniquekey = It eliminate unique records
based on by variable.
by eid;run; ➢ Duplicate record deleted from original dataset
Presented By : Shashi Kumar can be saved in another dataset by using the
47
YouTube Channel : https://lnkd.in/fNSUTDE option uniqueout.
Combining SAS Dataset
Vertically Horizontally
SQL Join
Append(Proc) Concatenation
Hash object
Interleaving
Merge Statement
data combined;
merge dat1 dat2 ;
by ID;
run;
proc sql ;
create table combined2 as
select coalesce(dat1.id, dat2.id) as id ,info,info2
from dat1 full join dat2 on dat1.ID = dat2.ID; 59
quit;
Working with Date/Time/Date Time :
Format : Which is used to convert standard data to non standard data. Format variable name <$> formatW.d;
Informant : Which is used to convert Non Standard data into standard data. W : Total Width
d : Number of decimal place
$ : It indicate Character format
10/09/08 : Non-Standard :::: Informant ::::::MMDDYY8.
$40,000 : Non-Standard :::: Informant ::::::Dollar7. i.e, Format Doj date9.;
74,000 : Non-Standard :::: Informant ::::::Comma6. (Format either SAS built or User define format (Proc format))
34555 : Standard ::::::::::: Format :::::::: ?? (Comma6., MMDDYY8.,date9……)
1
HOUR. 53132 15
DDMMYY10. 18703 17/03/2011
4. Month :- Extract the month from the SAS date and return a number from 1 to 12.
(January =1;February=2;March=3;April=4………December=12)
5. Year :- Extract the year from the SAS date and returns 4 digit of years.
6. Qtr :- Extract the quarter from the SAS date and returns number from 1-4.
(Jan- March = 1;Apr-June=2;July-Sep=3;October-December=4)
7. MDY :- Return a SAS date value from numeric month, day and year value.
Weekday 13FEB2019 4
Month 13FEB2019 2
Qtr 13FEB2019 1
MDY (02,13,2019) *Arg should be numeric 21534 (Days From 1 Jan 1960)
Presented By : Shashi Kumar
Formats Date/Time/Date Time :
data test;
set sashelp.air;
A=today();
B=day(date);
C=weekday(date);
D=month(date);
E=qtr(date);
F=year(date);
G=MDY(1,1,1960);
H=MDY(D,B,F);
I=MDY(month(date),1,year('08JAN1960'd));
J=date;
format A G H date9. J weekdate24.;
run;
Trim(variable) *Remove trailing Blank Cmiss(variable/vector) *count of Missing across row both char & num. Variable.
Strip(variable) *Remove Leading and Trailing Blank n() * count non missing values
Left(variable) * Left Align a Character String Scan(String,nth word,delimeter) * Return nth word of the character value
Right(variable) * Right Align a Character String Find(string,substring,modifier,start position)* It search a target string to specified substring and return
numeric value.
Lowcase(variable) * Convert in Low Case Cat(String1,String2,…Stringn) *Doesnot renove leading and trailing blank before concatenate
Propcase(variable) * Convert in 1st character in up case and reaming low case Cats(String1,String2,…Stringn) * Remove leading blank before concatenate
Length(variable) * Return total number of column width Catx(delimeter,String1,String2,…Stringn) * Remove leading and trailing blank and add delimiter between string
Char(variable,pos) *Return a single character from specified position in a character String Tranwrd (Source,target,replacement) *Search and replace from character string
Round(variable) Compress(Source,Character,modifier) *Removes the characters listed in the character argument from the source.
Nmiss(variable/vector) *count of Missing across row both numeric variable Compbl(String) * Remove multiple blank from a String by translating each occurrence of two or more
Presented By :blank
conjugative Shashi Kumar
into single blank.
Character and Numeric Functions:
data test;
x=' Ram ';
y=' Sita ';
A=trim(x);
B=strip(x);
C=left(x);
D=right(x);
E=upcase(x);
proc contents data=test; run;
F=lowcase(x);
g=length(x);
H=x!!y;
I=length(H);
J=char(x,2);
K=length(x);
L=length(D);
run;
data test2;
infile datalines;
input num;
A=int(num);
B=ceil(num);
C=Floor(num);
D=round(num);
E=round(num,5);
F=round(num,11);
G=round(num,.33);
datalines;
10
15
32.5 If A is +ve then int(num)=floor(num)
79 If A is -ve then int(num)=Ceil(num)
37.9 Round: convert to nearest integer with multiple of 2nd argument.
-10
-23.6
; Presented By : Shashi Kumar
run;
Character and Numeric Functions:
data test3;
name='Shashi Kumar';
x=substr(name,1,2);
y=substr(name,3,2);
z=substr(name,1,7);
A=substr('Mohan',3,1);
B=substr(upcase(name),3,4);
run;
data test4;
A='Today is FRIDAY';
B=scan(A,2);
C=scan(A,1,'a');
D=scan(A,-1);
E=find(A,'a');
F=find(A,'A');
G=find(A,'a',7);
H=find(A,'a','i',7);
I=' Ram ';
J=' Sita ';
K=cat(I,J,I);
L=catt(I,J,I);
M=cats(I,J,I);
N=catx('/',I,J,I);
O=catx('0',I,J,I);
P=tranwrd(N,'/','*');
Q=tranwrd(N,'Ram','Sita');
R='$500';
S=0;
T=put(S,date9.);
U=input(R,dollar4.);
run;
run ;
Option can be Placed in table statement after a / to suppressed the display of default statistics.
1. Nocum
2. NoPercent
3. NoFreq
4. Norow
5. Nocol
It provides data summarization tools to compute descriptive statistics for variables across all observation and with group of observation.
1. The means procedure produces summary report that display descriptive statistics.
2. The var statement specifies the analysis variable and their order in the result.
3. The class statement identifies the variables whose value is defined subgroups for the analysis.
4. By default the means procedure create the report with N, mean, Standard deviation, Minimum, Maximum.
Note: Without the Var statement proc means analysis all numeric variables in the data set.
Option can be Placed in table statement after a / to suppressed the display of default statistics.
1. Nocum
2. NoPercent
3. NoFreq
4. Norow
5. Nocol
It provides data summarization tools to compute descriptive statistics for variables across all observation and with group of observation.
1. The means procedure produces summary report that display descriptive statistics.
2. The var statement specifies the analysis variable and their order in the result.
3. The class statement identifies the variables whose value is defined subgroups for the analysis.
4. By default the means procedure create the report with N, mean, Standard deviation, Minimum, Maximum.
Note: Without the Var statement proc means analysis all numeric variables in the data set.
i.e ;
length name $ 5;
Length age 3;