0% found this document useful (0 votes)
231 views

SAS Tutorial: Presented By: Shashi Kumar

SAS is a software suite used for advanced analytics, business intelligence, and predictive analytics. It can access, manage, analyze, and report data in a highly flexible environment. SAS offers statistical functions and tools for data management, visualization, and customer intelligence. It traditionally been the most expensive option but is being updated with newer techniques like Python integration. R and Python are open-source alternatives to SAS that have grown in popularity for their cost effectiveness and ability to quickly adopt new techniques.

Uploaded by

Ria thanicka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
231 views

SAS Tutorial: Presented By: Shashi Kumar

SAS is a software suite used for advanced analytics, business intelligence, and predictive analytics. It can access, manage, analyze, and report data in a highly flexible environment. SAS offers statistical functions and tools for data management, visualization, and customer intelligence. It traditionally been the most expensive option but is being updated with newer techniques like Python integration. R and Python are open-source alternatives to SAS that have grown in popularity for their cost effectiveness and ability to quickly adopt new techniques.

Uploaded by

Ria thanicka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

SAS Tutorial

Presented By : Shashi Kumar

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Classification: Genpact Internal
What is SAS & Why SAS ??
SAS ("Statistical Analysis System") is a software suite developed by SAS Institute for advanced analytics, multivariate analyses,
business intelligence, data management, and predictive analytics.

Here is a brief description about the 3 ecosystems:


SAS: SAS has been the undisputed market leader in commercial analytics space. The software offers huge array of statistical
functions, has good GUI (Enterprise Guide & Miner) for people to learn quickly and provides awesome technical support.
However, it ends up being the most expensive option and currently being updated with latest trends like SAS Viya(R &Python).

R: R is the Open source counterpart of SAS, which has traditionally been used in academics and research. Because of its open
source nature, latest techniques get released quickly. There is a lot of documentation available over the internet and it is a very
cost-effective option.

Python: With origination as an open source scripting language, Python usage has grown over time. Today, it sports libraries
(numpy, scipy and matplotlib) and functions for almost any statistical operation / model building you may want to do. Since
introduction of pandas, it has become very strong in operations on structured data.

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Data Integration (DI)
Visual Analytics (VA)
Enterprise guide(EG)
Customer Intelligence (CI)

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Foundation Of SAS
SAS :-
It is a highly flexible and integrated software environment that is used to access, manipulate, manage,
analyze and report of data.

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
SAS Session

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
SAS Session
Primary Window
1. Outputs: Contain the report of procedure that have submitted and executed
2. Log: Provide Information about SAS program execution.

a. Note : Blue color


Numbers of observation
Data set names
b. Warning: Green Color
Execution Continue
c. Error: Red Color
Depend on error it will stop or continue the execution
3. Editor: The Place where SAS program is written, edited, submitted the program for execution.

Secondary Window
1. Result : It contains list of procedure which are submitted and executed successfully.
2. Explorer : Provide easy navigation to SAS library icon ,window system, my computer etc.

YouTube Channel : https://lnkd.in/fNSUTDE


Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Topic: Data Step , Boundary, Statement

The DATA step consists of a group of SAS statements that begins with a DATA statement. The DATA statement
begins the process of building a SAS data set and names the data set. The statements that make up the DATA step
are compiled, and the syntax is checked.

1. SAS Statement : A SAS Statement begin with SAS identifying Keyword and end with semi column (;).
Properties:-
1.1 A Single SAS statement can be written in multiple row.
1.2 Multiple SAS statement is written in single row.
1.3 One or more blank separated the word.
2. Step Boundary : Program ends with SAS identifying Keywords e.g. run, quit and begin of new SAS program.
3. SAS Step : It is a combination of SAS statements ,the SAS step begin or start with identifying keyword i.e. data or proc and end
with step boundary. Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Topic: Data Step , Boundary, Statement

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Exercise :

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Topic: SAS Variable

Numeric Character
It can hold 0-9,integer number, decimal number It can hold any character values, such as letter or
number, special character and “ “ (blank)
Right align Left align
Missing Value/Blank assign as . (dot) Missing Value/Blank assign as “ “ (Space)
Default length is 8 bytes Default length is 8 bytes

16-17 digit number when 8 bytes 8 bytes hold 8 character


Minimum length is 3 bytes Minimum length is 1 bytes
Maximum length is infinite (depend on RAM Size) Maximum length is 32767

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Topic: SAS Naming Convention for Variable and Data set
1. Name can hold maximum 32 character.
2. Name should begin with _ (under score) or letter and seconds onwards letter can be _ (under score),letter
or numeric.
3. Special Character not permitted except _ (under score).
4. SAS name is case insensitive ,it may be upper, lower or prop case.

test
test123
_test
@test
4test
Test&
TEST
_12test
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Library :
• SAS library is simply a collection of SAS files that are stored in the same folder or directory on your computer. Other files can be stored in the same folder or
directory, but only the files that have SAS file extensions are recognized as part of the SAS library.
• Depending upon on your need SAS library is of two types :

Sashelp or lib
Work as Temp ref are
permanent

Content
All content
retained even
deleted after
after SAS
SAS session
session

Dataset View Catalog SAS pgm file


Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Library : Temporary to permanent
Syntax:
Libname <libref> <engine> path;
i.e. Libname sk “D:\Users\703215742\Desktop\SAS CLASS”;
This LIBNAME statement specifies sk as a reference to a SAS library. The EXCEL engine specifies the engine that supports the connection to the file type .XLSX.

Libname sk EXCEL “D:\Users\703215742\Desktop\SAS CLASS\file.xlsx”;


Libname sk “D:\Users\703215742\Desktop\SAS CLASS\file.xlsx”;

Access Tera Data in SAS:


libname sk teradata user="userid" password="password" mode=teradata server="servername“ connection=global dbmstemp=yes;

<libref> : - Naming Convention


1. Name can hold maximum 8 character.
2. Name should begin with _ (under score) or letter and seconds onwards letter can be _ (under score),letter or
numeric.
3. Special Character not permitted except _ (under score).
4. SAS name is case insensitive ,it may be upper, lower or prop case.
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Library : Temporary to Permanent

Data one; Libname sk “D:\Users\7032xxxxx\Desktop\SAS CLASS”;


Set sashelp.class;
Run; Data sk.two;
Set sashelp.class;
Run;

Dataset one is stored in temporary library “Work”. Dataset two is stored in Permanent library “sk”.
(All content deleted after SAS session) (Content retained even after SAS session)

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Proc Print :-
Syntex:-
proc print data =input data <option>;
Var var1 var2…varn;
Id var1 var2 …varn;
run;

Var : It define the variables and observation in o/p window.


Id: It suppress the observation column and id variable comes the first variable in o/p window.

Options:-
Noobs:- It suppress the observation column in the proc print o/p.
Double:- It provides space between the observation.
Obs=n:- It give the first n observation from the dataset.

Examples:-
proc print data =sashelp.class;run;
proc print data =sashelp.class; var age sex;run;
proc print data =sashelp.class; id age;run;
proc print data =sashelp.class double ;run;
proc print data =sashelp.class (obs=10);run; Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Proc Print :- proc print data =sashelp.class;run;
Proc Contents:-

Syntax:-
proc contents data=input data <option>;run;
It display description portion of dataset by default by show the result in three parts.
1. Attributes
2. Engine/Host
3. List of Variables

Examples:-
proc contents data=sashelp.class; run;
proc contents data=sashelp.class varnum; run;
proc contents data=sashelp._all_; run;
proc contents data=sashelp.class nods; run;

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Proc Contents:-
Exercise:-
1. Create Permanent Library

Libname sk “D:\Users\7032xxxxx\Desktop\SAS CLASS”;


Data one;
Set sashelp.class; Data sk.two;
Run; Set sashelp.class;
Run;

2. proc print data =sashelp.class;run;


3. proc contents data=sashelp.class; run;

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
PDV : Program Data Vector

Before Actually moving to the PDV section let us try to explorer how SAS actually its data/steps in a sequence.
SAS initiate its code into two parts :
1. Compilation Phase
2. Execution Phase

➢ In the compilation phase SAS checks the syntax of the submitted code and if there is a syntax error then SAS stops
the further process and at the end of the compilation phase (i.e. after checking syntax and good to go) SAS creates
the descriptor portion of the dataset.

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Compilation Phase :

1. Syntax Error
2. Create PDV
3. Create descriptor portion

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Compilation Phase:
When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates
the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a variable type
conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items:
1. Input buffer :
Is a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer is created
only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.)
2. Program Data Vector (PDV) :
It is a logical area and virtual memory, which is used for manipulation purpose. It creates two automatically temporary variables.
1. _N_ : It represent number of iteration done by data steps. Where one iteration equal to one observation.
2. _ERROR_ : It represent data error in the record has two values.
a) 0 : It represent no error in particular record.
b) 1 : It represent error in the specific record. It does not represent that how many error are there in the record.

3. Descriptor Information :
Is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for
example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types
(character or numeric) of the variables.

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Descriptor Portion :
• How to see descriptor portion of the dataset in sas ??

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Execution Phase:

• The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_
automatic variable is incremented by 1.
• SAS sets the newly created program variables to missing in the program data vector (PDV).
• SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data
vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.
• SAS executes any subsequent programming statements for the current record.
• At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system
automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing
in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing
here.
• SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current
observation.
• The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.
1. Read data step and initializes with missing values.
2. Set statement read one observation
3. Implicit output/Return

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
PDV (Program Data Vector) :

It is a logical area and virtual memory, which is used for manipulation purpose. It creates two automatically temporary variables.
1. _N_ : It represent number of iteration done by data steps. Where one iteration equal to one observation.
2. _ERROR_ : It represent data error in the record has two values.
a) 0 : It represent no error in particular record.
b) 1 : It represent error in the specific record. It does not represent that how many error are there in the record.

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
PDV (Program Data Vector) :

Name Age Gender


Ram 21 M
Sita 20 F
Radha 19 F

Program:-
A=Data one;
B=Set test;
C=run;

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
➢1. Read data step and initialize with missing value.
Name Age Gender _Error_ _N_
A . 0 1

➢2.Set statement read one observation


Name Age Gender Error _N_
A . 0 1
B Ram 21 M 0 1
➢3. Implicit output
Name Age Gender
C Ram 21 M
➢4. Implicit Return
Name Age Gender Error _N_
A . 0 1
B Ram 21 M 0 1
A Ram 21 M 0 2

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
IF/Where/Keep/Drop Statement:

WHERE/(Keep/Drop in set statement) Keep/Drop in dataset Statement

IF/Keep/Drop Statements

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
IF/Where/Keep/Drop Statement:
Data one;
set sashelp.class; Data four1(drop=sex);
run; set sashelp.class(Drop=Height Weight);
Salary= age*1000;
drop age;
Data two; run;
set sashelp.class;
Keep Name age ; Data five;
run; set sashelp.class;
if age >14;
Data three; run;
set sashelp.class(keep=Name age);
run; Data six;
Set sashelp.class;
where age > 14;
Data four(keep=Name age); run;
set sashelp.class;
run;
Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
IF -THEN / IF -THEN –Else Statement :

IF Condition/expression THEN statement;


------------------------------------------------------
False
IF Condition/expression THEN statement1; Condition
Else IF Condition/expression THEN statement2;
Else Statement3;
True
Data six1;
Set sashelp.class;
Statement
if age=15 then salary=age*1000;
run;

Data six1;
Set sashelp.class;
if age < 15 then salary=age*1000;
else if age=15 then salary=age*2000;
else salary=age*3000; Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
run;
IF -THEN-DO Statement :
IF Condition/expression THEN do;
statement1;
statement2;
End;
----------------------------------------------- False
IF Condition/expression THEN do; Condition
statement1;
statement2;
End;
Else IF Condition/expression THEN do; True
statement1;
statement2; Statement1
End;

Data six11; Statement2


Set sashelp.class;
if age=15 then do;
Salary=age*1000;
Bonus= age *10;
End;
run; Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Sort / Order the dataset

If can also alter the default ascending order


by using descending followed by var name.

Caution: If you don’t specify out= then parent dataset will be changed

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
nodupkey, nodup & nouniquekey options

Three Threes
Four

proc sort data=three out=four nodupkey;


by eid;
run;
proc sort data=three out=threes; nodupkey= it keep the first unique observation
by eid;run; according to key variable(variable present in
Presented By : Shashi Kumar the by statement) 45
YouTube Channel : https://lnkd.in/fNSUTDE
FIVE

SIX
nodup=It keep the first unique
proc sort data=three out=five nodup dupout=six; observation, if entire row is duplicate
by _all_;run;
Presented By : Shashi Kumar
46
YouTube Channel : https://lnkd.in/fNSUTDE
Seven

Eight
proc sort data=three out=seven nouniquekey uniqueout=eight; ➢ nouniquekey = It eliminate unique records
based on by variable.
by eid;run; ➢ Duplicate record deleted from original dataset
Presented By : Shashi Kumar can be saved in another dataset by using the
47
YouTube Channel : https://lnkd.in/fNSUTDE option uniqueout.
Combining SAS Dataset

Vertically Horizontally

Multiple Set Statement

SQL Join
Append(Proc) Concatenation
Hash object

Interleaving
Merge Statement

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Appending & Interleaving
• Appending is joining two datasets vertically.
• Mention dataset name/s in single set statement to append.
• Make sure the variable/s names should be same otherwise unwanted result
would come.
• Interleaving combines individual sorted dataset into one sorted dataset by
specified variable in the by statement.

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Merging
What’s need to have a successful merging??
• There should be at least one key variable else unwanted result will populate.
• Key variable values should be sorted in both dataset.
• Key variable attribute i.e format, length , alignment should be same.
❑zero-to-one
❑one-to-zero
❑ one-to-one
❑one-to-many
Types of Matched Merging
❑ many-to-one
❑ few-to-many
❑many-to-few
❑many-to-many Presented By : Shashi Kumar
YouTube Channel : https://lnkd.in/fNSUTDE
Merging

Different Cases of Merging


• You can do it by merge statement
• You can do it also with set statement
• Be very careful when you have different #
of records in both tables i.e in few to many or few
to many cases

Presented By : Shashi Kumar


YouTube Channel : https://lnkd.in/fNSUTDE
Full Join
Proc sql;
data M;
create table M2 as
merge A (in = x) B (in = y);
select coalesce(a.id, b.id) as id, gender, sex
by id;
from A full join B on a.id = b.id;
if x = 1 or y = 1;
quit;
run;

Presented By : Shashi Kumar


52
YouTube Channel : https://lnkd.in/fNSUTDE
Inner Join
Proc sql;
data M;
create table M2 as
merge A (in = x) B (in = y);
select coalesce(a.id, b.id) as id, gender, sex
by id;
from A inner join B on a.id = b.id;
if x = 1 and y = 1;
quit;
run;

Presented By : Shashi Kumar


53
YouTube Channel : https://lnkd.in/fNSUTDE
Left Join
Proc sql;
data M;
create table M2 as
merge A (in = x) B (in = y);
select coalesce(a.id, b.id) as id, gender, sex
by id;
from A left join B on a.id = b.id;
if x = 1 ;
quit;
run;

Presented By : Shashi Kumar


54
YouTube Channel : https://lnkd.in/fNSUTDE
Right Join
Proc sql;
data M;
create table M2 as
merge A (in = x) B (in = y);
select coalesce(a.id, b.id) as id, gender, sex
by id;
from A right join B on a.id = b.id;
if y = 1 ;
quit;
run;

Presented By : Shashi Kumar


55
YouTube Channel : https://lnkd.in/fNSUTDE
Non Matching From A
Proc Sql;
data M; create table Q2 as
merge A (in = x) B (in = y); select coalesce(a.id, b.id) as id, gender, sex
by id; from A left join B on a.id = b.id
If x=1 and y = 0 ; where b.id is null;
run; quit;

Presented By : Shashi Kumar


56
YouTube Channel : https://lnkd.in/fNSUTDE
Non Matching From B
Proc Sql;
data M; create table Q2 as
merge A (in = x) B (in = y); select coalesce(a.id, b.id) as id, gender, sex
by id; from A right join B on a.id = b.id
If x=0 and y = 1 ; where a.id is null;
run; quit;

Presented By : Shashi Kumar


57
YouTube Channel : https://lnkd.in/fNSUTDE
Non Matching From A and B
Proc Sql;
data M;
create table Q2 as
merge A (in = x) B (in = y); select coalesce(a.id, b.id) as id, gender, sex
by id; from A right join B on a.id = b.id
If (x=0 and y = 1) or (y=0 and x = 1) ; where a.id is null or b.id is null ;
run; quit;

Presented By : Shashi Kumar


58
YouTube Channel : https://lnkd.in/fNSUTDE
Many to Many

data combined;
merge dat1 dat2 ;
by ID;
run;

proc sql ;
create table combined2 as
select coalesce(dat1.id, dat2.id) as id ,info,info2
from dat1 full join dat2 on dat1.ID = dat2.ID; 59
quit;
Working with Date/Time/Date Time :

SAS Date value


is a value that represents the number of days between January 1, 1960, and a specified date. SAS can perform calculations on dates ranging from A.D. 1582 to A.D.
19,900. Dates before January 1, 1960, are negative numbers; dates after January 1, 1960, are positive numbers.
SAS time value
is a value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400

SAS datetime value


is a value representing the number of seconds between January 1, 1960, and an hour/minute/second within a specified date.

Presented By : Shashi Kumar


Formats Date/Time/Date Time :

Format : Which is used to convert standard data to non standard data. Format variable name <$> formatW.d;
Informant : Which is used to convert Non Standard data into standard data. W : Total Width
d : Number of decimal place
$ : It indicate Character format
10/09/08 : Non-Standard :::: Informant ::::::MMDDYY8.
$40,000 : Non-Standard :::: Informant ::::::Dollar7. i.e, Format Doj date9.;
74,000 : Non-Standard :::: Informant ::::::Comma6. (Format either SAS built or User define format (Proc format))
34555 : Standard ::::::::::: Format :::::::: ?? (Comma6., MMDDYY8.,date9……)

Presented By : Shashi Kumar


Formats Date/Time/Date Time :

Input Format Output

There are various date / Time / Date time formats


1 Date7. 1 01JAN60
as per your need your choose your format.
Especially in transections data datetime format is datetime22.
1 Date9. 1 01JAN1960
Format Input Output
1 DDMMYY10. 1 01/01/1960

HHMM. 53132 14:46


1 DDMMYY. 18703 17/03/11

1
HOUR. 53132 15
DDMMYY10. 18703 17/03/2011

MMSS. 53132 885


1 DDMMYYB. 18703 17 03 11
TIME. 53132 14:45:32

1 DDMMYYB10. 18703 17 03 2011


TOD. 53132 14:45:32

Presented By : Shashi Kumar


Formats Date/Time/Date Time :

1. Today :- Return a current date from a SAS date value.


2. Day :- Extract the day of the month from a SAS date and returns a number from 1-31.
3. Weekdays :- Returns the day of the week from SAS date and return a number from 1 to 7.
(Sunday =1; Monday=2;Tuesday=3;Wednesday=4;Thrusday=5;Friday=6;Saturday=7)

4. Month :- Extract the month from the SAS date and return a number from 1 to 12.
(January =1;February=2;March=3;April=4………December=12)
5. Year :- Extract the year from the SAS date and returns 4 digit of years.
6. Qtr :- Extract the quarter from the SAS date and returns number from 1-4.
(Jan- March = 1;Apr-June=2;July-Sep=3;October-December=4)
7. MDY :- Return a SAS date value from numeric month, day and year value.

Function Input Output

Today Current System Date 21534


Day 13FEB2019 13

Weekday 13FEB2019 4

Month 13FEB2019 2

Qtr 13FEB2019 1

Year 13FEB2019 2019

MDY (02,13,2019) *Arg should be numeric 21534 (Days From 1 Jan 1960)
Presented By : Shashi Kumar
Formats Date/Time/Date Time :

data test;
set sashelp.air;
A=today();
B=day(date);
C=weekday(date);
D=month(date);
E=qtr(date);
F=year(date);
G=MDY(1,1,1960);
H=MDY(D,B,F);
I=MDY(month(date),1,year('08JAN1960'd));
J=date;
format A G H date9. J weekdate24.;
run;

Presented By : Shashi Kumar


Character and Numeric Functions:

Trim(variable) *Remove trailing Blank Cmiss(variable/vector) *count of Missing across row both char & num. Variable.

Strip(variable) *Remove Leading and Trailing Blank n() * count non missing values

Left(variable) * Left Align a Character String Scan(String,nth word,delimeter) * Return nth word of the character value

Right(variable) * Right Align a Character String Find(string,substring,modifier,start position)* It search a target string to specified substring and return
numeric value.

Lowcase(variable) * Convert in Low Case Cat(String1,String2,…Stringn) *Doesnot renove leading and trailing blank before concatenate

Upcase(variable) * Convert in Up Case Catt(String1,String2,…Stringn) * Remove trailing blank before concatenate

Propcase(variable) * Convert in 1st character in up case and reaming low case Cats(String1,String2,…Stringn) * Remove leading blank before concatenate

Length(variable) * Return total number of column width Catx(delimeter,String1,String2,…Stringn) * Remove leading and trailing blank and add delimiter between string

Char(variable,pos) *Return a single character from specified position in a character String Tranwrd (Source,target,replacement) *Search and replace from character string

Int(variable) Input(Source,Informat) * Convert to numeric

Ceil(variable) Put(Source,format) * Round and convert to character

Floor(variable) Substr(String,Start position,Length) * Extract character from string

Round(variable) Compress(Source,Character,modifier) *Removes the characters listed in the character argument from the source.

Nmiss(variable/vector) *count of Missing across row both numeric variable Compbl(String) * Remove multiple blank from a String by translating each occurrence of two or more
Presented By :blank
conjugative Shashi Kumar
into single blank.
Character and Numeric Functions:

data test;
x=' Ram ';
y=' Sita ';
A=trim(x);
B=strip(x);
C=left(x);
D=right(x);
E=upcase(x);
proc contents data=test; run;
F=lowcase(x);
g=length(x);
H=x!!y;
I=length(H);
J=char(x,2);
K=length(x);
L=length(D);
run;

Presented By : Shashi Kumar


Character and Numeric Functions:

data test2;
infile datalines;
input num;
A=int(num);
B=ceil(num);
C=Floor(num);
D=round(num);
E=round(num,5);
F=round(num,11);
G=round(num,.33);
datalines;
10
15
32.5 If A is +ve then int(num)=floor(num)
79 If A is -ve then int(num)=Ceil(num)
37.9 Round: convert to nearest integer with multiple of 2nd argument.
-10
-23.6
; Presented By : Shashi Kumar
run;
Character and Numeric Functions:

data test3;
name='Shashi Kumar';
x=substr(name,1,2);
y=substr(name,3,2);
z=substr(name,1,7);

A=substr('Mohan',3,1);
B=substr(upcase(name),3,4);
run;

Presented By : Shashi Kumar


Character and Numeric Functions:

data test4;
A='Today is FRIDAY';
B=scan(A,2);
C=scan(A,1,'a');
D=scan(A,-1);
E=find(A,'a');
F=find(A,'A');
G=find(A,'a',7);
H=find(A,'a','i',7);
I=' Ram ';
J=' Sita ';
K=cat(I,J,I);
L=catt(I,J,I);
M=cats(I,J,I);
N=catx('/',I,J,I);
O=catx('0',I,J,I);
P=tranwrd(N,'/','*');
Q=tranwrd(N,'Ram','Sita');
R='$500';
S=0;
T=put(S,date9.);
U=input(R,dollar4.);
run;

Presented By : Shashi Kumar


Character and Numeric Functions:

*%%%%%%%%%%%%%%%%%%%% compress %%%%%%%%%%%%%;


data test8;

string='StudySAS Blog! 17752 ' ;


string1=compress(string,'') ; *Compress spaces. This is default;
string2=compress(string,'','ak'); *Compress alphabetic chars(1,2etc);
string3=compress(string,'','d') ; *Compress numerical values;
string4=compress(string,'','l'); *Compress lowercase characters;
string5=compress(string,'','u'); *Compress uppercase characters;
string6=compress(string,'S','k'); *Keeps only specified characters;
string7=compress(string,'!.','P'); *Compress Punctuations only;
string8=compress(string,'s','i'); *upper/lower case specified characters;
string9=compress(string,'','a'); *Compress all upper\lower case characters ;
string10=compress(string,'','s') ; * Compress or delete spaces;
string11=compress(string,'','kd') ; *Compress alphabets (Keeps only digits);

run ;

Presented By : Shashi Kumar


Descriptive Statistics : Proc Freq

Proc freq data= input data <option>;


Table var1 var2 …..varn;
Run;

1. The freq procedure produces one way to n-way frequency tables.


2. The table statement specifies the frequency table and cross tabulation to produce.
3. * between variable request n-way cross tabulation tables.
4. One way frequency tables produces freq, cumulative freq, percentage,Cummulative percentage.
5. N-way freq table produces frequency,row%,column %,Total %
Note:- Without the table statement Proc freq produces the frequency table for each variable (Character & Numeric).

Option can be Placed in table statement after a / to suppressed the display of default statistics.
1. Nocum
2. NoPercent
3. NoFreq
4. Norow
5. Nocol

Options to be added in table statement after the / to control the dataset.


1. Outcome : Include the cumulative freq and cumulative percentage in output data set.
2. Outpct : Include the column% and row % in the output dataset.

Presented By : Shashi Kumar


Descriptive Statistics : Proc means

Proc Means data = input data <option>;


Var analysis variable;
Class Classification Variable;
Run;

It provides data summarization tools to compute descriptive statistics for variables across all observation and with group of observation.
1. The means procedure produces summary report that display descriptive statistics.
2. The var statement specifies the analysis variable and their order in the result.
3. The class statement identifies the variables whose value is defined subgroups for the analysis.
4. By default the means procedure create the report with N, mean, Standard deviation, Minimum, Maximum.
Note: Without the Var statement proc means analysis all numeric variables in the data set.

Presented By : Shashi Kumar


Length/Label/Attrib Statement

1. Length var<$> length; 3. Attrib variable-list attribute-list ;


Associates a format, informat, label, and length with one or more variables.
length name $ 5;
Length age 3;
1. Length statement defines length of the variables.
2. Length of character variable must be define before the variable created at
PDV.

2. Label var1= “label 1” var2= “label 2” … varn= “label n” ;

1. It assign the descriptive level to the variable name.


2. Any number of variables can be associated with single label statement;
3. A label can have 256 character;
4. Using a label statement in the data step, Permanently associate labels with variable by
storing the label in the description portion of SAS data set.

Proc print Option:-


1. Label : By default proc print the variable name in the output window. If we need to
print the label then we have to used label option in proc print statement.

2. Split: It is used to split the label in to multiple line.


Presented By : Shashi Kumar
First dot and last dot (By grouping Processing) :

1. The By statement in dataset enable SAS to process data in groups.


2. By statement data step create two temporary variable for each variable listed in the by statement.
3. The first variable has a value of 1 for the first observation in the by group, otherwise it equal to 0.
4. The last variable has a value of 1 for the last observation in the by group otherwise it is 0.

Presented By : Shashi Kumar


Descriptive Statistics : Proc Freq

Proc freq data= input data <option>;


Tables var1 var2 …..varn;
Run;

1. The freq procedure produces one way to n-way frequency tables.


2. The table statement specifies the frequency table and cross tabulation to produce.
3. * between variable request n-way cross tabulation tables.
4. One way frequency tables produces freq, cumulative freq, percentage,Cummulative percentage.
5. N-way freq table produces frequency,row%,column %,Total %
Note:- Without the table statement Proc freq produces the frequency table for each variable.

Option can be Placed in table statement after a / to suppressed the display of default statistics.
1. Nocum
2. NoPercent
3. NoFreq
4. Norow
5. Nocol

Options to be added in table statement after the / to control the dataset.


1. Outcome : Include the cumulative freq and cumulative percentage in output data set.
2. Outpct : Include the column% and row % in the output dataset.

Presented By : Shashi Kumar


Descriptive Statistics : Proc means

Proc Means data = input data <option>;


Var analysis variable;
Class Classification Variable;
Run;

It provides data summarization tools to compute descriptive statistics for variables across all observation and with group of observation.
1. The means procedure produces summary report that display descriptive statistics.
2. The var statement specifies the analysis variable and their order in the result.
3. The class statement identifies the variables whose value is defined subgroups for the analysis.
4. By default the means procedure create the report with N, mean, Standard deviation, Minimum, Maximum.
Note: Without the Var statement proc means analysis all numeric variables in the data set.

Presented By : Shashi Kumar


Length Statement

Length var<$> length;

i.e ;
length name $ 5;
Length age 3;

1. Length statement defines length of the variables.


2. Length of character variable must be define before the variable created at PDV.

Presented By : Shashi Kumar


First dot and last dot (By grouping Processing) :

1. The By statement in dataset enable SAS to process data in groups.


2. By statement data step create two temporary variable for each variable listed in the by statement.
3. The first variable has a value of 1 for the first observation in the by group, otherwise it equal to 0.
4. The last variable has a value of 1 for the last observation in the by group otherwise it is 0.

Presented By : Shashi Kumar

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy