0% found this document useful (0 votes)
147 views25 pages

Base SAS - Learning by Example PPT (CH8 - CH10)

SAS

Uploaded by

de19ch007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views25 pages

Base SAS - Learning by Example PPT (CH8 - CH10)

SAS

Uploaded by

de19ch007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Performing iterative Processing - Chapter 8

This chapter covers DO groups, DO loops, DOWHILE statements, and DO UNTIL statements.

Do groups : When you need to execute multiple statements


on one condition. if Age le 39 then do;
• All the statements between DO and END form a Agegrp = 'Younger group';
DO group. When the IF condition is true, all the Grade = .4*Midterm + .6*FinalExam;
statements in the DO group execute. end;
• The DO group coding is not only more efficient than else if Age gt 39 then do;
multiple IF statements, it is also easier to read. Agegrp = 'Older group';
Grade = (Midterm + FinalExam)/2;
The SUM Statement : There are two primary uses for a sum end;
statement:
- one is to accumulate totals such as a month-to-date total, and
- the other is to create a counter—a variable that is
incremented by a fixed amount on each iteration of the
DATA step.
NOTE : Remember that variables read from raw data or created by assignment statements are
initialized to a missing value for each iteration of the DATA step.
Total = Total + Revenue; (Total is not initiliazed)
Adding missing value to Revenue will result in missing value. Total will be missing data revenue;
for every value. retain Total 0;
You can use a RETAIN statement to tell SAS not to do this. A RETAIN statement input Day : $3.
also enables you to set an initial value for a variable. Revenue : dollar6.;
retain Total 0; if not missing(Revenue) then
if not missing(Revenue) then - needs to be entered to avoid missing value Total = Total + Revenue; /* Note:
addition for Revenue. (Total = Total + Revenue) this does not work */
format Revenue Total dollar8.;
Easier Way : Use a sum statement. datalines;
A sum statement takes the following form: ……………
input Day : $3. Mon $1,000
variable + increment; Tue $1,500
Revenue : dollar6.;
Total + Revenue; Wed .
This statement does the following: Thu $2,000
1. Variable is retained format Revenue Total
dollar8.; Fri $3,000
2. Variable is initialized at 0 ;
3. Missing values (of increment) are ignored. …………….
data test;
- Another very common use of a sum statement is to create counters, for
input x;
example:
if missing(x) then MissCounter +
1;
MissCounter is counting the number of missing values for x.
datalines;
2
.
Iterative DO Loop : 7
.
One form of an iterative DO statement follows: ;
do index-variable = start to stop by increment; (Increment defaults to 1.)
do Year = 1 to 3;

Other Forms of an Iterative DO Loop : data compound;


do x = 1,2,5,10; Interest = .0375;
(values of x are: 1, 2, 5, and 10) Total = 100;
do month = 'Jan','Feb','Mar'; do Year = 1 to 3;
(values of month are: 'Jan', 'Feb', and 'Mar') Total + Interest*Total;
do n = 1,3, 5 to 9 by 2, 100 to 200 by 50; output;
(values of n are: 1, 3, 5, 7, 9, 100, 150, and 200) end;
format Total dollar10.2;
run;
• Without the @ sign, each time SAS executes an INPUT statement, it goes to a new data easyway;
line of data. The single trailing @ sign is an instruction to “hold the line” for another do Group = 'Placebo','Active';
INPUT statement in the DATA step. do Subj = 1 to 5;
• Program shows how to use character variables and nested DO loops. input Score @;
output;
end;
DO WHILE and DO UNTIL Statements : end;
Instead of choosing a stopping value for an iterative DO loop, you can stop a loop datalines;
when a condition is met (UNTIL) or while a condition is true(WHILE). 250 222 230 210 199
166 183 123 129 234
In the program loop runs until the condition is met. ;
DO UNTIL loop always executes at least once.
data double; data double;
Interest = .0375; Interest = .0375;
In the Program loop runs when the condition Total = 100; Total = 100;
is met (True). do while (Total le 200); do until (Total ge 200);
Year + 1; Year + 1;
Unlike DO UNTIL : Total = Total + Interest*Total; Total = Total + Interest*Total;
DO WHILE block does not execute even once output; output;
if the condition is false. end; end;
format Total dollar10.2; format Total dollar10.2;
run; run;
It is very important that the condition you place on a DO UNTIL statement becomes true at some point. Otherwise program
make stuck in infinite loop.
data double;
click the CTRL and C keys simultaneously to escape from infinite loop. Interest = .0375;
Total = 100;
One way to prevent infinite loops is to combine do Year = 1 to 100 until (Total gt 200);
a regular DO loop with an UNTIL condition. Total = Total + Interest*Total;
output;
end;
There are two advantages to this structure: first, even if the UNTIL condition format Total dollar10.2;
Never becomes true, the loop ends when Year reaches 100, and second, run;
you don’t have to assign a value to Year inside the loop.
LEAVE and CONTINUE Statements :

The LEAVE statement in the SAS DATA step is equivalent to the "break" statement. It provides a way to immediately exit
from an iterative loop. The CONTINUE statements in the SAS DATA step skips over any remaining statements in the body
of a loop and starts the next iteration.
Do Year = 1 to 100;
Output will get skip for Year = 5; but the will continue for the other Years. ……………..
But if you use Leave instead, It will break and end the iterative loop. If Year = 5 then continue;
Output;
End;
Working with Dates – Chapter 9
SAS can read dates in almost any form, such as the following:
Date Description :
• 10/21/1950 Month – Day – Year
• 21/10/1950 Day – Month – Year
• 21Oct1950 Day – Month Abbreviation – Year
• 50294 Julian Date

However, SAS does not normally store dates in any of these forms—
it converts all of these dates into a single number—
“the number of days from January 1, 1960. Dates after January 1, 1960, are positive integers; dates
before January 1, 1960, are negative integers.”
January 1, 1960 :0
January 2, 1960 :1
December 31, 1959 : -1
June 15, 2006 :16,967
October 21, 1950 :-3,359
Reading Date Values from Raw Data : 1 2 3
1234567890123456789012345678901234567
You have a raw data file, as shown here: Columns
The first three dates (starting in columns 5, 16, and 25) 001 10/21/1950 05122003 08/10/65 23Dec2005
are in the month-day-year form; 002 01/01/1960 11122009 09/13/02 02Jan1960
The last date (starting in column 34) starts with the day of the month, a three-letter month
abbreviation, and a four-digit year. data four_dates;
infile 'c:\books\learning\dates.txt' truncover;
Here is the Program to read the file : input @1 Subject $3.
@5 DOB mmddyy10.
@16 VisitDate mmddyy8.
You typically want to use either the TRUNCOVER or PAD @26 TwoDigit mmddyy8.
option when reading raw data files with data in fixed @34 LastDate date9.;
columns (see Chapter 21). run;

Because the date of birth (DOB) takes up 10 columns, MMDDYY10. is the proper
informat to use. Similarly for others.
The number of columns used for LastDate is 9, and Listing of FOUR_DATES
the informat name is DATE. Visit Two Last
Subject DOB Date Digit Date
SAS converts all of the dates to their corresponding numerical value. 001 -3359 15837 2048 16793
002 0 18213 15596 1
PROC PRINT will show :
Test.txt
FLOWOVER, STOPOVER, MISSOVER, and TRUNCOVER : 22
333
infile 'your-external-file' flowover; 4444
is the default behavior. The DATA step simply reads the next record data test; 55555
into the input buffer, attempting to find values to assign to the rest infile "/folders/myfolders/test.txt“ flowover;
of the variable names in the INPUT statement. input tt 5.;
infile 'your-external-file' stopover; run;
causes the DATA step to stop processing if an INPUT statement proc print data=test;
reaches the end of the current record without finding values for all title 'Test DATA Step';
variables in the statement. Use this option if you expect all of the run;
STOPOVER;
data in the external file to conform to a given standard and if you
want the DATA step to stop when it encounters a data record that
does not conform to the standard.
infile 'your-external-file' missover;
prevents the DATA step from going to the next line if it does not
find values in the current record for all of the variables in the INPUT MISSOVER; TRUNCOVER;
statement. Instead, the DATA step assigns a missing value for all
variables that do not have values.
infile 'your-external-file' truncover;
causes the DATA step to assign the raw data value to the variable
even if the value is shorter than expected by the INPUT statement.
If, when the DATA step encounters the end of an input record, there
are variables without values, the variables are assigned missing
values for that observation
data four_dates;
You can choose any SAS date format to display these dates properly —
infile 'c:\books\learning\dates.txt' truncover;
format DOB VisitDate date9.
input @1 Subject $3.
TwoDigit LastDate mmddyy10.;
@5 DOB mmddyy10.
@16 VisitDate mmddyy8.
If you see closely mmddyy tells about the year while reading.
@26 TwoDigit mmddyy8.
So system had only information about last two digits of YYYY but while
@34 LastDate date9.;
Converting them to YYYY (format option) - it changed “TwoDigit”
format DOB VisitDate date9.
1 record to 1965
TwoDigit LastDate mmddyy10.;
2 record to 2002
run;
How did SAS figure out whether to make the first two digits 19 or 20?
Listing of FOUR_DATES
There is a system option called YEARCUTOFF that enables
Subject DOB VisitDate TwoDigit LastDate
SAS to compute these values.
001 21OCT1950 12MAY2003 08/10/1965 12/23/2005
The default value for this option is 1920 in SAS 8 and SAS®9.
002 01JAN1960 12NOV2009 09/13/2002 01/02/1960
This value determines the start of a 100-year interval
that SAS uses when it encounters a two-digit year.
With a YEARCUTOFF value of 1920, all two-digit years are in the interval from 1920 to 2019.
65 comes as 19
02 comes as 20 that’s why 1965 and 2002.
NOTE :You might consider including a statement such as the following in each of your programs in case the version of SAS
you are using changed the default value of this option:
options yearcutoff=1920; (Of course, the best practice is to always use four-digit years!)
Computing the Number of Years between Two Dates : data ages;
set four_dates;
YRDIF, to compute the difference in years between Age = yrdif(DOB,VisitDate,'Actual');
the date of birth and the visit date. run;
You can specify values other than ACTUAL if you want to use 30-day title "Listing of AGES";
months and 360-day years (to compute bond interest and other financial proc print data=ages;
calculations, for example). id Subject;
var DOB VisitDate Age;
If you want the Age as of the person’s last birthday (dropping any run;
fractional part of a year), you can use the INT (integer) function:
Age = int(yrdif(DOB,VisitDate,'Actual'));

If you want to round the Age to the nearest year, you can use the ROUND function:
Age = round(yrdif(DOB,VisitDate,'Actual'));

You may see the following expression in some older programs:


Age = (VisitDate – DOB) / 365.25;

This expression gives an approximate value for the number of years between two dates
(the .25 accounts for a leap year every four years). Because the YRDIF function has the
ability to return an exact value, you should use it rather than this expression.
Demonstrating a Date Constant :

How do you compute a person’s age as of a certain date, January 1, 2006, for example?
data ages;
The general form of a date constant is a one- or two-digit day of the month, a set four_dates;
Three character month abbreviation, and a two- or four-digit year in single or double Age =
Quotation marks, followed by an upper- or lowercase d. This is the only form allowed yrdif(DOB,'01Jan2006'd,'Actual');
as a date constant. You cannot use '01/01/2006'd, run;
title "Listing of AGES";
proc print data=ages;
You could use the expression Age = yrdif(DOB,16802,'Actual'); id Subject;
if you happen to know that January 1, 2006, is 16,802 days after January 1, 1960. var DOB Age;
format Age 5.1;
run;
Computing the Current Date :

Suppose you want to compute a quantity based on the current date. data ages;
set four_dates;
Age = yrdif(DOB,today(),'Actual');
run;
Extracting the Day of the Week, Day of the Month, Month, and Year from a SAS Date : data extract;
set four_dates;
The WEEKDAY function returns the day of the week, with 1 = Sunday, 2 = Monday, Day = weekday(DOB);
and so on, and the DAY function returns the day of the month (a number from 1 to 31). DayOfMonth = day(DOB);
Month = Month(DOB);
The MONTH function returns a Year = year(DOB);
number from 1 to 12 and the YEAR function returns a four-digit year value. run;
( -- ) This VAR statement includes all of the variables from title "Listing of EXTRACT";
Day through Year in the order they are stored in the SAS data set. proc print data=extract noobs;
var DOB Day –- Year;
Creating a SAS Date from Month, Day, and Year Values : run;

A very useful function, MDY (month day year), allows you to create a SAS date value by data mdy_example;
supplying month, day, and year values. This function is especially useful if you have a set learn.month_day_year;
SAS data set that contains these values but does not contain the corresponding SAS date Date = mdy(Month, Day, Year);
value or if you have values for month, day, and year in a raw data file that do not format Date mmddyy10.;
conform to any of the SAS date informats. run;
5 got translated to 2005 . ( As per YEARCUTOFF Value, in this case 1920) Obs Month Day Year Date
1 10 21 1950 10/21/1950
2 1 15 5 01/15/2005
3 3 . 2005 .
4 5 7 2000 05/07/2000
Substituting the 15th of the Month when the Day Value Is Missing : data substitute;
There are occasions where you have a missing value for the day of the set learn.month_day_year;
month but still want to compute an approximate date. Many people if missing(day) then Date = mdy(Month,15,Year);
use the 15th of the month to substitute for a missing Day value. else Date = mdy(Month,Day,Year);
format Date mmddyy10.;
Here the MISSING function tests if there is a missing value for the run;
variable Day. If so, the number 15 is used as the second argument
to the MDY function.
Obs Month Day Year Date
Using Date Interval Functions : 1 10 21 1950 10/21/1950
2 1 15 5 01/15/2005
Two functions, INTCK and INTNX, deal with date intervals (such as months, 3 3 . 2005 03/15/2005
quarters, years). The INTCK function computes the number of intervals 4 5 7 2000 05/07/2000
between two dates; the INTNX function computes a date after a given
number of intervals.
To understand even the most basic use of these two functions, you must understand that
they both deal with interval boundaries. INTCK('year','01Jan2005'd,'31Dec2005) : 0
INTCK('year','31Dec2005'd,'01Jan2006) : 1
INTCK('month','01Jan2005'd,'31Jan2005'd) : 0
INTCK('month','31Jan2005'd,'01Feb2005'd) : 1
INTCK('qtr','25Mar2005'd,'15Apr2005'd) : 1
Subsetting and Combining SAS Data Sets – Chapter 10

Subsetting a SAS Data Set :


Subsetting a SAS data set involves selecting observations from one data set by defining selection criteria, usually in a WHERE
or subsetting IF statement. One way to do this is with data females;
a WHERE statement, as follows: set learn.survey;
where gender = 'F';
Remember that the variables used in a WHERE statement must all come run;
from a SAS data set.
Variables that are created by reading raw data or from an assignment statement may not be used in this fashion.

data females;
If you do not need one or more variables from the input data set set learn.survey(drop=Salary);
(the data set on the SET statement), you can use a DROP= or KEEP= where gender = 'F';
data set option. run;

There is an important difference between using a DROP= data set option on the input data set and placing a DROP statement
somewhere in the DATA step.
In this Example - the variable Salary is not present in the Program Data Vector (PDV).

Creating More Than One Subset Data Set in One DATA Step :

You can create multiple SAS data sets from one input data set (something that SQL
cannot do).
Notice that you must name the data set following the OUTPUT statement.
If you do not, SAS outputs the observation to all the data sets listed in the data males females;
DATA statement. set learn.survey;
if gender = 'F' then output females;
else if gender = 'M' then output males;
run;
Adding Observations to a SAS Data Set :

Suppose you want to create a single data set from several similar data sets. You can list as many data sets as you want on a
SET statement and SAS will add all the observations together to form a single data set.
data one_two;
Each of the data sets contains the same variables. set one two;
run;

output

SAS refers to this process as concatenating data sets.


what happens if you use the SET statement on two data sets that don’t contain all the same variables?
Data set Three contains a new variable, Gender, and does not contain the variable data one_three;
Weight. set one three;
run;
OUTPUT

when you are combining several data sets to be


sure you will not truncate any values. If necessary,
place a LENGTH statement before the SET
statement to be sure the resulting length is
adequate to hold all of your values.
(Remember that the length of a character
variable is determined as soon as that variable
enters the PDV; it cannot be changed after
that.)
Finally, if you have a variable in two data sets, one character and the other numeric,
SAS prints an error message in the log and the program terminates.

Interleaving Data Sets :


If both the data set are in sorted order, you can get a sorted output. proc sort data=one;
The advantage of this method is that you don’t have to by ID;
sort the resulting data set. run;
There are times when the resulting data set would be too large proc sort data=two;
to sort conveniently or at all. by ID;
run;
Remember, each of the data sets in the SET statement must be in order of data interleave;
the BY variable(s). Output set one two;
by ID;
When you use PROC SORT to sort a SAS data set, run;
a sort flag is set (you can see this on the first page
of output from PROC CONTENTS) and SAS does not
resort this data set if you attempt to sort it again
by the same BY variables. When you interleave
data sets, this sort flag is not set (which should not
cause you any problems).

Combining Detail and Summary Data :


suppose you want to express each value of cholesterol in the Blood data set as a percentage of the mean for all subjects.
proc means data=learn.blood noprint;
var Chol;
output out = means(keep=AveChol)
mean = AveChol;
run;

To combine this single value with every observation in the Blood data set, you execute a SET statement conditionally. Here’s
how it works. data percent;
set learn.blood(keep=Subject Chol);
if _n_ = 1 then set means;
PerChol = Chol / AveChol;
format PerChol percent8.;
run;

Merging Two Data Sets :


you could have an employee data set (Employee) containing ID numbers and names. If you had another data set (Hours)
containing ID numbers, along with a job class and the number of hours worked, you might want to add the name from the
Employee data set to each observation in the Hours data set.
Listing of two data sets are here :

Here is the program to merge them by ID :


OUTPUT proc sort data=employee;
by ID;
run;
proc sort data=hours;
by ID;
run;
data combine;
merge employee hours; data new;
by ID; merge employee(in=InEmploy)
run; hours (in=InHours);
by ID;
file print;
put ID= InEmploy= InHours= Name=
Controlling Observations in a Merged Data Set : JobClass= Hours=;
run;
Output
You can use IN= variables to control which observations are data combine;
written to the output data set. For example ---------------- > merge employee(in=InEmploy)
hours(in=InHours);
You can, alternatively, write the subsetting IF statement like this: by ID;
if InEmploy = 1 and InHours = 1; if InEmploy and InHours;
run;
data in_both
More Uses for IN= Variables : missing_name(drop = Name);
merge employee(in=InEmploy)
you are asking for all IDs that hours(in=InHours);
are in the Hours data set and by ID;
not in the Employee data set. if InEmploy and InHours then output in_both;
else if InHours and not InEmploy then
output missing_name;
run;

Here is the listing of both the datasets :


When Does a DATA Step End? : data short;
input x;
When any data set reaches an end of file, it signals the end of the DATA step. datalines;
Here is the sample program to understand it. ---------------------------------- 1
2
Data set Short has two observations and data set Long has four. How many observations ;
are in data set New? data long;
Each SET statement keeps a pointer to keep track of which observation it is reading. input x;
In this program, an observation is first read from the Short data set, an observation datalines;
is written out to the New data set, an observation is read from the Long data set, and 3
another observation is written to the New data set. 4
You might expect that this would continue until all the observations from both data sets were read. 5
However, when the end of file on data set Short is encountered, it signals an end to the DATA step, 6
with the result that data set New has only four observations, with values of x equal to 1, 2, 3, and 4. ;
data new;
Merging Two Data Sets with Different BY Variable Names : set short;
you will sometimes find yourself trying to merge two output;
data sets where the name of the variable you want to use to join them has a different set long;
name in each data set. output;
For example, one data set may call a variable ID and the other EmpID. run;
you can use a RENAME= data set option to rename the variable in one data set to be consistent with
the name in the other
Here is the program data sesame;
which merge bert
Uses RENAME option : -> ernie(rename=(EmpNo = ID));
by ID;
run;

Merging Two Data Sets with Different BY Variable Data Types :

A slightly more complicated situation occurs when the variable you


want to use as the BY variable in a merge is a different data type in
the two data sets.

You can create a character variable based on the numeric value of SS in the Division1 data division1c;
data set or you can create a numeric variable based on the character variable in the set division1(rename=(SS = NumSS));
Division2 data set. SS = put(NumSS,ssn11.);
drop NumSS;
Because the two data sets use the same name for the BY variable, first you have to run;
rename this variable and then create a new character variable with the variable\ data both_divisions;
name of SS. ***Note: Both data sets already in
The SSN11. format is a built-in SAS format that prints leading 0s and adds dashes as order of BY variable;
required for Social Security numbers. merge division1c division2;
by SS;
run;
Here is the output of the merge : ------

You may choose to create a numeric variable from the


character value of SS in data set Division2 instead.
data division2n;
set division2(rename=(SS = CharSS));
SS = input(compress(CharSS,'-'),9.);
***Alternative:
SS = input(CharSS,comma11.);
drop CharSS;
run;

One-to-One, One-to-Many, and Many-to- Many Merges : LATER

Updating a Master File from a Transaction File :


If you have two data sets that have some common variables and you perform a data set merge, values in the second data
set replace values in the first data set, even if the values in the second data set are missing values.
This makes the UPDATE statement perfect for updating values in
a master data set from new values in a transaction data set. proc sort data=prices;
by ItemCode;
run;
proc sort data=new15dec2005;
by ItemCode;
run;
data prices_15dec2005;
update prices new15dec2005;
by ItemCode;
run;

OUTPUT :

Only nonmissing values of Price in the transaction data set


replaced values in the master file.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy