Base SAS - Learning by Example PPT (CH8 - CH10)
Base SAS - Learning by Example PPT (CH8 - CH10)
This chapter covers DO groups, DO loops, DOWHILE statements, and DO UNTIL statements.
The LEAVE statement in the SAS DATA step is equivalent to the "break" statement. It provides a way to immediately exit
from an iterative loop. The CONTINUE statements in the SAS DATA step skips over any remaining statements in the body
of a loop and starts the next iteration.
Do Year = 1 to 100;
Output will get skip for Year = 5; but the will continue for the other Years. ……………..
But if you use Leave instead, It will break and end the iterative loop. If Year = 5 then continue;
Output;
End;
Working with Dates – Chapter 9
SAS can read dates in almost any form, such as the following:
Date Description :
• 10/21/1950 Month – Day – Year
• 21/10/1950 Day – Month – Year
• 21Oct1950 Day – Month Abbreviation – Year
• 50294 Julian Date
However, SAS does not normally store dates in any of these forms—
it converts all of these dates into a single number—
“the number of days from January 1, 1960. Dates after January 1, 1960, are positive integers; dates
before January 1, 1960, are negative integers.”
January 1, 1960 :0
January 2, 1960 :1
December 31, 1959 : -1
June 15, 2006 :16,967
October 21, 1950 :-3,359
Reading Date Values from Raw Data : 1 2 3
1234567890123456789012345678901234567
You have a raw data file, as shown here: Columns
The first three dates (starting in columns 5, 16, and 25) 001 10/21/1950 05122003 08/10/65 23Dec2005
are in the month-day-year form; 002 01/01/1960 11122009 09/13/02 02Jan1960
The last date (starting in column 34) starts with the day of the month, a three-letter month
abbreviation, and a four-digit year. data four_dates;
infile 'c:\books\learning\dates.txt' truncover;
Here is the Program to read the file : input @1 Subject $3.
@5 DOB mmddyy10.
@16 VisitDate mmddyy8.
You typically want to use either the TRUNCOVER or PAD @26 TwoDigit mmddyy8.
option when reading raw data files with data in fixed @34 LastDate date9.;
columns (see Chapter 21). run;
Because the date of birth (DOB) takes up 10 columns, MMDDYY10. is the proper
informat to use. Similarly for others.
The number of columns used for LastDate is 9, and Listing of FOUR_DATES
the informat name is DATE. Visit Two Last
Subject DOB Date Digit Date
SAS converts all of the dates to their corresponding numerical value. 001 -3359 15837 2048 16793
002 0 18213 15596 1
PROC PRINT will show :
Test.txt
FLOWOVER, STOPOVER, MISSOVER, and TRUNCOVER : 22
333
infile 'your-external-file' flowover; 4444
is the default behavior. The DATA step simply reads the next record data test; 55555
into the input buffer, attempting to find values to assign to the rest infile "/folders/myfolders/test.txt“ flowover;
of the variable names in the INPUT statement. input tt 5.;
infile 'your-external-file' stopover; run;
causes the DATA step to stop processing if an INPUT statement proc print data=test;
reaches the end of the current record without finding values for all title 'Test DATA Step';
variables in the statement. Use this option if you expect all of the run;
STOPOVER;
data in the external file to conform to a given standard and if you
want the DATA step to stop when it encounters a data record that
does not conform to the standard.
infile 'your-external-file' missover;
prevents the DATA step from going to the next line if it does not
find values in the current record for all of the variables in the INPUT MISSOVER; TRUNCOVER;
statement. Instead, the DATA step assigns a missing value for all
variables that do not have values.
infile 'your-external-file' truncover;
causes the DATA step to assign the raw data value to the variable
even if the value is shorter than expected by the INPUT statement.
If, when the DATA step encounters the end of an input record, there
are variables without values, the variables are assigned missing
values for that observation
data four_dates;
You can choose any SAS date format to display these dates properly —
infile 'c:\books\learning\dates.txt' truncover;
format DOB VisitDate date9.
input @1 Subject $3.
TwoDigit LastDate mmddyy10.;
@5 DOB mmddyy10.
@16 VisitDate mmddyy8.
If you see closely mmddyy tells about the year while reading.
@26 TwoDigit mmddyy8.
So system had only information about last two digits of YYYY but while
@34 LastDate date9.;
Converting them to YYYY (format option) - it changed “TwoDigit”
format DOB VisitDate date9.
1 record to 1965
TwoDigit LastDate mmddyy10.;
2 record to 2002
run;
How did SAS figure out whether to make the first two digits 19 or 20?
Listing of FOUR_DATES
There is a system option called YEARCUTOFF that enables
Subject DOB VisitDate TwoDigit LastDate
SAS to compute these values.
001 21OCT1950 12MAY2003 08/10/1965 12/23/2005
The default value for this option is 1920 in SAS 8 and SAS®9.
002 01JAN1960 12NOV2009 09/13/2002 01/02/1960
This value determines the start of a 100-year interval
that SAS uses when it encounters a two-digit year.
With a YEARCUTOFF value of 1920, all two-digit years are in the interval from 1920 to 2019.
65 comes as 19
02 comes as 20 that’s why 1965 and 2002.
NOTE :You might consider including a statement such as the following in each of your programs in case the version of SAS
you are using changed the default value of this option:
options yearcutoff=1920; (Of course, the best practice is to always use four-digit years!)
Computing the Number of Years between Two Dates : data ages;
set four_dates;
YRDIF, to compute the difference in years between Age = yrdif(DOB,VisitDate,'Actual');
the date of birth and the visit date. run;
You can specify values other than ACTUAL if you want to use 30-day title "Listing of AGES";
months and 360-day years (to compute bond interest and other financial proc print data=ages;
calculations, for example). id Subject;
var DOB VisitDate Age;
If you want the Age as of the person’s last birthday (dropping any run;
fractional part of a year), you can use the INT (integer) function:
Age = int(yrdif(DOB,VisitDate,'Actual'));
If you want to round the Age to the nearest year, you can use the ROUND function:
Age = round(yrdif(DOB,VisitDate,'Actual'));
This expression gives an approximate value for the number of years between two dates
(the .25 accounts for a leap year every four years). Because the YRDIF function has the
ability to return an exact value, you should use it rather than this expression.
Demonstrating a Date Constant :
How do you compute a person’s age as of a certain date, January 1, 2006, for example?
data ages;
The general form of a date constant is a one- or two-digit day of the month, a set four_dates;
Three character month abbreviation, and a two- or four-digit year in single or double Age =
Quotation marks, followed by an upper- or lowercase d. This is the only form allowed yrdif(DOB,'01Jan2006'd,'Actual');
as a date constant. You cannot use '01/01/2006'd, run;
title "Listing of AGES";
proc print data=ages;
You could use the expression Age = yrdif(DOB,16802,'Actual'); id Subject;
if you happen to know that January 1, 2006, is 16,802 days after January 1, 1960. var DOB Age;
format Age 5.1;
run;
Computing the Current Date :
Suppose you want to compute a quantity based on the current date. data ages;
set four_dates;
Age = yrdif(DOB,today(),'Actual');
run;
Extracting the Day of the Week, Day of the Month, Month, and Year from a SAS Date : data extract;
set four_dates;
The WEEKDAY function returns the day of the week, with 1 = Sunday, 2 = Monday, Day = weekday(DOB);
and so on, and the DAY function returns the day of the month (a number from 1 to 31). DayOfMonth = day(DOB);
Month = Month(DOB);
The MONTH function returns a Year = year(DOB);
number from 1 to 12 and the YEAR function returns a four-digit year value. run;
( -- ) This VAR statement includes all of the variables from title "Listing of EXTRACT";
Day through Year in the order they are stored in the SAS data set. proc print data=extract noobs;
var DOB Day –- Year;
Creating a SAS Date from Month, Day, and Year Values : run;
A very useful function, MDY (month day year), allows you to create a SAS date value by data mdy_example;
supplying month, day, and year values. This function is especially useful if you have a set learn.month_day_year;
SAS data set that contains these values but does not contain the corresponding SAS date Date = mdy(Month, Day, Year);
value or if you have values for month, day, and year in a raw data file that do not format Date mmddyy10.;
conform to any of the SAS date informats. run;
5 got translated to 2005 . ( As per YEARCUTOFF Value, in this case 1920) Obs Month Day Year Date
1 10 21 1950 10/21/1950
2 1 15 5 01/15/2005
3 3 . 2005 .
4 5 7 2000 05/07/2000
Substituting the 15th of the Month when the Day Value Is Missing : data substitute;
There are occasions where you have a missing value for the day of the set learn.month_day_year;
month but still want to compute an approximate date. Many people if missing(day) then Date = mdy(Month,15,Year);
use the 15th of the month to substitute for a missing Day value. else Date = mdy(Month,Day,Year);
format Date mmddyy10.;
Here the MISSING function tests if there is a missing value for the run;
variable Day. If so, the number 15 is used as the second argument
to the MDY function.
Obs Month Day Year Date
Using Date Interval Functions : 1 10 21 1950 10/21/1950
2 1 15 5 01/15/2005
Two functions, INTCK and INTNX, deal with date intervals (such as months, 3 3 . 2005 03/15/2005
quarters, years). The INTCK function computes the number of intervals 4 5 7 2000 05/07/2000
between two dates; the INTNX function computes a date after a given
number of intervals.
To understand even the most basic use of these two functions, you must understand that
they both deal with interval boundaries. INTCK('year','01Jan2005'd,'31Dec2005) : 0
INTCK('year','31Dec2005'd,'01Jan2006) : 1
INTCK('month','01Jan2005'd,'31Jan2005'd) : 0
INTCK('month','31Jan2005'd,'01Feb2005'd) : 1
INTCK('qtr','25Mar2005'd,'15Apr2005'd) : 1
Subsetting and Combining SAS Data Sets – Chapter 10
data females;
If you do not need one or more variables from the input data set set learn.survey(drop=Salary);
(the data set on the SET statement), you can use a DROP= or KEEP= where gender = 'F';
data set option. run;
There is an important difference between using a DROP= data set option on the input data set and placing a DROP statement
somewhere in the DATA step.
In this Example - the variable Salary is not present in the Program Data Vector (PDV).
Creating More Than One Subset Data Set in One DATA Step :
You can create multiple SAS data sets from one input data set (something that SQL
cannot do).
Notice that you must name the data set following the OUTPUT statement.
If you do not, SAS outputs the observation to all the data sets listed in the data males females;
DATA statement. set learn.survey;
if gender = 'F' then output females;
else if gender = 'M' then output males;
run;
Adding Observations to a SAS Data Set :
Suppose you want to create a single data set from several similar data sets. You can list as many data sets as you want on a
SET statement and SAS will add all the observations together to form a single data set.
data one_two;
Each of the data sets contains the same variables. set one two;
run;
output
To combine this single value with every observation in the Blood data set, you execute a SET statement conditionally. Here’s
how it works. data percent;
set learn.blood(keep=Subject Chol);
if _n_ = 1 then set means;
PerChol = Chol / AveChol;
format PerChol percent8.;
run;
You can create a character variable based on the numeric value of SS in the Division1 data division1c;
data set or you can create a numeric variable based on the character variable in the set division1(rename=(SS = NumSS));
Division2 data set. SS = put(NumSS,ssn11.);
drop NumSS;
Because the two data sets use the same name for the BY variable, first you have to run;
rename this variable and then create a new character variable with the variable\ data both_divisions;
name of SS. ***Note: Both data sets already in
The SSN11. format is a built-in SAS format that prints leading 0s and adds dashes as order of BY variable;
required for Social Security numbers. merge division1c division2;
by SS;
run;
Here is the output of the merge : ------
OUTPUT :