SAS Procedures
SAS Procedures
Procedures
PROC CHART PROC CORR PROC FREQ PROC PLOT PROC REG
PROC CONTENTS PROC FORMAT PROC MEANS PROC PRINT PROC UNIVARIATE
* Any syntax enclosed in brace brackets {} is optional in the procedure and may be omitted.
PROC CHART
Application: Graphing data in histograms (bar charts), pie graphs and star charts Syntax: proc chart data=filename; {title Title of Output;} {by variables}; vbar /hbar /pie/star variables { /options}; run;
Chart Types: VBAR: vertical bar graph HBAR: horizontal bar graph PIE: pie graph STAR: star graph (similar to vertical bar chart, but with bars radiating from central point) Common Chart Options: TYPE= : specifies what the bar or section of the chart represents FREQ: frequency of a value in the data (default) MEAN: mean of sumvar= variable for observations with the bars value PERCENT: percentage of observations with bars value SUM: sum of sumvar= variable for observations with the bars value CFREQ: value frequency plus all previous frequencies cpercent: value percentage plus all previous percentages SUMVAR= variable : specifies variable to calculate sum and means DISCRETE: used to specify discrete variables, rather than continuous GROUP= variable: displays values by group variable (eg. vbar agegrp /group=sex) MIDPOINTS= : defines midpoint for range of values. If not specified, SAS will pick midpoints
Discussion: The mainframe version of SAS doesnt have the graphics capabilities of Windows, so the graphing is very crude. I wouldnt use it for presentations, but it is sufficient to get a feel for the data.
27-April-2000
Page 1
PROC CONTENTS
Application: Describing the contents of a SAS dataset Syntax: proc contents data=filename ; run; Discussion: This is a very simple, yet very useful procedure. You should run proc contents on a new dataset to familiarize yourself with it before you begin to work with it.
PROC CORR
Application: Computing the correlation between variables Syntax: proc corr options data=filename; {title Title of Output;} {by variables;} var variables; {with variables;} {outp/outs/outk/outh=newfile;} run;
Procedure Options: with is used to obtain correlations for specific combinations of variables. Not specifying a with variable will compare all combinations of var variables. outp /outs /outk /outh outputs a dataset with Pearson, Spearman, Kendall or Hoeffding statistics respectively.
PROC FORMAT
Application: Assign labels to specific values Syntax: * This p rocedure must immediately follow the libname statements.
27-April-2000 SAS Procedures Workshop Page 2
proc format {library=library}; value numberf low-1=Less than 2 2=Two 3=Three 4-high=Four or more; value $charfmt A=Category A B=Category B; run;
* To create a permanent format, you must include the library=libname statement.
To Assign a Format: format variable numberfmt./$charfmt .; Discussion: This procedure makes variables more descriptive without needing recoding them and without taking up extra disk space by using long text variables.
PROC FREQ
Application: Calculating a frequency table for the values of a numeric or character variable, or the unique combinations of two or more variables in a dataset. Syntax: proc freq data=filename; {title Title of Output;} {by variables;} {weight variable;} tables variable{*variable} {/ list noprint out= newfile}; {format variable numberfmt. /$charfmt.;} run;
Tables Options: The frequencies of unique combinations of two or more variables can be calculated by listing the variables separated by an asterisk (*). LIST: prints the frequency table as a list, rather than a cross-tabulation table. NOPRINT: suppresses the output of the table. Useful when generating a large output dataset. out=newfile: outputs the frequency table to a SAS dataset. The new dataset includes the variables listed in the tables statement, as well as COUNT and PERCENT variables for each unique value/combination.
27-April-2000
Page 3
weight Statement: Normally, each observation contributes 1 to the frequency count. Using the weight statement however, contributes the value of the weight variable associated with that observation. Only one variable can be specified in the weight statement.
Discussion: Proc freq is the most useful procedure for generating a count of the unique values of a variable (eg. the number of males and females in a dataset or a count of sex by age group).
PROC MEANS
Application: Calculating statistics (means, sums, std) for numeric variables. Syntax: proc means statistics {nway noprint } data= filename ; {title Title of Output;} {class variables;} {by variables;} {weight variable;} var variables; {format variable numberfmt. /$charfmt.;} {output out= newfile statistic1=var1 statistic2=var2;} run;
Common Statistics: N: number of observations with non-missing values NMISS: number of observations with missing values MEAN: mean of non-missing values SUM: sum of non-missing values MIN: minimum non-missing value MAX: maximum non-missing value STD: standard deviation STDERR: standard error of the mean T: T-test value of hypothesis mean=0 PRT: probability of a greater absolute value for T-test value BY vs. CLASS: by and class statements both have the effect of generating statistics for analysis variables grouped by the by or class variables. The key differences are the need to sort the dataset before using the by statement and the extra memory used by the class statement. Unless you are grouping the analysis into a large number of levels, it is easiest to use the class statement. Outputting Datasets: When outputting datasets, you must include the nway option in the proc means statement. To output the statistics generated by proc means, use the output statement to assign a new dataset name and to select the statistics to be output. Not all statistics generated by proc means need to be included in the output dataset, but all analysis variables must be included.
27-April-2000 SAS Procedures Workshop Page 4
To output the sum and mean of two analysis variables, use the syntax:
weight Statement: The weight statement is used to generate a weighted a weighted statistic, where the value of x I becomes x I wi. Only one variable can be specified in the weight statement.
Discussion: Proc means is the single most useful SAS procedure. It allows you to generate all the commonly needed statistics without needing to sort the dataset before generating grouped analyses. It also allows you to (relatively) easily generate a dataset with summary statistics.
PROC PLOT
Application: Plotting one variable against another (eg. plotting a variable over time) Syntax: proc plot data=filename; {title Title of Output;} {by variables}; plot varY*varX{=+} {varY2*varX3 } {/options}; run;
* varY is plotted on the vertical axis; varX is plotted on the horizontal axis Common Proc Plot Options: OVERLAY: overlays all plots on one set of axes HAXIS = / VAXIS = : specifies values for horizontal/vertical axis (eg. haxis = 0 to 100 by 10) HZERO / VZERO: starts horizontal/vertical axis at zero HREVERSE / VREVERSE: reverses order of values on horizontal/vertical axis HREF= / VREF= : specifies values for horizontal/vertical reference lines
Discussion: Though the graphics capabilities are limited, proc plot allows you to examine correlations between variables or trends over time.
PROC PRINT
Application: Viewing the observations in a SAS dataset
27-April-2000
Page 5
Syntax: proc print {options} data=filename{(OBS= 100)}; {title Title of Output;} {by variables;} {id variables;} var variables; {sum variables;} {sumby byvars;} {pageby byvars;} {format variable numberfmt. /$charfmt.;} run;
Proc Print Options: LABEL : prints variable labels as column headings NOOBS: suppresses the observation number in the output UNIFORM: formats all pages of output uniformly (ie. lines up columns across pages) ROUND: rounds variables to 2 decimal places (values are rounded before summing) OBS= : specifies the number of observations to print. Useful for printing a subset. by and id variables: Specifying a by variable causes proc print to separate the output into groups. If an id variable is specified, proc print suppresses the observation number and prints the id variable(s) at the beginning of each line of proc print output. If the by and id variables are the same, proc print formats the output by printing the by variable at the beginning of the group and leaving the remainder of the column blank. (This is for output purposes only; it does not effect the data) sum and sumby variables: If a sum variable is specified, proc print totals the values of that variable. If a sumby variable is specified, the sum variable is totalled each time the value of the sumby variable changes. The sumby variables must be specified (in the same order) in the by statement. sum must be a numeric variable, but sumby may be character or numeric. pageby variables: If a pageby variable is specified, proc print begins a the output on a new page each time the value of the pageby variable changes. The pageby variables must be specified (in the same order) in the by statement.
Discussion: Proc print allows you to print data generated by another SAS prodedure with more control over the output variables and layout.
PROC REG
Application: Fitting linear regression models by least-squares method
27-April-2000
Page 6
Syntax: proc reg {options} data=filename ; {label:} model dependent = independents {/options}; {weight variable;} {plot dependent*(independents);} run;
Common Proc Reg Model Options: SELECTION= : Specifies options for selecting variables to be included in the model NONE: all variables are included in the model FORWARD: starts with no variables and adds the variable with next largest F-value BACKWARD: starts with all variables and removes smallest F-value STEPWISE: similar to FORWARD, but variables dont necessarily stay once added 2 MAXR: tries to find model with max R for 1 var, 2 vars, etc. 2 2 MINR: tries to find max R by removing vars with smallest contribution to R 2 RSQUARE: tries to find subsets of independent vars with best R 2 ADJRSQ: similar to RSQUARE, but uses adjusted R BEST= : specifies the maximum number of variables selected P: computes predicted values CLI: compute 95% confidence interval for predicted value DW: compute Durbin-Watson statistic
Discussion: This is the most general regression procedure. There are other regression procedures for more specific models (eg. GLM, logit, probit, etc.). Use the model selection options with caution models should be based on hypotheses rather than automated selection criteria.
PROC UNIVARIATE
Application: Calculating descriptive statistics, particularly details on distribution Syntax: proc univariate {options} data=filename; {by variables}; {weight variable;} var variables; {output out= newfile statistics;} run;
Proc Univariate Options: FREQ: generates a frequency table for variables specified in the var statement NORMAL: tests if variables specified in the var statement are normally distributed PLOT: generates a stem-and-leaf plot, box plot and normal probability plot for variables specified in the var statement
27-April-2000
Page 7
Outputting a dataset: Though proc univariate can generate a wide range of statistics, it will only display a fixed set of results. To view custom statistics, it is necessary to output the results to a new dataset. This is most common when generating percentile measures. To generate an output dataset, use the syntax: output out=newfile {statistics} {PCTLPRE=prefix} {PCTLPTS=percentiles}; Common Output Statistics: NOBS: number of observations N: number of non-missing values in observations NMISS: number of missing values in observations MEAN: mean STDMEAN: standard deviation of mean SUM: sum STD: standard deviation VAR: variance MEDIAN: median MODE: most frequent value (if >1 mode, smallest value) T: Students T-value PORBT: probability of greater absolute value for Students T value NORMAL: test statistic for normality PCTLPTS: percentile points; 0 PCTLPTS 100 PCTLPRE: percentile variable prefix Example: Outputting Quintile Statistics: To generate quintiles, use the syntax: output out=newfile PCTLPRE=q PCTLPTS=0 20 40 60 80 100; This will generate a dataset with the variables q0 q20 q40 q60 q80 q100 * Note that the combined length of PCTLPTS and PCTLPRE cannot exceed 8 characters.
Discussion: Proc univariate is most useful for generation distributional statistics about a variable. It is the only SAS procedure that can generate median or percentile statistics. Its only drawback is the need to create a new dataset to generate these statistics.
27-April-2000
Page 8