SuperLearnerMacro

Usage details

The sas script containing the SuperLearner macro actually contains 4 main macros: %SuperLearner, %_SuperLearner, %CVSuperLearner macro, and %_CVSuperLearner

0. Installing the macro

Option 1 - run the following two lines in SAS (requires internet connection each SAS session in which super learner is used):

FILENAME slgh URL "https://cirl-unc.github.io/SuperLearnerMacro/super_learner_macro.sas";
%INCLUDE slgh;

Option 2 - install from release version (requires initial internet connection):

Navigate to the release page of the super learner macro here
Download the zip/tar.gz file to your computer and open/unzip the file - you should see a folder called SuperLearnerMacro-XXXX, where XXXX is the release number
Run the following two lines in SAS (replacing appropriate path names):

FILENAME slgh "C:/path/to/SuperLearnerMacro-XXXX/super_learner_macro.sas";
%INCLUDE slgh;

Some examples of using the %SuperLearner macro are available here

1. Using %SuperLearner macro

Stacking is based on what Wolpert refers to as a set of ‘level-0’ models and a ‘level-1’ model, indexed by parameters $\mathbf{\beta}_m$ and $\mathbf{\alpha}$ in some study sample S. Where

Level-0: $\hat{Y}_{m}=f_m(\mathbf{x};\mathbf{\beta}_m,S)\mbox{ for }m\in1,\ldots,M$

Level-1: $\hat{Y}_{sl}=f_{sl}(\hat{\mathbf{Y}}_{\bar{m}};\mathbf{\alpha},S)$

The parameterization of the macro is based loosely on this notation. Each level-0 model is referred to as a ‘learner’ in the super learner library. A call to super learner is structured as follows:

%SuperLearner(
 Y=,
 X=,
 library=, 
 indata=, 
 preddata=, 
 outdata=sl_out,
 dist=GAUSSIAN,
 method=NNLS,
 id=,
 by=,
 intvars=,
 binary_predictors=,
 ordinal_predictors=,
 nominal_predictors=,
 continuous_predictors=,
 weight=, 
 trtstrat=false, 
 folds=10 
);

Macro parameters include the following:

Y: [value = variable name] the target variable, or outcome
X: [value = blank, or a space separated list of variable names] predictors of Y on the right side of the level-0 models. Note that this is a convenience function for the individual [coding]_predictors macro variables. The macro will make a guess at whether each predictor in X is continuous, categorical, or binary. (OPTIONAL but at least one of the X or [coding]_predictors - binary_predictors, ordinal_predictors, nominal_predictors, continuous_predictors - parameters must be specified, as described below). If X is specified and any one of the [coding]_predictors has a value, the macro will generate an error.
library: [value = a space separated list of learners] the names of the m level-0 models (e.g. glm lasso cart). A single learner can be used here if you only wish to know the cross-validated expected loss (e.g. mean-squared error). See all available default learners here and how to construct new learners here
indata: [value = an existing dataset name] the dataset used for analysis that contains Y and all predictors (and weight variables, if needed)
preddata: [OPTIONAL value = a dataset name] the validation dataset. A dataset which contains all predictors and possibly Y that is not used in model fitting but predictions for each learner and superlearner are made in these data
outdata: [value = a dataset name; default: sl_out] an output dataset that will contain all predictions as well as all variables and observations in the indata and preddata datasets
dist: [value = one of: GAUSSIAN,BERNOULLI; default GAUSSIAN] Super learner can be used to make predictions of a continuous (assumed gaussian in some learners) or a binary variable. Use GAUSSIAN for all continuous variables and BERNOULLI for all binary variables. Nominal/categorical variables currently not supported.
method:[value = one of: NNLS, CCNLS, OLS, NNLOGLIK, CCLOGLIK, LOGLIK, CCLAE, NNLAE, LAE, CCRIDGE, NNRIDGE, RIDGE, BNNLS, BCCNLS, BOLS, BNNLOGLIK, BCCLOGLIK, BLOGLIK, BCCLAE, BNNLAE, BLAE, BCCRIDGE, BNNRIDGE, BRIDGE, BCCLASSO, BNNLASSO, BLASSO; default NNLS] the method used to estimate the $\mathbf{\alpha}$ coefficients of the level-1 model. Methods are indexed by [prefix][suffix], where the prefix sets constraints and the suffix sets the mean model form
- prefixes: NN, CC, BNN, BCC, B, [none], where
  - B implies “big” and combines with other prefixes- non-B methods are fit via optimization in the OPTMODEL procedure. This method is sufficient and robust for many problems, but OPTMODEL may fail due to memory limitations with large datasets. The “B” methods are fit via the HPNLMODEL procedure, which does not have the same memory constraints, but may be less robust for routine use. If you get an error regarding running out of memory when running the macro or are using training data with > 500k observations, try switching to a “B” version of the method (e.g. BNNLS rather than NNLS). B methods may eventually become the default if robustness is established. In some difficult models (e.g. with method=LAE), B methods may be faster.
  - NN implies non-negative coefficients that are standardized after fitting to sum to 1.
  - CC implies a convexity constraint where the super learner fit is subject to a constraint that forces the coefficients to fall in [0,1] and sum to 1.0. No prefix implies no constraints (which results in some loss of asymptotic properties such as the oracle property). Note: OLS violates this naming convention, but LS will also be accepted and is equivalent to OLS
- suffixes: LS, RIDGE, LOGLIK, LAE, LASSO, where
  - LS methods use an L2 loss function (least squares - OLS, NNLS, CCLS, BOLS, BNNLS, BCCLS). This method selects the super learner fit by minimizing the (cross-validated) mean-squared error (GAUSSIAN distribution) or the Brier score (BERNOULLI distribution).
  - RIDGE methods use an L2 penalized L2 loss function (ridge regression - RIDGE, NNRIDGE, CCRIDGE, BRIDGE, BNNRIDGE, BCCRIDGE). Penalization level can be modified in the %_SuperLearner macro via the slridgepen parameter (default 0.3).
  - LOGLIK methods use a loss function corresponding to the binomial likelihood with a logit link function (logistic regression - LOGLIK, NNLOGLIK, CCLOGLIK, BLOGLIK, BNNLOGLIK, BCCLOGLIK)
  - LAE methods [experimental] use an L1 loss function (least absolute error), which will not penalize outliers as much as L2 methods, and is also non-differentiable at the minimum (median regression - LAE, NNLAE, CCLAE, BLAE, BNNLAE, BCCLAE) which may cause computational difficulties
  - LASSO methods use an L1 penalized L2 loss function (LASSO regression - B methods only: BLASSO, BNNLASSO, BCCLASSO). Penalization level can be modified in the %_SuperLearner macro via the slridgepen parameter (default 0.3).
id: [OPTIONAL value = variable name] a variable that uniquely identifies clusters or individuals within a dataset where there are possibly multiple records per cluster/individual (e.g. discrete hazard estimation).
by: [OPTIONAL value = variable name] a by variable in the usual SAS usage. Separate super learner fits will be specified for each level of the by variable (only one allowed, unlike typical “by” variables.
intvars:[OPTIONAL value = variable name] an intervention variable that is included in the list of predictors. This is a convenience function that will make separate predictions for the intvars variable at 1 or 0 (with all other predictors remaining at their observed levels)
binary_predictors: [value = blank, or a space separated list of variable names] advanced specification of predictors: a space separated list of binary predictors (OPTIONAL but at least one of the X or [coding]_predictors parameters must be specified)
ordinal_predictors: [value = blank, or a space separated list of variable names]advanced specification of predictors: a space separated list of ordinal predictors (OPTIONAL but at least one of the X or [coding]_predictors parameters must be specified)
nominal_predictors: [value = blank, or a space separated list of variable names]advanced specification of predictors: a space separated list of nominal predictors (OPTIONAL but at least one of the X or [coding]_predictors parameters must be specified)
continuous_predictors: [value = blank, or a space separated list of variable names] advanced specification of predictors: a space separated list of continuous predictors (OPTIONAL but at least one of the X or [coding]_predictors parameters must be specified)
weight: [OPTIONAL value = a variable name] a variable containing weights representing the relative contribution of each observation to the fit (a.k.a. case weights). Not all learners will respect non-integer weights, so weights will either be ignored or truncated by some procedures.
trtstrat: [value = true, false; DEFAULT: false] convenience function. If this is set to true and intvars is specified, then all fits will be stratified by levels of intvars. Levels 0,1 only.
folds: [value = integer ; default: 10] number of cross-validation folds to use.

2. %_SuperLearner macro

This is a version of the %SuperLearner macro for advanced users that may be somewhat faster due to reduced error checking, and offers finer level controls. If the %SuperLearner macro completes successfully, it will give some example code that can be run with %_SuperLearner. Of note, there is no checking or correction of parameter syntax, so the case-sensitive parameter arguments may cause an error in %_SuperLearner, but not %SuperLearner.

One main difference is that %_SuperLearner will make no guesses about variable types for X, so use of the [coding]_predictors parameters is required for correct specification. See the source code for documentation of additional options.

3. %CVSuperLearner macro

This macro is used to estimate the cross-validated expected loss of super learner itself. It does not produce predictions! This gives an idea about whether super learner is the appropriate learner to use in a given scenario, and allows some choice between parameters of the the super learner model, such as the method (e.g. NNLS vs. CCLS).

folds:[value = integer; default: 10] specifies two quantities (which can be individually specified in the %_CVSuperLearner macro):
1. slfolds number of “inner folds” (number of folds within each super learner fit) should only be different from cvslfolds in odd cases
2. cvslfolds: number of “outer folds” (the number of folds for cross-validating super learner) should only be different from slfolds in odd cases

Options repeated from %SuperLearner (see definitions given above)

Y, X, binary_predictors, ordinal_predictors, nominal_predictors, continuous_predictors, weight, indata, dist, library, method

4. %_CVSuperLearner macro

This is a version of the %CVSuperLearner macro for advanced users that may be somewhat faster due to reduced error checking, and offers finer level controls. See the source code for further tuning options.

Getting errors?

See the Troubleshooting help

Acknowledgements

This work was only possible with valuable advice and beta testing from the following people: Stephen R Cole, Jessie K Edwards, Katie M O'Brien, Eric Polley, Marie Stoner, Jennifer Winston and many others

Super learner macro home page