SuperLearnerMacro

The super learner macro was designed to give useful error codes when the errors could be anticipated ahead of time. Generally errors come down to typos when calling the code (e.g. using logti rather than logit in the library) or forgetting a comma in between macro parameters. For troubleshooting errors, try the steps below:

Troubleshooting

SAS ERROR: Procedure OPTMODEL not found
- Use one of the “B” methods: e.g. BNNLS, rather than NNLS.
Super learner gives an error that has “super learner” in the note that accompanies the error (check sas log)
- The macro was written to make most error text explain the problem and give a possible solution. Follow the instructions and/or check for spelling errors or omitted commas in the call to the macro.
The macro is taking a long time to run
- Is the dataset large? Super learner utilizes multiple, possibly slow algorithms and cross validation, which can be computationally intensive. For troubleshooting, try setting folds to a low number (e.g. 2) and keeping the library limited to one or two members (e.g. glm and cart). If it is still much slower than expected (e.g. it takes longer than fitting a glm 6 or 7 times), then try one of the steps below. If it speeds up as expected, then return folds to the default 10, add in additional library members and just wait patiently for results. Some potentially very slow learners are: loess, bspline
- Do you have a continuous variable with only a few values? Use continous_predictors, binary_predictors, ordinal_predictors, and nominal_predictors, rather than X to specify your predictors. The macro will try to make guesses, but it can make mistakes if you have a lot of duplicate values of continuous variables and/or if the dataset is sorted by a continuous predictor. In such cases, the macro will think you have a nominal variable, and proceed as though every value of the continous variable should be treated as its own class - this can be very, very slow and will yield poor answers.
Errors related to running out of memory
- Did this happen before super learner reached X out of X folds? (e.g. at 3 of 10 folds? - check SAS log). Try reducing the number of learners until you have identified which one caused the error. Some a priori candidates are the R based learners (since these run in memory), and gam.
- Did this happen after reaching the total number of folds (e.g. after reaching 10 of 10 folds? - check SAS log)? This likely means you have a large dataset. Try changing the method parameter by appending a “B” as a prefix (e.g. NNLS would become BNNLS). The default optimization routine in super learner can run out of memory but has been tested more thoroughly. The “B” (big) methods can handle much larger datasets.
Errors related to Newton-raphson algorithm or optimization failing
- This can happen with some learners in some datasets. Try reducing the number of learners until you have identified which one caused the error. If you can, find a suitable, similar replacement algorithm (e.g. if this happens in gam, try using gampl or bspline). You can also change the predictors you are using - this generally occurs because you are are attempting to make predictions in a very sparse region of data.
Other, puzzling errors? Send a note to akeil@unc.edu with as much detail as possible, including the code you used to call the macro and text from the SAS log and any output produced by the macro.

Main help page

Super learner macro home page