Efficiently train, tune, and deploy custom ensembles using Amazon SageMaker

Artificial intelligence (AI) has become an important and popular topic in the technology community. As AI has evolved, we have seen different types of machine learning (ML) models emerge. One approach, known as ensemble modeling, has been rapidly gaining traction among data scientists and practitioners. In this post, we discuss what ensemble models are and why their usage can be beneficial. We then provide an example of how you can train, optimize, and deploy your custom ensembles using Amazon SageMaker.

Ensemble learning refers to the use of multiple learning models and algorithms to gain more accurate predictions than any single, individual learning algorithm. They have been proven to be efficient in diverse applications and learning settings such as cybersecurity [1] and fraud detection, remote sensing, predicting best next steps in financial decision-making, medical diagnosis, and even computer vision and natural language processing (NLP) tasks. We tend to categorize ensembles by the techniques used to train them, their composition, and the way they merge the different predictions into a single inference. These categories include:

Boosting – Training sequentially multiple weak learners, where each incorrect prediction from previous learners in the sequence is given a higher weight and input to the next learner, thereby creating a stronger learner. Examples include AdaBoost, Gradient Boosting, and XGBoost.
Bagging – Uses multiple models to reduce the variance of a single model. Examples include Random Forest and Extra Trees.
Stacking (blending) – Often uses heterogenous models, where predictions of each individual estimator are stacked together and used as input to a final estimator that handles the prediction. This final estimator’s training process often uses cross-validation.

There are multiple methods of combining the predictions into the single one that the model finally produce, for example, using a meta-estimator such as linear learner, a voting method that uses multiple models to make a prediction based on majority voting for classification tasks, or an ensemble averaging for regression.

Although several libraries and frameworks provide implementations of ensemble models, such as XGBoost, CatBoost, or scikit-learn’s random forest, in this post we focus on bringing your own models and using them as a stacking ensemble. However, instead of using dedicated resources for each model (dedicated training and tuning jobs and hosting endpoints per model), we train, tune, and deploy a custom ensemble (multiple models) using a single SageMaker training job and a single tuning job, and deploy to a single endpoint, thereby reducing possible cost and operational overhead.

BYOE: Bring your own ensemble

There are several ways to train and deploy heterogenous ensemble models with SageMaker: you can train each model in a separate training job and optimize each model separately using Amazon SageMaker Automatic Model Tuning. When hosting these models, SageMaker provides various cost-effective ways to host multiple models on the same tenant infrastructure. Detailed deployment patterns for this kind of settings can be found in Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker. These patterns include using multiple endpoints (for each trained model) or a single multi-model endpoint, or even a single multi-container endpoint where the containers can be invoked individually or chained in a pipeline. All these solutions include a meta-estimator (for example in an AWS Lambda function) that invokes each model and implements the blending or voting function.

However, running multiple training jobs might introduce operational and cost overhead, especially if your ensemble requires training on the same data. Similarly, hosting different models on separate endpoints or containers and combining their prediction results for better accuracy requires multiple invocations, and therefore introduces additional management, cost, and monitoring efforts. For example, SageMaker supports ensemble ML models using Triton Inference Server, but this solution requires the models or model ensembles to be supported by the Triton backend. Additionally, additional efforts are required from the customer to set up the Triton server and additional learning to understand how different Triton backends work. Therefore, customers prefer a more straightforward way to implement solutions where they only need to send the invocation once to the endpoint and have the flexibility to control how the results are aggregated to generate the final output.

Solution overview

To address these concerns, we walk through an example of ensemble training using a single training job, optimizing the model’s hyperparameters and deploying it using a single container to a serverless endpoint. We use two models for our ensemble stack: CatBoost and XGBoost (both of which are boosting ensembles). For our data, we use the diabetes dataset [2] from the scikit-learn library: It consists of 10 features (age, sex, body mass, blood pressure, and six blood serum measurements), and our model predicts the disease progression 1 year after baseline features were collected (a regression model).

The full code repository can be found on GitHub.

Train multiple models in a single SageMaker job

For training our models, we use SageMaker training jobs in Script mode. With Script mode, you can write custom training (and later inference code) while using SageMaker framework containers. Framework containers enable you to use ready-made environments managed by AWS that include all necessary configuration and modules. To demonstrate how you can customize a framework container, as an example, we use the pre-built SKLearn container, which doesn’t include the XGBoost and CatBoost packages. There are two options to add these packages: either extend the built-in container to install CatBoost and XGBoost (and then deploy as a custom container), or use the SageMaker training job script mode feature, which allows you to provide a requirements.txt file when creating the training estimator. The SageMaker training job installs the listed libraries in the requirements.txt file during run time. This way, you don’t need to manage your own Docker image repository and it provides more flexibility to running training scripts that need additional Python packages.

The following code block shows the code we use to start the training. The entry_point parameter points to our training script. We also use two of the SageMaker SDK API’s compelling features:

First, we specify the local path to our source directory and dependencies in the source_dir and dependencies parameters, respectively. The SDK will compress and upload those directories to Amazon Simple Storage Service (Amazon S3) and SageMaker will make them available on the training instance under the working directory /opt/ml/code.
Second, we use the SDK SKLearn estimator object with our preferred Python and framework version, so that SageMaker will pull the corresponding container. We have also defined a custom training metric ‘validation:rmse‘, which will be emitted in the training logs and captured by SageMaker. Later, we use this metric as the objective metric in the tuning job.

hyperparameters = {“num_round”: 6, “max_depth”: 5}
estimator_parameters = {
    “entry_point”: “multi_model_hpo.py”,
    “source_dir”: “code”,
    “dependencies”: [“my_custom_library”],
    “instance_type”: training_instance_type,
    “instance_count”: 1,
    “hyperparameters”: hyperparameters,
    “role”: role,
    “base_job_name”: “xgboost-model”,
    “framework_version”: “1.0-1”,
    “keep_alive_period_in_seconds”: 60,
       {‘Name’: ‘validation:rmse’, ‘Regex’: ‘validation-rmse:(.*?);’}
estimator = SKLearn(**estimator_parameters)

Next, we write our training script (multi_model_hpo.py). Our script follows a simple flow: capture hyperparameters with which the job was configured and train the CatBoost model and XGBoost model. We also implement a k-fold cross validation function. See the following code:

if __name__ == “__main__”:
    parser = argparse.ArgumentParser()

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument(“–output-data-dir”, type=str, default=os.environ[“SM_OUTPUT_DATA_DIR”])
    parser.add_argument(“–model-dir”, type=str, default=os.environ[“SM_MODEL_DIR”])
    parser.add_argument(“–train”, type=str, default=os.environ[“SM_CHANNEL_TRAIN”])
    parser.add_argument(“–validation”, type=str, default=os.environ[“SM_CHANNEL_VALIDATION”])
    Train catboost
    K = args.k_fold    
    catboost_hyperparameters = {
        “max_depth”: args.max_depth,
        “eta”: args.eta,
    rmse_list, model_catboost = cross_validation_catboost(train_df, K, catboost_hyperparameters)
    Train the XGBoost model

    hyperparameters = {
        “max_depth”: args.max_depth,
        “eta”: args.eta,
        “objective”: args.objective,
        “num_round”: args.num_round,

    rmse_list, model_xgb = cross_validation(train_df, K, hyperparameters)

After the models are trained, we calculate the mean of both the CatBoost and XGBoost predictions. The result, pred_mean, is our ensemble’s final prediction. Then, we determine the mean_squared_error against the validation set. val_rmse is used for the evaluation of the whole ensemble during training. Notice that we also print the RMSE value in a pattern that fits the regex we used in the metric_definitions. Later, SageMaker Automatic Model Tuning will use that to capture the objective metric. See the following code:

pred_mean = np.mean(np.array([pred_catboost, pred_xgb]), axis=0)
val_rmse = mean_squared_error(y_validation, pred_mean, squared=False)
print(f”Final evaluation result: validation-rmse:{val_rmse}”)

Finally, our script saves both model artifacts to the output folder located at /opt/ml/model.

When a training job is complete, SageMaker packages and copies the content of the /opt/ml/model directory as a single object in compressed TAR format to the S3 location that you specified in the job configuration. In our case, SageMaker bundles the two models in a TAR file and uploads it to Amazon S3 at the end of the training job. See the following code:

model_file_name = ‘catboost-regressor-model.dump’
    # Save CatBoost model
    path = os.path.join(args.model_dir, model_file_name)
    print(‘saving model file to {}’.format(path))
   # Save XGBoost model
   model_location = args.model_dir + “/xgboost-model”
   pickle.dump(model, open(model_location, “wb”))
   logging.info(“Stored trained model at {}”.format(model_location))

In summary, you should notice that in this procedure we downloaded the data one time and trained two models using a single training job.

Automatic ensemble model tuning

Because we’re building a collection of ML models, exploring all of the possible hyperparameter permutations is impractical. SageMaker offers Automatic Model Tuning (AMT), which looks for the best model hyperparameters by focusing on the most promising combinations of values within ranges that you specify (it’s up to you to define the right ranges to explore). SageMaker supports multiple optimization methods for you to choose from.

We start by defining the two parts of the optimization process: the objective metric and hyperparameters we want to tune. In our example, we use the validation RMSE as the target metric and we tune eta and max_depth (for other hyperparameters, refer to XGBoost Hyperparameters and CatBoost hyperparameters):

from sagemaker.tuner import (

hyperparameter_ranges = {
    “eta”: ContinuousParameter(0.2, 0.3),
    “max_depth”: IntegerParameter(3, 4)
metric_definitions = [{“Name”: “validation:rmse”, “Regex”: “validation-rmse:([0-9\.]+)”}]
objective_metric_name = “validation:rmse”

We also need to ensure in the training script that our hyperparameters are not hardcoded and are pulled from the SageMaker runtime arguments:

catboost_hyperparameters = {
    “max_depth”: args.max_depth,
    “eta”: args.eta,

SageMaker also writes the hyperparameters to a JSON file and can be read from /opt/ml/input/config/hyperparameters.json on the training instance.

Like CatBoost, we also capture the hyperparameters for the XGBoost model (notice that objective and num_round aren’t tuned):

catboost_hyperparameters = {
    “max_depth”: args.max_depth,
    “eta”: args.eta,

Finally, we launch the hyperparameter tuning job using these configurations:

tuner = HyperparameterTuner(
tuner.fit({“train”: train_location, “validation”: validation_location}, include_cls_metadata=False)

When the job is complete, you can retrieve the values for the best training job (with minimal RMSE):

attached_tuner = HyperparameterTuner.attach(job_name)

For more information on AMT, refer to Perform Automatic Model Tuning with SageMaker.


To deploy our custom ensemble, we need to provide a script to handle the inference request and configure SageMaker hosting. In this example, we used a single file that includes both the training and inference code (multi_model_hpo.py). SageMaker uses the code under if _ name _ == “_ main _” for the training and the functions model_fn, input_fn, and predict_fn when deploying and serving the model.

Inference script

As with training, we use the SageMaker SKLearn framework container with our own inference script. The script will implement three methods required by SageMaker.

First, the model_fn method reads our saved model artifact files and loads them into memory. In our case, the method returns our ensemble as all_model, which is a Python list, but you can also use a dictionary with model names as keys.

def model_fn(model_dir):
    catboost_model = CatBoostRegressor()
    catboost_model.load_model(os.path.join(model_dir, model_file_name))
    model_file = “xgboost-model”
    model = pickle.load(open(os.path.join(model_dir, model_file), “rb”))
    all_model = [catboost_model, model]
    return all_model

Second, the input_fn method deserializes the request input data to be passed to our inference handler. For more information about input handlers, refer to Adapting Your Own Inference Container.

def input_fn(input_data, content_type):
    payload = StringIO(input_data)
    return np.genfromtxt(payload, dtype=dtype, delimiter=”,”)

Third, the predict_fn method is responsible for getting predictions from the models. The method takes the model and the data returned from input_fn as parameters and returns the final prediction. In our example, we get the CatBoost result from the model list first member (model[0]) and the XGBoost from the second member (model[1]), and we use a blending function that returns the mean of both predictions:

def predict_fn(input_data, model):
    predictions_catb = model[0].predict(input_data)
    dtest = xgb.DMatrix(input_data)
    predictions_xgb = model[1].predict(dtest,
                                          ntree_limit=getattr(model, “best_ntree_limit”, 0),
    return np.mean(np.array([predictions_catb, predictions_xgb]), axis=0)

Now that we have our trained models and inference script, we can configure the environment to deploy our ensemble.

SageMaker Serverless Inference

Although there are many hosting options in SageMaker, in this example, we use a serverless endpoint. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic. This takes away the undifferentiated heavy lifting of managing servers. This option is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts.

Configuring the serverless endpoint is straightforward because we don’t need to choose instance types or manage scaling policies. We only need to provide two parameters: memory size and maximum concurrency. The serverless endpoint automatically assigns compute resources proportional to the memory you select. If you choose a larger memory size, your container has access to more vCPUs. You should always choose your endpoint’s memory size according to your model size. The second parameter we need to provide is maximum concurrency. For a single endpoint, this parameter can be set up to 200 (as of this writing, the limit for total number of serverless endpoints in a Region is 50). You should note that the maximum concurrency for an individual endpoint prevents that endpoint from taking up all the invocations allowed for your account, because any endpoint invocations beyond the maximum are throttled (for more information about the total concurrency for all serverless endpoints per Region, refer to Amazon SageMaker endpoints and quotas).

from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(

Now that we configured the endpoint, we can finally deploy the model that was selected in our hyperparameter optimization job:

predictor = estimator.deploy(serverless_inference_config=serverless_config)

Clean up

Even though serverless endpoints have zero cost when not being used, when you have finished running this example, you should make sure to delete the endpoint:



In this post, we covered one approach to train, optimize, and deploy a custom ensemble. We detailed the process of using a single training job to train multiple models, how to use automatic model tuning to optimize the ensemble hyperparameters, and how to deploy a single serverless endpoint that blends the inferences from multiple models.

Using this method solves potential cost and operational issues. The cost of a training job is based on the resources you use for the duration of usage. By downloading the data only once for training the two models, we reduced by half the job’s data download phase and the used volume that stores the data, thereby reducing the training job’s overall cost. Furthermore, the AMT job ran four training jobs, each with the aforementioned reduced time and storage, so that represent 4 times in cost saving! With regard to model deployment on a serverless endpoint, because you also pay for the amount of data processed, by invoking the endpoint only once for two models, you pay half of the I/O data charges.

Although this post only showed the benefits with two models, you can use this method to train, tune, and deploy numerous ensemble models to see an even greater effect.


[1] Raj Kumar, P. Arun; Selvakumar, S. (2011). “Distributed denial of service attack detection using an ensemble of neural classifier”. Computer Communications. 34 (11): 1328–1341. doi:10.1016/j.comcom.2011.01.012.

[2] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) “Least Angle Regression,” Annals of Statistics (with discussion), 407-499. (https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)

About the Authors

Melanie Li, PhD, is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers to build solutions leveraging the state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing machine learning solutions with best practices. In her spare time, she loves to explore nature outdoors and spend time with family and friends.

Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers to design, build, and operate ML workloads at scale. In his spare time, he enjoys cycling, hiking, and minimizing RMSEs.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *