Machine Learning

Machine Learning Infrastructure with Amazon SageMaker and Terraform — A Case of Fraud Detection

terraform deploy

Machine Learning Infrastructure with Amazon SageMaker and Terraform — A Case of Fraud Detection

A. Introduction

The goal of this solution is to predict whether a given credit card transaction is fraudulent or not.

B. Solution overview

There are 3 main parts to this solution:

  1. Storing the model in an accessible interface.
  2. Using the model to predict whether a generated transaction is fraudulent.
Source: https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a

C. Implementation walkthrough

Let’s follow the implementation based on the above 3 parts.

1. Training a model based on a sample credit card transaction dataset.

To train a model, these components are needed:

1a. Infrastructure

This is where it starts to get technical. If you are not interested in infrastructure, you can skip this.

Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/iam_sagemaker.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/s3_lambda.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/s3_sagemaker.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/sagemaker.tf

1b. Notebook instance init script

In order for the notebook instance to obtain the Jupyter notebook located in our source code, we have to add an init script that downloads the notebook from the aws_s3_bucket.fraud_detection_function_bucket above (we uploaded using the aws_s3_bucket_object). This is done using the notebook instance’s lifecycle configuration:

Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/sagemaker.tf
sagemaker_instance_init.sh

1c. Jupyter Notebook Logic for training

These are the main steps in the Jupyter notebook:

import sagemaker.amazon.common as smac
buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, features, labels)
bucket = "fraud-detection-end-to-end-demo"
prefix = 'linear-learner'
key = 'recordio-pb-data'
boto3.resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', key)).upload_fileobj(buf)
container = get_image_uri(boto3.Session().region_name, 'linear-learner')
import sagemaker
s3_train_data = 's3://{}/{}/train/{}'.format(bucket, prefix, key)
output_location = 's3://{}/{}/output'.format(bucket, prefix)
linear = sagemaker.estimator.Estimator(container,
get_execution_role(),
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
output_path=output_location,
sagemaker_session=session)
linear.set_hyperparameters(feature_dim=features.shape[1],
predictor_type='binary_classifier',
mini_batch_size=200)
linear.fit({'train': s3_train_data})

2. Storing the model in an accessible interface.

To store a model and present it to the outside world for prediction later on, these components are required:

linear_predictor = linear.deploy(initial_instance_count=1,                                 
endpoint_name="fraud-detection-endpoint",
instance_type='ml.m4.xlarge')

3. Using the model to predict whether a generated transaction is fraudulent.

With the created endpoint, we can now use it on any generated transaction.

  • an AWS Lambda function is created using code located at source/fraud_detection/index.py in the code repository. It follows these steps: randomly selects a predefined fraud or nonfraud transaction, sends the transaction to the endpoint to obtain a fraud prediction, sends the result to a Kinesis Data Firehose after some minor processing.
  • a Kinesis Data Firehose that stores streamed results into S3.
  • and finally, the S3 bucket.
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/cloudwatch_event.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/iam_lambda.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/s3_lambda.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/lambda.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/iam_kinesis.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/s3_kinesis.tf
Complete code: https://github.com/qtangs/tf-fraud-detection-using-machine-learning/blob/master/terraform/kinesis.tf

D. Deployment

Terraform code is located in the folder terraform (original CloudFormation can be found in cloudformation).

terraform_backend.tf.template
terraform.tfvars.template
# Set default AWS profile,
# use 'set' instead of 'export' for Windows.
export AWS_PROFILE=<your desired profile>
terraform init
terraform validate
terraform plan -out=tfplan
terraform apply --auto-approve tfplan

Final manual steps

Once all Terraform resources are set up, you need to follow these manual steps as documented by the AWS site:

  1. In the navigation pane, select Notebook instances.
  2. Select FraudDetectionNotebookInstance.
  3. The notebook instance should already be running.
  4. Select Open Jupyter.
  5. In the Jupyter notebook interface, open the sagemaker_fraud_detection.ipynb file.
  6. In the Cell dropdown menu, select Run All to run the file.
  1. In the navigation pane, select Functions.
  2. Select the fraud_detection_event_processor Lambda function.
  3. In the diagram in the Designer tab, select CloudWatch Events.
  4. In the CloudWatch Events tab, select <stackname>-ScheduleRule-<id>.
  5. Select Actions > Enable.
  6. Select Enable.
  1. In the navigation pane, select Functions.
  2. Select the fraud_detection_event_processor Lambda function.
  3. Select Monitoring and verify that the Invocations graph shows activity.
  4. After a few minutes, check the results Amazon S3 bucket for processed transactions.

E. Cleanup

Once you are done with the experiment, simply perform the followings to delete all resources and save costs. Again, remember to review the terraform plan before applying.

terraform plan -destroy -out=tfplan
terraform apply tfplan

F. Summary

Amazon SageMaker is a powerful platform, we have just barely touched its surface.

Leave your thought here

Your email address will not be published. Required fields are marked *

Wishlist 0
Open wishlist page Continue shopping