RDS Auto Restart Protection

Abstract

Customers needing to keep an Amazon Relational Database Service (Amazon RDS) instance stopped for more than 7 days, look for ways to efficiently re-stop the database after being automatically started by Amazon RDS. If the databas…


This content originally appeared on DEV Community and was authored by 🚀 Vu Dao 🚀

Abstract

  • Customers needing to keep an Amazon Relational Database Service (Amazon RDS) instance stopped for more than 7 days, look for ways to efficiently re-stop the database after being automatically started by Amazon RDS. If the database is started and there is no mechanism to stop it; customers start to pay for the instance’s hourly cost

  • Stopping and starting a DB instance is faster than creating a DB snapshot, and then restoring the snapshot.

  • This blog provides a step-by-step approach to automatically stop an RDS cluster with fully serverless and using Pulumi to create AWS resources

Table Of Contents

🚀 Overview of Pulumi

  • Why Pulumi? Pulumi enables developers to write infrastructure as code in their favorite languages, such as TypeScript, JavaScript, Python, and Go.

  • Here is general steps-by-step to create pulumi project and its stack

  1. Create new project
pulumi new aws-typescript
  1. Set up aws profile
  2. When create/init a stack pulumi stack init the Pulumi.<stack-name>.yaml is not created so we have to set config ourselves
pulumi config set aws:region ap-northeast-2
pulumi config set aws:profile myprofile
  1. Pulumi bash completion
  2. Feel lazy for typing? Setup bashcompletion for pulumi
pulumi gen-completion bash > /etc/bash_completion.d/pulumi
  • Update .bashrc for alias
# add Pulumi to the PATH
export PATH=$PATH:$HOME/.pulumi/bin
alias plm='/home/vudao/.pulumi/bin/pulumi'
complete -F __start_pulumi plm
  1. Import existing resources
  2. For creating new RDS cluster to test the flow, we can import existing Security group or anything to the stack
pulumi import aws:ec2/securityGroup:SecurityGroup vpc_sg sg-13a02c7a
  1. Refresh the stack
  2. If we manually delete the resources which are managed by the stack we can run refresh to update stack resource status
pulumi refresh

🚀 Solution overview

RDS Auto Restart Protection

  • The solution relies on RDS event notifications. Once a stopped RDS instance is started by AWS due to exceeding the maximum time in the stopped state; an event (RDS-EVENT-0154) is generated by RDS.

  • The RDS event is pushed to a dedicated SNS topic sns-rds-event.

  • The Lambda function start-step-func-rds is subscribed to the SNS topic sns-rds-event

    • The function filters messages with event code: RDS-EVENT-0153 (The DB cluster is being started due to it exceeding the maximum allowed time being stopped.), plus the function validates that the RDS instance is tagged with auto-restart-protection and that the tag value is set to yes.
    • Once all conditions are met, the Lambda function starts the AWS Step Functions state machine execution.
  • The AWS Step Functions state machine integrates with two Lambda functions in order to retrieve the instance state, as well as attempt to stop the RDS instance.

    • In case the instance state is not ‘available’, the state machine waits for 5 minutes and then re-checks the state.
    • Finally, when the Amazon RDS instance state is ‘available’; the state machine will attempt to stop the Amazon RDS instance.
  • Note: This blog is for handling RDS cluster with multiple intances, for single instance, catch RDS-EVENT-0154: The DB instance is being started due to it exceeding the maximum allowed time being stopped.

Let's start writing IaC using Pulumi and typescript

🚀 Create RDS cluster with multiple instances

  • Create RDS cluster with one or more instances
  • Using the imported existing VPC (optional)

rds.ts

import * as aws from "@pulumi/aws";

const vpc_sg = new aws.ec2.SecurityGroup("vpc_sg",
    {
        description: "Allows inbound and outbound traffic for all instances in the VPC",
        name: "vpc-sec",
        revokeRulesOnDelete: false,
        tags: {
            Name: "vpc-sec",
        }
    },
    {
        protect: true,
    }
);

export const rds_cluster = new aws.rds.Cluster('SelTestRdsEventSub', {
    //availabilityZones: ['ap-northeast-2a', 'ap-northeast-2c'],
    clusterIdentifier: 'my-test-rds-sub',
    engine: 'aurora-postgresql',
    masterUsername: 'postgres',
    masterPassword: '*****',
    dbSubnetGroupName: 'aws-test',
    databaseName: "mydb",
    skipFinalSnapshot: true,
    vpcSecurityGroupIds: [vpc_sg.id],
    tags: {
        'Name': 'my-test-rds-sub',
        'stack': 'pulumi-rds',
        'auto-restart-protection': 'yes'
    }
});

export const clusterInstances: aws.rds.ClusterInstance[] = [];

for (const range = {value: 0}; range.value < 1; range.value++) {
    clusterInstances.push(new aws.rds.ClusterInstance(`SelRdsClusterInstance-${range.value}`, {
        identifier: `my-test-rds-sub-${range.value}`,
        clusterIdentifier: rds_cluster.id,
        instanceClass: aws.rds.InstanceType.T3_Medium,
        engine: 'aurora-postgresql',
        engineVersion: rds_cluster.engineVersion,
        dbSubnetGroupName: 'aws-test',
        tags: {
            'Name': `my-test-rds-sub-${range.value}`,
            'stack': 'pulumi-rds-instance',
            'auto-restart-protection': 'yes'
        }
    }))
}

🚀 Create SNS topic and subscribe event to the RDS cluster

  • Create a SNS topic to receive events from RDS cluster
  • Create event subscription:
    • Target: the SNS topic
    • Source Type: Clusters (and point to the cluster which created from above step)
    • Specific event categories: notification

index.ts

import * as aws from "@pulumi/aws";
import { state_machine_handler } from "./stepFunc";
import { rds_cluster } from "./rds";


const sns_rds_event = new aws.sns.Topic('SnsRdsEvent', {
    displayName: 'sns-rds-event',
    name: 'sns-rds-event',
    tags: {
        'Name': 'sns-rds-event',
        'stack': 'plumi-sns'
    }
});

const rds_event_sub = new aws.rds.EventSubscription('RdsEventSub', {
    enabled: true,
    name: 'rds-event-sub',
    eventCategories: ['notification'],
    sourceType: 'db-cluster',
    sourceIds: [rds_cluster.id],
    snsTopic: sns_rds_event.arn,
    tags: {
        'Name': 'rds-event-sub',
        'stack': 'pulumi-event'
    }
});

const sns_sub = new aws.sns.TopicSubscription('sns-topic-event-sub', {
    endpoint: state_machine_handler.arn,
    protocol: 'lambda',
    topic: sns_rds_event.arn
});

sns_rds_event.onEvent('sns-lambda-trigger', state_machine_handler, sns_sub)

🚀 Create Lambda function which is subscribe to the SNS topic

  • The lambda function will be triggerd by SNS topic whenever there's event
  • The lambda function parses the event message to filter event ID RDS-EVENT-0153 and checks the RDS cluster tag for key:value auto-restart-protection: yes. If all conditions match, then the lambda function execute Step Functions state machine

  • Create IAM role which is consumed by lambda function

iam-role

export const allowRdsClusterRole = new aws.iam.Role("allow-stop-rds-cluster-role", {
    name: 'lambda-stop-rds-cluster',
    description: 'Role to stop rds cluster base on event',
    assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Action: "sts:AssumeRole",
            Effect: "Allow",
            Sid: "",
            Principal: {
                Service: "lambda.amazonaws.com",
            },
        }],
    }),
    tags: {
        'Name': 'lambda-stop-rds-cluster',
        'stack': 'pulumi-iam'
    },
});

const rds_policy = new aws.iam.RolePolicy("allow-stop-rds-cluster", {
    role: allowRdsClusterRole,
    policy: {
        Version: "2012-10-17",
        Statement: [
            {
                Sid: "AllowRdsStatement",
                Effect: "Allow",
                Resource: "*",
                Action: [
                    "rds:AddTagsToResource",
                    "rds:ListTagsForResource",
                    "rds:DescribeDB*",
                    "rds:StopDB*"
                ]
            },
            {
                Sid: "AllowSfnStatement",
                Effect: "Allow",
                Resource: "*",
                Action: "states:StartExecution"
            },
            {
                Sid: 'AllowLog',
                Effect: 'Allow',
                Resource: "arn:aws:logs:*:*:*",
                Action: [
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
            }
        ]
    },
}, {parent: allowRdsClusterRole});

  • Create lambda function which is subscription of the SNS topic

start-step-func-lambda

export const state_machine_handler = new aws.lambda.Function('RdsSNSEvent',
    {
        code: new pulumi.asset.FileArchive('lambda-code/start-statemachine-execution-lambda/handler.tar.gz'),
        description: 'Lambda function listen to RDS SNS event topic to trigger step function',
        name: 'start-step-func-rds',
        handler: 'app.handler',
        runtime: aws.lambda.Runtime.Python3d8,
        role: handler.allowRdsClusterRole.arn,
        environment: {
            variables: {
                'STEPFUNCTION_ARN': stepFunction.arn
            }
        },
        tags: {
            'Name': 'start-step-func-rds',
            'stack': 'pulumi-lambda'
        }
    },
    {
        dependsOn: [handler.allowRdsClusterRole]
    }
);

  • Create step function state machine with flowing definitions

sfn-rds-event

stepFunc.ts

import * as aws from '@pulumi/aws';
import * as pulumi from '@pulumi/pulumi';
import * as handler from './handler';


export const stepFunction = new aws.sfn.StateMachine('SfnRdsEvent', {
    name: 'sfn-rds-event',
    roleArn: handler.sfn_role.arn,
    tags: {
        'Name': 'sfn-rds-event',
        'stack': 'pulumi-sfn'
    },
    definition: pulumi.all([handler.retrieve_rds_status_handler.arn, handler.stop_rds_cluster_handler.arn, handler.send_slack_handler.arn])
        .apply(([retrieveArn, stopRdsArn, sendSlackArn]) => {
        return JSON.stringify({
            "Comment": "RdsAutoRestartWorkFlow: Automatically shutting down RDS instance after a forced Auto-Restart",
            "StartAt": "retrieveRdsClustertate",
            "States": {
                "retrieveRdsClustertate": {
                    "Type": "Task",
                    "Resource": retrieveArn,
                    "TimeoutSeconds": 5,
                    "Retry": [
                        {
                        "ErrorEquals": [
                            "Lambda.Unknown",
                            "States.TaskFailed"
                        ],
                        "IntervalSeconds": 3,
                        "MaxAttempts": 2,
                        "BackoffRate": 1.5
                        }
                    ],
                    "Catch": [
                        {
                        "ErrorEquals": [
                            "States.ALL"
                        ],
                        "Next": "fallback"
                        }
                    ],
                    "Next": "isRdsClusterAvailable"
                },
                "isRdsClusterAvailable": {
                    "Type": "Choice",
                    "Choices": [
                        {
                        "Variable": "$.readyToStop",
                        "StringEquals": "yes",
                        "Next": "stopRdsCluster"
                        }
                    ],
                    "Default": "waitFiveMinutes"
                },
                "waitFiveMinutes": {
                    "Type": "Wait",
                    "Seconds": 300,
                    "Next": "retrieveRdsClustertate"
                },
                "stopRdsCluster": {
                    "Type": "Task",
                    "Resource": stopRdsArn,
                    "TimeoutSeconds": 5,
                    "Retry": [
                        {
                        "ErrorEquals": [
                            "States.Timeout"
                        ],
                        "IntervalSeconds": 3,
                        "MaxAttempts": 2,
                        "BackoffRate": 1.5
                        }
                    ],
                    "Catch": [
                        {
                        "ErrorEquals": [
                            "States.ALL"
                        ],
                        "Next": "fallback"
                        }
                    ],
                    "Next": "retrieveRdsClustertateStopping"
                },
                "retrieveRdsClustertateStopping": {
                    "Type": "Task",
                    "Resource": retrieveArn,
                    "TimeoutSeconds": 5,
                    "Retry": [
                        {
                        "ErrorEquals": [
                            "States.Timeout"
                        ],
                        "IntervalSeconds": 3,
                        "MaxAttempts": 2,
                        "BackoffRate": 1.5
                        }
                    ],
                    "Catch": [
                        {
                        "ErrorEquals": [
                            "States.ALL"
                        ],
                        "Next": "fallback"
                        }
                    ],
                    "Next": "isRdsClusterStopped"
                },
                "isRdsClusterStopped": {
                    "Type": "Choice",
                    "Choices": [
                        {
                        "Variable": "$.rdsClusterStatus",
                        "StringEquals": "stopped",
                        "Next": "sendSlack"
                        }
                    ],
                    "Default": "waitFiveMinutesStopping"
                },
                "waitFiveMinutesStopping": {
                    "Type": "Wait",
                    "Seconds": 300,
                    "Next": "retrieveRdsClustertateStopping"
                },
                "sendSlack": {
                    "Type": "Task",
                    "Resource": sendSlackArn,
                    "TimeoutSeconds": 5,
                    "End": true
                },
                "fallback": {
                    "Type": "Task",
                    "Resource": sendSlackArn,
                    "TimeoutSeconds": 5,
                    "End": true
                }
            }
        });
    })
});

🚀 Create lambda function to retrieve RDS cluster and instances status

retrieve-rds-status.ts

export const retrieve_rds_status_handler = new aws.lambda.Function('RetrieveRdsStateFunc', {
    code: new pulumi.asset.FileArchive('lambda-code/retrieve-rds-instance-state-lambda/handler.tar.gz'),
    description: 'Lambda function to retrieve rds instance status',
        name: 'get-rds-status',
        handler: 'app.handler',
        runtime: aws.lambda.Runtime.Python3d8,
        role: allowRdsClusterRole.arn,
        tags: {
            'Name': 'get-rds-status',
            'stack': 'pulumi-lambda'
        }
});

🚀 Create lambda function to stop RDS cluster

stop-rds.ts

export const stop_rds_cluster_handler = new aws.lambda.Function('StopRdsClusterFunc', {
    code: new pulumi.asset.FileArchive('lambda-code/stop-rds-instance-lambda/handler.tar.gz'),
    description: 'Lambda function to retrieve rds instance status',
        name: 'stop-rds-cluster',
        handler: 'app.handler',
        runtime: aws.lambda.Runtime.Python3d8,
        role: allowRdsClusterRole.arn,
        tags: {
            'Name': 'stop-rds-cluster',
            'stack': 'pulumi-lambda'
        }
});

🚀 Create lambda function to send slack

send-slack.ts

export const send_slack_handler = new aws.lambda.Function('SendSlackFunc', {
    code: new pulumi.asset.FileArchive('lambda-code/send-slack/handler.tar.gz'),
    description: 'Lambda function to send slack',
        name: 'rds-send-slack',
        handler: 'app.handler',
        runtime: aws.lambda.Runtime.Python3d8,
        role: allowRdsClusterRole.arn,
        tags: {
            'Name': 'rds-send-slack',
            'stack': 'pulumi-lambda'
        }
});

🚀 SFN IAM role to trigger lambda functions

sfn-role.ts

export const sfn_role = new aws.iam.Role('SfnRdsRole', {
    name: 'sfn-rds',
    description: 'Role to trigger lambda functions',
    assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Action: "sts:AssumeRole",
            Effect: "Allow",
            Sid: "",
            Principal: {
                Service: "states.ap-northeast-2.amazonaws.com",
            },
        }],
    }),
    tags: {
        'Name': 'sfn-rds',
        'stack': 'pulumi-iam'
    }
});

🚀 Pulumi deploy stack

🚀 Conclusion

  • We now can save time and save money with this solution. Plus, we will receive slack message when there're events

  • Although Pulumi Supports Many Clouds and provisioner and can visulize the resources chart within the stack but there're more options such as AWS Cloud Development Kit (CDK)

.ltag__user__id__512906 .follow-action-button { background-color: #000000 !important; color: #62df88 !important; border-color: #000000 !important; }
vumdao image


This content originally appeared on DEV Community and was authored by 🚀 Vu Dao 🚀


Print Share Comment Cite Upload Translate Updates
APA

🚀 Vu Dao 🚀 | Sciencx (2021-10-14T16:13:51+00:00) RDS Auto Restart Protection. Retrieved from https://www.scien.cx/2021/10/14/rds-auto-restart-protection/

MLA
" » RDS Auto Restart Protection." 🚀 Vu Dao 🚀 | Sciencx - Thursday October 14, 2021, https://www.scien.cx/2021/10/14/rds-auto-restart-protection/
HARVARD
🚀 Vu Dao 🚀 | Sciencx Thursday October 14, 2021 » RDS Auto Restart Protection., viewed ,<https://www.scien.cx/2021/10/14/rds-auto-restart-protection/>
VANCOUVER
🚀 Vu Dao 🚀 | Sciencx - » RDS Auto Restart Protection. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/10/14/rds-auto-restart-protection/
CHICAGO
" » RDS Auto Restart Protection." 🚀 Vu Dao 🚀 | Sciencx - Accessed . https://www.scien.cx/2021/10/14/rds-auto-restart-protection/
IEEE
" » RDS Auto Restart Protection." 🚀 Vu Dao 🚀 | Sciencx [Online]. Available: https://www.scien.cx/2021/10/14/rds-auto-restart-protection/. [Accessed: ]
rf:citation
» RDS Auto Restart Protection | 🚀 Vu Dao 🚀 | Sciencx | https://www.scien.cx/2021/10/14/rds-auto-restart-protection/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.