This content originally appeared on Level Up Coding - Medium and was authored by Murray Stewart
Rescheduling a Lambda when the target data outside of AWS isn’t available
The Problem:
You want a lambda to trigger an ingestion via an AWS Glue workflow of data from a source outside of AWS. The data can be available any time between hours X and Y but you need to ingest as soon as possible.
In the following example, I will use a table in BigQuery as the data source outside of AWS.
Note: This article assumes knowledge of how to create and configure lambda’s on AWS and will not go into the step by step. It will include the permissions required and a code example of the lambda.
Example Code Available: https://github.com/MurrayCode/Reschedule-Lambda-Example
The Solution:
To solve this problem a Cron scheduled EventBridge rule can be used to trigger a Lambda, the lambda can then go down various paths using the AWS SDK. The lambda will either find the intended data exists and trigger the Glue Workflow or create a new rule which targets the lambda to re-trigger in X amount of minutes. Note: In this example X will be 5 minutes.
Firstly create a lambda with a Cron Scheduled EventBridge rule trigger scheduled for the earliest possible time the data can become available. The following permissions are then required on the lambda to allow it to perform all the necessary actions.
Required IAM Permissions:
“glue:StartWorkflowRun”
“events:RemoveTargets”
“events:PutRule”
“events:DeleteRule”
“lambda:GetFunctionConfiguration”
Required Resource-based Policies
A policy that allows lambda:InvokeFunction on both the initial lambda cron rule and then the reschedule rule handled by the lambda
Lambda Code:
Note: The following code sample is an example showing all the SDK calls required for this solution.
Stage 1 — Use the AWS SDK to remove the reschedule rule from the lambda.
Initially in our Lambda we want to try and remove the rule that will exist if this is not the first time on that day our lambda has ran. If this rule doesn’t exist yet then we catch and log the error but allow the Lambda to continue.
Stage 2 — Check if the data exists:
In this example we will be checking a table that is created daily exists within BigQuery. First we create a BigQuery client through the Google Cloud SDK, next retrieve the dataset and within the dataset we get the table.
Within a try/catch we attempt to get the table if it exists we return true, If we catch an error we then check if the error code is 404 (Not Found) meaning the data is not currently available and then we return false. Anything else and we throw a new error.
We then call our function and set up the 2 paths that the lambda can go down.
Path 1 — The table exists:
If the table exists we create a new glue client and run the intended workflow.
Path 2 — The table doesn’t exist:
In the case that the table doesn’t exist we use the CloudWatchEvents client function putRule to create a new rule with the desired reschedule time, in this case 5 minutes and set the state to “ENABLED”.
We then create a new lambda client through the SDK and use the function getFunctionConfiguration passing in the name we gave the lambda when we created it on AWS. We then extract the Lambda ARN from the results of this with “data.FunctionArn” and provide that as part of our parameters for the next CloudWatchEvents client SDK call putTargets which takes in the Rule name, and the target lambda ARN and an Id.
Below is a full image of the example Lambda code:
https://github.com/MurrayCode/Reschedule-Lambda-Example
Level Up Coding
Thanks for being a part of our community! More content in the Level Up Coding publication.
Follow: Twitter, LinkedIn, Newsletter
Level Up is transforming tech recruiting ➡️ Join our talent collective
Rescheduling a Lambda when target data outside of AWS isn’t currently available was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Murray Stewart
Murray Stewart | Sciencx (2022-06-24T18:17:17+00:00) Rescheduling a Lambda when target data outside of AWS isn’t currently available. Retrieved from https://www.scien.cx/2022/06/24/rescheduling-a-lambda-when-target-data-outside-of-aws-isnt-currently-available/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.