Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…

Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring BootImagine you have many @Scheduled tasks configured in your spring boot application which runs on multiple nodes and you have added ShedLock to ensure t…


This content originally appeared on Level Up Coding - Medium and was authored by Rishabh Manna

Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring Boot

Imagine you have many @Scheduled tasks configured in your spring boot application which runs on multiple nodes and you have added ShedLock to ensure that a task executes only once at a time across multiple nodes.

But surprisingly you noticed that the tasks are still running on multiple pods not at scheduled time but with a random delay. Also after few days, you noticed that most of the scheduled tasks stopped running suddenly and this situation continues until you re-deploy pods.

Scheduled tasks stopped running until re-deployment

To understand what was happening, we need to understand how spring scheduling and ShedLock works.

How Scheduling works in spring boot?

We can easily schedule tasks in spring boot by using @Scheduled annotation. And then we need to enable scheduling by adding @EnableScheduling annotation to a spring configuration class. Spring uses ThreadPoolTaskScheduler for scheduled tasks, which internally delegates to a ScheduledExecutorService. By default the thread-pool size of ThreadPoolTaskScheduler is 1, that means all the scheduled tasks of the application are executed by a single thread(thread name — scheduling-1).

How ShedLock works?

ShedLock uses persistent storage that connects to all the instances to store the information about the scheduled task. There are multiple ways to implement this LockProvider as ShedLock has support for numerous databases like MongoDB, PostgreSQL, MySQL, etc. We used PostgreSQL database for LockProvider. The table that ShedLock uses inside the database to manage the locks is straightforward. It has only had four columns:

  • name : A unique name provided by the user for scheduled job
  • lock_until : The time till the current execution of the job is locked
  • locked_at : The timestamp when the instance acquired the lock
  • locked_by : The identifier who acquires the lock(pod name)

This way, ShedLock creates an entry for every scheduled task only when the job runs for the first time. After that, when the task executes again, the same database row gets updated without deleting or re-creating the entry.

@SchedulerLock annotation used for a scheduled task

Locking a scheduled task happens, when we set the lock_until column to date in the future. When our scheduled task executes, all the running nodes try to update the database row for the task. But only one succeeds to do so when (lock_until <= now()) . The instance that updates the columns for lock_until, locked_at, locked_by has the lock for this execution period and sets lock_until to now() + lockAtMostFor . After execution of the task ShedLock updates lock_until column of the same row with now() only if task completion time is > lockAtLeastFor. If it is a short running task and task completion time is < lockAtLeastFor then ShedLock sets lock_until to locked_at + lockAtLeastFor . The ShedLock that we use works on AOP Proxy mode of integration by default and it tries to acquire lock for a task just before TaskScheduler is about to execute the task.

ShedLock AOP PROXY_METHOD Mode Integration

What was happening?

So only one scheduler thread was working in each node. And we have more than 80 scheduled tasks in our application and some of those are long running (runs for 2–4 hours) which runs on every 2–4 hours interval. So when a long running task is executing, and all other tasks that are scheduled in that period gets queued and eventually gets delayed for execution.

So Lets suppose we have 2 nodes(A & B) running and we have a scheduled task with following configuration which takes ~6–7 minutes:

So this testJob will be scheduled to execute at 10 AM on both nodes but suppose the scheduling-1 thread of node A is already executing another long running task which started at 8 AM and takes 3 hours to be completed and scheduling-1 thread of node B is idle.

In this scenario node B will be able to acquire lock for testJob at 10AM and update the row in db as below.

But testJob task in node A will wait for the previous task which will complete at 11 AM. So at 11 AM scheduling-1 thread of node A will try to acquire the lock and will successfully take the lock as lock_until < now() and will execute the task again.

This same thing is happening in our multi node system and tasks are executed more than configured frequencies in multiple nodes. And as we have many long running tasks and these tasks are also getting executed more than required frequency, it eventually contributes to pile up tasks in queue. So most of the scheduled tasks seems to be stopped running after some days.

Solution!

Now it’s time to increase number of scheduler threads. Because a single thread is not enough to handle the no of scheduled tasks we have configured in our application. For this we configured our scheduler to run tasks parallel, so that tasks waiting time in the queue will be minimal.

Scheduler configuration

What we have learned ?

We should always configure/tune scheduler’s thread-pool size as per requirement otherwise we can notice unexpected behaviours. Analysing thread dump can be useful to tune thread-pool size.

Another factor which also contributes to our issue is improper configuration of @SchedulerLock ’s lockAtLeastFor and lockAtMostFor . So we should always consider the points below :

  • lockAtMostFor should be always greater than execution time of the task, otherwise task can be executed more than once.
  • lockAtLeastFor should always be added for short running jobs. If the clock difference between nodes are greater than lockAtLeastFor then tasks can be executed more than configured frequency.

Helpful Articles


Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring… was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Rishabh Manna


Print Share Comment Cite Upload Translate Updates
APA

Rishabh Manna | Sciencx (2022-09-15T15:29:18+00:00) Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…. Retrieved from https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/

MLA
" » Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…." Rishabh Manna | Sciencx - Thursday September 15, 2022, https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/
HARVARD
Rishabh Manna | Sciencx Thursday September 15, 2022 » Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…., viewed ,<https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/>
VANCOUVER
Rishabh Manna | Sciencx - » Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/
CHICAGO
" » Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…." Rishabh Manna | Sciencx - Accessed . https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/
IEEE
" » Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring…." Rishabh Manna | Sciencx [Online]. Available: https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/. [Accessed: ]
rf:citation
» Solving multiple executions of @Scheduled tasks across multiple nodes even with ShedLock — Spring… | Rishabh Manna | Sciencx | https://www.scien.cx/2022/09/15/solving-multiple-executions-of-scheduled-tasks-across-multiple-nodes-even-with-shedlock-spring/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.