Building a Workflow for Async Searchkick Reindexing

This content originally appeared on DEV Community and was authored by Chad Wilken

We lean heavily on Elasticsearch at CompanyCam. One of it's primary use cases is serving our highly filterable project feed. It is incredibly fast, even when you apply multiple filters to your query and are searching a largish data set. Our primary interface for interacting with Elasticsearch is using the Searchkick gem. Searchkick is a powerhouse and provides so many features out of the box. One place where we bump up against the edges is when trying to reindex a large collection.

Mo' Projects, Mo' Problems

CompanyCam houses just under 21 million projects with tens of thousands of new projects added daily. On occasion we change the fields that we want to be available for filtering. In order to make the new field(s) available, we have to reindex the entire collection of records. Reindexing a collection of this size, where each record additionally pulls in values from associated records, can be quite slow. If we run the reindex synchronously it takes about 10 hours, and that is with eager loading the associations and other optimizations. Never fear though, Searchkick accounts for this and has the ability to use ActiveJob to reindex asynchronously. The one thing that isn't accounted for is how to promote that index when the indexing is complete. You can run the task like reindex(async: { wait: true }) which will run the indexing operation async and do a periodic pull waiting for indexing to complete and then promote the index. This almost works but once again, this can still take hours and I can't just sit on a server instance waiting for this to complete. What if I get disconnected or the instance terminates due to a deploy? We decided that it was time to build a small workflow around indexing large collections asynchronously that promotes itself upon completion.

Enter Tooling

I set out with the goal in mind that I should be able to start a collection indexing operation from our internal admin tool, Dash, as well as monitor the progress. With those two simple goals in mind, this is what I came up with.

I need two jobs, one job to enqueue the indexing operation and another job that can monitor the operation until it is completed. Once that is built then I can use Rails basics to wrap this in a minimalistic UI.

The Jobs

The first job needed to:

Accept a class name and start the reindex for the given class
Store the pending index name for later usage, optimally in Redis
Enqueue another job to monitor the progress of the indexing operation

This is what I came up with:

module Searchkick
  class PerformAsyncReindexWorker
    include Sidekiq::Worker

    sidekiq_options retry: 0

    def perform(klass)
      result = klass.constantize.reindex(async: true)
      index_name = result[:index_name]
      Searchkick::AsyncReindexStatus.new.currently_reindexing << index_name
      Searchkick::MonitorAsyncReindexWorker.perform_in(5.seconds, klass, index_name)
    end
  end
end

The second job, as you may have guessed, monitors the indexing progress until completion. The basic functionality needs to check the operation and:

If incomplete simply re-enqueues itself to check again in 10 seconds
If complete it promotes the index for the given class and removes the index name from the collection in Redis.

module Searchkick
  class MonitorAsyncReindexWorker
    include Sidekiq::Worker

    def perform(klass, index_name)
      status = Searchkick.reindex_status(index_name)
      if status[:completed]
        klass.constantize.search_index.promote(index_name)
        Searchkick::AsyncReindexStatus.new.currently_reindexing.delete(index_name)
      else
        self.class.perform_in(10.seconds, klass, index_name)
      end
    end
  end
end

By default the Searchkick::BulkReindexJob uses the same queue as regular async reindexing, this would block user generated content from being indexed while performing a full reindex. So I also patched the Searchkick::BulkReindexJob to use a custom queue we have just for performing full collection indexing operations. In an initializer I simply did:

class Searchkick::BulkReindexJob
  queue_as { 'searchkick_full_reindex' }
end

The Status Object

You may be wondering what Searchkick::AsyncReindexStatus is. It is a simple class that includes the Redis::Objects library so that we can store a list of currently reindexing collections. It looks like this:

module Searchkick
  class AsyncReindexStatus
    include Redis::Objects

    def id
      'searchkick-async-reindex-status'
    end

    list :currently_reindexing
  end
end

Note: I opted to use Redis::Objects since it was already in our codebase and it is a bit simpler than interacting with Redis directly using Searchkick.redis.

How to Kick off the Job

An indexing operation can be kicked off in one of two ways. You can start it via the command-line if you have access such as Searchkick::PerformAsyncReindexWorker.perform_async(model_class). Otherwise, we built a crude interface into our internal admin tool. The UI allows you to select a model and start the indexing operation and then track it's status until completion.

The Code

For the full code that we use you can look at this gist. Always happy to hear improvements that could be made as well!

Recap

Searchkick is great and has saved us serious time and energy developing features backed by Elasticsearch. By taking a feature that it already offered, async reindexing, and wrapping it in a small bit of workflow we were able to scratch our own itch for truly async indexing operations.

This content originally appeared on DEV Community and was authored by Chad Wilken

Print Share Comment Cite Upload Translate Updates

APA

Chad Wilken | Sciencx (2022-06-29T20:36:17+00:00) Building a Workflow for Async Searchkick Reindexing. Retrieved from https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/

MLA

" » Building a Workflow for Async Searchkick Reindexing." Chad Wilken | Sciencx - Wednesday June 29, 2022, https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/

HARVARD

Chad Wilken | Sciencx Wednesday June 29, 2022 » Building a Workflow for Async Searchkick Reindexing., viewed ,<https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/>

VANCOUVER

Chad Wilken | Sciencx - » Building a Workflow for Async Searchkick Reindexing. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/

CHICAGO

" » Building a Workflow for Async Searchkick Reindexing." Chad Wilken | Sciencx - Accessed . https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/

IEEE

" » Building a Workflow for Async Searchkick Reindexing." Chad Wilken | Sciencx [Online]. Available: https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/. [Accessed: ]

rf:citation

» Building a Workflow for Async Searchkick Reindexing | Chad Wilken | Sciencx | https://www.scien.cx/2022/06/29/building-a-workflow-for-async-searchkick-reindexing/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.