This content originally appeared on Level Up Coding - Medium and was authored by Emir Vatric
Improve Search Performance with Trigram Indexes in PostgreSQL
TLDR;
- Enable pg_trgm extension
- Add Trigram Index on columns that you are doing a full-text search
class AddTrigramIndex < ActiveRecord::Migration[6.0]
disable_ddl_transaction!
def change
enable_extension :pg_trgm
add_index :table, :column, opclass: :gin_trgm_ops, using: :gin,
algorithm: :concurrently, name: ‘index_trgm’
end
end
And that is it! Now you can leave and be happy that you have made a massive improvement or you can read on and actually understand what happened!
Introduction
What is the best part of software development? Optimizing! Optimizing code, optimizing queries, optimizing data models, fixing google speed score whenever something like this lands in your lap you should be over the moon! You will never do so little (usually) and achieve such a big change for the end user!
Now here is what happened, some time ago you added a search feature to your dashboard, the app was small so you didn’t care about performance.
def search(query)
Post.where("body ILIKE (?)", "%#{query}%")
end
At the time of writing, this was fine there were not too many posts and a sequential scan was quick enough, but fast forward to today this query is too slow and we need to fix it.
What is a Trigram Index?
A trigram index is a type of inverted index. Trigram indexes are used to efficiently search for strings in large tables without providing an exact search term.
Trigram indexes make substring and similarity match efficient by indexing the unique trigrams of a string.
From the definition we are left with two questions:
- What is a trigram?
- What is an inverted index?
A trigram is a group of three consecutive characters in a string.
SELECT show_trgm('trigram');
show_trgm
-------------------------------
{" t"," tr","am ",gra,igr,ram,rig,tri}
Generalized Inverted Index or GIN for short is an index type designed to deal with data types that are subdividable. An index is going to store a key and a reference where that value occurs.
Another way of explaining GIN indexes is that it is like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:
To summarise let’s say that we have an indexed body column in our posts table, and the body of the post is a STRING Trigram Index
SELECT show_trgm('Trigram Index');
-----------------------
{" i"," t"," in"," tr","am ",dex,"ex ",gra,igr,ind,nde,ram,rig,tri}
All of the substrings Trigram Index are going to be pointing to the post record. Once we make a query trying to match a keyword gram database is going to split a keyword into trigrams and do Bitmap Index Scan matching them to the trigrams in the index.
SELECT show_trgm('gram');
-----------------------
{" g"," gr", "am ", gra, ram}
-------------match--match-match
I am sure that there is much more happening than this but as I understand this is the gist of it!
Trigram indexes are much more powerful depending of the comparison used:
Trigram indexes on STRING columns support the following comparison operators:
- equality: =. Note that standard btree secondary indexes may perform better than trigram indexes for equality searches.
- pattern matching (case-sensitive): LIKE
- pattern matching (case-insensitive): ILIKE
- similarity matching: %. This operator returns true if the strings in the comparison have a similarity that meets or exceeds the threshold.
There are a few cons to all of this and one of them is slower writes on large tables!
Trigram index in action
I didn’t want to leave the subject without a quick example, I quickly created 40k users in the database and run a simple ILIKE query on the users' email.
SELECT "users".* FROM "users" WHERE users.email ILIKE ('%emir%');
Even on a dataset this small and as simple as it gets email column difference is more than noticeable. On larger datasets, larger columns ( think article body ) or more complex queries where we try to search in multiple fields results are going to be much more significant!
Conclusion
Enabling the pg_trgm extension and adding a trigram index can be a simple and effective way to improve the performance of LIKE and ILIKE queries in PostgreSQL. By taking advantage of the specialized data structure and algorithms provided by trigram indexes, you can ensure that your queries run quickly and efficiently, providing a better experience for your users.
References
Big thank you to the pganalyze team and their awesome blog!
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job
Simple Trick for Lightning-Fast LIKE and ILIKE Queries was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Emir Vatric
Emir Vatric | Sciencx (2023-01-10T18:32:51+00:00) Simple Trick for Lightning-Fast LIKE and ILIKE Queries. Retrieved from https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.