Simple Trick for Lightning-Fast LIKE and ILIKE Queries

Improve Search Performance with Trigram Indexes in PostgreSQL

TLDR;

  1. Enable pg_trgm extension
  2. Add Trigram Index on columns that you are doing a full-text search
class AddTrigramIndex < ActiveRecord::Migration[6.0]
disable_ddl_transaction!

def change
enable_extension :pg_trgm

add_index :table, :column, opclass: :gin_trgm_ops, using: :gin,
algorithm: :concurrently, name: ‘index_trgm’
end
end

And that is it! Now you can leave and be happy that you have made a massive improvement or you can read on and actually understand what happened!

Introduction

What is the best part of software development? Optimizing! Optimizing code, optimizing queries, optimizing data models, fixing google speed score whenever something like this lands in your lap you should be over the moon! You will never do so little (usually) and achieve such a big change for the end user!

One obligatory cheesy meme

Now here is what happened, some time ago you added a search feature to your dashboard, the app was small so you didn’t care about performance.

def search(query)
Post.where("body ILIKE (?)", "%#{query}%")
end

At the time of writing, this was fine there were not too many posts and a sequential scan was quick enough, but fast forward to today this query is too slow and we need to fix it.

What is a Trigram Index?

A trigram index is a type of inverted index. Trigram indexes are used to efficiently search for strings in large tables without providing an exact search term.

Trigram indexes make substring and similarity match efficient by indexing the unique trigrams of a string.

From the definition we are left with two questions:

  1. What is a trigram?
  2. What is an inverted index?

A trigram is a group of three consecutive characters in a string.

SELECT show_trgm('trigram');
               show_trgm
-------------------------------
{" t"," tr","am ",gra,igr,ram,rig,tri}

Generalized Inverted Index or GIN for short is an index type designed to deal with data types that are subdividable. An index is going to store a key and a reference where that value occurs.

Another way of explaining GIN indexes is that it is like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:

presentation by Oleg Bartunov and Alexander Korotkov

To summarise let’s say that we have an indexed body column in our posts table, and the body of the post is a STRING Trigram Index

SELECT show_trgm('Trigram Index');

-----------------------

{" i"," t"," in"," tr","am ",dex,"ex ",gra,igr,ind,nde,ram,rig,tri}

All of the substrings Trigram Index are going to be pointing to the post record. Once we make a query trying to match a keyword gram database is going to split a keyword into trigrams and do Bitmap Index Scan matching them to the trigrams in the index.

SELECT show_trgm('gram');

-----------------------

{" g"," gr", "am ", gra, ram}
-------------match--match-match

I am sure that there is much more happening than this but as I understand this is the gist of it!

Trigram indexes are much more powerful depending of the comparison used:

Trigram indexes on STRING columns support the following comparison operators:

  • equality: =. Note that standard btree secondary indexes may perform better than trigram indexes for equality searches.
  • pattern matching (case-sensitive): LIKE
  • pattern matching (case-insensitive): ILIKE
  • similarity matching: %. This operator returns true if the strings in the comparison have a similarity that meets or exceeds the threshold.

There are a few cons to all of this and one of them is slower writes on large tables!

Trigram index in action

I didn’t want to leave the subject without a quick example, I quickly created 40k users in the database and run a simple ILIKE query on the users’ email.

SELECT "users".* FROM "users" WHERE users.email ILIKE ('%emir%');
Left: query plan without index, database doing a sequential scan, total time: ~28.5 ms | Right: query plan uses index, total time: ~1.3 ms

Even on a dataset this small and as simple as it gets email column difference is more than noticeable. On larger datasets, larger columns ( think article body ) or more complex queries where we try to search in multiple fields results are going to be much more significant!

Conclusion

Enabling the pg_trgm extension and adding a trigram index can be a simple and effective way to improve the performance of LIKE and ILIKE queries in PostgreSQL. By taking advantage of the specialized data structure and algorithms provided by trigram indexes, you can ensure that your queries run quickly and efficiently, providing a better experience for your users.

References

Big thank you to the pganalyze team and their awesome blog!

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job


Simple Trick for Lightning-Fast LIKE and ILIKE Queries was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Emir Vatric

Improve Search Performance with Trigram Indexes in PostgreSQL

TLDR;

  1. Enable pg_trgm extension
  2. Add Trigram Index on columns that you are doing a full-text search
class AddTrigramIndex < ActiveRecord::Migration[6.0]
disable_ddl_transaction!

def change
enable_extension :pg_trgm

add_index :table, :column, opclass: :gin_trgm_ops, using: :gin,
algorithm: :concurrently, name: ‘index_trgm’
end
end

And that is it! Now you can leave and be happy that you have made a massive improvement or you can read on and actually understand what happened!

Introduction

What is the best part of software development? Optimizing! Optimizing code, optimizing queries, optimizing data models, fixing google speed score whenever something like this lands in your lap you should be over the moon! You will never do so little (usually) and achieve such a big change for the end user!

One obligatory cheesy meme

Now here is what happened, some time ago you added a search feature to your dashboard, the app was small so you didn’t care about performance.

def search(query)
Post.where("body ILIKE (?)", "%#{query}%")
end

At the time of writing, this was fine there were not too many posts and a sequential scan was quick enough, but fast forward to today this query is too slow and we need to fix it.

What is a Trigram Index?

A trigram index is a type of inverted index. Trigram indexes are used to efficiently search for strings in large tables without providing an exact search term.

Trigram indexes make substring and similarity match efficient by indexing the unique trigrams of a string.

From the definition we are left with two questions:

  1. What is a trigram?
  2. What is an inverted index?

A trigram is a group of three consecutive characters in a string.

SELECT show_trgm('trigram');
               show_trgm
-------------------------------
{" t"," tr","am ",gra,igr,ram,rig,tri}

Generalized Inverted Index or GIN for short is an index type designed to deal with data types that are subdividable. An index is going to store a key and a reference where that value occurs.

Another way of explaining GIN indexes is that it is like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:

presentation by Oleg Bartunov and Alexander Korotkov

To summarise let’s say that we have an indexed body column in our posts table, and the body of the post is a STRING Trigram Index

SELECT show_trgm('Trigram Index');

-----------------------

{" i"," t"," in"," tr","am ",dex,"ex ",gra,igr,ind,nde,ram,rig,tri}

All of the substrings Trigram Index are going to be pointing to the post record. Once we make a query trying to match a keyword gram database is going to split a keyword into trigrams and do Bitmap Index Scan matching them to the trigrams in the index.

SELECT show_trgm('gram');

-----------------------

{" g"," gr", "am ", gra, ram}
-------------match--match-match

I am sure that there is much more happening than this but as I understand this is the gist of it!

Trigram indexes are much more powerful depending of the comparison used:

Trigram indexes on STRING columns support the following comparison operators:

  • equality: =. Note that standard btree secondary indexes may perform better than trigram indexes for equality searches.
  • pattern matching (case-sensitive): LIKE
  • pattern matching (case-insensitive): ILIKE
  • similarity matching: %. This operator returns true if the strings in the comparison have a similarity that meets or exceeds the threshold.

There are a few cons to all of this and one of them is slower writes on large tables!

Trigram index in action

I didn’t want to leave the subject without a quick example, I quickly created 40k users in the database and run a simple ILIKE query on the users' email.

SELECT "users".* FROM "users" WHERE users.email ILIKE ('%emir%');
Left: query plan without index, database doing a sequential scan, total time: ~28.5 ms | Right: query plan uses index, total time: ~1.3 ms

Even on a dataset this small and as simple as it gets email column difference is more than noticeable. On larger datasets, larger columns ( think article body ) or more complex queries where we try to search in multiple fields results are going to be much more significant!

Conclusion

Enabling the pg_trgm extension and adding a trigram index can be a simple and effective way to improve the performance of LIKE and ILIKE queries in PostgreSQL. By taking advantage of the specialized data structure and algorithms provided by trigram indexes, you can ensure that your queries run quickly and efficiently, providing a better experience for your users.

References

Big thank you to the pganalyze team and their awesome blog!

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job


Simple Trick for Lightning-Fast LIKE and ILIKE Queries was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Emir Vatric


Print Share Comment Cite Upload Translate Updates
APA

Emir Vatric | Sciencx (2023-01-10T18:32:51+00:00) Simple Trick for Lightning-Fast LIKE and ILIKE Queries. Retrieved from https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/

MLA
" » Simple Trick for Lightning-Fast LIKE and ILIKE Queries." Emir Vatric | Sciencx - Tuesday January 10, 2023, https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/
HARVARD
Emir Vatric | Sciencx Tuesday January 10, 2023 » Simple Trick for Lightning-Fast LIKE and ILIKE Queries., viewed ,<https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/>
VANCOUVER
Emir Vatric | Sciencx - » Simple Trick for Lightning-Fast LIKE and ILIKE Queries. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/
CHICAGO
" » Simple Trick for Lightning-Fast LIKE and ILIKE Queries." Emir Vatric | Sciencx - Accessed . https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/
IEEE
" » Simple Trick for Lightning-Fast LIKE and ILIKE Queries." Emir Vatric | Sciencx [Online]. Available: https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/. [Accessed: ]
rf:citation
» Simple Trick for Lightning-Fast LIKE and ILIKE Queries | Emir Vatric | Sciencx | https://www.scien.cx/2023/01/10/simple-trick-for-lightning-fast-like-and-ilike-queries/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.