Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript

Cuckoo Filters are very useful when you have to deal with big data sets. Especially when you have to check if a new value is already inside the set.Image by Dalle-2, edits by authorWhen dealing with large systems, you would normally use a database to u…


This content originally appeared on Bits and Pieces - Medium and was authored by Fernando Doglio

Cuckoo Filters are very useful when you have to deal with big data sets. Especially when you have to check if a new value is already inside the set.

Image by Dalle-2, edits by author

When dealing with large systems, you would normally use a database to understand if whatever new piece of information you’re trying to save is already there or not.

But how long does that database take? Unless you’ve optimized the table and created some clever indexes, you might be looking at an O(n) type of query or worst, if you’re actually having to join multiple tables to do it.

Probabilistic filters try to help here by giving you a way to check whether or not the information you’re looking to save is already saved, without the performance loss.

How can they do that? They’re called “probabilistic filters” for a reason. They give you either certainty that an element isn’t there, or a “maybe” in the case that they do find some kind of match.

In particular, Cuckoo filters are very efficient for searching and deleting members from the filter, so we’re going to take a look at them today.

When would you want to use a Cuckoo filter?

Cuckoo filters, despite their funky name, are very useful when you have to deal with big data sets. Especially when you have to check if a new value is already inside the set.

You might’ve seen this type of behavior when trying to sign-up for a new popular game, and the system tells you your username is already taken. You then proceed to add your birth year to your name, and somehow, someone already signed up with that combination too.

You see, Cuckoo filters are known as “probabilistic filters”. That means they work based on probability to speed things up. These filters are crazy fast, but they’re also not 100% accurate.

It’s a trade-off kinda thing. If you’re OK with a massive gain in performance at the cost of minimal loss of accuracy, then these filters are a great match for you.

In the example of a popular game system where millions of users might be playing and signing up, the game creators are probably OK with you having a bit of a hard time when picking your username if that means their signup form is able to give you an answer immediately.

Trade-offs!

Implementing your own Cuckoo filter in JavaScript

Cuckoo filters are not hard to implement.

The way they work is by using two different hashing functions for the same value being added. These hashes will act as indexes inside the filter.

We’ll check if the first index (the first hash value) is already taken or not, if it’s not, then we’re good, the value isn’t on the filter already.

If the first index is taken, we’ll use the second index (the value of the second hash). If it’s taken, then we’ll put the value there and re-do the same thing with the value that we previously in there. If the index wasn’t taken then, we would add the new value there and call it a day.

Let’s take a look at how we would implement the code for inserting a value into the filter.

The process is as I mentioned before:

  1. We hash the new value with two different functions.
  2. We check if the first index is free inside the filter, if it is, then we add it here and we’re done.
  3. If the first index is taken, then we move on to using the second index. If it’s not taken (line 17), then we add it there, and we’re done!
  4. Finally, here is the most “complex” part: if both indexes are taken, then we’ll take the value from the second index and we’ll replace it with the new value (line 24). The original value has to go through the same process, so we add it to the head of the original list of values (line 23), and keep iterating over it.

The other thing we need to keep track of, is the hash values. If we try to use one that we already used as the second option, then we have to stop the loop because we’re going in circles.

Now, look at the implementation, we’re either using one check or two to insert the value into the filter. That’s considered constant time (as in O(1)). That’s very good and it’s independent of the number of values inside the filter.

That said, the really interesting part is checking if a value is already inside the filter. So let’s look at that now.

Did you like what you read? Consider subscribing to my FREE newsletter where I share my 2 decades’ worth of wisdom in the IT industry with everyone. Join “The Rambling of an old developer” !

Searching inside a Cuckoo filter

The search process inside a Cuckoo filter is similar to the insert flow, the only difference is that we’re trying to understand if there are matches, without having to move any values around.

In this algorithm you can perfectly see the “probabilistic” part of this filter. You see, hashing functions have a possibility of generating “collisions” meaning that for different values they might generate the same output. This would cause, in our case, false positive matches. In other words, we might end up thinking a value is already there when in reality, it’s not.

Sounds familiar? That’s why it happens that sometimes you get a “username already exists” when you try to use your last name coupled with your birthdate and your pet’s name. It’s not that you already signed up there and don’t remember. It’s the probabilistic filter having a false positive.

Can you solve this? Not entirely.

Clearly, if you carefully design your hashing functions to minimize collisions, then you can keep the uncertainty to a minimum. But if you want to go 100% sure, then you’ll end up implementing something else that is not as efficient as this filter.

Remember: trade-offs!

Here is the implementation for our search function, with hash1 and hash2 being the same 2 functions we already saw.

The first thing to notice here is that there are no loops. We’re simply:

  1. Hashing our value with both hashing functions.
  2. Checking with the first index.
  3. If there is a value there, then we return true .
  4. Otherwise, we check with the second index, is there a value there? Then we return true .
  5. Finally, if both indexes failed (as in, there weren’t any values in those), then we can safely say the value we’re looking for is not inside the filter.

Both steps 3 and 4 can yield potential false positives, but step 5 will always be certain.

So you can be sure that while a Cuckoo filter might incorrectly make you think a value is already there, it will always be true if the result is that there is no match.

Now that we have our Cuckoo filter algorithm created, it would make sense to deploy it as a reusable component on Bit. We’ve learned how to use the Cuckoo filter, so it makes sense that we start using it when the right project calls for it.

Conclusion

Cuckoo filters are really interesting and efficient data structures that can help you gain back the performance that a massive amount of data caused you to lose.

The only thing to remember, is that they’re not 100% accurate, and that you should be fine with that before adding them to your business logic. If you are, then go for it!

If you’re not OK with that, then probabilistic filters are probably not for you.

Build microfrontends with reusable components

Bit’s open-source tool help 250,000+ devs to build apps with components.

Turn any UI, feature, or page into a reusable component — and share it across your applications. It’s easier to collaborate and build faster.

Learn more

Split apps into components to make app development easier, and enjoy the best experience for the workflows you want:

Micro-Frontends

Design System

Code-Sharing and reuse

Monorepo

Learn more


Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript was originally published in Bits and Pieces on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Bits and Pieces - Medium and was authored by Fernando Doglio


Print Share Comment Cite Upload Translate Updates
APA

Fernando Doglio | Sciencx (2023-01-19T07:22:22+00:00) Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript. Retrieved from https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/

MLA
" » Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript." Fernando Doglio | Sciencx - Thursday January 19, 2023, https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/
HARVARD
Fernando Doglio | Sciencx Thursday January 19, 2023 » Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript., viewed ,<https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/>
VANCOUVER
Fernando Doglio | Sciencx - » Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/
CHICAGO
" » Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript." Fernando Doglio | Sciencx - Accessed . https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/
IEEE
" » Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript." Fernando Doglio | Sciencx [Online]. Available: https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/. [Accessed: ]
rf:citation
» Advanced Data Structures and Algorithms: Build a Cuckoo Filter in JavaScript | Fernando Doglio | Sciencx | https://www.scien.cx/2023/01/19/advanced-data-structures-and-algorithms-build-a-cuckoo-filter-in-javascript/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.