This content originally appeared on DEV Community and was authored by Saji Wang
One of the most important data structures that is used professionally is without a doubt the Hash Table.
A lot of data structures that are considered important are never actually used directly by most developers. The same can’t be said about Hash Tables. Being able to understand when and how to use a Hash table is a necessary condition for building effective software.
In this article, we’ll go over how Hash tables work and why they matter for the average developer.
How Do Hash Tables Work?
Hash Tables are a way of storing collections of data through key-value pairs. This effectively works as a dictionary.
For example, if we were using a hash table for a list of contacts, it might look like this:
So what does this actually look like at a low-level? Well first consider that our list of values needs to be stored as an array in the computer:
A hashtable works by allowing us to find the correct index for a particular value from a key, which is much more descriptive than a numerical index.
Hash tables achieve this through the use of what’s known as a hash function.
A Hash function is a function that takes in an input that could have any number of possibilities(Such as a name in our contact book example) and returns a number from a finite set of numbers(Our array index). Hash functions are used in a variety of instances but in the context of hash tables, it transforms our key into a given numerical value, and then maps that numerical value onto an array index, which can then be mapped onto our corresponding value.
The important attribute of a hash function is that it works by performing a computation on the input key. This means the speed at which it can convert the key into a hash is not dependent on how many different items we are storing!
For example, a lookup for our hashtable might look this:
“Jane” -> 41523563 -> 1 -> {Phone: 123-555-3344, Company: Big Oil}
Why Use a Hash Table over a Simple Array
It may not be immediately clear why this is so helpful.
Let’s say we’re building a blog platform and we have a user who just selected the article they need to read, and we now have to retrieve the blog article for them from a list of articles.
Now when we are trying to find the article from a list, it is unlikely that we already have its index for the array. The likely scenario is that we have some kind of identifying value for the article, such as the title of the article.
Let’s say we just had all the articles in an unsorted array. How could we find the article? We’d have to go through the array item by item until we find the article that matches our title. This is going to take an incredibly long time, especially if we have a large number of articles to check.
Now what if we instead had it in a hash table. Well now we could just plug our title into our hash function, and get back the correct index. From there we know exactly where in the array to look for our article information. Due to the hash function, this process is going to take the same amount of time no matter how many articles we have!
This makes Hash Tables one of the most important parts of building any piece of software. This dictionary format of storing and accessing data is most conducive to how applications actually work!
Collisions and Separate Chaining
Okay, now I actually lied a bit when I said that the time it takes to search a hash table doesn’t change based on the size of the table.
A key issue that needs to be considered when working with hash tables are what are known as collisions.
Now remember that a hash function is responsible for mapping an infinite amount of possibilities to a finite amount of indices:
As a result, when we are working with enough data points, it might be that our hash function maps two or more keys to the same array index. This is known as a collision.
So does this make Hash tables useless? Of course not, it is just something that needs to be accounted for. There are a number of methods to resolve collisions, but one of the most common is called separate chaining.
Separate Chaining is a process of storing LinkedLists(Lists of elements that point to each other instead of being indexed) of the values, instead of individual values at each index. Additionally, each entry in our hashtable now also needs to store the key.
Thus after passing our key to the hash function and looking up the correct index, we sift through the (hopefully short) linked list to find the entry that corresponds to our key.
While this can make the lookup time of a hash table increase slightly as the amount of key-value pairs grows, it is almost always going to be more efficient than a simple array.
Whats Next?
So how can you get comfortable working with hash tables? One of the best ways is to try and implement a hash table yourself in your favorite language, without the help of any data structure libraries.
Note that Hash Tables aren’t always the best way to organize collections of data. If our data has some natural ordering that makes lookup easy, or if it can be represented well in a tree, hash tables might be unnecessary or even non-optimal.
Being able to think rigorously about how to best represent your data is an important part of software engineering. Being well-versed in data structures and algorithms can help you navigate these kinds of questions.
What questions do you have about Hash Tables, let us know down below and we’ll try our best to answer them!
As important as hash tables are, your code isn’t the only thing you should be optimizing! At Codesphere, we’re building an all-in-one development platform to make developer workflows seamless. Try our platform out and let us know what you think!
This content originally appeared on DEV Community and was authored by Saji Wang
Saji Wang | Sciencx (2022-06-20T14:49:00+00:00) Understanding Hash Tables and Why They are Important. Retrieved from https://www.scien.cx/2022/06/20/understanding-hash-tables-and-why-they-are-important/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.