Data Structure Series: Hash Table

Introduction

We use forks to eat pasta, spoons to eat soup, and chopsticks to eat dumplings. Each silverwares have its advantages/disadvantages, hence working better than the other for the food that it interacts well with. Just like that, …


This content originally appeared on DEV Community and was authored by Chuck Choi

Introduction

data-structure-series-intro

We use forks to eat pasta, spoons to eat soup, and chopsticks to eat dumplings. Each silverwares have its advantages/disadvantages, hence working better than the other for the food that it interacts well with. Just like that, different data structures are better suited and perform better than the others based on the situations/use cases. They each have their pros and cons. Understanding these pros and cons can help you be a better programmer, as it will allow you to choose an appropriate data structure(s) based on the circumstances/goals you have, and it helps to drastically improve the performance of the algorithm being applied. I will be putting these blog series together on well known programming data structures in JavaScript, and link them all in one blog post in the future. Feel free to leave a comment if you have any questions!

Table of Contents

1. What Is Hash Table?
2. Hash Function
3. Implementation in JavaScript
4. Helper Functions
5. Handling Collisions
6. Big O
7. Helpful Resources

1. What is Hash Table?

Hash Table is a data structure of associative array that stores key/value paired data into buckets.

Considered being one of the most important data structures in computing, Hash Table is used in many areas of applications: password verifications, cryptography, compilers, and the list goes on. Due to its efficiency and speed in searching, insertion, and removal of data, it is a widely applicable and preferred data structure in many cases. A Hash Table is a data structure of associative array that stores data as a key/value pair into a bucket.

hash-table

How Hash Table works is that it takes a key and a value as inputs, then runs the key through a hash function that turns it into an index. This process is called hashing. The index is used to map the value of the input into the table's bucket. The hash function is irreversible, which makes it secure and reliable. It is possible for two different keys to result in the same index though, and this is called a collision. A collision can override the previous key's placeholder if not handled. There are various ways to handle hash collisions though -- Separate Chaining being one of them which often uses Linked Lists inside the bucket to store multiple data in the same index. We will get into it later in this post. But first, let's discuss how hash function works in a nutshell.

2. Hash Function

The hashing process in computing is like hashing a potato to make hashed brown.

potato

Hash functions, or hashing algorithms generate a fixed-length result from a given input. This process is called hashing. The fixed-length result is used in Hash Tables as an index to map the input into a hash bucket. The hashing process in computing is like hashing a potato to make hashed brown. You could think of potato as key input, grater as hash function, and shredded potato as index that is hashed out as a result of the hash function. Just like how you can't turn shredded potato back into a whole potato, hash functions are irreversible -- it's a one way algorithm.

Here is an example of a hash function in JavaScript:

function hash (key, size) {
    let hashedKey = 0;
    for (let i = 0; i < key.length; i++) {
        hashedKey += key.charCodeAt(i)
    }
    return hashedKey % size
}

Pseudocode:

  • This function accepts two arguments: string key to hash, and size of hash buckets
  • Initialize a variable named hashedKey as 0 to return at the end
  • Iterate each of the string's characters to sum up their character codes
  • After the iteration, use modulo operation (%) to find the remainder of the hashedKey / size and set it as new hashedKey
  • Return hashedKey

Explanation
In the above algorithm, we are initializing a variable hashedKey as 0. The value of this variable will change based on the string and be returned as a result of this function. We need a way to represent each of the letters into numbers, this way the matching string key that goes through the function will always convert to the same integer. JavaScript's string method charCodeAt() allows us to convert a string character into an integer representing the UTF-16 code unit.

With that being said, we are using a for loop to iterate every character of the key input. For each character being iterated, we are using the charCodeAt() method to convert the character and add it to hashedKey variable we defined at the beginning. Once we sum up all the integers that represents each characters, we execute an modulo operation % using the size of the bucket (second argument of the function) as a divisor. Modulo operation not only guarantees that the resulting integer is in range between 0 to the size of the bucket, but also makes the result irreversible.

This is a very simple and basic hash function that can be improved better. I recommend you to check out this blog post if you are interested in learning about different hash functions designed by mathematicians and computer scientists around the world. Now it is time to implement the hash table in JavaScript!

3. Implementation in JavaScript

class HashTable {
    constructor(size=53) {
        this.size = size
        this.buckets = new Array(size);
    }
    _hash (key) {
        let hashedKey = 0;
        for (let i = 0; i < key.length; i++) {
            hashedKey += key.charCodeAt(i)
        }
        return hashedKey % this.size
    }
}

let table = new HashTable()

console.log(table) // HashTable {size: 53, buckets: Array(53)}

The above Hash Table class has two properties:

  1. size: the number representing the size of the buckets, and we are using prime number 53 as a default value (choosing a prime number for the hash table's size reduces the chances of collisions)
  2. buckets: buckets are the placeholders for each data (key/value pair), and we are using Array class to create an empty array with size of 53 indices

And we have the _hash method similar to what we created earlier, but only difference is that it is not taking in the size as second argument since we are using the size of the object created from the Hash Table class. With this, we can create an object with buckets array that contains default size of 53 indices or a specified size.

Let's go ahead and add some methods to this Hash Table!

4. Helper Functions

set()

// adds key-value pair into hash table's bucket
set(key, value) {
    let index = this._hash(key)
    this.buckets[index] = [key, value];
}

Pseudocode:

  • Accepts a key and a value
  • Hashes the key
  • Stores the key-value pair in the hash bucket

get()

// retrieves the value of the key from its respective bucket
get(key) {
    let index = this._hash(key)
    return this.buckets[index][1] // returns value of the key
}

Pseudocode:

  • Accepts a key
  • Hashes the key
  • Retrieves the key-value pair in the hash bucket

remove()

// removes the key-value pair from the hash table's bucket
remove(key) {
    let index = this._hash(key)
    let deleted = this.buckets[index]
    delete this.buckets[index]
    return deleted
}

Pseudocode:

  • Accepts a key
  • Hashes the key
  • Retrieves the key-value pair in the hash bucket and stores it
  • Delete the key-value pair in the hash bucket (use delete operator to empty the element, doesn't affect the array size)
  • Returns the stored key-value pair

All the helper functions in this data structure is fairly simple -- they all utilize the hash function we defined earlier to retrieve the index that is associated with the key passed, and access the array's element in that index. There's a problem with these methods though. What happens if the hash function returns the same index for two different inputs? Our hash function is fairly simple so this will happen for sure. If so, it will override the bucket that is already occupied or get method will retrieve a wrong value that we aren't looking for. How can we improve these helper methods to handle the collisions?

5. Handling Collisions

collision

As we discussed earlier, it is possible for a hash function to produce collisions: returning the same index for multiple different keys. Unfortunately, even under the best of circumstances, collisions are nearly unavoidable. Any hash function with more inputs than outputs will necessarily have such collisions; the harder they are to find, the more secure the hash function is.

There are multiple ways to handle collisions though, and the two common techniques are Separate Chaining and Linear Probing.

Separate Chaining: If there is only one hash code pointing to an index of array then the value is directly stored in that index. If hash code of second value also points to the same index though, then we replace that index value with a linked list or array and all values pointing to that index are stored in the list. Same logic is applied while retrieving the values, we will have to iterate all the elements inside a bucket if the bucket stores multiple key-value pairs. In short, separate chaining creates a list-like object inside a bucket to store multiple data with collisions.

Linear Probing: Linear Probing technique works on the concept of keep incrementing the hashed index until you find an empty bucket. Thus, Linear Probing takes less space than Separate Chaining , and perform significantly faster than Separate Chaining (since we don't have to loop through lists inside buckets).

Although Separate Chaining is significantly less efficient than Linear Probing, it is easier to implement. Here's how we can improve the helper methods we defined by utilizing Separate Chaining (we will use Array instead of Linked List for simplicity):

set()

// adds key-value pair into hash table's bucket
set(key, value) {
    let index = this._hash(key)
    if(!this.buckets[index]) {
        this.buckets[index] = [];
    }
    this.buckets[index].push([key, value]);
}

Pseudocode:

  • Accepts a key and a value
  • Hashes the key
  • If the hash bucket is empty, set it as an empty array
  • Push the key-value pair in the array inside the bucket

get()

// retrieves the value of the key from its respective bucket
get(key) {
    let index = this._hash(key)
    if(this.buckets[index]) {
        for(let i = 0; i < this.buckets[index].length; i++) {
            if(this.buckets[index][i][0] === key) {
                return this.buckets[index][i][1]
            }
        }
    }
    return undefined
}

Pseudocode:

  • Accepts a key
  • Hashes the key
  • If the bucket is truthy, iterate each key-value pair inside the bucket
  • If the key matches the pair, return the value of the pair
  • return undefined if the bucket is empty

remove()

// removes the key-value pair from the hash table's bucket
remove(key) {
    let index = this._hash(key)
    if(this.buckets[index]) {
        for(let i = 0; i < this.buckets[index].length; i++) {
            if(this.buckets[index][i][0] === key) {
                return this.buckets[index].splice(i, 1)
            }
        }
    }
}

Pseudocode:

  • Accepts a key
  • Hashes the key
  • If the bucket is truthy, iterate each key-value pair inside the bucket
  • If the key matches the pair, remove the pair and return it

6. Big O

big-o

  • Space Complexity:

    • O(n)
    • Space complexity of this data structure is linear: as the size of the buckets increase, so does the space
  • Set/Get/Remove:

    • Average: O(1) Time Complexity
    • Worst Case: O(n) Time Complexity
    • All these helper methods use hash function to look up the indices. Hash function takes constant time, but the time complexity can get linear with buckets with multiple elements due to collisions. More items will mean more time to look inside the bucket, hence taking linear time (O(n))

7. Helpful Resources

Online Course (Udemy Course)
Check out this Udemy course named JavaScript Algorithms and Data Structures Masterclass! It is created by Colt Steele, and I referenced his code for the data structure implementation part of this blog post. Personally, I didn't know where to start with algorithms and data structures especially coming from a non-tech background. This course is very well structured for beginners to build a foundation on these topics.

Visual Animation (VisuAlgo)
Data structures can be difficult to comprehend for some people just by looking at the code/text. The instructor in the course above uses a website named VisuAlgo that has visual representation of algorithms and data structures through animation.

Data Structure Cheat Sheet (Interview Cake)
Also, here's a really well-summarized cheat sheet/visualizations on data structures.

CS50's Hash Tables Lesson (YouTube Video)
I came across this YouTube video thanks to one of the DEV Community users Alex @tinydev ! It's part of Harvard's CS50 course, and they do a great job explaining Hash Tables.


This content originally appeared on DEV Community and was authored by Chuck Choi


Print Share Comment Cite Upload Translate Updates
APA

Chuck Choi | Sciencx (2021-08-16T22:28:31+00:00) Data Structure Series: Hash Table. Retrieved from https://www.scien.cx/2021/08/16/data-structure-series-hash-table/

MLA
" » Data Structure Series: Hash Table." Chuck Choi | Sciencx - Monday August 16, 2021, https://www.scien.cx/2021/08/16/data-structure-series-hash-table/
HARVARD
Chuck Choi | Sciencx Monday August 16, 2021 » Data Structure Series: Hash Table., viewed ,<https://www.scien.cx/2021/08/16/data-structure-series-hash-table/>
VANCOUVER
Chuck Choi | Sciencx - » Data Structure Series: Hash Table. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/08/16/data-structure-series-hash-table/
CHICAGO
" » Data Structure Series: Hash Table." Chuck Choi | Sciencx - Accessed . https://www.scien.cx/2021/08/16/data-structure-series-hash-table/
IEEE
" » Data Structure Series: Hash Table." Chuck Choi | Sciencx [Online]. Available: https://www.scien.cx/2021/08/16/data-structure-series-hash-table/. [Accessed: ]
rf:citation
» Data Structure Series: Hash Table | Chuck Choi | Sciencx | https://www.scien.cx/2021/08/16/data-structure-series-hash-table/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.