This content originally appeared on Level Up Coding - Medium and was authored by Loris Occhipinti
An amusing fact about being a software engineer is that there is always something new to learn. Although I am familiar with some common data structures like Lists, Stacks and Trees, I was guilty about not knowing about Tries.
As the name suggests, Tries are essentially a type of Tree. They are optimized for searching and retrieving keys organized in hierarchical structures: each node would store a key value and have a varying - although limited - number of children that represent valid successors in the hierarchy. The type of key values and their respective structures can differ, but a common use case is to store a dictionary of words:
data:image/s3,"s3://crabby-images/10db8/10db8dcb3a717e840f20ce2ee2b8dbb414469ac1" alt=""
Each node is an alphabet letter, and by following any branch it is possible to find all interrelated words in a given language. A possible application would be searching all the words in a dictionary that starts with a common prefix. Why are Tries essential for this task? Well, it becomes quickly self-evident that searching words with shared prefixes is a daunting task when using alternative data structures. For instance, although Hashmaps allow for value retrieval in constant time, there is no way to find all the hashes of related keys. On the other hand, it would be horribly slow to search in a List. So what could we do instead? The superpower of Tries is that searching related values can be done in logarithmic time — sometimes in nearly constant time.
Programming with Dinosaurs
Storing language dictionaries is not the only possible use case for Tries. In fact, I can think of another class of hierarchically organized symbols: taxonomy ranks. To put it simply for non-experts like me, taxonomy ranks are a hierarchical system used by botanists, zoologists and paleontologists to classify any living being. By using this system, it is possible to build a bush-like structure that shows the relationships between different species. Of course, this gives me a perfect excuse to introduce Dinosaurs into the mix while studying Tries! It sounds like a win-win to me.
data:image/s3,"s3://crabby-images/3f067/3f06737775f423ef67f9ef38dc4717db7aabda0a" alt=""
Data structure and operations
Let’s implement our Taxonomy Trie. Any given Rank (aka a Trie node) will have a set of children Ranks that represent the smaller rungs in the Dino taxonomy, and will feature an isSpecies marker that will tell us if we reached a leaf of the tree of life:
Of course, we need an entry point for our Taxonomy structure, providing a root Rank element and some basic operations to create hierarchies and search for values:
Checking the isSpecies flag will allow for searching for exact species matches; in alternative, we can just verify if the given rank is present in the structure.
DinoTries in action
Consider this group of lovely Sauropodomorpha (long-neck dinosaurs related to the Brontosaurus): aardonyx celestae, ammosaurus major, ankylosaurus polyzelous, melanosaurus readi, riojasaurus incertus
Although a quick search in a paleontology reference book would suggest that these dinos are interrelated, it is still difficult to grasp their exact position in the taxonomy. However, a Trie data structure quickly reveals the underlying relations:
Dinosauria
└── Saurischia
└── Sauropodomorpha
└── Prosauropoda
└── Anchisauria
├── Anchisauridae
│ ├── ankylosaurus polyzelous
│ └── ammosaurus major
├── aardonyx celestae
└── Melanorosauridae
├── melanorosaurus readi
└── riojasaurus incertus
An algorithm expedition
Imagine being a paleontologist on the verge of a breakthrough discovery. You just examined a new fossil that changes everything you used to believe about Anchisauria. You only need to perform a quick check about the Anchisauridae family representatives: after all, you don’t want to meet the disdain of your fellow paleontologist for a gross mistake. You quickly insert the query into your field terminal, hoping that the underlying algorithm won’t take ages before giving a satisfactory answer.
For a Taxonomy Trie like ours, searching for all Anchisauridae or any other clade would be implemented like this:
Your fame as a paleontologist is secured as you rapidly find out about the Anchisauridae family members:
Anchisauridae
├── ankylosaurus polyzelous
└── ammosaurus major
References
- DinoTrie repository
- OpenGeology: Free Open Educational Resources for the Geosciences
- kjanjua26/jurassic-park: a dinosaurs dataset
Learning Tries with Dinosaurs was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Loris Occhipinti
data:image/s3,"s3://crabby-images/02712/02712ed05be9b9b1bd4a40eaf998d4769e8409c0" alt=""
Loris Occhipinti | Sciencx (2022-04-24T23:29:32+00:00) Learning Tries with Dinosaurs. Retrieved from https://www.scien.cx/2022/04/24/learning-tries-with-dinosaurs/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.