This content originally appeared on Level Up Coding - Medium and was authored by Joel Joseph
Natural Language Processing (NLP) is one of the main applications of deep learning. With the help of deep learning, we give machines the ability to make sense of natural language. Using deep learning we can solve a bunch of problems like text translation, test auto-completion, spelling check, sentiment analysis, etc.
In this article, I wish to give a tutorial on how to make a mail spam classifier using deep learning techniques. We will be using TensorFlow and Keras deep learning frameworks. Here I am just providing a detailed explanation of things, to view the code kindly refer to my GitHub repo.
Why Deep Learning?
Before we answer this question. Let’s ask why machine learning? You see you can solve this question by using traditional software engineering techniques. You collect some data containing some examples of spam and not spam(ham) mails. You observe a few things like spam mails containing a lot of unnecessary usage of exclamation marks “!!!!”, the emails contain words like “off, sale, %, urgent, limited-time deal”, etc. Now you can write an algorithm where you keep a list of words and things that you find common in the spam mails. Now, whenever a new mail contains any of the words from your list of observations you mark the mail as spam. The problem with this solution is that this is not a permanent solution for the problem — meaning, say the spammers now understand the way your spam classifier is working — you are just looking for certain words from the data, the spammers now modify their mails with changes and now your algorithm fails. You have to constantly update your system manually.
With the help of machine learning, we say, “These are some mails that I have some are spam and some are not spam, please come up with a way to classify these mails”. Your machine then goes through these emails and learns some correlations to classify emails. The advantage here is that if the spammers now change the structure of the mail your machine is actively learning these changes, as you haven’t programmed it explicitly as you did in the first case. Thus the definition of machine learning says that “Machine learning involves computers discovering how they can perform tasks without being explicitly programmed to do so.”
Now deep learning is a subset of machine learning algorithms. Deep learning is great when it comes to applying machine learning to unstructured data like images, text, audio, etc. Thus to build our spam classifier we use deep learning techniques.
Few additional readings
- Now to build this classifier I use the TensorFlow library. Now TensorFlow is a deep learning framework developed by Google. Using TensorFlow you can build these deep learning models with ease.
- Keras — Keras is a higher-level API built on top of TensorFlow. Initial Keras and TensorFlow were separate libraries. After TensorFlow 2.0 Keras is now integrated within TensorFlow itself.
- Colab — Google Colab is a free cloud-based virtual environment where you get access to free GPU to train your deep learning models. You get python, Jupyter notebook UI, etc.
- LSTMs — Long short term memory(LSTM) is a special type of neural architecture. It comes from a base structure called recurrent neural networks (RNN). If you are not aware of LSTMs and RNN refer to this video.
Solving the problem
“You can apply machine for literally anything as long as you can convert it into numbers and program it to find patterns” this was a comment that I read on a YouTube video, let’s break it down.
So you can apply machine learning as long as you can convert your inputs to numbers, you see when you use machine learning or deep learning for computer image problems you convert your images into numbers, now computers natively store images in the form of matrix and pixel numbers so no need to worry. But, for text computers natively stores text as text it is not stored in the form of numbers, now you might argue upon ASCII values and stuff, but that’s not what I mean. What I mean is that your program should see text as numbers and not as native text.
Now the second statement is “program it to find patterns” not that you have converted your data(input/output) to numbers now you need to program it to find patterns, that’s where deep learning comes into the picture. You see machine learning is all about finding correlations and patterns in the data with the help of these correlations identified in the data your program has an understanding of how the data is progressing thus when you give new data it can predict something based on the previous correlations learned.
Now that we understand the problem and also how to solve the problem we can now try solving it.
1. Learning Word Embeddings
The very first step when it comes to solving is to convert the native text into numbers. This can be done by using some sort of frequency sampling method. For instance suppose you have a sentence like, “I love dogs!”, you can convert this to “1 2 3” with 1--I, 2 --love, and 3 --dogs. Now, say you have another sentence like “I love cats!”, you can convert this to “1 2 4” from the above example 1--I, 2 --love, and 4--cats. So like this whenever you find a new word you assign that to a new value and move on. This is called tokenizing.
Now just converting words to numbers is not enough, your neural network must understand the meaning and correlation between words. For instance, words like cats, and dogs, must be highly correlated as they point to the same thing i.e, pets, four-legged animals, etc. Now words like Human, and dogs shouldn’t be highly correlated.
Now to learn a correlation like this we need to learn something called word embeddings. This is just a matrix of numbers that just contains these learned embeddings. Refer to the video for a detailed explanation.
Now there are certain pre-trained word embeddings that you can use, refer to Keras docs to understand the same.
Keras has its embedding layer. Refer to the embedding layer docs.
So The Idea So Far
- The very first thing that we did was load the dataset.
- We then converted the text to numbers.
- Then we generated a matrix that makes sense of these numbers.
- Now we are ready to feed this newly generated matrix to our neural network.
To view the code kindly refer to my GitHub repo.
2. Building the LSTM network
The idea of a simple RNN is to take sequential data and pass it sequentially through each time step. Now the problem is that RNN suffers from a vanishing gradient problem and does not perform well for long-range sentences.
Consider the sentences, “Nothing’s tech is great” and “Nothing can be done now”. In the first sentence, I’m referring to the Nothing brand and in the second sentence, I’m talking about a situation.
In any case, what the word “nothing” means in the context of the sentence is essential. This information must be preserved and must be passed to the later layers of the neural network which the simple RNN model cannot achieve.
Thus, we use LSTM where the LSTM calculates certain gate values related to the context of the sentence.
We can use the LSTM layer that TensorFlow provides, refer to docs here.
To view the code kindly refer to my GitHub repo.
Finally, we have come to an end of the tutorial. I hope you learned something. Thank you for reading!
More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Join our community Discord.
Build a Mail Spam Classifier Using Tensorflow and Keras was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Joel Joseph
Joel Joseph | Sciencx (2022-04-06T13:20:13+00:00) Build a Mail Spam Classifier Using Tensorflow and Keras. Retrieved from https://www.scien.cx/2022/04/06/build-a-mail-spam-classifier-using-tensorflow-and-keras/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.