This content originally appeared on DEV Community and was authored by Alok Kumar
In this article, we are going to talk about Regular expressions, or in general, it is called "Regex" in which "Reg" stands for Regular and "ex" stands for Expressions. In this article, I'm going to use the Regex word only, for Regular Expressions.
So let's first talk about what are regex?
Regex are used to match certain patterns inside a string. Or, in simpler words, with the help of regex, we can search through a text for a specific combination of characters.
Let's understand it with the help of a pictorial example -
In the above picture, imagine the first line to be a string. And the test cases are regex, so with the help of regex, we search through the string for a specific character, and if it matches, it returns true as in the second line and if not, then it returns false as in the last line.
Now let's understand it with the help of a real example -
Regex- /name/
Text- My name is Alok.
Output- true
We’ll talk about regex structure next but first, let's see what is happening here. The regex - name is searched through the text -”My name is Alok.”. and as it matches, it returns true as we saw in the pictorial example.
Apart from this, there are a lot of things that we can do with regex that we'll see later in this article, so stick to it till the end.
Regex Structure
Let's talk about the structure of regex( literal syntax ) -
/pattern/
A pattern is written between two forward slashes ( / ), which is to be matched with the text.
A pattern can be a single character, a word, or tokens using special characters that we’ll see next.
Examples-
A single character
Regex- /n/
Text- My name is Alok.
Output- true
A word
Regex- /Alok/
Text- My name is Alok.
Output- true
Position Anchors
With the help of anchors, we can match the beginning or the end of a text. Some common anchors are ( ^ )caret and ( $ )dollar -
^ searches start of the string
Regex- /^My/
Text- My name is Alok.
Output- true ( matches with My at the beginning )
$ searches end of the string
Regex- /Alok$/
Text- My name is Alok.
Output- false ( don't match as there is a dot (.) at the end )
We'll see some more regex after talking about flags and escaping.
Flags
Flags are used to give extra functionality to the searching.
It can be written as - /pattern/flags
Example - /my/g
Flags can be used separately or together in any order.
There are many flags used in regex, but the most common are -
Keyword | Flag | Description |
---|---|---|
g | global | matches the whole string and don't return after the first match |
Example -
Regex- /n/
Text- My name is Aman.
Output- true ( it matches only the first occurrence )
Regex- /n/g
Text- My name is Aman.
Output- true ( now it matches all the n’s in the string )
Keyword | Flag | Description |
---|---|---|
i | insensitive | ignore case while matching |
Example -
Regex- /Alok/g
Text- alok Alok
Output- true ( by default it is case sensitive )
Regex- /Alok/gi ( Here we are using both g and i flags )
Text- alok Alok
Output- true ( now it is case insensitive )
Keyword | Flag | Description |
---|---|---|
m | multi line | ^ and $ matches start/end of every line |
Example -
Regex- /^My/g
Text-
My name is Alok.
My name is Aman.
Output- true ( only matches in the first line )
Regex- /^My/gm
Text-
My name is Alok.
My name is Aman.
Output- true ( now matches in all lines )
Escaping
Escaping is used to treat special characters as text.
For example, special characters like ^ or $ have special meaning in regex, i.e. matching at the beginning or last of the string, but what if we want to match a caret ( ^ ) symbol as a text, not a special character?
Thus we use backslash( \ ) for this. Let's understand it with an example -
We have to use a backslash ( \ ) before any special symbol to treat it as text.
Example ( if we want to match ^Alok ) -
Regex- /^Alok/g
Text- alok ^Alok
Output- false ( as it doesn't treat ^ as text )
Regex- /\^Alok/g
Text- alok ^Alok
Output- true ( now it treats ^ as text )
Some common and important Regex -
Character Classes -
Character classes can be defined using bracket [ ] notation. It matches characters inside the [ ]. Example - /[ab]/ - it matches either a or b, and if used with the g flag, it matches all the a and b in the text.
Examples -
Regex- /[abc]/ ( using /[]/ matches any character inside [ ] )
Text- apricot boy ccd
Output- true ( Matches a, b or c character )
Regex- /[abc]/g ( using /[]/g matches all characters inside [ ] )
Text- apricot boy ccd
Output- true ( Matches a, b and c character )
Using a caret (^) symbol matches any character except the characters inside [ ]. And similarly, when used with the g flag, it matches all except characters inside [ ].
Examples -
Regex- /[^abc]/
Text- apple boy cat
Output- true ( Matches any character except a, b and c character )
Regex- /[^abc]/g ( using /[^]/g matches all except characters inside [ ] )
Text- apple boy cat
Output- true ( Matches all, even spaces except a, b and c character )
Using a hyphen ( - ) symbol, we can define a range of characters to be matched.
Example -
Regex- /[a-z]/g ( using /[a-z]/g matches all characters in range of a-z )
Text- Alok
Output- true
Similarly, using a caret (^) symbol, we can match any character, not in the range defined in the character class.
Example -
Regex- /[^a-z]/g ( using /[^a-z]/g matches all except in range of a-z )
Text- Alok
Output- true
We can also combine different ranges together in a character class.
Example -
Regex- /[a-zA-Z]/g ( /[a-zA-Z]/g matches all in range of a-z and A-Z )
Text- Alok
Output- true
Alteration -
Alteration can be defined using or( | ) symbol, and it works similarly as OR function. Example - a|b - it matches a or b. And if the g flag is not used, the first occurrence of a or b is matched, and the other is ignored.
Examples -
Regex- /a|b/ ( /a|b/ matches first occurrence of a or b )
Text- boy apple
Output- true ( it ignores a, even if it matches )
Regex- /a|b/g ( /a|b/g matches a and b both )
Text- apple boy ape
Output- true
Predefined character classes -
Many of the commonly used character classes come with predefined shortcuts. Let's see some of them -
.
Regex- /b.n/g ( . matches any single character )
Text- ban bin
Output- true
\s
Regex- /\s/g ( \s matches any whitespace character )
Text- apple boy cat
Output- true
Similarly, \S matches all non-whitespace characters.
\d
Regex- /\d/g ( \d matches any digit, equivalent to [0-9] )
Text- Ak47
Output- true
Similarly, \D matches all non-digit characters, equivalent to [^0-9].
\w
Regex- /\w/g ( matches any word, equivalent to [a-zA-Z0-9_] )
Text- Ak-47
Output- true
Similarly, \W matches all non-word characters, equivalent to [^a-zA-Z0-9_].
Repetition Quantifiers -
Using repetition quantifiers, we can specify how many times a character should match. Let's see some of them -
Question Mark ( ? ) -
( ? ) matches the previous token between zero and one time.
Example -
Regex- /ba?/g ( /ba?/g matches zero or one of a with b )
Text- a b ba baa baaa
Output- true
Asterisk symbol ( * ) -
( * ) matches the previous token between zero and more times.
Example -
Regex- /ba*/g ( /ba*/g matches zero or more of a with b )
Text- a b ba baa baaa
Output- true
Plus symbol ( + ) -
( + ) matches the previous token between one and more times.
Example -
Regex- /ba+/g ( /ba+/g matches one or more of a with b )
Text- a b ba baa baaa
Output- true
Curly brackets ( { } ) -
{ } can be used in three ways. Let's see them one by one -
{3} matches the previous token exactly 3 times
Example -
Regex- /a{3}/g ( /a{3}/g matches exactly 3 of a )
Text- a aa aaa aaaa
Output- true
{3,} matches the previous token between 3 and more times
Example -
Regex- /a{3,}/g ( /a{3,}/g matches 3 or more of a )
Text- a aa aaa aaaa
Output- true
{2,4} matches the previous token between 2 and 4 times
Example -
Regex- /a{2,4}/g ( /a{2,4}/g matches between 2 and 4 of a )
Text- a aa aaa aaaa aaaaa
Output- true
Groups
Groups can be used to treat more than one character as a single unit.
For using groups, we use parenthesis ()
For example, when we use /ba+/, the + searches for one or more of a with b as we have seen in the above example, but if we use group, i.e. /(ba)+/ now the + will search for one or more ba together.
Example -
Regex- /t|The/g
Text- The dog jumps over the fence.
Output- true
Regex- /(t|T)he/g
Text- The dog jumps over the fence.
Output- true
As you can see above that now, he acts on both t and T
I want to talk about two important concepts of groups - Positive and negative lookaheads. Let's talk about them one by one.
Positive Lookaheads (?=...) -
Using positive lookaheads, we can search for texts followed by a specific string.
Example -
Regex- /foo(?=t)/g
Text- food foot
Output- true
In this, only the foo followed by t is matched.
Negative Lookaheads (?!...) -
Using negative lookaheads, we can search for texts not followed by a specific string.
Example -
Regex- /foo(?!t)/g
Text- food foot
Output- true
In this, only the foo not followed by t is matched.
Note - The word inside ( ) i.e. t is not matched.
How to Use in JS
We have seen how to use regex but now let's see how to implement it in javascript.
And which is very similar to what we have seen so far.
Example -
let regex = /Alok/
let text = "My name is Alok."
let isExisting = regex.test(text)
console.log(isExisting) //gives output true
All you have to do is, store the regex and the string in variables and then use the test() function to match the regex in the string.
And it'll give output true or false, as we have seen so far in our examples.
We can also use the RegExp() constructor to define regex.
Let's have a look at how it is done -
let regex = new RegExp("Alok");
And it works the same as earlier. But as you can see, the literal syntax is a bit easier and works the same way, so generally, developers prefer using literal syntax only.
Different Use Cases
There are many use cases of regex but let's see some important ones -
match() -
The match() method searches the string for matches and returns an array of them, or null if no matches are found.
Example -
let regex = /[A-Z]/g
let text ="My name is Alok."
let found = text.match(regex)
console.log(found) // returns an Array = ["M","A"]
replace() -
The replace() method accepts two arguments. One is the regex to be searched, and the other is the text with which the matches are to be replaced.
Example -
let regex = /[A-Z]/g
let text ="My name is Alok."
let found = text.replace(regex,"S")
console.log(found) // returns - Sy name is Slok.
Note - The method test() that we have seen earlier is a RegExp method that takes a string as a parameter, whereas the methods match() and replace() are String methods that take a regular expression as a parameter.
Applications
Here I'm going to list some applications of regex or areas in which regex are used that you can study in deep if you like -
- Validating User Inputs
- File Searching
- Data scraping
- Lexical Analysis in a compiler
- Search and Replace in a text editor
- Parsing
- Search Engines
What’s Next?
The applications and types of regex are enormous, which are beyond to be covered in one article.
Regex is something with which even experienced developers also struggle. So, you don't have to worry if all these are a little overwhelming for you.
Most of the commonly used regex can be easily found on the internet whenever you need them.
Next, I'd suggest you visit regex101.com and practice all we have done so far.
It's a great platform to play around with regex and learn. Also, it comes with a regex library where people post important regex which are commonly used.
Thanks for reading ?
If you find this useful then you can share it with others :)
Feel free to drop a Hi and let's chat ???
This content originally appeared on DEV Community and was authored by Alok Kumar
Alok Kumar | Sciencx (2021-08-15T09:26:29+00:00) Regex 101. Retrieved from https://www.scien.cx/2021/08/15/regex-101-2/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.