Regular expressions in Perl : CodesDope

A regular expression (regex) is a string which represents a particular pattern. It is heavily used in Perl to check if a pattern is present in a string or not. In the practical world, we can use regular expression to validate phone numbers(must be 10 digits), email addresses(must be in the form of abc@example.com), etc.

We can check if a regular expression is present in a string or not by using $string =~ /regular_expression/ in Perl.

Simplest regular expression is using alphanumeric characters i.e., /alpha_num/, where 'alpha_num' contains only alphanumeric characters. Alphanumeric characters in a regular expression stand for themselves. It means that /abc/ will stand for the string "abc". So, let's take an example to see this:

$a = "Hey there. I am Perl";
if ($a =~ /perl/){
  print "YES, perl present\n";
}
elsif ($a =~ /Perl/){
  print "Perl present\n";
}
else{
  print "NO\n";
}

Output

Perl present

You can see that $a =~ /Perl/ checked if "Perl" is present in $a or not. Also, "perl" didn't match in the string 'a' because this matching is case-sensitive.

Using i - case-insensitive

We can use i to make the match case-insensitive. Let's see an example:

$a = "Hey there. I am Perl";
if ($a =~ /perl/i){
  print "YES, perl present\n";
}
elsif ($a =~ /Perl/){
  print "Perl present\n";
}
else{
  print "NO\n";
}

Output

YES, perl present

Using m

The // default delimiters for a match can be changed to arbitrary delimiters by putting a 'm'. It means that our pattern will then be after m and not inside //. One of the situations in which it can be useful is when we need to use '/' in the pattern itself. Let's see an example:

if ("/he" =~ m"/h"){
  print "YES\n";
}

Output

YES

Metacharacters

We can't use every character directly in a pattern. These characters are called metacharacters. These are {}[]()^$.|*+?\. To use them in a pattern, we use a backslash (\) before them. Let's see an example:

if ("10*5+8" =~ /10\*5/){
  print "YES\n";
}

Output

YES

Notice that we have used '\' before '*' because '*' is a metacharacter.

Using ^ and $

We can use ^ to match at the beginning of a string. This means that the provided pattern must be present at the beginning of the string.
We can use $ to match at end of the end of a string or before a newline. This means that the provided pattern must be present at the end of the string.

For example, '^abc' will match in 'abcdef' but '^def' won't. Also, 'def$' will match in 'abcdef' but 'abc$' won't.
'^abcdef$' will match in 'abcdef'. Let's see a working example:

if ("abcdef" =~ /^abc/){
  print "abc in front\n";
}
if ("abcdef" =~ /^def/){
  print "def in front\n";
}
if ("abcdef" =~ /abc$/){
  print "abc at last\n";
}
if ("abcdef" =~ /def$/){
  print "def at last\n";
}

Output

abc in front
def at last

Matching anything

dot (.) in a regular expression matches to everything except a newline character. It means that a dot(.) will match to every character(H, e, l, and o) and will be true for a string "Hello". We can use /H...o/ to check if a 5 lettered word which starts with H and ends with o is present or not. /H.l../ can be used to check if a five lettered word whose first letter is H and the third letter is l is present or not. Let's see an example to do this:

if ("Hello" =~ /H...o/){
  print "YES\n";
}
if ("Hello" =~ /H.t../){
  print "Yes word with t as third and H as first letter present.\n"
}
if ("Hello" =~ /H.l../){
  print "YES\n";
}

Output

YES
YES

Using []

[] is used to match a set of characters present inside brackets. For example, "sas"=~[abc] will match to a; [ab]10 will match to a1, b1, a0 and b0; "abc" =~ /[cab]/ will match to 'a' because the earliest point at which the regex can match is 'a'.

We can use [] with a hyphen (-) to match a sequence. For example, [a-z] can be used if any of the characters from a to z is present in a string or not. We can use [a-zA-Z0-9] to match every alphanumeric character. Let's see its example:

if ("Hello" =~ /[a-z]/){
  print "lowercase present\n";
}
if ("hello" =~ /[A-Z]/){
  print "uppercase present\n";
}
else{
  print "uppercase absent\n";
}
if ("Hello123" =~ /[a-zA-Z]/){
  print "No numeric\n";
}
else{
  print "Yes numeric also\n";
}

Output

lowercase present
uppercase absent
No numeric

^ inside []

The function of ^ changes when it is used inside [] at first position. If the first character inside [] is ^, then everything will be matched except the characters inside [].

Thus /[^a]/ will not match to 'a'.
[^0-9] will match to a non-numeric character.
[^c]ar will not match to car. Let's see an example:

if ("car" =~ /[^c]ar/){
  print "It is not a car.\n";
}
else{
  print "It is a car\n";
}

Output

It is a car.

Perl uses the following abbreviations for common character classes:

Abbreviation	Represents	for
\d	digit	[0-9]
\s	whitespace	[\ \t\r\n\f]
\w	word character (alphanumeric or _)	[0-9a-zA-Z_]
\D	any character except a digit	[^0-9]
\S	any non-whitespace character	[^\s]
\W	any non-word character	[^\w]
.	matches any except \n

Using | for alternation

We can use | for alternation. For example, bike|car will match either bike or car. Perl will try to match the regex at the earliest possible point in the string. Thus Perl will first check for bike and if bike is not found, then only Perl will then try the next alternative, car. Let's see an example:

if ("car" =~ /bike|car/){
  print "Either bike or car found.\n";
}
else{
  print "None found\n";
}

Output

Either bike or car found.

Checking repetitions

The metacharacters ?, * , + , and {} are used to determine the number of repetitions of a portion of a regex for a match. See the table given below to understand this:

Character	Matches	Example
?	1 or 0 times	"abc" =~ /ab?c/ matches "abbc" =~ /ab?c/ doesn't match
*	any number of times	"abbc" =~ /ab*c/ matches
+	atleast once	"abbc" =~ /ab+c/ matches "ac" =~ /ab*c/ doesn't match
{a,b}	atleast a number of times but not more than b	"abbc" =~ /ab{1,10}c/ matches "abbc" =~ /ab{5,10}c/ doesn't match
{a,}	atleast a or more number of times	"abbc" =~ /ab{1,}c/ matches "abbc" =~ /ab{10,}c/ doesn't match
{a}	exactly a number of times	"abbc" =~ /ab{2}c/ matches "abbc" =~ /ab{10}c/ doesn't match

Search and replace

We can search a regex and replace it using s///. The syntax is s/regex/replacement/modifiers. Here, 'replacement' will replace the pattern 'regex' from the string. We may or may not provide any modifier according to our need. Let's see an example of doing the same:

$a = "take ride on car";
$a =~ s/car/bike/;
print "$a\n";

Output

take ride on bike

grep

grep is a function in Perl which is used to extract or filter elements from an array after matching with some regex. Its syntax is grep(regex,ARRAY). Let's see an example for separating all the numbers from an array.

@a = (1,'a',2,'b',3,'c',4,'d',5,'e');
@num = grep(/\d/,@a);
print "@num\n";

Output

1 2 3 4 5

Prev Next