Deprecated: Function create_function() is deprecated in /home/ursaftwz/public_html/myscripthub.com/wp-content/plugins/codecolorer/lib/geshi.php on line 4698

Regular Expressions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
Regular Expressions Tutorial
  I have searched the web far and near for a good tutorial on PHP Regular Expressions
  and I have come up with a multitude of sites. However, I needed just a little bit of
  information from each of the sites and I ended up trying to move between 10 different
  webpages to get the information I needed at a particular time. This tutorial is a
  collation of all those bits of information. Some of this is my work, but it is mostly
  good collection of other tutorials available out there. In order to give authors credit
  for their work, I have included ALL the links of those pages and if anyone feels like
  this is an outrage, let me know and I will take down the relevant information.
  So here goes...

Advertise Here


 
Basic Syntax of Regular Expressions (as from PHPBuilder.com)
First of all, let's take a look at two special symbols: '^' and '$'. What they do is indicate the
start and the end of a string, respectively, like this:

"^The": matches any string that starts with "The";
"of despair$": matches a string that ends in the substring "of despair";
"^abc$": a string that starts and ends with "abc" -- that could only be "abc" itself!
"notice": a string that has the text "notice" in it.
You can see that if you don'
t use either of the two characters we mentioned, as in the last example,
you're saying that the pattern may occur anywhere inside the string -- you're not "hooking" it to any of the edges.

There are also the symbols '*', '+', and '?', which denote the number of times a character or a sequence of
characters may occur. What they mean is: "zero or more", "one or more", and "zero or one." Here are some examples:




"ab*": matches a string that has an a followed by zero or more b's ("a", "ab", "abbb", etc.);
"ab+": same, but there'
s at least one b ("ab", "abbb", etc.);
"ab?": there might be a b or not;
"a?b+$": a possible a followed by one or more b's ending a string.
You can also use bounds, which come inside braces and indicate ranges in the number of occurences:

"ab{2}": matches a string that has an a followed by exactly two b'
s ("abb");
"ab{2,}": there are at least two b's ("abb", "abbbb", etc.);
"ab{3,5}": from three to five b'
s ("abbb", "abbbb", or "abbbbb").
Note that you must always specify the first number of a range (i.e, "{0,2}", not "{,2}"). Also, as you might
have noticed, the symbols '*', '+', and '?' have the same effect as using the bounds "{0,}", "{1,}", and "{0,1}",
respectively.



Now, to quantify a sequence of characters, put them inside parentheses:

"a(bc)*": matches a string that has an a followed by zero or more copies of the sequence "bc";
"a(bc){1,5}": one through five copies of "bc."
There's also the '|' symbol, which works as an OR operator:

"hi|hello": matches a string that has either "hi" or "hello" in it;
"(b|cd)ef": a string that has either "bef" or "cdef";
"(a|b)*c": a string that has a sequence of alternating a'
s and b's ending in a c;
A period ('
.') stands for any single character:

"a.[0-9]": matches a string that has an a followed by one character and a digit;
"^.{3}$": a string with exactly 3 characters.
Bracket expressions specify which characters are allowed in a single position of a string:

"[ab]": matches a string that has either an a or a b (that'
s the same as "a|b");
"[a-d]": a string that has lowercase letters 'a' through 'd' (that's equal to "a|b|c|d" and even "[abcd]");
"^[a-zA-Z]": a string that starts with a letter;
"[0-9]%": a string that has a single digit before a percent sign;
",[a-zA-Z0-9]$": a string that ends in a comma followed by an alphanumeric character.
You can also list which characters you DON'
T want -- just use a '^' as the first symbol in a bracket expression
(i.e., "%[^a-zA-Z]%" matches a string with a character that is not a letter between two percent signs).

In order to be taken literally, you must escape the characters "^.[$()|*+?{" with a backslash ('\'), as
they have special meaning. On top of that, you must escape the backslash character itself in PHP3 strings, so,
for instance, the regular expression "(\$|¥)[0-9]+" would have the function call: ereg("(\\$|¥)[0-9]+", $str)
(what string does that validate?)

Example 1. Examples of valid patterns

    * /<\/\w+>/

    * |(\d{3})-\d+|Sm

    * /^(?i)php[34]/

    * {^\s+(\s+)?$}

Example 2. Examples of invalid patterns

    * /href='
(.*)' - missing ending delimiter

    * /\w+\s*\w+/J - unknown modifier '
J'

    * 1-\d3-\d3-\d4| - missing starting delimiter


Some useful PHP Keywords and their use (php.net man pages)

preg_split

(PHP 3>= 3.0.9, PHP 4 )
preg_split -- Split string by a regular expression
Description
array preg_split ( string pattern, string subject [, int limit [, int flags]])

Returns an array containing substrings of subject split along boundaries matched by pattern.

If limit is specified, then only substrings up to limit are returned, and if limit is -1, it
actually means "no limit", which is useful for specifying the flags.

flags can be any combination of the following flags (combined with bitwise | operator):

PREG_SPLIT_NO_EMPTY
    If this flag is set, only non-empty pieces will be returned by preg_split().

PREG_SPLIT_DELIM_CAPTURE
    If this flag is set, parenthesized expression in the delimiter pattern will be captured and
    returned as well. This flag was added for 4.0.5.

PREG_SPLIT_OFFSET_CAPTURE
    If this flag is set, for every occuring match the appendant string offset will also be
    returned. Note that this changes the return value in an array where every element is an
    array consisting of the matched string at offset 0 and it'
s string offset into subject
    at offset 1. This flag is available since PHP 4.3.0 .

Example 1. preg_split() example : Get the parts of a search string

<?php
// split the phrase by any number of commas or space characters,
// which include " ", \r, \t, \n and \f
$keywords = preg_split ("/[\s,]+/", "hypertext language, programming");
?>

Example 2. Splitting a string into component characters




<?php
$str = 'string';
$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>

Example 3. Splitting a string into matches and their offsets

<?php
$str = 'hypertext language programming';
$chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>

will yield:

Array
(
    [0] => Array
        (
            [0] => hypertext
            [1] => 0
        )

    [1] => Array
        (
            [0] => language
            [1] => 10
        )

    [2] => Array
        (
            [0] => programming
            [1] => 19
        )

)

    Note: Parameter flags was added in PHP 4 Beta 3.

preg_match

(PHP 3>= 3.0.9, PHP 4 )
preg_match -- Perform a regular expression match
Description
int preg_match ( string pattern, string subject [, array matches [, int flags]])

Searches subject for a match to the regular expression given in pattern.

If matches is provided, then it is filled with the results of search. $matches[0] will
contain the text that matched the full pattern, $matches[1] will have the text that matched
 the first captured parenthesized subpattern, and so on.

flags can be the following flag:

PREG_OFFSET_CAPTURE
    If this flag is set, for every occuring match the appendant string offset will also
    be returned. Note that this changes the return value in an array where every element
    is an array consisting of the matched string at offset 0 and it's string offset into
    subject at offset 1. This flag is available since PHP 4.3.0 .

The flags parameter is available since PHP 4.3.0 .

preg_match() returns the number of times pattern matches. That will be either 0 times
    (no match) or 1 time because preg_match() will stop searching after the first match.
preg_match_all() on the contrary will continue until it reaches the end of subject.
preg_match() returns FALSE if an error occured.

    Tip: Do not use preg_match() if you only want to check if one string is contained
    in another string. Use strpos() or strstr() instead as they will be faster.

Example 1. Find the string of text "php"

<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match ("/php/i", "PHP is the web scripting language of choice.")) {
    print "A match was found.";
} else {
    print "A match was not found.";
}
?>

<strong>Example 2.</strong> Find the word "web"

<?php
/* The \b in the pattern indicates a word boundary, so only the distinct
 * word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match ("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
    print "A match was found.";
} else {
    print "A match was not found.";
}

if (preg_match ("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
    print "A match was found.";
} else {
    print "A match was not found.";
}
?>

<strong>Example 3.</strong> Getting the domain name out of a URL

<?php
// get host name from URL
preg_match("/^(http:\/\/)?([^\/]+)/i",
    "http://www.php.net/index.html", $matches);
$host = $matches[2];

// get last two segments of host name
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

This example will produce:

domain name is: php.net




Perl Style Delimiters (as from crazygrrl.com)


When using Perl-style matching, the pattern also has to be enclosed by special delimiters.
The default is the forward slash, though you can use others. For example:

     /colou?r/

Usually you'
ll want to stick with the default, but if you need to use the
forward slash a lot in the actual pattern (especially if you're dealing with
pathnames) you might want to use something else:

     !/root/home/random!

To make a match case-insensitive, all you need to do is append the option
i to the pattern:

     /colou?r/i

Perl-style functions support these extra metacharacters (this is not a full
list):

\b A word boundary, the spot between word (\w) and non-word (\W) characters.
     \B A non-word boundary.
     \d A single digit character.
     \D A single non-digit character.
     \n The newline character. (ASCII 10)
     \r The carriage return character. (ASCII 13)
     \s A single whitespace character.
     \S A single non-whitespace character.
     \t The tab character. (ASCII 9)
     \w A single word character - alphanumeric and underscore.
     \W A single non-word character.

Example:

     /\bhomer\b/

Have a donut, Homer no match
     A tale of homeric proportions! no match
     Do you think he can hit a homer? match

Corresponding to ereg() is preg_match(). Syntax:

     preg_match(pattern (string), target (string), optional_array);

Example:

     $pattern = "/\b(do(ugh)?nut)\b.*\b(Homer|Fred)\b/i";

$target = "Have a donut, Homer.";

if (preg_match($pattern, $target, $matches)) {

 print("<P>Match: $reg[0]</P>");
     print("<P>Pastry: $reg[1]</P>");
     print("<P>Variant: $reg[2]</P>");
     print("<P>Name: $reg[3]</P>");
     }

else {
     print("No match.");
     }

Results:

     Match: donut, Homer

Pastry: donut

Variant: [blank because there was no "ugh"]

Name: Homer

     If you use the $target "Doughnut, Frederick?" there will be no match,
     since there has to be a word boundary after Fred.

but "Doughnut, fred?" will match since we'
ve specified it to be
case-insensitive.
   


Contributed code which is applicable (and very useful!)
mkr at binarywerks dot dk
A (AFAIK) correct implementation of Ipv4 validation, this one supports optional ranges
(CIDR notation) and it validates numbers from 0-255 only in the address part, and 1-32
only after the /

<?

function valid_ipv4($ip_addr)
{
        $num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])";
        $range="([1-9]|1\d|2\d|3[0-2])";

        if(preg_match("/^$num\.$num\.$num\.$num(\/$range)?$/",$ip_addr))
        {
                return 1;
        }

        return 0;
}

$ip_array[] = "127.0.0.1";
$ip_array[] = "127.0.0.256";
$ip_array[] = "127.0.0.1/36";
$ip_array[] = "127.0.0.1/1";

foreach ($ip_array as $ip_addr)
{
        if(valid_ipv4($ip_addr))
        {
                echo "$ip_addr is valid<BR>\n";
        }
        else
        {
                echo "$ip_addr is NOT valid<BR>\n";
        }
}

?>

plenque at hotmail dot com
I wrote a function that checks if a given regular expression is valid. I think some of
you might find it useful. It changes the error_handler and restores it, I didn't find
any other way to do it.

Function IsRegExp ($sREGEXP)
{
    $sPREVIOUSHANDLER = Set_Error_Handler ("TrapError");
    Preg_Match ($sREGEXP, "");
    Restore_Error_Handler ($sPREVIOUSHANDLER);
    Return !TrapError ();
}

Function TrapError ()
{
    Static $iERRORES;

    If (!Func_Num_Args ())
    {
        $iRETORNO = $iERRORES;
        $iERRORES = 0;
        Return $iRETORNO;
    }
    Else
    {
        $iERRORES++;
    }
}


PHP Get_title tag code which uses simple regex and nice php string functions
(As from Zend PHP)

<?php
function get_title_tag($chaine){
    $fp = fopen ($chaine, '
r');
    while (! feof ($fp)){
         $contenu .= fgets ($fp, 1024);
         if (stristr($contenu, '
<\title>' )){
                 break;
                }
         }
    if (eregi("", $contenu, $out)) {
        return $out[1];
        }
    else{
        return false;
        }
    }
?>

My Own '
Visitor Trac' code which uses regex XML parsing methods

<?php
$referer = $_SERVER['
HTTP_REFERER'];
$filename = $_SERVER[REMOTE_ADDR] . '
.txt';
//print_r($_SERVER);
if (file_exists($filename)){
    $lastvisit = filectime($filename);
    $currentdate = date('
U');
    $difference = round(($currentdate - $lastvisit)/84600);
    if ($difference > 7)  {
        unlink($filename);
        $fp = fopen($filename, "a");
    }
    else $fp = fopen($filename, "a");
}
    else $fp = fopen($filename, "a");
if (!$_SERVER['
HTTP_REFERER']) $url_test = 'http://dinki.mine.nu/weblog/';
else $url_test = $_SERVER['HTTP_REFERER'];
$new_title = return_title ($url_test);
//print $new_title;
$new_name = stripslashes("<beg>$new_title\n");
$new_URL = stripslashes("<beg>$referer\n");
fwrite($fp,$new_URL);
fwrite($fp,$new_name);
fclose($fp);

$fp = fopen($filename, "r");
$file = implode('', file ($filename));
$foo = preg_split("/<beg>/",$file);
$number = count($foo);
//print $number;
if ($number > 11) {
    fclose($fp);
    $fp = fopen($filename, "w");
    $count = $number - 10;
    while ($count < $number)  {
        $print1 = $foo[$count];
        $print2 = $foo[$count+1];
        print " <img src = arrow.gif> ";
        print "<a href=$print1>$print2</a>"; //print $count;
        $count += 2;
        $new_name = stripslashes("<beg>$print2");
        $new_URL = stripslashes("<beg>$print1");
        fwrite($fp,$new_URL);
        fwrite($fp,$new_name);
    }
    fclose($fp);
}
//print_r($foo);
else  {
    $count = 1;
    while ($count <= $number)  {
        $print1 = $foo[$count];
        $print2 = $foo[$count+1];
        print " <img src = arrow.gif> ";
        print "<a href=$print1>$print2</a>"; //print $count;
        $count += 2;
        }
    fclose($fp);
    }

function return_title($url)  {
    print $filename." ".$difference;
         $array = file ($url);
          for ($i = 0; $i < count($array); $i++)
          {
            if (preg_match("/<title>(.*)<\/title>/i",$array[$i], $tag_contents))  {
                $title = $tag_contents[1];
                $title = strip_tags($title);
                }
          }
          return $title;
    }

?>

One Reply to “Regular Expressions”

Leave a Reply