News:

GinGly.com - Used by 85,000 Members - SMS Backed up 7,35,000 - Contacts Stored  28,850 !!

Main Menu

A Few Tips for Speeding Up PHP Code - Pattern Matching Metrics

Started by ganeshbala, Apr 19, 2008, 08:15 PM

Previous topic - Next topic

ganeshbala

Pattern Matching Metrics

A Few Tips for Speeding Up PHP Code - Pattern Matching Metrics

This is a hard one to test accurately because running a single instance of a pattern match takes negligible time that's difficult to separate from network latency. For example, in testing four different pattern matching functions, I found that none consistently took more than about 40 milliseconds to execute a single pattern match, even within a string of 20,000 or 30,000 characters. It's hard to chalk that up to sluggishness of a given function rather than to network or server load issues. The following table shows the functions I tested and the time they took to process:

<table border="1" cellpadding="3" cellspacing="0">
<tr>
  <td align="left" valign="top">ereg()</td>
  <td align="left" valign="top">preg_match()</td>
  <td align="left" valign="top">strstr()</td>
  <td align="left" valign="top">strpos()</td>
</tr>
<tr>
  <td align="left" valign="top">35.1 ms</td>
  <td align="left" valign="top">35.31 ms</td>
  <td align="left" valign="top">35.58 ms</td>
  <td align="left" valign="top">32.79 ms</td>
</tr>
</table>

They're all pretty close. But when I do a lot of pattern matches within a script, rather than just the one instance, the function I use does seem to make a little bit of a difference. Consider the following test code:

if(!$_GET["type"]){ print "Must specify type ereg, preg_match, or strstr."; exit; }

//Generate a long string to perform matches against.
for($j=0; $j<5000; $j++){
  $x .= "test " . $j . " -- ";
}

for($i=0; $i<10000; $i++){
  //For 10,000 iterations, run the specified function,
  //searching for the string "st 10".
  switch ($_GET["type"]){
   case "ereg":
    $y=ereg("st 10",$x);
    break;
   case "eregi":
    $y=eregi("st 10",$x);
    break;
   case "preg_match":
    $y=preg_match("/st 10/",$x);
    break;
   case "strstr":
    $y=strstr($x,"st 10");
    break;
   case "strpos":
    $y=strpos($x,"st 10");
    break;
   default:
    print "Must specify type ereg, preg_match, or strstr."; exit;
  }
}

The results of this test are as follows:

<table border="1" cellpadding="3" cellspacing="0">
<tr>
  <td align="left" valign="top">ereg()</td>
  <td align="left" valign="top">preg_match()</td>
  <td align="left" valign="top">strstr()</td>
  <td align="left" valign="top">strpos()</td>
</tr>
<tr>
  <td align="left" valign="top">1442.76 ms</td>
  <td align="left" valign="top">252.22 ms</td>
  <td align="left" valign="top">542.96 ms</td>
  <td align="left" valign="top">297.43 ms</td>
</tr>
</table>

This really didn't accord at all with my expectations. I had expected preg_match() to run the slowest because it has to load and run perl's regular expression engine. And I had read that it was slower than eregi, which I expected to be the fastest of the functions. I ran these tests a number of times to verify that the time discrepancy wasn't merely a fluke, and the results were similar each time. So it appears, oddly, that if you're doing a quick pattern match, it may not matter which of these functions you use, but if you're doing many searches, preg_match or testing for a true return from strpos() might be your best bet. This is particularly heartening in that preg_match is more versatile, allowing for much more complicated matches than any of the other functions.

I also tested ereg_replace() and preg_replace() for speed. Again, contrary to my original expectations, preg_replace() proved faster in both the single-replacement test and in 100 iterations of a block of test code very similar to that given above.

<table border="1" cellpadding="3" cellspacing="0">
<tr>
  <td align="left" valign="top"> </td>
  <td align="left" valign="top">ereg_replace()</td>
  <td align="left" valign="top">preg_replace()</td>
</tr>
<tr>
  <td align="left" valign="top">One Run</td>
  <td align="left" valign="top">50.16 ms</td>
  <td align="left" valign="top">37.42 ms</td>
</tr>
<tr>
  <td align="left" valign="top">100 Iterations</td>
  <td align="left" valign="top">1383.75 ms</td>
  <td align="left" valign="top">159.75 ms</td>
</tr>
</table>

I can't help thinking I've overlooked some condition in my testing, as I've always read that the preg functions were slower than the ereg functions. Based on these metrics, however, I've concluded that the preg functions may be the best to use, given their flexibility and apparent superior speed.