chillibear.com

mod_rewrite

Some tests run against Apache 2.2.?? with a simple bit of mod_rewrite in a .htaccess file.

RewriteEngine on

# Test rewrite
RewriteRule ^(.+)  junk?_START_&q1=$1&_END_  [NC,QSA]

# Final rewrite
RewriteCond %{IS_SUBREQ}   !true                             [NC]
RewriteCond %{SERVER_NAME} (.+)                              [NC]
RewriteRule (.*)  http://%1/apps/mod_rewrite?_R=Final&URI=$1 [NC,L,QSA,NE,PROXY]

Where we can pop some tests through. The mod_rewrite file is a PHP script which dumps out some data for debug purposes.

Whitespace

Putting the following URI fragments into the rewrites:

  • foo bar (encoded %20 space)
  • foo bar (encoded + space)
  • foo bar (physical whitespace)
  • foo bar (tab)

Against the following rule:

RewriteRule ^(.+)  junk?_START_&param=$1&_END_  [NC,QSA]

yields:

/apps/mod_rewrite?_R=Final&URI=junk&_START_&q1=foo

So we can see the any character dot has fallen over at the space, this is repeated with the physical whitespace and the tab, we even loose the last querystring parameter. The plus character is however passed through thus:

/apps/mod_rewrite?_R=Final&URI=junk&_START_&q1=foo+bar&_END_

Most other encoded values pass through okay, an encoded ampersand later confuses PHP and it splits the value into a new variable (as was to be expected).

Testing against the rule with a B flag to stop expansion of URI encoded vars yields:

/apps/mod_rewrite?_R=Final&URI=junk&_START_&q1=foo%20bar&_END_

So we can see the space has been passed though as a literal, to be expected when the rule sees it this time it sees it as literally a percent followed by a two followed by a zero. Physical whitespace is encoded to a %20 and passed through. The ampersand is decoded in the output and ends up once again as a separator. Adding an NE flag along with the B doesn’t seem to alter the output.

Moving to a different rewrite rule where we try to catch our elusive whitespace:

RewriteRule ^(.+)(\s+)(.+)  ojunk?_START_&q1=$1&q2=$2&q3=$3&_END_  [NC,QSA]

Matching against our %20 encoded whitespace again fails to properly capture the space and we end up with a result like this:

/apps/mod_rewrite?_R=Final&URI=junk&_START_&q1=foo&q2=

The foo+bar obviously doesn’t match the pattern and misses the rule. Other whitespace characters fail as we would expect. Adding a B flag causes the pattern to be properly processed and we end up with a result like this:

/apps/mod_rewrite?_R=Final&URI=junk&_START_&q1=foo&q2=%20&q3=bar&_END_

So we can see the %20 has correctly been matched against the \s whitespace character class.

Cascading

Now we have some idea of how the whitespace behaves we should see how it behaves when part of a cascaded query string. So if we construct some rules like this:

RewriteRule ^(.+)(\s+)(.+)  junk?_START_&q1=$1&q2=$2&q3=$3&_END_  [NC,QSA,B,NE]

RewriteCond %{QUERY_STRING} q2=(.)   [NC]
RewriteRule ^junk           junk?_S_&e1=%1&_E_  [NC,QSA]

And pass through our foo%20bar. We find the e1 parameter has captured a % character, so we know our space character has ended up URI encoded in the query string.

Written on 10 Feb 2013 and categorised in Apache and NIX, tagged as pcre, regexp, and uri

Home, Post archive

site copyright Eric Freeman

Valid XHTML 1.0 Strict