Here’s a regular expression to match URLs based on the RFC 3986. This example uses PHP.
[cc lang=”php”]
$scheme = “(https?)://”;
$userinfo = ‘([“+a-z0-9-._~+”]+(:[“+a-z0-9-._~+”]+)?@)?’;
$host = ‘([([0-9a-f]{1,4}|:)(:[0-9a-f]{0,4}){1,7}((d{1,3}.){3}d{1,3})?]|[“+a-z0-9-+”]+(.[“+a-z0-9-+”]+)*)’;
$port = “(:d{1,5})?”;
$path = ‘(/(([“+a-z0-9-._~!$&’ . “‘()*+,;=:@+” . ‘]|”+%[0-9a-f]{2}+”)+)?)*’;
$query = ‘(?(([“+a-z0-9-._~!$&’ . “‘()*+,;=:@+” . ‘”/”+”?”+”]|”+%[0-9a-f]{2}+”)+)?)?’;
$fragment = ‘(#(([“+a-z0-9-._~!$&’ . “‘()*+,;=:@+” . ‘”/”+”?”+”]|”+%[0-9a-f]{2}+”)+)?)?’;
[/cc]
If a fully-qualified URL is required, you can build the regex like this
[cc lang=”php”]
$regex= “#^” . $scheme . $userinfo . $host . $port . $path . $query . $fragment . “$#”;[/cc]
If a relative URL is required, you can build the regex like this
[cc lang=”php”]
$regex = “#^” . “(” . $scheme . “)?” . $userinfo . “(” . $host . “)?” . $port . “(..)?” . $path . $query . $fragment . “$#”;[/cc]
You can then test for a match like this
[cc lang=”php”]
if (preg_match($regex, strtolower($subject) == 0) …[/cc]
If you’re matching the referer field as specified in HTTP1.1 field definitions, then a separate regex is needed since the http referer allows relative URIs and may not include fragments.