I have a website in php that does include() to embed the content into a template. The page to load is given in a get parameter, I add \".php\" to the end of the parameter an
Define an explicit list of pages you have in your source code and then use it to check the input. Yes, it's more work, but it makes it very clear what is allowed and what is not. For example:
$AVAILABLE_PAGES = array('home', 'news', ...);
$AVAILABLE_PAGES = array_fill_keys($AVAILABLE_PAGES, 1);
$page = $_GET['page'];
if (!$AVAILABLE_PAGES[$page]) {
header("HTTP/1.0 404 Not Found");
die('Page not found.');
}
include "pages/$page.php";
Don't "sanitize" - Attacks are specific to the use of data, not the source. Escape values as you output them instead. See also my answer to What’s the best method for sanitizing user input with PHP?
$page = preg_replace('/[^-a-zA-Z0-9_]/', '', $_GET['page']);
Is probably the quickest way to sanitize this, this will take anything and make sure that it only contains letters, numbers, underscores or dashes.