Despite using PHP for years, I\'ve never really learnt how to use expressions to truncate strings properly... which is now biting me in the backside!
Can anyone prov
Only use strip_tags()
, that would get rid of the tags and left only the desired text between them
You could use substring in combination with stringpos, eventhough this is not a very nice approach.
Check: PHP Manual - String functions
Another way would be to write a regular expression to match your criteria. But in order to get your problem solved quickly the string functions will do...
EDIT: I underestimated the audience. ;) Go ahead with the regexes... ^^
$str = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $str)
What about something like this, considering you might want to re-use it with other href
s :
$str = '<a href="link.html">text</a>';
$result = preg_replace('#(<a[^>]*>).*?(</a>)#', '$1$2', $str);
var_dump($result);
Which will get you :
string '<a href="link.html"></a>' (length=24)
(I'm considering you made a typo in the OP ? )
If you don't need to match any other href, you could use something like :
$str = '<a href="link.html">text</a>';
$result = preg_replace('#(<a href="link.html">).*?(</a>)#', '$1$2', $str);
var_dump($result);
Which will also get you :
string '<a href="link.html"></a>' (length=24)
As a sidenote : for more complex HTML, don't try to use regular expressions : they work fine for this kind of simple situation, but for a real-life HTML portion, they don't really help, in general : HTML is not quite "regular" "enough" to be parsed by regexes.
Using SimpleHTMLDom:
<?php
// example of how to modify anchor innerText
include('simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.example.com/');
//set innerText to null for each anchor
foreach($html->find('a') as $e) {
$e->innerText = null;
}
// dump contents
echo $html;
?>
You don't need to capture the tags themselves. Just target the text between the tags and replace it with an empty string. Super simple.
Demo of both techniques
Code:
$string = '<a href="link.html">text</a>';
echo preg_replace('/<a[^>]*>\K[^<]*/', '', $string);
// the opening tag--^^^^^^^^ ^^^^^-match everything before the end tag
// ^^-restart fullstring match
Output:
<a href="link.html"></a>
Or in fringe cases when the link text contains a <
, use this: ~<a[^>]*>\K.*?(?=</a>)~
This avoids the expense of capture groups using a lazy quantifier, the fullstring restarting \K
and a "lookahead".
Older & wiser:
If you are parsing valid html, you should use a dom parser for stability/accuracy. Regex is DOM-ignorant, so if there is a tag attribute value containing a >
, my snippet will fail.
As a narrowly suited domdocument solution to offer some context:
$dom = new DOMDocument;
$dom->loadHTML($string, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); // 2nd params to remove DOCTYPE);
$dom->getElementsByTagName('a')[0]->nodeValue = '';
echo $dom->saveHTML();