First part of question: p tag
I have a string that contains text with unnecessary line breaks caused by p tags, example:
hi ev
try using str_replace
$content = str_replace(array("<p> </p>\n", " <br />\n"), array('', ''), $content);
To use regex:
$content = preg_replace('/((<p\s*\/?>\s*) (<\/p\s*\/?>\s*))+/im', "<p> </p>\n", $content);
and for BRs
$content = preg_replace('/( (<br\s*\/?>\s*)|(<br\s*\/?>\s*))+/im', "<br />\n", $content);
EDIT Heres why your regex works (hopefully so you can understand it a bit :) ):
/((\\n\s*))+/im
^ ^^^ ^^ ^^^^
| \|/ || ||\|
| | || || -- Flags
| | || |-- Regex End Character
| | || -- One or more of the preceeding character(s)
| | |-- Zero or More of the preceeding character(s)
| | -- String Character
| -- Newline Character (Escaped)
-- Regex Start Character
Every regex expression must start and end with the same character. In this case, i've used the forward slash character.
The ( character indicates an expression block (to replace)
The Newline character is \n. Because the backslash is used as the escape character in regex, you will need to escape it: \\n.
The string character is \s. This will search for a string. The * character means to search for 0 or more of the preceeding expression, in this case, search for zero or more strings: \s*.
The + symbols searches for ONE or more of the preceeding expresssion. In this case, the preceeding expression is (\\n\s*), so as long as that expression is found once or more, the preg_replace function will find something.
The flags I've used i and m means case *I*nsensitive, (not really needed for a newline expression), and *M*ultiline - meaning the expression can go over multiple lines of code, instead of the code needing to be on one line.