Use a HTML parser if at all possible; there are many available for Java.
Or you can use regex like many people do. This is generally not advisable, however, unless you're doing very simplistic processing.
Related questions
- Java HTML Parsing
- Which Html Parser is best?
- Any good Java HTML parsers?
- recommendations for a java HTML parser/editor
- What HTML parsing libraries do you recommend in Java
Text extraction:
- Text Extraction from HTML Java
- Text extraction with java html parsers
Tag stripping:
- Stripping HTML tags in Java
- How to strip HTML attributes except “src” and “alt” in JAVA
- Removing HTML from a Java String