How to remove control characters from java string?

前端 未结 7 2174
没有蜡笔的小新
没有蜡笔的小新 2020-12-15 16:41

I have a string coming from UI that may contains control characters, and I want to remove all control characters except carriage returns, line feeds

7条回答
  •  猫巷女王i
    2020-12-15 16:54

    I'm using Selenium to test web screens. I use Hamcrest asserts and matchers to search the page source for different strings based on various conditions.

    String pageSource = browser.getPageSource();
    assertThat("Text not found!", pageSource, containsString(text));
    

    This works just fine using an IE or Firefox driver, but it bombs when using the HtmlUnitDriver. The HtmlUnitDriver formats the page source with tabs, carriage returns, and other control characters. I am using a riff on Nidhish Krishnan's ingenious answer above. If I use Nidish's solution "out of the box," I am left with extra spaces, so I added a private method named filterTextForComparison:

    String pageSource = filterTextForComparison(browser.getPageSource());
    assertThat("Text not found!", pageSource, 
            containsString(filterTextForComparison(text)));
    

    And the function:

    /**
     * Filter out any characters embedded in the text that will interfere with
     * comparing Strings.
     * 
     * @param text
     *            the text to filter.
     * @return the text with any extraneous character removed.
     */
    private String filterTextForComparison(String text) {
    
        String filteredText = text;
    
        if (filteredText != null) {
            filteredText = filteredText.replaceAll("\\p{Cc}", " ").replaceAll("\\s{2,}", " ");
        }
    
        return filteredText;
    }
    

    First, the method replaces the control characters with a space then it replaces multiple spaces with a single one. I tried doing everything at once with "\p{Cc}+?" but it didn't catch "\t " becoming " ".

提交回复
热议问题