Detect EOL type using PHP

后端 未结 6 817
南方客
南方客 2020-12-05 11:39

Reference: This is a self-answered question. It was meant to share the knowledge, Q&A style.

How do I detect the ty

6条回答
  •  长情又很酷
    2020-12-05 12:26

    /**
     * Detects the end-of-line character of a string.
     * @param string $str The string to check.
     * @param string $default Default EOL (if not detected).
     * @return string The detected EOL, or default one.
     */
    function detectEol($str, $default=''){
        static $eols = array(
            "\0x000D000A", // [UNICODE] CR+LF: CR (U+000D) followed by LF (U+000A)
            "\0x000A",     // [UNICODE] LF: Line Feed, U+000A
            "\0x000B",     // [UNICODE] VT: Vertical Tab, U+000B
            "\0x000C",     // [UNICODE] FF: Form Feed, U+000C
            "\0x000D",     // [UNICODE] CR: Carriage Return, U+000D
            "\0x0085",     // [UNICODE] NEL: Next Line, U+0085
            "\0x2028",     // [UNICODE] LS: Line Separator, U+2028
            "\0x2029",     // [UNICODE] PS: Paragraph Separator, U+2029
            "\0x0D0A",     // [ASCII] CR+LF: Windows, TOPS-10, RT-11, CP/M, MP/M, DOS, Atari TOS, OS/2, Symbian OS, Palm OS
            "\0x0A0D",     // [ASCII] LF+CR: BBC Acorn, RISC OS spooled text output.
            "\0x0A",       // [ASCII] LF: Multics, Unix, Unix-like, BeOS, Amiga, RISC OS
            "\0x0D",       // [ASCII] CR: Commodore 8-bit, BBC Acorn, TRS-80, Apple II, Mac OS <=v9, OS-9
            "\0x1E",       // [ASCII] RS: QNX (pre-POSIX)
            //"\0x76",       // [?????] NEWLINE: ZX80, ZX81 [DEPRECATED]
            "\0x15",       // [EBCDEIC] NEL: OS/390, OS/400
        );
        $cur_cnt = 0;
        $cur_eol = $default;
        foreach($eols as $eol){
            if(($count = substr_count($str, $eol)) > $cur_cnt){
                $cur_cnt = $count;
                $cur_eol = $eol;
            }
        }
        return $cur_eol;
    }
    

    Notes:

    • Needs to check encoding type
    • Needs to somehow know that we may be on an exotic system like ZX8x (since ASCII x76 is a regular letter) @radu raised a good point, in my case, it's not worth the effort to handle ZX8x systems nicely.
    • Should I split the function into two? mb_detect_eol() (multibyte) and detect_eol()

提交回复
热议问题