I need to extract the sequence of equal chars in a text.
For example:
The string \"aaaBbbcccccccDaBBBzcc11211\" should be converted to a list of strings
Here a completely generic implementation:
let group xs =
let folder x = function
| [] -> [[x]]
| (h::t)::ta when h = x -> (x::h::t)::ta
| acc -> [x]::acc
Seq.foldBack folder xs []
This function has the type seq<'a> -> 'a list list when 'a : equality, so works not only on strings, but on any (finite) sequence of elements, as long as the element type supports equality comparison.
Used with the input string in the OP, the return value isn't quite in the expected shape:
> group "aaaBbbcccccccDaBBBzcc11211";;
val it : char list list =
[['a'; 'a'; 'a']; ['B']; ['b'; 'b']; ['c'; 'c'; 'c'; 'c'; 'c'; 'c'; 'c'];
['D']; ['a']; ['B'; 'B'; 'B']; ['z']; ['c'; 'c']; ['1'; '1']; ['2'];
['1'; '1']]
Instead of a string list, the return value is a char list list. You can easily convert it to a list of strings using a map:
> group "aaaBbbcccccccDaBBBzcc11211" |> List.map (List.toArray >> System.String);;
val it : System.String list =
["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
This takes advantage of the String constructor overload that takes a char[] as input.
As initially stated, this implementation is generic, so can also be used with other types of lists; e.g. integers:
> group [1;1;2;2;2;3;4;4;3;3;3;0];;
val it : int list list = [[1; 1]; [2; 2; 2]; [3]; [4; 4]; [3; 3; 3]; [0]]