I need to convert fields like this:
{
\"_id\" : ObjectId(\"576fd6e87d33ed2f37a6d526\"),
\"phoneme\" : \"JH OY1 N Z\"
}
into an a
Suppose that the documents in our collection look like this:
{ "phoneme" : "JH OY1 N Z" }
{ "phoneme" : "foobar" }
In version 3.4+, we can use $split operator to divide the field value into an array of substrings.
To split a string into an array of characters, we need to apply a $substrCP expression to the array of all chars in the string index using the $map operator.
To get the array of index value is all integers from 0 to the string's length minus one which can generate using the $range and the $strLenCP operators.
We use the $addFields pipeline stage to add the new fields to the initial document, but for this to be persistent, we can either create a view or overwrite our collection using the $out aggregation pipeline operator.
[
{
"$addFields":{
"arrayOfPhonemeChar":{
"$map":{
"input":{
"$range":[
0,
{
"$strLenCP":"$phoneme"
}
]
},
"in":{
"$substrCP":[
"$phoneme",
"$$this",
1
]
}
}
},
"phonemeSubstrArray":{
"$split":[
"$phoneme",
" "
]
}
}
}
]
yields something that look like this:
{
"phoneme" : "JH OY1 N Z",
"arrayOfPhonemeChar" : ["J", "H", " ", "O", "Y", "1", " ", "N", " ", "Z"],
"phonemeSubstrArray" : ["JH", "OY1", "N", "Z"]
},
{
"phoneme" : "foobar",
"arrayOfPhonemeChar" : ["f", "o", "o", "b", "a", "r"],
"phonemeSubstrArray" : ["foobar"]
}