问题
I have a large list of phrases such as
"Nola jumped off the cliff"
"Loroy jumped off the cliff"
"Nola jumped off the couch"
"Leroy lept off the couch"
I need to find each point in a phrase that is a different word and add that word to a node, which is a list of words that can be used in that position in a phrase. So we would end up with.
"Node1(1) Node2(1) off the Node3(1)"
"Node1(2) Node2(1) off the Node3(1)"
...etc
Where node 1 represents a list of the names(Nola,Leroy), node2 represents a list of the actions(jumped,lept) and node3 ends up representing the list of locations(cliff,couch)
The idea is to take a list of the phrases, and have it automatically create the nodes and fill it with the words that can be used at that node in a phrase.
So, 1st how would I generate the list of phrase nodes? I haven't been able to figure out how to compare two sentences and see if they are exactly alike minus one word.
2nd once I have the nodes set up, what would be the best way to compare all the combinations of the nodes to come up with new matches? (hope that made sense)
回答1:
Nice one, I like it. Since you tagged your question with C#, I wrote the answer also in C#.
A fast way to get the different words between two phrases:
string phrase1 = "Nola jumped off the cliff";
string phrase2 = "Juri jumped off the coach";
//Split phrases into word arrays
var phrase1Words = phrase1.Split(' ');
var phrase2Words = phrase2.Split(' ');
//Find the intersection of the two arrays (find the matching words)
var wordsInPhrase1and2 = phrase1Words.Intersect(phrase2Words);
//The number of words that differ
int wordDelta = phrase1Words.Count() - wordsInPhrase1and2.Count();
//Find the differing words
var wordsOnlyInPhrase1 = phrase1Words.Except(wordsInPhrase1and2);
var wordsOnlyInPhrase2 = phrase2Words.Except(wordsInPhrase1and2);
Instead of matching the elements yourself by looping over and checking each element, you can save yourself time and use the built-in LINQ functions Intersect, Except, etc...
For creating phrases by random, please refer to the answer of NominSim.
回答2:
Yet another Linq-based solution that generates all possible combinations:
var phrases = new List<string> {
"Nola jumped off the cliff",
"Loroy jumped off the cliff",
"Nola jumped off the couch",
"Leroy lept off the couch"
};
var sets = (from p in phrases
from indexedWord in p.Split(' ').Select((word,idx) => new {idx,word})
group indexedWord by indexedWord.idx into g
select g.Select(e => e.word).Distinct()).ToArray();
var allCombos = from w1 in sets[0]
from w2 in sets[1]
from w3 in sets[2]
from w4 in sets[3]
from w5 in sets[4]
select String.Format("{0} {1} {2} {3} {4}.", w1, w2, w3, w4, w5);
Doesn't make for the most readable code, but was fun writing. =)
回答3:
First to generate the list something like this should work:
HashSet<String>[] NodeList = new HashSet<String>[phraseLength];
for (int i = 0; i < phraseLength; i++)
{
NodeList[i] = new HashSet<string>();
}
foreach (String phrase in PhraseList)
{
string[] phraseStrings = phrase.Split(' ');
for (int i = 0; i < phraseLength; i++)
{
if(!NodeList[i].Contains(phraseStrings[i]))
{
NodeList[i].Add(phraseStrings[i]);
}
}
}
Then when you create your sentences you can simply traverse the NodeList and pick a String from each node, if you wanted to do it randomly maybe something like this:
String sentence = "";
foreach (HashSet<String> Node in NodeList)
{
Random rand = new Random();
sentence += Node.ToArray()[rand.Next(0, Node.Count)];
}
Should note that a HashSet probably isn't the best idea if you need to access it randomly.
来源:https://stackoverflow.com/questions/9523228/find-different-word-between-two-strings