问题
currently i trying to create an application to do some text processing to read in a text file, then I use a dictionary to create index of words, technically it will be like this .. program will be run and reading a text file then checking it, to see if the word is already in that file or not and what the id word for it as a unique word . If so, it will print out the index number and total of appearance for each word they meet and continue to check for entire file. and produce something like this: http://pastebin.com/CjtcYchF
Here is an example of the text file I'm inputting: http://pastebin.com/ZRVbhWhV A quick ctrl-F shows that "not" occurs 2 times and "that" occurs 4 times. What I need to do is to index each word and call it in like this:
sample input : "that I have not that place sunrise beach like not good dirty beach trash beach"
dictionary : output.txt / output.dat:
index word
1 I 4:2 1:1 2:1 3:2 5:1 6:1 7:3 8:1 9:1 10:1 11:1
2 have
3 not
4 that
5 place
6 sunrise
7 beach
8 like
9 good
10 dirty
11 trash
I've tried to implement some code to create the dictionary. Here is what I have so far:
private void bagofword_Click(object sender, EventArgs e)
{
//creating dictionary in background
//Dictionary<string, int> dict = new Dictionary<string, int>();
string rawinputbow = File.ReadAllText(textBox31.Text);
//string[] inputbow = rawinputbow.Split(' ');
var inputbow = rawinputbow.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.ToList();
var dict = new OrderedDictionary();
var output = new List<int>();
foreach (var element in inputbow.Select((word, index) => new { word, index }))
{
if (dict.Contains(element.word))
{
var count = (int)dict[element.word];
dict[element.word] = ++count;
output.Add(GetIndex(dict, element.word));
//textBoxfile.Text = output.ToString();
// textBoxfile.Text = inputbow.ToString();
string result = string.Join(",", output);
textBoxfile.Text = result.ToString();
}
else
{
dict[element.word] = 1;
output.Add(GetIndex(dict, element.word));
//textBoxfile.Text = dict.ToString();
string result = string.Join(",", output);
textBoxfile.Text = result.ToString();
}
}
}
public int GetIndex(OrderedDictionary dictionary, string key)
{
for (int index = 0; index < dictionary.Count; index++)
{
if (dictionary[index] == dictionary[key])
return index; // We found the item
//textBoxfile.Text = index.ToString();
}
return -1;
}
Does anyone know how to complete that code? Any help is much appreciated!
回答1:
Use this code
string input = "that I have not that place sunrise beach like not good dirty beach trash beach";
var wrodList = input.Split(null);
var output = wrodList.GroupBy(x => x).Select(x => new Word { charchter = x.Key, repeat = x.Count() }).OrderBy(x=>x.repeat);
foreach (var item in output)
{
textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}
class for holding data
public class word
{
public string charchter { get; set; }
public int repeat { get; set; }
}
回答2:
Spliting on whitespace is not enough. You have some words like temple,
photos.
or cafes/restaraunts
. A better approach would be using a regex like \w+
. Also the words should be compared in case insensitive way.
My approach would be:
var words = Regex.Matches(File.ReadAllText(filename), @"\w+").Cast<Match>()
.Select((m, pos) => new { Word = m.Value, Pos = pos })
.GroupBy(s => s.Word, StringComparer.CurrentCultureIgnoreCase)
.Select(g => new { Word = g.Key, PosInText = g.Select(z => z.Pos).ToList() })
.ToList();
foreach(var item in words)
{
Console.WriteLine("{0,-15} POS:{1}", item.Word, string.Join(",", item.PosInText));
}
for (int i = 0; i < words.Count; i++)
{
Console.Write("{0}:{1} ", i, words[i].PosInText.Count);
}
回答3:
### Sample code for you to tweak for your needs:
touch test.txt
echo "ravi chandran marappan 30" > test.txt
echo "ramesh kumar marappan 24" >> test.txt
echo "ram lakshman marappan 22" >> test.txt
sed -e 's/ /\n/g' test.txt | sort | uniq | awk '{print "echo """,$1,
"""`grep -wc ",$1," test.txt`"}' | sh
Results:
22 -1
24 -1
30 -1
chandran -1
kumar -1
lakshman -1
marappan -3
ram -1
ramesh -1
ravi -1
来源:https://stackoverflow.com/questions/32362427/count-the-number-of-unique-words-and-occurrence-of-each-word-from-txt-file