Count the number of unique words and occurrence of each word from txt file

一个人想着一个人 提交于 2019-12-04 22:10:56

Use this code

  string input = "that I have not that place sunrise beach like not good dirty beach trash beach";
        var wrodList = input.Split(null);
        var output = wrodList.GroupBy(x => x).Select(x => new Word { charchter = x.Key, repeat = x.Count() }).OrderBy(x=>x.repeat);
        foreach (var item in output)
        {
            textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
        }

class for holding data

 public class word
    {
        public string  charchter { get; set; }
        public int repeat { get; set; }
    }

Spliting on whitespace is not enough. You have some words like temple, photos. or cafes/restaraunts. A better approach would be using a regex like \w+. Also the words should be compared in case insensitive way.

My approach would be:

var words = Regex.Matches(File.ReadAllText(filename), @"\w+").Cast<Match>()
            .Select((m, pos) => new { Word = m.Value, Pos = pos })
            .GroupBy(s => s.Word, StringComparer.CurrentCultureIgnoreCase)
            .Select(g => new { Word = g.Key, PosInText = g.Select(z => z.Pos).ToList() })
            .ToList();


foreach(var item in words)
{
    Console.WriteLine("{0,-15} POS:{1}", item.Word, string.Join(",", item.PosInText));
}


for (int i = 0; i < words.Count; i++)
{
    Console.Write("{0}:{1} ", i, words[i].PosInText.Count);
} 
### Sample code for you to tweak for your needs:
touch test.txt
echo "ravi chandran marappan 30" > test.txt                                                                                                                                     
echo "ramesh kumar marappan 24" >> test.txt
echo "ram lakshman marappan 22" >> test.txt
sed -e 's/ /\n/g' test.txt | sort | uniq | awk '{print "echo """,$1,
"""`grep -wc ",$1," test.txt`"}' | sh

Results:                          
22 -1                                                                                                                                                         
24 -1                                                                                                                                                         
30 -1                                                                                                                                                         
chandran -1                                                                                                                                                   
kumar -1                                                                                                                                                      
lakshman -1                                                                                                                                                   
marappan -3                                                                                                                         
ram -1                                                                                                                            
ramesh -1                                                                                                                       
ravi -1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!