Combining Lists of Word Frequency Data

早过忘川 提交于 2019-12-05 00:58:10

It sounds like you need GatherBy.

Suppose your two lists are named data1 and data2, then use

{#[[1, 1]], Total[#[[All, 2]]]} & /@ GatherBy[Join[data1, data2], First]

This easily generalizes to any number of lists, not just two.

Try using a hash table, like this. First set things up:

ClearAll[freq];
freq[_] = 0;

Now eg freq["safas"] returns 0. Next, if the lists are defined as

lst1 = {{"the", 42216}, {"of", 24903}, {"and", 18624}, {"n", 
    16850}, {"in", 16164}, {"de", 14930}, {"a", 14660}, {"to", 
    14175}, {"la", 7347}, {"was", 6030}, {"l", 5981}, {"le", 
    5735}, {"abattoir", 1}, {"abattement", 1}, {"abattagen", 
    1}, {"abattage", 1}, {"abated", 1}, {"abandonn", 1}, {"abaiss", 
    1}, {"aback", 1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah", 
    1}, {"aaa", 1}};
lst2 = {{"the", 30419}, {"n", 20414}, {"de", 19956}, {"of", 
    16262}, {"and", 14488}, {"to", 12726}, {"a", 12635}, {"in", 
    11141}, {"la", 10739}, {"et", 9016}, {"les", 8675}, {"le", 
    7748}, {"abattement", 1}, {"abattagen", 1}, {"abattage", 
    1}, {"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 
    1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}};

you may run this

Scan[(freq[#[[1]]] += #[[2]]) &, lst1]

after which eg

freq["the"]
(*
42216
*)

and then the next list

Scan[(freq[#[[1]]] += #[[2]]) &, lst2]

after which eg

freq["the"]
72635

while still

freq["safas"]
(*
0
*)

Here is a direct Sow/Reap function:

Reap[#2~Sow~# & @@@ data1~Join~data2;, _, {#, Tr@#2} &][[2]]

Here is a concise form of acl's method:

Module[{c},
  c[_] = 0;

  c[#] += #2 & @@@ data1~Join~data2;

  {#[[1, 1]], #2} & @@@ Most@DownValues@c
]

This appears to be a bit faster than Szabolcs code on my system:

data1 ~Join~ data2 ~GatherBy~ First /.
  {{{x_, a_}, {x_, b_}} :> {x, a + b}, {x : {_, _}} :> x}
rcollyer

There's an old saying, "if all you have is a hammer, everything becomes a nail." So, here's my hammer: SelectEquivalents.

This can be done a little quicker using SelectEquivalents:

SelectEquivalents[data1~Join~data2, #[[1]]&, #[[2]]&, {#1, Total[#2]}&]

In order, the first param is obviously just the joined lists, the second one is what they're grouped by (in this case the first element), the third param strips off the string leaving just the count, and the fourth param puts it back together with the string as #1 and the counts in a list as #2.

Try ReplaceRepeated.

Join the lists. Then use

//. {{f1___, {a_, c1_}, f2___, {a_, c2_}, f3___} -> {f1, f2, f3, {a, c1 + c2}}}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!