Vim, word frequency function and French accents

筅森魡賤 提交于 2019-12-14 01:50:24

问题


I have recently discovered the Vim Tip n° 1531 (Word frequency statistics for a file).

As suggested I put the following code in my .vimrc

function! WordFrequency() range
  let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
  let frequencies = {}
  for word in all
    let frequencies[word] = get(frequencies, word, 0) + 1
  endfor
  new
  setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
  for [key,value] in items(frequencies)
    call append('$', key."\t".value)
  endfor
  sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()

It works fine except for accents and other french specifics (latin small ligature a or o, etc…).

What am I supposed to add in this function to make it suit my needs ?

Thanks in advance


回答1:


For 8-bit characters you can try to change the split pattern from \A\+ to [^[:alpha:]]\+.




回答2:


The pattern \A\+ matches any number of consecutive non-alphabetic characters which — unfortunately — includes multibytes characters like our beloved çàéô and friends.

That means that your text is split at spaces AND at multibyte characters.

With \A\+, the phrase

Rendez-vous après l'apéritif.

gives:

ap      1
apr     1
l       1
Rendez  1
ritif   1
s       1
vous    1

If you are sure your text doesn't include fancy spaces you could replace this pattern with \s\+ that matches whitespace only but it's probably to liberal.

With this pattern, \s\+, the same phrase gives:

après       1
l'apéritif. 1
Rendez-vous 1

which, I think, is closer to what you want.

Some customizing may be necessary to exclude punctuations.




回答3:


function! WordFrequency() range
  " Whitespace and all punctuation characters except dash and single quote
  let wordSeparators = '[[:blank:],.;:!?%#*+^@&/~_|=<>\[\](){}]\+'
  let all = split(join(getline(a:firstline, a:lastline)), wordSeparators)
  "...
endfunction

If all punctuation characters should be word separators, the expression shortens to

let wordSeparators = '[[:blank:][:punct:]]\+'


来源:https://stackoverflow.com/questions/7525839/vim-word-frequency-function-and-french-accents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!