Get the number of tokens using a specific parsing character

北慕城南 提交于 2020-01-06 04:55:14

问题


Consider the following toy string:

my first name is Pearly, and my surname is Spencer

Is there an out-of-the-box way in Stata (mata included) to get the number of tokens based on a user-specified parsing character? In this particular example, two tokens separated by a comma.

Solutions like the macro extended function for parsing word count use a space and I would like to avoid writing a program for this.


回答1:


The number of tokens is the number of parsing characters PLUS 1.

That being so, using commas as example parsing characters,

gen ntokens = 1 + strlen(strvar) - strlen(subinstr(strvar, ",", "", .))  

See https://www.stata-journal.com/sjpdf.html?articlenum=dm0056 for a write-up of this simple trick.



来源:https://stackoverflow.com/questions/51062401/get-the-number-of-tokens-using-a-specific-parsing-character

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!