Stata Nested foreach loop substring comparison

无人久伴 提交于 2019-12-25 01:41:41

问题


I have just started learning Stata and I'm having a hard time. My problem is this: I have two different variables, ATC and A, where A is potentially a substring of ATC. Now I want to mark all the observations in which A is a substring of ATC with OK = 1.

I tried this using a simple nested loop:

foreach x in ATC {
foreach j in A {
        replace OK = 1 if strpos(`x',`j')!=0
    }
}

However, whenever I run this loop no changes are being made even though there should be plenty. I feel like I should probably give an index specifying which OK is being changed (the one belonging to the ATC/x), but I have no idea how to do this. This is probably really simple but I've been struggling with it for some time.


I should have clarified: my A list is separate from the main list (simply appended to it) and only contains unique keys which I use to identify the ATCs which I want. So I have ~120 A-keys and a couple million ATC keys. What I wanted to do was iterate over every ATC key for every single A-key and mark those ATC-keys with A that qualify.

That means I don't have complete tuples of (ATC,A,OK) but instead separate lists of different sizes. For example: I have

ATC    OK  A 
ABCD   0   .
EFGH   0   .
...   ...  ...
.     .    AB
.     .    ET

and want the result that "ABCD" having OK is marked as 1 while "EFGH" remains at 0.


回答1:


We can separate your question into two parts. Your title implies a problem with loops, but your loops are just equivalent to

  replace OK = 1 if strpos(ATC, A)!=0

so the use of looping appears irrelevant. That leaves the substring comparison.

Let's supply an example:

. set obs 3 
obs was 0, now 3

. gen OK = 0 

. gen A = cond(_n == 1, "42", "something else")  

. gen ATC = "answer is 42"

. replace OK = 1 if strpos(ATC, A) != 0 
(1 real change made)

. list 

     +------------------------------------+
    | OK                A            ATC |
    |------------------------------------|
 1. |  1               42   answer is 42 |
 2. |  0   something else   answer is 42 |
 3. |  0   something else   answer is 42 |
    +------------------------------------+

So it works fine; and you really need to give a reproducible example if you think you have something different.

As for specifying where the variable should be changed: your code does precisely that, as again the example above shows.


The update makes the problem clear. Stata will only look in the same observation for a matching substring when you specify the syntax you gave. A variable in Stata is a field in a dataset. To cycle over a set of values, something like this should suffice

 gen byte OK = 0 
 levelsof A, local(Avals) 

 quietly foreach A of local Avals { 
     replace OK = 1 if strpos(ATC, `"`A'"') > 0 
 } 

Notes:

  1. Specifying byte cuts down storage.

  2. You may need an if or in restriction on levelsof.

  3. quietly cuts out messages about changed values. When debugging, it is often better left out.

  4. > 0 could be omitted as a positive result from strpos() is automatically treated as true in logical comparisons. See this FAQ.



来源:https://stackoverflow.com/questions/27337523/stata-nested-foreach-loop-substring-comparison

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!