How to parse inside HTML tags with REBOL?

谁说我不能喝 提交于 2019-12-04 12:32:48
HostileFork

When mess is processed with LOAD/MARKUP you get this (and I've formatted + commented with the types):

[
    ; string!
    "^/" 

    ; tag! string! tag!
    <td> "Bob Sockaway" </td>

    ; string!
    "^/"

    ; tag! tag!
    ;     string!
    ; tag! tag!
    <td> <a href=mailto:bsockaway@example.com>
        "bsockaway@example.com"
    </a> </td>

    ; (Note: you didn't put the anchor's href in quotes above...)

    ; string!
    "^/"

    ; tag! string! tag!
    <td> "9999" </td> 

    ; string!
    "^/"
]

Your output pattern matches series of the form [<td> string! </td>] but not things of the form [<td> tag! string! tag! </td>]. Sidestepping the question posed in your title, you could solve this particular dilemma several ways. One might be to maintain a count of whether you are inside a TD tag and print any strings when the count is non-zero:

rules: [
    (td-count: 0)
    some [
        ; if we see an open TD tag, increment a counter
        <td> (++ td-count)
        |
        ; if we see a close TD tag, decrement a counter
        </td> (-- td-count)
        |
        ; capture parse position in s if we find a string
        ; and if counter is > 0 then print the first element at
        ; the parse position (e.g. the string we just found) 
        s: string! (if td-count > 0 [print s/1])
        |
        ; if we find any non-TD tags, match them so the
        ; parser will continue along but don't run any code
        tag!
    ]
]

This produces the output you asked for:

Bob Sockaway
bsockaway@example.com
9999

But you also wanted to know, essentially, whether you can transition into string parsing from block parsing in the same set of rules (without jumping into open code). I looked into it "mixed parsing" looks like it may be a feature addressed in Rebol 3. Still, I couldn't get it to work in practice. So I asked a question of my own.

How to mix together string parsing and block parsing in the same rule?

I think I found a pretty good solution. It may have to be generalized if you had lots of different tags whose attributes you need.

I was looking for the id attribute of the query tag!:

<query id="5">

In the parse rule for tag!, I did this:

  | set t tag! (
    p: make block! t 
    if p/1 = 'query [_qid: to-integer p/3]
  )

More tags to look at, I'd use case. And maybe this would be better to set _qid

to-integer select p 'id=

I ended up needing to parse another tag and this is a nice general pattern

switch p/1 [
  field [_fid: to-integer p/id= _field_type: p/field_type=]
  query [_qid: to-integer p/id=]
]
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!