About a Prolog tokenizer

﹥>﹥吖頭↗ 提交于 2019-12-25 03:29:23

问题


One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.

The replace part looks like this:

replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).

And the Main part has a predicate called removewhite(list1 list2)

So how can I let removewhite execute replace?


回答1:


You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):

tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).

tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).

skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.

code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].

despite the simplicity, this is a fairly efficient scanner, easily extensible. In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:

?- tokenize(`123  4 567  `, L).
L = [123, 4, 567]

or

?- atom_codes('123  4 567  ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567] 

Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).

Anyway, about your question

how can I let removewhite execute replace?

I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...




回答2:


You can write a more "powerfull" predicate

replace_all(_, _, [], []).
replace_all(L, R, [X|T], [R|T2]):- 
    member(X, L),
    replace_all(L, R, T, T2).

replace_all(L, R, [X|T], [X|T2]) :- 
    \+ member(X, L),
    replace_all(L, R, T, T2).

Then, you will have

removewhite(List1, List2) :-
    remove_all([' ', '\t'], '\n', List1, List2).


来源:https://stackoverflow.com/questions/29044086/about-a-prolog-tokenizer

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!