Creating the shortest Turing-complete interpreter [closed]

强颜欢笑 提交于 2019-12-17 21:47:11

问题


I've just tried to create the smallest possible language interpreter. Would you like to join and try?

Rules of the game:

  • You should specify a programming language you're interpreting. If it's a language you invented, it should come with a list of commands in the comments.
  • Your code should start with example program and data assigned to your code and data variables.
  • Your code should end with output of your result. It's preferable that there are debug statements at every intermediate step.
  • Your code should be runnable as written.
  • You can assume that data are 0 and 1s (int, string or boolean, your choice) and output is a single bit.
  • The language should be Turing-complete in the sense that for any algorithm written on a standard model, such as Turing machine, Markov chains, or similar of your choice, it's reasonably obvious (or explained) how to write a program that after being executred by your interpreter performs the algorithm.
  • The length of the code is defined as the length of the code after removal of input part, output part, debug statements and non-necessary whitespaces. Please add the resulting code and its length to the post.
  • You can't use functions that make compiler execute code for you, such as eval(), exec() or similar.

This is a Community Wiki, meaning neither the question nor answers get the reputation points from votes. But vote anyway!


回答1:


Python, 250 bytes.

This one is longer than ilya n.'s Python solution, but the language is a little easier to grasp. Each command in this language (I call it Kaputt, the German word for "broken") is one byte. The following 6 commands are defined:

  • 0: Push a zero onto the stack
  • 1: Push a one onto the stack
  • I: (if) Pop a value off the stack (which must be zero or one). Run the following block of code (until "i") if it's a one; skip the block if it's a zero.
  • i: (endif) Ends the if block and pushes a one if the block was not run (because the "I" popped a zero). See below for an explanation of the latter.
  • D: (def) Pops the name of the to-be-defined function off the stack (see below) and binds the following block (until "d") to that name.
  • d: (enddef) Ends the function definition.
  • Any other byte: Check if there is a function by this (one-byte; but see the edit below) name. If so, run this function; if not, push this byte onto the stack for immediate use by D.

Nested ifs are legal; nested function definitions are not. The fact that i (endif) pushes a one if and only if the corresponding if block was not run allows for the following idiom resembling an if/else/endif structure:

# [code that left a one or zero on the stack]
I    # abbreviated "(" below
# [code to be run if it was a one]
0iI  # abbreviated "/" below
# [code to be run if it was a zero]
1iIi # abbreviated ")" below
# [continuing...]

Note that comments, linebreaks, spaces etc. are not actually allowed.

Here are some examples of common functions. These make use of the abbreviations ( / ) mentioned above.

<D(/)d

defines the function < (pop) that pops a value off the stack without using it for anything.

&D((1/0)/<0)d

defines the function & (and) that pops two values of the stack and pushes a one if both values are ones, pushes a zero otherwise.

~D((11/10)/(01/00))d

defines the function ~ (swap) that swaps the top two values on the stack.

RD(R/<)d

defines the function R (remove) that recursively removes all ones from the top of the stack, and then removes two more values (the top one of which should be zero).

The following interpreter function expects the program p, which is a string (or any other iterable of bytes), and the input stack S, which is a (possibly empty) list of bytes. After the interpreter has run, this list contains the output stack.

def i(p,S,F=0):
 A=S.append
 F=F or{}
 C=0
 k=[]
 for c in p:
  h=0in k
  if c=="d":C=0
  elif C!=0:C+=[c]
  elif c=="I":k+=[int(h or S.pop())]
  elif c=="i":k.pop()or A("1")
  elif 1-h:
   if c=="D":C=F[S.pop()]=[]
   else:i(F.get(c)or A(c)or"",S,F)

Obviously, there was no room for error checking, so i() expects legal Kaputt code. Test cases:

script = "<D(/)d"                 # < = pop
script += "&D((1/0)/<0)d"         # & = and
script += "~D((11/10)/(01/00))d"  # ~ = swap
script += "RD(R/<)d"              # R = remove
script += "|D(<1/(1/0))d"         # | = or
script=script.replace("(","I")   
script=script.replace("/","0iI")
script=script.replace(")","1iIi") # ( and / and ) are no real commands of the language.

S=[]
i(script+"1111010111111111R",S)
assert S == ["1","1","1","1","0"]

S=[]
i(script+"00&",S)
assert S == ["0"]

S=[]
i(script+"01~",S)
assert S == ["1","0"]

S=[]
i(script+"00|",S)
assert S == ["0"]

S=[]
i(script+"01|",S)
assert S == ["1"]

Happy coding :-)

Edit: There is nothing inherent in the interpreter that forces tokens to be one byte only. Requiring this was more for consistency with the built-in commands (which are one-byte because that makes the interpreter shorter). If you pass a list of strings instead of a string, you can write more readable Kaputt code like this:

script = """
inc D
    (
        (
            0 0
        /
            1 0
        )
    /
        1
    )
d
""".replace("(","I").replace("/","0 i I").replace(")","1 i I i").split()

This defines a function inc that increments the two-bit number on top of the stack (LSB on top).

Test:

for n in xrange(4):
    S=[]
    i(script + [str((n & 2)/2), str(n & 1), "inc"], S)
    assert S == [str(((n+1) & 2)/2), str((n+1) & 1)]

Let's call multi-byte function names a language extension :-)




回答2:


Python program that interprets a language I just created:

from random import randint
z = [randint(0,1), None]  # example: input data is 0 or 1
x = '_b_ed_a_i____d_ai'   # this program will set p = data[1] = not data[0]

# input x[i], data z[k]
# jumper j
# p is +1 or -1 each step
# commands:
#    a   set data = 0 or 1
#    b   j += 0 or +-9 depending on data
#    d   move data left or right
#    e   perform jump left or right
#    j   set jumper to 0
#    i   end program
# output: p or some data[x], both possible

g = globals()
p = i = 1
a = b = d = j = e = k = 0
while i:
    h = x[i]
    g[h] = p = -p
    i += 1 + j*e
    if a: z[k] = i % 2
    if z[k]: j += 9*b
    k += d
    g[h] = 0        
#    print(a, b, d, e, i, j, k, h, z)

print('Input:', z, 'Output:', p > 0)

Optimized version:

g=globals()
p=i=1
a=b=d=j=e=k=0
while i:
 h=x[i]
 g[h]=p=-p
 i+=1+j*e
 if a:z[k]=i%2
 if z[k]:j+=9*b
 k+=d
 g[h]=0

114 bytes

Update: I want to add that the point of my program is not to create any practical language, and not even to somehow have the 'best' (=='shortest') interpreter, but rather to demonstrate interesting programming tricks. For example, I am relying on direct access to global variables through globals(), so that I never even test for j command, saving precious bytes :)




回答3:


#include "/dev/tty"



回答4:


Assemble the code below using A86 to get a 150 byte BF interpreter!

    add dh,10h
    push dx
    add dh,10h
    push dx
    mov bl,80h
    lea dx,[bx+2]
    add bl,byte ptr [bx]
    mov byte ptr [bx+1],al
    mov ah,3dh
    int 21h
    pop ds,es
    jc l14
    mov bx,ax
    mov ah,3fh  
    mov cx,di
    xor dx,dx
    int 21h
    jc l14
    mov bx,ax
    xor ax,ax
    rep stosw
    xor di,di
    xor si,si
    inc ch
l1:
    cmp si,bx
    jae l14
    lodsb
    mov bp,8
    push l1
l2: 
    dec bp
    js ret
    cmp al,[bp+l15]
    jne l2
l3:
    mov cl,[bp+8+l15]
    jmp cx
l4:
    inc di  
    ret
l5:
    dec di
    ret
l6:
    es inc byte ptr [di]
    ret
l7:
    es dec byte ptr [di]
    ret
l8:
    mov ah,2
    es mov dl,[di]
    int 21h
    ret
l9:
    mov ah,8
    int 21h
    es mov [di],al
    ret
l10:
    es cmp byte ptr [di],dh
    jne ret
l11:    
    cmp si,bx
    jae l14
    lodsb
    cmp al,']'
    jne l11
    ret
l12:
    es cmp byte ptr [di],dh
    je ret
l13:    
    dec si
    jz l14
    cmp byte ptr [si-1],'['
    jne l13
l14:
    ret
l15:
    db '>'
    db '<'
    db '+'
    db '-'
    db '.'
    db ','
    db '['
    db ']'
    db offset l4-100h
    db offset l5-100h
    db offset l6-100h
    db offset l7-100h
    db offset l8-100h
    db offset l9-100h
    db offset l10-100h
    db offset l12-100h

Specify a filename on the command line, no need for double quotes, as the BF source.




回答5:


Not mine, but take a look at the 210 bit binary lambda calculus self-interpreter here.




回答6:


Written in Perl, 140 characters long, including the shell invocation and flags:

perl -ne'push@a,split;if(eof){$i=0;while($i>=0){($a,$b,$c)=@a[$i..$i+2];$b>-1?$a[$b]-=$a[$a]:print chr$a[$a];$i=$b>-1&&$a[$b]<=0?$c:$i+3;}}'

Readable version:

#!/usr/bin/perl -n
push @prog, split /\s+/, $_;
if(eof) {
  $i = 0;
  while($i >= 0) {
    ($a, $b, $c) = @prog[$i .. $i + 2];
    if($b > -1) {
      $prog[$b] -= $prog[$a];
    } else {
      print chr $prog[$a];
    }
    if($b > -1 and $prog[$b] <= 0) {
      $i = $c;
    } else {
      $i + 3;
    }
  }
}

The above is an interpreter for pseudo-machine code for a One Instruction Set Computer using the subleq instruction. Basically, the source code should just be a bunch of numbers separated by whitespace. A simple test program to verify my results (should print "Hi" and a Unix newline):

0 0 6 72 105 10 3 -1 9 4 -1 12 5 -1 15 0 0 -1

Readable version of the test input (works just as well):

0    0     6
72   105   10
3   -1     9
4   -1     12
5   -1     15
0    0    -1



回答7:


My own entry, implementation of OISC in Ruby. It's 85 bytes long, and I'm sure one could squeeze a few more with some neat tricks. Programs have to contain data in code (line of space-separated numbers). At the moment I'm unable to provide working program written in OISC, but I will do it later.

p,m=0,gets.chomp.split.map(:to_i)
while p>0
p=(m[m[b]]-=m[m[a]])>0?p+3:c
end
$><<m[0]

The code is quite straightforward. m is a "memory" containing program and data. First line initializes m with provided code, and p - memory pointer. Main loop is subleq operation, written with ternary operator. Last line is smart way to output number contained in memory.




回答8:


Building on my previous code-golf entry, here is a (slightly generalized for IO) OISC emulator in

Fortran77

Obsfucated and without the loading scaffold (655 characters):

  subroutine o(n,m)
  integer m(n)              
  l=1;                      
  do while (l.ne.0)
     i=m(l)
     j=m(l+1)
     k=m(l+2)
     mi=mg(n,m,i)
     mj=mg(n,m,j)
     if(j.eq.(n+2)) then
        write(6,*)mj-mi
     else
        m(l+1)=mj-mi
     endif
     if (m(l+1).lt.0) then
        l=mg(n,m,k)
     else
        l=l+3
     endif
  end do
  return
  end
  function mg(n,m,i)
  integer m(n)
  if (i.eq.n+2) then
     read(5,*)mg
  elseif (i.eq.n+1) then
     mg=0
  else
     mg=m(i)
  endif
  return
  end

with comments, loading scaffold and so on (2435 characters):

      program c
      parameter (n=1024)        ! The size to use for memeory
      integer m(n)              ! represent the memory
c     Load a program into memory
      i=1
 1    read(5,*,end=2)l
c     write (6,*) "Load ",l," into location ",i
      m(i)=l
      i=i+1
      goto 1
c     Run the computer
 2    call o(n,m)
      stop
      end

      subroutine o(n,m)
c     Simulate a simple computer that supports only a single
c     instruction. Memory is a fixed size array of integers.
c
c     The supported instruction is subtract-branch-negative which we
c     give the assembly syntax
c
c     sbn i j k
c
c     and acts by subtracting the value in memeory location i from that
c     in location j and storing the result in j. If the result is
c     negative, the PC is set to k. Because there is only one opcode, it
c     is not represented, so a program consists simply of a series of
c     triplet (i,j,k), and the PC is generally incremented by 3. The
c     program counter starts at 1
c
c     Povisions for IO and halting are by way of special memory
c     location:
c
c     * PC=0 means halt
c     * writes (opcode argument j) to memory location n+1 send
c       output, reads from n+1 evaluate as 0
c     * reads (opcode argument i, j, k) from memory location n+2 fetch
c       input, writes to n+2 are forbidden
c     * All others reads and writes outside of memeory are forbidden

c     n                         ! the size of the computers memory 
      integer m(n)              ! an array representing the computers memory
      l=1;                      ! program counter
      do while (l.ne.0)
         i=m(l)
         j=m(l+1)
         k=m(l+2)
c        write (6,*) "Executing from PC=",l,": (",i,", ",j,", ", k,")"
c        handle the write special case for output
         mi=mg(n,m,i)
         mj=mg(n,m,j)
         if(j.eq.(n+1)) then
            write(6,*)mj-mi
         else
c           write (6,*) "Setting location ",j," to ",mj-mi
            m(l+1)=mj-mi
         endif
         if (m(l+1).lt.0) then
            l=mg(n,m,k)
         else
            l=l+3
         endif
      end do
      return
      end

c     Handle the various read special cases
      function mg(n,m,i)
      integer m(n)
      if (i.eq.n+2) then
         read(5,*)mg
      elseif (i.eq.n+1) then
         mg=0
      else
         mg=m(i)
      endif
c     write (6,*) "Read ",mg," from location ",i
      return
      end

Sample program:

13
1025
0

14 
1025
0

14
15
0

0
0
0

-1
1
0

resulting in output:

$ cat trivial.oisc | ./a.out 
           1
          -1

which is as expected.

It really is exceedingly straightforward code. The point here is not really how tightly I can code it, but how simple a language you need for Turing completeness.




回答9:


106-byte solution was posted to codegolf.com contest. It is written in perl and interprets Brainfuck. There is no way to review it at this point, afaik.

It is not my solution, but I believe it is shorter than current entries. Possible solution should be shorter, as there is no need to use Brainfuck as Turing-complete language.




回答10:


URM interpreter in CoffeeScript, 143 bytes (167 with new lines).

This version of URM has consists of an unbounded amount of registers, with zero, successor and jump operators. It is well known that this is turing-complete.

The URM program is written in the array c (commands), and the inputs are in the array r (registers). After the calculation the output is at r[0] and this value is displayed.

The interpreter, with sample program and input that computes 32+13 (and indeed outputs 45):

# Addition program, similar to http://planetmath.org/examplesofunlimitedregistermachines
c = [['j', 1, 2, 4], ['s', 0], ['s', 2], ['j', 1, 1, 0]]

# Input: 32, 13, thus the desired output is: 45
r = [32, 13]

k = 0
while k < c.length
    t = c[k]
    k += 1
    if t[0] == 'z'
        r[t[1]] = 0
    if t[0] == 's'
        if !r[t[1]]?
            r[t[1]] = 0
        r[t[1]] += 1
    if t[0] == 'j' && r[t[1]] == r[t[2]]
        k = t[3]
alert r[0]

Minified version:

k=0
while k<c.length
 t=c[k]
 k+=1
 if t[0]=='z'
  r[t[1]]=0
 if t[0]=='s'
  if !r[t[1]]?
   r[t[1]]=0
  r[t[1]]+=1
 if t[0]=='j'&&r[t[1]]==r[t[2]]
  k=t[3]
alert r[0]

What I really like in this code is that is really straightforward and easy to understand.




回答11:


Custom language: SLA
Keywords:
i - Interpret SLB q - Quit

Custom Language: SLB
Keywords:
cp("text") - Interpret text as C program

Example:
cp("#include \nint main() { puts(\"Hi!\n\");return 0}")

Interpreter (Written in SLA): i q

3 characters!



来源:https://stackoverflow.com/questions/1053931/creating-the-shortest-turing-complete-interpreter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!