string search using symbol tables in Robert Sedwick book

问题

I am reading about Algorithms in C++ by Robert Sedwick. In chapter 12 about symbol tables following text is taken.

This program assumes that Item.cxx defines a char* data representation for string keys in items, an overloaded operator< that uses strcmp, an overloaded operator== that uses strncmp, and a conversion operator from Item to char* (see text). The main program reads a text string from a specified file and uses a symbol table to build an index from the strings defined by starting at each character in the text string. Then, it reads query strings from standard input, and prints the position where the query is found in the text (or prints not found). With a BST symbol-table implementation, the search is fast, even for huge strings.

#include <iostream.h>
#include <fstream.h>
#include "Item.cxx"
#include "ST.cxx"
static char text[maxN];
int main(int argc, char *argv[])
  { int N = 0; char t;
    ifstream corpus; corpus.open(*++argv);
    while (N < maxN && corpus.get(t)) text[N++] = t;
    text[N] = 0;
    ST<Item, Key> st(maxN);
    for (int i = 0; i < N; i++) st.insert(&text[i]);
    char query[maxQ]; Item x, v(query);
    while (cin.getline(query, maxQ))
      if ((x = st.search(v.key())).null())
           cout << "not found: " << query << endl;
      else cout << x-text << ": " << query << endl; // Question here.
  }

Above program reads a series of queries from standard input, uses search to determine whether each query is in the text, and prints out the text position of the first occurrence of the query. If the symbol table is implemented with BSTs, then we expect that the search will involve about 2N ln N comparisons. For example, once the index is built, we could find any phrase in a text consisting of about 1 million characters (such as Moby Dick) with about 30 string comparisons. This application is the same as indexing, because C string pointers are indices into a character array: If x points to text[i], then the difference between the two pointers, x-text, is equal to i.

My question on above text is

How can above program work to find string in symbol table as author is storing only char? Is there is bug in program.
As mentioned in below commentry that "If x points to text[i], then the difference between the two pointers, x-text, is equal to i." How author concluded this?
Text "we could find any phrase in a text consisting of about 1 million characters (such as Moby Dick) with about 30 string comparisons." Here author conlcuded that we can find phrase in given text with 30 string comparisions. as we required 2NlogN comparisions?

4.If any one have access to book how author has drawn BST Figure 12.11. Example of indexing a text string"? And how this is linked to above program.

Thanks!

回答1:

I do not have the text, but I accessed most of the relevant parts through google books.

It is true that text is an array of characters. However, when the search tree st is built, what is inserted are character pointers (st.insert(&text[i])). You can think of char* as the type of a C string. So the search tree operates on strings, not characters.

x points to text[i] implies x - text == i is C pointer arithmetic.

char text[1];
int i = 0;
char* x = &text[i];
int ii = x - text;
assert(i == ii);

"2N ln N" is a typo. If you refer back to "Property 12.6" you will see that it should read "2 ln N" which, for N equals a million, is 27.63, or about 30.
I could not access figure 12.11 through google books, so I cannot answer this one.

来源：https://stackoverflow.com/questions/20973625/string-search-using-symbol-tables-in-robert-sedwick-book

标签

c++

algorithm

binary-search-tree