I have a large database for solving crossword puzzles, consisting of a word and a description. My application allows searching for words of a specific length and characters
Since you use a database, create a Suffixes table.
For example :
Suffix | WordID | SN
----------------+------------+----
StackOverflow 10 1
tackOverflow 10 2
ackOverflow 10 3
ckOverflow 10 4
kOverflow 10 5
...
With that table it's easy to get all words that contain a particular char in a specific position,
like this:
SELECT WordID FROM suffixes
WHERE suffix >= 't' AND suffix < 'u' AND SN = 2
Get all words which contain 't'
at position 2
.
Update: if you want to save space, and sacrifice a bit of speed, you can use a suffix array.
You can store all the words in a line (array) with a separator among them, ie the $
, and create
a suffix array which will have pointers to chars. Now, given a char c
you can find all instances of words which contain it rather fast. Still, you'll have to examine if it's in the right position.
(by checking how far it is from the $
s)
Probably with the above technique the search will be x10 faster than searching all the words in your original program.
Update 2: I've used the database approach in one of my utilities where I needed to locate suffixes such as "ne", for example, and I forgot to adjust (optimize) it for this specific problem.
You can just store a single char as a suffix:
Suffix | WordID | SN
---------+------------+----
S 10 1
t 10 2
a 10 3
c 10 4
k 10 5
...
which saves a lot of space. Now, the query becomes
SELECT WordID FROM suffixes
WHERE suffix = 't' AND SN = 2