I\'m trying to sort a list of titles, but currently there\'s a giant block of titles which start with \'The \'. I\'d like the \'The \' to be ignored, and the sort to work o
Ways that will only remove the first The
:
=SUBSTITUTE(A1,"The ","",1) OR more reliably:
=IF(IF(LEFT(A1,4)="The ",TRUE)=TRUE,RIGHT(A1,(LEN(A1)-4)),A1)
Second one is basically saying if the first left digit equals The
, then check how many digits are in the cell, and show only the the right hand digits excluding The
.
For SQLite
ORDER BY CASE WHEN LOWER(SUBSTR(title,1,4)) = 'the ' THEN SUBSTR(title,5) ELSE title END ASC
You can certainly arrange dynamically strip off 'The', though you'll soon find that you have to deal with 'A' and 'An' (except for the special case of titles like "A is for Alibi"). When "foreign" films enter the mix, you'll need to cope with "El" and "La" (except for that pesky edge case, "LA Story"). Then mix in some German films, and you'll need to cope with 'Der' and 'Die' (except for that pesky set of 'Die Hard' edge cases). See the pattern? You're headed down a path that keeps getting longer and more pitted with special cases.
The way forward on this that avoids an ever-growing set of special cases is to store the title as you want it display and store the title as you want it sorted.
Something like:
ORDER BY IF(LEFT(title,2) = "A ",
SUBSTRING(title FROM 3),
IF(LEFT(title,3) = "An ",
SUBSTRING(title FROM 4),
IF(LEFT(title,4) = "The ",
SUBSTRING(title FROM 5),
title)))
But given the overhead of doing this more than a few times, you're really better off storing the title sort value in another column...
Best is to have a computed column to do this, so that you can index the computed column and order by that. Otherwise, the sort will be a lot of work.
So then you can have your computed column as:
CASE WHEN title LIKE 'The %' THEN stuff(title,1,4,'') + ', The' ELSE title END
Edit: If STUFF isn't available in MySQL, then use RIGHT or SUBSTRING to remove the leading 4 characters. But still try to use a computed column if possible, so that indexing can be better. The same logic should be applicable to rip out "A " and "An ".
Rob
I think you could do something like
ORDER BY REPLACE(TITLE, 'The ', '')
although this would replace any occurrence of 'The ' with '', not just the first 'The ', although I don't think this would affect very much.