TSQL: Find a date with varying characters in a string

不问归期 提交于 2019-12-11 07:49:09

问题


I need to find a continuous date in a string from column name Filename. The string has other numbers in it with dashes(or another character, like an underscore), but I only need the continuous number

The Date needs to be extracted from the filename. (I know the data is just wow, multiple vendors, multiple file naming formats is the cause.)

This question is similar to this question, but it's looking for something different with a different requirement: TSQL: Find a continuous number in a string

Desired result:

Actual Result:

Test Code:

DROP TABLE #dob

CREATE TABLE #dob (
 FILENAME VARCHAR(MAX)
,StudentID INT
)

INSERT INTO #dob
( FILENAME  )
VALUES
 ('Smith John D, 11-23-1980, 1234567.pdf')
,('Doe Jane, _01_22_1980_123456.pdf')
,('John Doe, 567891.pdf' )

--This is what I tried.

SELECT FILENAME
, substring(FileName, patindex('%[0-9][%-%][%_%][0-9][0-9][0-9][0-9][0-9]%', FileName), 8) AS dob
FROM #dob

回答1:


I don't think you have the pattern quite right. Also, you can use a CASE expression to return NULL:

SELECT FILENAME,
       (CASE WHEN FileName LIKE '%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%'
             THEN substring(FileName, patindex('%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%', FileName), 10)
        END) AS dob
FROM #dob;

You can also dispense with the CASE and use NULLIF():

   substring(FileName, NULLIF(patindex('%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%', FileName), 0), 10) as dob



回答2:


Try it like this:

DROP TABLE #StuID
GO
CREATE TABLE #StuID (
 FILENAME VARCHAR(MAX)
,StudentID INT
)

INSERT INTO #StuID
( FILENAME  )
VALUES
 ('Smith John D, 11-23-1980, 1234567.pdf')
,('Doe Jane, _01_22_1980_123456.pdf')
,('John Doe, 567891.pdf' );

WITH Casted([FileName],ToXml) AS
(
    SELECT [FILENAME] 
          ,CAST('<x>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE([FILENAME],'.',' '),'-',' '),'_',' '),',',' '),' ','</x><x>') + '</x>' AS XML)
    FROM #StuID
)
SELECT [FileName] 
      ,numberFragments.value('/x[.>=1 and .<=31][1]','int') AS MonthFragment --using <=12 might bring back the second fragment twice...
      ,numberFragments.value('/x[.>=1 and .<=31][2]','int') AS DayFragment
      ,numberFragments.value('/x[.>=1960 and .<=2050][1]','int') AS YearFragment
      ,numberFragments.value('/x[.>=100000 and .<=10000000][1]','int') AS StudId
FROM Casted
CROSS APPLY (SELECT ToXml.query('/x[not(empty(. cast as xs:int?))]')) A(numberFragments);

The idea in short:

As in the previous answer we will break the string to a XML and filter for fragments castable to int.
The magic ist the XQuery-filtering:

  • We pick the first fragment between 1 and 31, which is the month hopefully
  • We pick the second fragment between 1 and 31 which is the day hopefully
  • We pick the first fragment between 1960 and 2050 which is the year hopefully
  • And we pick the student's id, which is the first fragment between 100000 and 10000000.

Hint: It looks like a nice idea to use <=12 for the month fragment, but I'd use the same filter for day and month to make sure, that we pick the first and the second fragment of the same value region...




回答3:


Another method would be (after using PATINDEX to find the date) is force to string's format to MM/dd/yyyy and then use an explicit style for the conversion:

SELECT *,
       TRY_CONVERT(date,STUFF(STUFF(SUBSTRING(d.FILENAME,V.I, 10),3,1,'/'),6,1,'/'),101)
FROM #dob d
     CROSS APPLY (VALUES(NULLIF(PATINDEX('%[0-9][0-9]_[0-9][0-9]_[0-9][0-9][0-9][0-9]%',d.[FILENAME]),0))) V(I);


来源:https://stackoverflow.com/questions/57374764/tsql-find-a-date-with-varying-characters-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!