问题
This is a follow-up to my question here: in which I got an excellent answer for that question provided by uzi. I however noticed that a new Company, Company3
also used single data Points, such as account 6000 which does not follow the manner of the previous companies which makes uzi's recursive cte not applicable.
As such I feel like it is required to alter the question, but I Believe that this complication would issue a new question rather than an edit on my previous one due to having a great impact of the solution.
I need to read data from an Excel workbook, where data is stored in this manner:
Company Accounts
Company1 (#3000...#3999)
Company2 (#4000..#4019)+(#4021..#4024)
Company3 (#5000..#5001)+#6000+(#6005..#6010)
I believe that due to some companies, such as Company3
having single values of accounts such as #6000
that I need to, in this step, create a result set of the following appearence:
Company FirstAcc LastAcc
Company1 3000 3999
Company2 4000 4019
Company2 4021 4024
Company3 5000 5001
Company3 6000 NULL
Company3 6005 6010
I will then use this table and JOIN it with a table of only integers to get the appearance of the final table such as the one in my linked question.
Does anyone have any ideas?
回答1:
A good t-sql splitter function makes this quite simple; I suggest delimitedSplit8k. This will perform significantly better than a recursive CTE too. First the sample data:
-- your sample data
if object_id('tempdb..#yourtable') is not null drop table #yourtable;
create table #yourtable (company varchar(100), accounts varchar(8000));
insert #yourtable values ('Company1','(#3000...#3999)'),
('Company2','(#4000..#4019)+(#4021..#4024)'),('Company3','(#5000..#5001)+#6000+(#6005..#6010)');
and the solution:
select
company,
firstAcc = max(case when split2.item not like '%)' then clean.Item end),
lastAcc = max(case when split2.item like '%)' then clean.Item end)
from #yourtable t
cross apply dbo.delimitedSplit8K(accounts, '+') split1
cross apply dbo.delimitedSplit8K(split1.Item, '.') split2
cross apply (values (replace(replace(split2.Item,')',''),'(',''))) clean(item)
where split2.item > ''
group by split1.Item, company;
Results:
company firstAcc lastAcc
--------- ---------- --------------
Company1 #3000 #3999
Company2 #4000 #4019
Company2 #4021 #4024
Company3 #6000 NULL
Company3 #5000 #5001
Company3 #6005 #6010
回答2:
I believe that list (#6005..#6010) is represented like #6005#6006#6007#6008#6009#6010 in your Excel file. Try this query if that is true and there are no gaps
with cte as (
select
company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts
from
(values ('company 1','#3000#3001#3002#3003'),('company 2','(#4000#4001)+(#4021#4022)'),('company 3','(#5000#5001)+#6000+(#6005#6006)')) data(company, accounts)
)
, rcte as (
select
company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
from
cte
cross apply (select charindex('#', accounts) ind1) ca
cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
union all
select
company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
from
rcte
cross apply (select charindex('#', acc) ind1) ca
cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
where
len(acc)>1
)
select
company, min(accounts) FirstAcc, case when max(accounts) =min(accounts) then null else max(accounts) end LastAcc
from (
select
company, accounts, accounts - row_number() over (partition by company order by accounts) group_
from
rcte
) t
group by company, group_
option (maxrecursion 0)
回答3:
I made a little editing to @uzi solution from the other question, in which i added three other CTE's and used windows function like LEAD()
and ROW_NUMBER()
to solve the problem. I don't know if there is a simpler solution, but i think this is working good.
with cte as (
select
company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts
from
(values ('company 1','#3000..#3999'),('company 2','(#4000..#4019)+(#4021..#4024)'),('company 3','(#5000..#5001)+#6000+(#6005..#6010)')) data(company, accounts)
)
, rcte as (
select
company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
from
cte
cross apply (select charindex('#', accounts) ind1) ca
cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
union all
select
company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
from
rcte
cross apply (select charindex('#', acc) ind1) ca
cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
where
len(acc)>1
) ,cte2 as (
select company, accounts as accounts_raw, Replace( accounts,'..','') as accounts,
LEAD(accounts) OVER(Partition by company ORDER BY accounts) ld,
ROW_NUMBER() OVER(ORDER BY accounts) rn
from rcte
) , cte3 as (
Select company,accounts,ld ,rn
from cte2
WHERE ld not like '%..'
) , cte4 as (
select * from cte3 where accounts not in (select ld from cte3 t1 where t1.rn < cte3.rn)
)
SELECT company,accounts,ld from cte4
UNION
SELECT DISTINCT company,ld,NULL from cte3 where accounts not in (select accounts from cte4 t1)
option (maxrecursion 0)
Result:
回答4:
It looks like you tagged SSIS so I will provide a solution for that using a script task. All other examples require loading to a staging table.
- Use your normal reader (Excel probably) and load
- Add a script transformation component
- Edit Component
- Input Columns - Check both Company and Accounts
- Input and Output - Add a new Output and call it CompFirstLast
- Add three columns to it - Company string, First int, and Last int
Open Script and paste the following code
public override void Input0_ProcessInputRow(Input0Buffer Row) { //Create an array for each group to create rows out of by splitting on '+' string[] SplitForRows = Row.Accounts.Split('+'); //Note single quotes denoting char //Deal with each group and create the new Output for (int i = 0; i < SplitForRows.Length; i++) //Loop each split column { CompFirstLastBuffer.AddRow(); CompFirstLastBuffer.Company = Row.Company; //This is static for each incoming row //Clean up the string getting rid of (). and leaving a delimited list of # string accts = SplitForRows[i].Replace("(", String.Empty).Replace(")", String.Empty).Replace(".", String.Empty).Substring(1); //Split into Array string[] accounts = accts.Split('#'); // Write out first and last and handle null CompFirstLastBuffer.First = int.Parse(accounts[0]); if (accounts.Length == 1) CompFirstLastBuffer.Last_IsNull = true; else CompFirstLastBuffer.Last = int.Parse(accounts[1]); } }
Make sure you use the right output.
来源:https://stackoverflow.com/questions/47972115/find-lowest-and-highest-values-split-into-rows-from-a-single-string-of-concatena