问题
I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.
There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.
My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.
Warning: messy code ahead
declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)
insert into @duplicates (lngInternetPK, lngContactFK, lngValue)
(
select tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue from tblminternet inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)
update @duplicates set lngValue = t.val
from
(select (sum(dupes.lngValue) & 65535) as val from
(select here.lngInternetPK, here.lngContactFK, here.lngValue from tblminternet here inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On here.strAddress = secondemail.strAddress and
here.lngcontactfk = secondemail.lngcontactfk
where count > 1 and here.strAddress is not null and here.lngValue & 256 <> 256) dupes, tblminternet this
where this.lngContactFK = dupes.lngContactFK
) t
where lngInternetPK in (select lngInternetPK from @duplicates)
Edit:
As requested here is some sample data:
Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress
Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"
Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"
If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"
Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).
回答1:
SQL Server
lacks native bitwise aggregates, that's why we need to emulate them.
The main idea here is to generate a set of bits from 0
to 15
, for each bit apply the bitmask to the value and select MAX
(which will give us an OR
for a given bit), then select the SUM
(which will merge the bit masks).
The we just update the first lngInternetPK
for any given (lngContactFK, strValue)
with the new value of lngValue
, and delete all duplicates.
;WITH bits AS
(
SELECT 0 AS b
UNION ALL
SELECT b + 1
FROM bits
WHERE b < 15
),
v AS
(
SELECT i.*,
(
SELECT SUM(value)
FROM (
SELECT MAX(lngValue & POWER(2, b)) AS value
FROM tblmInternet ii
CROSS JOIN
bits
WHERE ii.lngContactFK = i.lngContactFK
AND ii.strAddress = i.strAddress
GROUP BY
b
) q
) AS lngNewValue
FROM (
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
) i
WHERE rn = 1
)
UPDATE v
SET lngValue = lngNewValue;
;WITH v AS
(
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
)
DELETE v
WHERE rn > 1
See this article in my blog for more detailed explanations:
- SQL Server: aggregate bitwise OR
回答2:
I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.
Declare @tblminternet
Table
( lngInternetPK int,
lngContactFK int,
lngValue int,
strAddress varchar(255)
)
Insert Into @tblminternet
select 1, 1, 33, 'me@myaddress.com'
union
select 2, 1, 40, 'me@myaddress.com'
union
select 3, 2, 33, 'me@myaddress2.com'
union
select 4, 2, 40, 'me@myaddress2.com'
union
select 5, 3, 2, 'me@myaddress3.com'
--Select * from @tblminternet
Select Distinct
A.lngContactFK ,
A.lngValue | B.lngValue as 'Bitwise OR',
A.strAddress
From @tblminternet A, @tblminternet B
Where A.lngContactFK = B.lngContactFK
And A.strAddress = B.strAddress
And A.lngInternetPK != B.lngInternetPK
回答3:
You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.
See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx
First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
- Create a SQL Server -> SQL Server Database Project
- Right-click on the new project and select Properties
- Configure the targeted SQL Server version under Project Settings
- Configure the targeted CLR language under SQL CLR (such as VB)
- Right-click on the new project and select Add -> New Item...
- When the dialog pops up, select SQL Server -> SQL CLR VB -> SQL CLR VB Aggregate
Now you can write your bitwise code in VB:
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure AggregateBitwiseOR
Private CurrentAggregate As SqlTypes.SqlInt32
Public Sub Init()
CurrentAggregate = 0
End Sub
Public Sub Accumulate(ByVal value As SqlTypes.SqlInt32)
'Perform Bitwise OR against aggregate memory
CurrentAggregate = CurrentAggregate OR value
End Sub
Public Sub Merge(ByVal value as AggregateBitwiseOR)
Accumulate(value.Terminate())
End Sub
Public Function Terminate() As SqlInt32
Return CurrentAggregate
End Function
End Structure
Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx
- Build the project using the menu bar: Build -> Build ProjectName (if the build fails with error 04018 then download a new version of the data tools @ http://msdn.microsoft.com/en-US/data/hh297027 or by going to the menu bar: Tools -> Extensions And Updates, then under updates select update for Microsoft SQL Server Update For Database Tooling)
- Copy your compiled DLL to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn and to C:\
Register the DLL:
CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE
Create the aggregate in SQL:
CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];
If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:
For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90
For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100
Test your code:
SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar
I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions
来源:https://stackoverflow.com/questions/1112382/update-column-to-be-different-aggregate-values