Addresses stored in SQL server have many small variations(errors)

一曲冷凌霜 提交于 2019-12-08 07:06:30

You need to use subqueries in the select statement Try this query:

 select CompanyCode,
    (select top 1 CompanyName from Table1 where CompanyCode=X.CompanyCode 
     group by CompanyName order by count(*) desc) CompanyName,
    (select top 1 Addr1 from Table1 where CompanyCode=X.CompanyCode 
     group by Addr1 order by count(*) desc) Addr1,
    (select top 1 City from Table1 where CompanyCode=X.CompanyCode 
     group by City order by count(*) desc) City,
    (select top 1 State from Table1 where CompanyCode=X.CompanyCode 
     group by State order by count(*) desc) State,
    (select top 1 Zip from Table1 where CompanyCode=X.CompanyCode 
     group by Zip order by count(*) desc) Zip
from    Table1 X
group by CompanyCode

You are going to struggle. Personally I think I'd consider having a process that tries to update the data in the database and correct it.

You could change the system that inputs the data (or if that's not possible, have an external process that runs over the data once it's in the db) that does something like the following:

  1. Against known lists of things like towns/states/countries etc to catch typos.
  2. For known regular mistakes and abbreviations. E.g. "Avenue"/"Ave." or "Street"/"St." and normalises the values.
  3. Change the input system to do this kind of validation and/or provide the users with an address search/validation UI that allows then to search for an address given some known values (zip/postal code etc). You can buy data like this from various suppliers depending on where you are in the world.

If this all works (I doubt you'll get 100% unless you provide a mechanism whereby those things that can't be auto-corrected are flagged for human intervention), then your reporting is as simple as SELECT DISTINCT...

Is it one-time job, I hope? It's impossible unless you can explain (in SQL terms) why first record is what you need. As an temporary solution I'd suggest following query

select C1.* from Company C1, 
(select CompanyCode, min(CompanyName) as CompanyNameSelected 
   from Company
   group by CompanyCode) C2
where 
   C1.CompanyCode = C2.CompanyCode and 
   C1.CompanyName = C2.CompanyNameSelected;

You could use any of aggregation functions instead of min (returning CompanyName of course), or even write your own stored function, but the only thing is needed - you have to explain in query language why record #1 is better than #2.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!