Compare columns of unequal length for matches and differences

懵懂的女人 提交于 2019-12-12 01:23:13

问题


I will explain this in excel terms just so it will probably be clearer.

I have an excel sheet with 2 columns.

Column A has 69,000 rows. Column B has 49,000 rows.

Column A has our complete product list Column B has product list from Manufacturer 1

There are only certain/some rows which are common between 2 columns. and also, column B is not a subset of column A. Column A has extra entries and so does column B.

I need to know, which rows from Column B, are common with Column A which rows from Column B are not common with Column A

How would I acheive this? I am trying excel but the vlookup is taking forever and hanging up. Are there any other windows/office utilities which can help me? If its a macro, can you please give me scripts and suggestions to execute it?

I have access to linux machine aswell and I am familiar with those tools.

I can transfer this info to a text file/s, can I run some sed or awk script to print the output?

Any help would be great.


回答1:


Use the MATCH() function, it'll give you a number if there is a result, and #NA if there isn't.

I always work in Tables in Excel 2007 and newer, but will give both syntaxes:

Assuming you have a table, with things to compare in columns "column1" and "column2", checking whether the value in Column2 is present in Column1

=ISNUMBER(MATCH(Table1[[#This Row],[Column2]],[Column1],0))

Or if you have an old school array with data in columns A and B, looking for the value in B in A:

=ISNUMBER(MATCH(Sheet1!$B2,Sheet1!$A$2:$A$11,0))

What's going on - you are looking for an exact match (the 0 parameter), of the value on the current row in one column, in the other column and checking whether you get a numeric value (yes there is a match), or not (no match)




回答2:


This is dead simple on Unix or Linux. Start by putting all of your company's products in one file, and all of the other company's products in another. I'll call them FileA and FileB.

Sort them.

$ sort -u FileA > temp_file
$ mv temp_file FileA

$ sort -u FileB > temp_file
$ mv temp_file FileB

The products that are common to both files . . .

$ comm -12 FileA FileB

The products that are unique to FileB . . .

$ comm -13 FileA FileB



回答3:


I'm surprised the VLOOKUP is slow/unreliable, 70'000 rows is nothing. Are you sure you've got the formulas correct?

Seeing as you have Excel, you might have MS Access. Loading the columns into an Access table and resolving with JOINs would be very quick




回答4:


Sort the two lists and use Approximate Vlookup (last argument True): this will be extremely fast (binary search) but you need to handle the NoMatch case: something like this in column C
=IF(B1=Vlookup(B1,$A$1:$A$69000,1,True),"Match","NoMatch")
and copy down



来源:https://stackoverflow.com/questions/4747058/compare-columns-of-unequal-length-for-matches-and-differences

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!