Excel VBA Performance - 1 million rows - Delete rows containing a value, in less than 1 min

匿名 (未验证) 提交于 2019-12-03 01:34:02

问题:

I am trying to find a way to filter large data and remove rows in a worksheet, in less than one minute

The goal:

  • Find all records containing specific text in column 1, and delete the entire row
  • Keep all cell formatting (colors, font, borders, column widths) and formulas as they are

.

Test Data:

:

.

How the code works:

  1. It starts by turning all Excel features Off
  2. If the workbook is not empty and the text value to be removed exists in column 1

    • Copies the used range of column 1 to an array
    • Iterates over every value in array backwards
    • When it finds a match:

      • Appends the cell address to a tmp string in the format "A11,A275,A3900,..."
      • If the tmp variable length is close to 255 characters
      • Deletes rows using .Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
      • Resets tmp to empty and moves on to the next set of rows
  3. At the end, it turns all Excel features back On

.

The main issue is the Delete operation, and total duration time should be under one minute. Any code-based solution is acceptable as long as it performs under 1 minute.

This narrows the scope to very few acceptable answers. The answers already provided are also very short and easy to implement. One performs the operation in about 30 seconds, so there is at least one answer that provides an acceptable solution, and other may find it useful as well

.

My main initial function:

Sub DeleteRowsWithValuesStrings()     Const MAX_SZ As Byte = 240      Dim i As Long, j As Long, t As Double, ws As Worksheet     Dim memArr As Variant, max As Long, tmp As String      Set ws = Worksheets(1)     max = GetMaxCell(ws.UsedRange).Row     FastWB True:    t = Timer      With ws         If max > 1 Then             If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then                 memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2                 For i = max To 1 Step -1                      If memArr(i, 1) = "Test String" Then                         tmp = tmp & "A" & i & ","                         If Len(tmp) > MAX_SZ Then                            .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp                            tmp = vbNullString                          End If                     End If                  Next                 If Len(tmp) > 0 Then                     .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp                 End If                 .Calculate             End If         End If     End With     FastWB False:   InputBox "Duration: ", "Duration", Timer - t End Sub

Helper functions (turn Excel features off and on):

Public Sub FastWB(Optional ByVal opt As Boolean = True)     With Application         .Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)         .DisplayAlerts = Not opt         .DisplayStatusBar = Not opt         .EnableAnimations = Not opt         .EnableEvents = Not opt         .ScreenUpdating = Not opt     End With     FastWS , opt End Sub  Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _                   Optional ByVal opt As Boolean = True)     If ws Is Nothing Then         For Each ws In Application.ActiveWorkbook.Sheets             EnableWS ws, opt         Next     Else         EnableWS ws, opt     End If End Sub  Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)     With ws         .DisplayPageBreaks = False         .EnableCalculation = Not opt         .EnableFormatConditionsCalculation = Not opt         .EnablePivotTable = Not opt     End With End Sub

Finds last cell with data (thanks @ZygD - now I tested it in several scenarios):

Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range      'Returns the last cell containing a value, or A1 if Worksheet is empty      Const NONEMPTY As String = "*"     Dim lRow As Range, lCol As Range      If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange     If WorksheetFunction.CountA(rng) = 0 Then         Set GetMaxCell = rng.Parent.Cells(1, 1)     Else         With rng             Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _                                         After:=.Cells(1, 1), _                                         SearchDirection:=xlPrevious, _                                         SearchOrder:=xlByRows)             If Not lRow Is Nothing Then                 Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _                                             After:=.Cells(1, 1), _                                             SearchDirection:=xlPrevious, _                                             SearchOrder:=xlByColumns)                  Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)             End If         End With     End If End Function

Returns the index of a match in the array, or 0 if a match is not found:

Public Function IndexOfValInRowOrCol( _                                     ByVal searchVal As String, _                                     Optional ByRef ws As Worksheet = Nothing, _                                     Optional ByRef rng As Range = Nothing, _                                     Optional ByRef vertical As Boolean = True, _                                     Optional ByRef rowOrColNum As Long = 1 _                                     ) As Long      'Returns position in Row or Column, or 0 if no matches found      Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long      result = CVErr(9999) '- generate custom error      Set usedRng = GetUsedRng(ws, rng)     If Not usedRng Is Nothing Then         If rowOrColNum 

.

Update:

Tested 6 solutions (3 tests each): Excel Hero's solution is the fastest so far (removes formulas)

.

Here are the results, fastest to the slowest:

.

Test 1. Total of 100,000 records, 10,000 to be deleted:

1. ExcelHero()                    - 1.5 seconds  2. DeleteRowsWithValuesNewSheet() - 2.4 seconds  3. DeleteRowsWithValuesStrings()  - 2.45 minutes 4. DeleteRowsWithValuesArray()    - 2.45 minutes 5. QuickAndEasy()                 - 3.25 minutes 6. DeleteRowsWithValuesUnion()    - Stopped after 5 minutes

.

Test 2. Total of 1 million records, 100,000 to be deleted:

1. ExcelHero()                    - 16 seconds (average)  2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)  3. DeleteRowsWithValuesStrings()  - 4 hrs 38 min (16701.375 sec) 4. DeleteRowsWithValuesArray()    - 4 hrs 37 min (16626.3051757813 sec) 5. QuickAndEasy()                 - 5 hrs 40 min (20434.2104492188 sec) 6. DeleteRowsWithValuesUnion()    - N/A

.

Notes:

  1. ExcelHero method: easy to implement, reliable, extremely fast, but removes formulas
  2. NewSheet method: easy to implement, reliable, and meets the target
  3. Strings method: more effort to implement, reliable, but doesn't meet requirement
  4. Array method: similar to Strings, but ReDims an array (faster version of Union)
  5. QuickAndEasy: easy to implement (short, reliable and elegant), but doesn't meet requirement
  6. Range Union: implementation complexity similar to 2 and 3, but too slow

I also made the test data more realistic by introducing unusual values:

  • empty cells, ranges, rows, and columns
  • special characters, like =[`~!@#$%^&*()_-+{}[]\|;:'",./?, separate and multiple combinations
  • blank spaces, tabs, empty formulas, border, font, and other cell formatting
  • large and small numbers with decimals (=12.9999999999999 + 0.00000000000000001)
  • hyperlinks, conditional formatting rules
  • empty formatting inside and outside data ranges
  • anything else that might cause data issues

回答1:

I'm providing the first answer as a reference

Others may find it useful, if there are no other options available

  • Fastest way to achieve the result is not to use the Delete operation
  • Out of 1 million records it removes 100,000 rows in an average of 33 seconds

.

Sub DeleteRowsWithValuesNewSheet()  '100K records   10K to delete                                     'Test 1:        2.40234375 sec                                     'Test 2:        2.41796875 sec                                     'Test 3:        2.40234375 sec                                     '1M records     100K to delete                                     'Test 1:        32.9140625 sec                                     'Test 2:        33.1484375 sec                                     'Test 3:        32.90625   sec     Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long     Dim wsName As String, t As Double, oldUsedRng As Range      FastWB True:    t = Timer      Set oldWs = Worksheets(1)     wsName = oldWs.Name      Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))      If oldUsedRng.Rows.Count > 1 Then                           'If sheet is not empty         Set newWs = Sheets.Add(After:=oldWs)                    'Add new sheet         With oldUsedRng             .AutoFilter Field:=1, Criteria1:="Test String"             .Copy                                               'Copy visible data         End With         With newWs.Cells             .PasteSpecial xlPasteColumnWidths             .PasteSpecial xlPasteAll                            'Paste data on new sheet             .Cells(1, 1).Select                                 'Deselect paste area             .Cells(1, 1).Copy                                   'Clear Clipboard         End With         oldWs.Delete                                            'Delete old sheet         newWs.Name = wsName     End If     FastWB False:   InputBox "Duration: ", "Duration", Timer - t End Sub

.

At high level:

  • It creates a new worksheet, and keeps a reference to the initial sheet
  • AutoFilters column 1 on the searched text: .AutoFilter Field:=1, Criteria1:="Test String"
  • Copies all (visible) data from initial sheet
  • Pastes column widths, formats, and data to the new sheet
  • Deletes initial sheet
  • Renames the new sheet to the old sheet name

It uses the same helper functions posted in the question

The 99% of the duration is used by the AutoFilter

.

There are a couple limitations I found so far, the first can be addressed:

  1. If there are any hidden rows on the initial sheet, it unhides them

    • A separate function is needed to hide them back
    • Depending on implementation, it might significantly increase duration
  2. VBA related:

    • It changes the Code Name of the sheet; other VBA referring to Sheet1 will be broken (if any)
    • It deletes all VBA code associated with the initial sheet (if any)

.

A few notes about using large files like this:

  • The binary format (.xlsb) reduce file size dramatically (from 137 Mb to 43 Mb)
  • Unmanaged Conditional Formatting rules can cause exponential performance issues

    • The same for Comments, and Data validation
  • Reading file or data from network is much slower than working with a locall file



回答2:

A significant gain in speed can be achieved if the source data do not contain formulas, or if the scenario would allow (or want) the formulas to be converted into hard values during the conditional row deletions.

With the above as a caveat, my solution uses the AdvancedFilter of the range object. It's about twice as fast as DeleteRowsWithValuesNewSheet().

Public Sub ExcelHero()     Dim t#, crit As Range, data As Range, ws As Worksheet     Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range     FastWB True     t = Timer          Set fc = ActiveSheet.UsedRange.Item(1)         Set lc = GetMaxCell         Set data = ActiveSheet.Range(fc, lc)         Set ws = Sheets.Add         With data             Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column))             Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column))             With fr2                 fr1.Copy                 .PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll                 .Item(1).Select             End With             Set crit = .Resize(2, 1).Offset(, lc.Column + 1)             crit = [{"Column 1";"Test String"}]             .AdvancedFilter xlFilterCopy, crit, fr2             .Worksheet.Delete         End With      FastWB False     r = ws.UsedRange.Rows.Count     Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds" End Sub


回答3:

On my elderly Dell Inspiron 1564 (Win 7 Office 2007) this:

Sub QuickAndEasy()     Dim rng As Range     Set rng = Range("AA2:AA1000001")     Range("AB1") = Now     Application.ScreenUpdating = False         With rng             .Formula = "=If(A2=""Test String"",0/0,A2)"             .Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete             .Clear         End With     Application.ScreenUpdating = True     Range("AC1") = Now End Sub

took about 10 seconds to run. I am assuming that column AA is available.

EDIT#1:

Please note that this code does not set Calculation to Manual. Performance will improve if the Calculation mode is set to Manual after the "helper" column is allowed to calculate.



回答4:

I know I'm incredibly late with my answer here however future visitors may find it very useful.

Please Note: My approach requires an index column for the rows to end up in the original order, however if you do not mind the rows being in a different order then an index column isn't needed and the additional line of code can be removed.

My approach: My approach was to simply select all the rows in the selected range (column), sort them in ascending order using Range.Sort and then collecting the first and last index of "Test String" within the selected range (column). I then create a range from the first and last indices and use Range.EntrieRow.Delete to remove all the rows which contain "Test String".

Pros:
- It is blazing fast.
- It doesn't remove formatting, formulas, charts, pictures or anything like the method which copies to a new sheet.

Cons:
- A decent size of code to implement however it is all straight-forward.

Test Range Generation Sub:

Sub DevelopTest()     Dim index As Long     FastWB True     ActiveSheet.UsedRange.Clear     For index = 1 To 1000000 '1 million test         ActiveSheet.Cells(index, 1).Value = index         If (index Mod 10) = 0 Then             ActiveSheet.Cells(index, 2).Value = "Test String"         Else             ActiveSheet.Cells(index, 2).Value = "Blah Blah Blah"         End If     Next index     Application.StatusBar = ""     FastWB False End Sub

Filter And Delete Rows Sub:

Sub DeleteRowFast()     Dim curWorksheet As Worksheet 'Current worksheet vairable      Dim rangeSelection As Range   'Selected range     Dim startBadVals As Long      'Start of the unwanted values     Dim endBadVals As Long        'End of the unwanted values     Dim strtTime As Double        'Timer variable     Dim lastRow As Long           'Last Row variable     Dim lastColumn As Long        'Last column variable     Dim indexCell As Range        'Index range start     Dim sortRange As Range        'The range which the sort is applied to     Dim currRow As Range          'Current Row index for the for loop     Dim cell As Range             'Current cell for use in the for loop      On Error GoTo Err         Set rangeSelection = Application.InputBox("Select the (N=) range to be checked", "Get Range", Type:=8)    'Get the desired range from the user         Err.Clear      M1 = MsgBox("This is recommended for large files (50,000 or more entries)", vbYesNo, "Enable Fast Workbook?") 'Prompt the user with an option to enable Fast Workbook, roughly 150% performace gains... Recommended for incredibly large files     Select Case M1         Case vbYes             FastWB True  'Enable fast workbook         Case vbNo             FastWB False 'Disable fast workbook     End Select      strtTime = Timer     'Begin the timer      Set curWorksheet = ActiveSheet     lastRow = CLng(rangeSelection.SpecialCells(xlCellTypeLastCell).Row)     lastColumn = curWorksheet.Cells(1, 16384).End(xlToLeft).Column      Set indexCell = curWorksheet.Cells(1, 1)      On Error Resume Next      If rangeSelection.Rows.Count > 1 Then 'Check if there is anything to do          lastVisRow = rangeSelection.Rows.Count          Set sortRange = curWorksheet.Range(indexCell, curWorksheet.Cells(curWorksheet.Rows(lastRow).Row, 16384).End(xlToLeft)) 'Set the sort range          sortRange.Sort Key1:=rangeSelection.Cells(1, 1), Order1:=xlAscending, Header:=xlNo 'Sort by values, lowest to highest          startBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, MatchCase:=False).Row         endBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, SearchDirection:=xlPrevious, MatchCase:=False).Row          curWorksheet.Range(curWorksheet.Rows(startBadVals), curWorksheet.Rows(endBadVals)).EntireRow.Delete 'Delete uneeded rows, deleteing in continuous range blocks is quick than seperated or individual deletions.          sortRange.Sort Key1:=indexCell, Order1:=xlAscending, Header:=xlNo 'Sort by index instead of values, lowest to highest     End If      Application.StatusBar = ""                    'Reset the status bar      FastWB False                                  'Disable fast workbook      MsgBox CStr(Round(Timer - strtTime, 2)) & "s" 'Display duration of task  Err:     Exit Sub  End Sub

THIS CODE USES FastWB, FastWS AND EnableWS BY Paul Bica!

Times at 100K entries (10k to be removed, FastWB True):
1. 0.2 seconds.
2. 0.2 seconds.
3. 0.21 seconds.
Avg. 0.2 seconds.

Times at 1 million entries (100k to be removed, FastWB True):
1. 2.3 seconds.
2. 2.32 seconds.
3. 2.3 seconds.
Avg. 2.31 seconds.

Running on: Windows 10, iMac i3 11,2 (From 2010)

EDIT
This code was originally designed with the purpose of filtering out numeric values outside of a numeric range and has been adapted to filter out "Test String" so some of the code may be redundant.



回答5:

Your use of arrays in calculating the used range and row count may effect the performance. Here's another approach which in testing proves efficient across 1m+ rows of data - between 25-30 seconds. It doesn't use filters so will delete rows even if hidden. Deleting a whole row won't effect formatting or column widths of the other remaining rows.

  1. First, check if the ActiveSheet has "Test String". Since you're only interested in Column 1 I used this:

    TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String") If TCount > 0 Then
  2. Instead of using your GetMaxCell() function I simply used Cells.SpecialCells(xlCellTypeLastCell).Row to get the last row:

    EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
  3. Then loop through the rows of data:

    While r 
  4. To test if the cell in Column 1 is equal to "Test String":

    If sht.Cells(r, 1).Text) = "Test String" Then
  5. To delete the row:

    Rows(r).Delete Shift:=xlUp

Putting it all together full code below. I've set ActiveSheet to a variable Sht and added turned of ScreenUpdating to improve efficiency. Since it's a lot of data I make sure to clear variables at the end.

Sub RowDeleter()     Dim sht As Worksheet     Dim r As Long     Dim EndRow As Long     Dim TCount As Long     Dim s As Date     Dim e As Date      Application.ScreenUpdating = True     r = 2       'Initialise row number     s = Now     'Start Time     Set sht = ActiveSheet     EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row      'Check if "Test String" is found in Column 1     TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")     If TCount > 0 Then          'loop through to the End row         While r  0 Then                 sht.Rows(r).Delete Shift:=xlUp                 r = r - 1             End If             r = r + 1         Wend     End If     e = Now  'End Time     D = (Hour(e) * 360 + Minute(e) * 60 + Second(e)) - (Hour(s) * 360 + Minute(s) * 60 + Second(s))     Application.ScreenUpdating = True     DurationTime = TimeSerial(0, 0, D)     MsgBox Format(DurationTime, "hh:mm:ss") End Sub


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!