What is the best way to remove from the array elements that are repeated. For example, from the array
a = [4, 3, 3, 1, 6, 6]
need to get
arr = [4, 3, 3, 1, 6, 6]
arr.
group_by {|e| e }.
map {|e, es| [e, es.length] }.
reject {|e, count| count > 1 }.
map(&:first)
# [4, 1]
a = [4, 3, 3, 1, 6, 6]
a.select{|b| a.count(b) == 1}
#=> [4, 1]
More complicated but faster solution (O(n)
I believe :))
a = [4, 3, 3, 1, 6, 6]
ar = []
add = proc{|to, form| to << from[1] if form.uniq.size == from.size }
a.sort!.each_cons(3){|b| add.call(ar, b)}
ar << a[0] if a[0] != a[1]; ar << a[-1] if a[-1] != a[-2]
I needed something like this, so tested a few different approaches. These all return an array of the items that are duplicated in the original array:
module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
def only_duplicates
duplicates = []
self.each {|each| duplicates << each if self.count(each) > 1}
duplicates.uniq
end
def dups_ej
inject(Hash.new(0)) {|h,v| h[v] += 1; h}.reject{|k,v| v==1}.keys
end
def dedup
duplicates = self.dup
self.uniq.each { |v| duplicates[self.index(v)] = nil }
duplicates.compact.uniq
end
end
Benchark results for 100,000 iterations, first with an array of integers, then an array of strings. Performance will vary depending on the numer of duplicates found, but these tests are with a fixed number of duplicates (~ half array entries are duplicates):
test_benchmark_integer
user system total real
Enumerable.dups 2.560000 0.040000 2.600000 ( 2.596083)
Enumerable.only_duplicates 6.840000 0.020000 6.860000 ( 6.879830)
Enumerable.dups_ej 2.300000 0.030000 2.330000 ( 2.329113)
Enumerable.dedup 1.700000 0.020000 1.720000 ( 1.724220)
test_benchmark_strings
user system total real
Enumerable.dups 4.650000 0.030000 4.680000 ( 4.722301)
Enumerable.only_duplicates 47.060000 0.150000 47.210000 ( 47.478509)
Enumerable.dups_ej 4.060000 0.030000 4.090000 ( 4.123402)
Enumerable.dedup 3.290000 0.040000 3.330000 ( 3.334401)
..
Finished in 73.190988 seconds.
So of these approaches, it seems the Enumerable.dedup algorithm is the best:
If only (array - array.uniq) worked correctly! (it doesn't - it removes everything)
Without introducing the need for a separate copy of the original array and using inject:
[4, 3, 3, 1, 6, 6].inject({}) {|s,v| s[v] ? s.merge({v=>s[v]+1}) : s.merge({v=>1})}.select {|k,v| k if v==1}.keys
=> [4, 1]
Here's my spin on a solution used by Perl programmers using a hash to accumulate counts for each element in the array:
ary = [4, 3, 3, 1, 6, 6]
ary.inject({}) { |h,a|
h[a] ||= 0
h[a] += 1
h
}.select{ |k,v| v == 1 }.keys # => [4, 1]
It could be on one line, if that's at all important, by judicious use of semicolons between the lines in the map
.
A little different way is:
ary.inject({}) { |h,a| h[a] ||= 0; h[a] += 1; h }.map{ |k,v| k if (v==1) }.compact # => [4, 1]
It replaces the select{...}.keys
with map{...}.compact
so it's not really an improvement, and, to me is a bit harder to understand.