In Java, using the following function for a huge matrix X to print its column-distinct elements:
// create the list of distinct values
List va
What would be much more efficient would be to use a Set instead of a list, for example the HashSet implementation. The contains method will run in O(1) instead of O(n) with a list. And you could save one call by only calling the add method.
As for your specific question, I would just create a new Set at each loop - object creation is not that expensive, probably less than clearing the set (as confirmed by the benchmark at the bottom - see the most efficient version in EDIT 2):
for (int j = 0, x; j < m; j++) {
Set values = new HashSet();
for (int i = 0; i < n; i++) {
x = X[i][j];
if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
System.out.println(x);
}
}
However, the only way to know which is quicker (new object vs. clear) is to profile that portion of your code and check the performance of both versions.
EDIT
I ran a quick benchmark and the clear version seems a little faster than creating a set at each loop (by about 20%). You should still check on your dataset / use case which one is better. Faster code with my dataset:
Set values = new HashSet();
for (int j = 0, x; j < m; j++) {
for (int i = 0; i < n; i++) {
x = X[i][j];
if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
System.out.println(x);
}
values.clear();
}
EDIT 2
An actually even faster version of the code is obtained by creating a new set of the right size at each loop:
for (int j = 0, x; j < m; j++) {
Set values = new HashSet(n, 1); //right size from the beginning
for (int i = 0; i < n; i++) {
x = X[i][j];
if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
System.out.println(x);
}
}
Summary of result
After JVM warm up + JIT:
Set values = new HashSet(n, 1); =====> 280 ms
values.clear(); =====> 380 ms
Set values = new HashSet(); =====> 450 ms