I wrote a test that attempts to test two things:
Firstly ...
Are ArrayLists more than twice as slow as arrays?
As a generalization, no. For operations that potentially involve "changing" the length of the list / array, an ArrayList
will be faster than an array ... unless you use a separate variable to represent the array's logical size.
For other operations, the ArrayList
is likely to be slower, though the performance ratio will most likely depend on the operation and the JVM implementation. Also note that you have only tested one operation / pattern.
Why is ArrayList so much slower?
Because an ArrayList has a distinct array object inside of it.
Operations typically involve extra indirections (e.g. to fetch the list's size and inner array) and there are extra bounds checks (e.g. checking the list's size
and the array's length). A typical JIT compiler is (apparently) not able to optimize these away. (And in fact, you would NOT want to optimize away the inner array because that's what allows an ArrayList to grow.)
For array's of primitives, the corresponding list types involve wrapped primitive types / objects, and that adds an overhead. For example your result += ...
involves unboxing, in the "list" cases.
Is my benchmark written well? In other words, are my results accurate?
There's nothing wrong with it technically. But it is not sufficient to demonstrate your point. For a start, you are only measuring one kind of operation: array element fetching and its equivalent. And you are only measuring for primitive types.
Finally, this largely misses the point of using List
types. We use them because they are almost always easier to use than plain arrays. A difference in performance of (say) 2 is typically not important to overall application performance.
If you store millions of objects, then the Add or Contains functions will be super slow. Best way is to split it using hashMap of Arrays. Although similar algorithms can be used for other types of objects, this is how I improved 1000 times faster the processing of 10 million strings (the memory taken is 2-3 times more)
public static class ArrayHashList {
private String temp1, temp2;
HashMap allKeys = new HashMap();
ArrayList curKeys;
private int keySize;
public ArrayHashList(int keySize) {
this.keySize = keySize;
}
public ArrayHashList(int keySize, String fromFileName) {
this.keySize = keySize;
String line;
try{
BufferedReader br1 = new BufferedReader(new FileReader(fromFileName));
while ((line = br1.readLine()) != null)
addString(line);
br1.close();
}catch(Exception e){
e.printStackTrace();
}
}
public boolean addString(String strToAdd) {
if (strToAdd.length()<keySize)
temp1 = strToAdd;
else
temp1 = strToAdd.substring(0,keySize);
if (!allKeys.containsKey(temp1))
allKeys.put(temp1,new ArrayList());
curKeys = (ArrayList)allKeys.get(temp1);
if (!curKeys.contains(strToAdd)){
curKeys.add(strToAdd);
return true;
}
return false;
}
public boolean haveString(String strCheck) {
if (strCheck.length()<keySize)
temp1 = strCheck;
else
temp1 = strCheck.substring(0,keySize);
if (!allKeys.containsKey(temp1))
allKeys.put(temp1,new ArrayList());
curKeys = (ArrayList)allKeys.get(temp1);
return curKeys.contains(strCheck);
}
}
to init and use it:
ArrayHashList fullHlist = new ArrayHashList(3, filesPath+"\\allPhrases.txt");
ArrayList pendingList = new ArrayList();
BufferedReader br1 = new BufferedReader(new FileReader(filesPath + "\\processedPhrases.txt"));
while ((line = br1.readLine()) != null) {
wordEnc = StrUtil.GetFirstToken(line,",~~~,");
if (!fullHlist.haveString(wordEnc))
pendingList.add(wordEnc);
}
br1.close();
Keep in mind that in using ArrayList, you are actually calling a function, which in the case of get()
actually makes two other function calls. (One of which is a range check, which I suspect may be part of the delay).
The important thing with ArrayList, is not so much how much faster or slower it is compared with straight arrays, but that it's access time always constant (like arrays). In the real world, you'll almost always find that the added delay is negligible. Especially if you have an application that even thinks about connecting to a database. :)
In short, I think your test (and the results) are legit.
These results don't surprise me. List.get(int)
involves a cast, which is slow. Java's generics are implemented via type erasure, meaning that any type of List<T>
is actually a List<Object>
, and the only reason you get your type out is because of a cast. The source for ArrayList
looks like this:
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
// snip...
private void rangeCheck(int index) {
if (index >= size)
throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}
// snip...
@SuppressWarnings("unchecked")
E elementData(int index) {
return (E) elementData[index];
}
The rangeCheck
and the overhead of the function calls are trivial, it's that cast to E
that kills you.