Before anyone questions the fact of using string.intern()
at all, let me say that I need it in my particular application for memory and performance reasons.
Most likely reason for the performance difference: String.intern()
is a native method, and calling a native method incurs massive overhead.
So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.
This article discusses the implementation of String.intern()
. In Java 6 and 7, the implementation used a fixed size (1009) hashtable so as the number entries grew, the performance became O(n). The fixed size can be changed using -XX:StringTableSize=N
. Apparently, in Java8 the default size is larger but issue remains.
I can't speak from any great experience with it, but from the String docs:
"When the intern method is invoked, if the pool already contains a string equal to this String
object as determined by the {@link #equals(Object)} method, then the string from the pool returned. Otherwise, this String
object is added to the pool and a reference to this String
object is returned."
When dealing with large numbers of objects, any solution involving hashing will outperform one that doesn't. I think you're just seeing the result of misusing a Java language feature. Interning isn't there to act as a Map of strings for your use. You should use a Map for that (or Set, as appropriate). The String table is for optimization at the language level, not the app level.
@Michael Borgwardt said this in a comment:
intern() is not synchronized, at least at the Java language level.
I think that you mean that the String.intern()
method is not declared as synchronized
in the sourcecode of the String class. And indeed, that is a true statement.
However:
Declaring intern()
as synchronized
would only lock the current String instance, because it is an instance method, not a static method. So they couldn't implement string pool synchronization that way.
If you step back and think about it, the string pool has to perform some kind of internal synchronization. If it didn't it would be unusable in a multi-threaded application, because there is simply no practical way for all code that uses the intern()
method to do external synchronization.
So, the internal synchronization that the string pool performs could be a bottleneck in multi-threaded application that uses intern()
heavily.