Java - What is the best way to find first duplicate character in a string

前端未结

关注

 7  1400

I have written below code for detecting first duplicate character in a string.

public static int detectDuplicate(String source) {
    boolean found = false;


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2020-12-19 09:33
              
            
            
                                                                       
O(1) Algorithm

Your solution is O(n^2) because of the two nested loops. 

The fastest algorithm to do this is O(1) (constant time):

public static int detectDuplicate(String source) {
    boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
    for(int i = 0; i < source.length(); i++) {
        if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
        char currentChar = source.charAt(i);
        if(foundChars[currentChar]) return i;
        foundChars[currentChar] = true;
    }
    return -1;
}


However, this is only fast in terms of big oh. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2020-12-19 09:34
              
            
            
                                                                       
As mentioned by others, your algorithm is O(n^2). Here is an O(N) algorithm, because HashSet#add runs in constant time ( the hash function disperses the elements properly among the buckets) - Note that I originally size the hashset to the maximum size to avoid resizing/rehashing:

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}


Note: this returns the index of the first duplicate (i.e. the index of the first character that is a duplicate of a previous character). To return the index of the first appearance of that character, you would need to store the indices in a Map<Character, Integer> (Map#put is also O(1) in this case):

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-12-19 09:35
              
            
            
                                                                       
This is O(n**2), not O(n). Consider the case abcdefghijklmnopqrstuvwxyzz. outerIndex will range from 0 to 25 before the procedure terminates, and each time it increments, innerIndex will have ranged from outerIndex to 26.

To get to O(n), you need to make a single pass over the list, and to do O(1) work at each position. Since the job to do at each position is to check if the character has been seen before (and if so, where), that means you need an O(1) map implementation. A hashtable gives you that; so does an array, indexed by the character code.

assylias shows how to do it with hashing, so here's how to do it with an array (just for laughs, really):

public static int detectDuplicate(String source) {
    int[] firstOccurrence = new int[1 << Character.SIZE];
    Arrays.fill(firstOccurrence, -1);
    for (int i = 0; i < source.length(); i++) {
        char ch = source.charAt(i);
        if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
        else firstOccurrence[ch] = i;
    }
    return -1;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-12-19 09:42
              
            
            
                                                                       
You could try with:   

 public static char firstRecurringChar(String s)
    {
    char x=' ';
    System.out.println("STRING : "+s);
    for(int i =0;i<s.length();i++)
    {
        System.out.println("CHAR AT "+i+" = " +s.charAt(i));
        System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
        if(s.lastIndexOf(s.charAt(i)) >i){
            x=s.charAt(i);
            break;
        }
    }
    return x;
    } 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-12-19 09:47
              
            
            
                                                                       
I can substantially improve your algorithm. It should be done like this:

StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
    if( source.charAt(i) == source.charAt(i-1) ) break;
    i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;


For a large string this algorithm is probably twice as fast as yours.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-19 09:48
              
            
            
                                                                       
The complexity is roughly O(M^2), where M is the minimum between the length of the string and the size of the set of possible characters K.

You can get it down to O(M) with O(K) memory by simply memorizing the position where you first encounter every unique character.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复