Find the first un-repeated character in a string

后端 未结 30 1273
猫巷女王i
猫巷女王i 2020-11-27 18:29

What is the quickest way to find the first character which only appears once in a string?

相关标签:
30条回答
  • 2020-11-27 19:09

    I have two strings i.e. 'unique' and 'repeated'. Every character appearing for the first time, gets added to 'unique'. If it is repeated for the second time, it gets removed from 'unique' and added to 'repeated'. This way, we will always have a string of unique characters in 'unique'. Complexity big O(n)

    public void firstUniqueChar(String str){
        String unique= "";
        String repeated = "";
        str = str.toLowerCase();
        for(int i=0; i<str.length();i++){
            char ch = str.charAt(i);
            if(!(repeated.contains(str.subSequence(i, i+1))))
                if(unique.contains(str.subSequence(i, i+1))){
                    unique = unique.replaceAll(Character.toString(ch), "");
                    repeated = repeated+ch;
                }
                else
                    unique = unique+ch;
        }
        System.out.println(unique.charAt(0));
    }
    
    0 讨论(0)
  • 2020-11-27 19:12

    I see that people have posted some delightful answers below, so I'd like to offer something more in-depth.

    An idiomatic solution in Ruby

    We can find the first un-repeated character in a string like so:

    def first_unrepeated_char string
       string.each_char.tally.find { |_, n| n == 1 }.first
    end
    

    How does Ruby accomplish this?

    Reading Ruby's source

    Let's break down the solution and consider what algorithms Ruby uses for each step.

    First we call each_char on the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could use each_byte instead.

    The each_char method is implemented like so:

    rb_str_each_char(VALUE str)
    {
        RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
        return rb_str_enumerate_chars(str, 0);
    }
    

    In turn, rb_string_enumerate_chars is implemented as:

    rb_str_enumerate_chars(VALUE str, VALUE ary)
    {
        VALUE orig = str;
        long i, len, n;
        const char *ptr;
        rb_encoding *enc;
    
    
        str = rb_str_new_frozen(str);
        ptr = RSTRING_PTR(str);
        len = RSTRING_LEN(str);
        enc = rb_enc_get(str);
    
    
        if (ENC_CODERANGE_CLEAN_P(ENC_CODERANGE(str))) {
        for (i = 0; i < len; i += n) {
            n = rb_enc_fast_mbclen(ptr + i, ptr + len, enc);
            ENUM_ELEM(ary, rb_str_subseq(str, i, n));
        }
        }
        else {
        for (i = 0; i < len; i += n) {
            n = rb_enc_mbclen(ptr + i, ptr + len, enc);
            ENUM_ELEM(ary, rb_str_subseq(str, i, n));
        }
        }
        RB_GC_GUARD(str);
        if (ary)
        return ary;
        else
        return orig;
    }
    

    From this we can see that it calls rb_enc_mbclen (or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string as tally consumes the iterator.

    Tally is then implemented like so:

    static void
    tally_up(VALUE hash, VALUE group)
    {
        VALUE tally = rb_hash_aref(hash, group);
        if (NIL_P(tally)) {
            tally = INT2FIX(1);
        }
        else if (FIXNUM_P(tally) && tally < INT2FIX(FIXNUM_MAX)) {
            tally += INT2FIX(1) & ~FIXNUM_FLAG;
        }
        else {
            tally = rb_big_plus(tally, INT2FIX(1));
        }
        rb_hash_aset(hash, group, tally);
    }
    
    
    static VALUE
    tally_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, hash))
    {
        ENUM_WANT_SVALUE();
        tally_up(hash, i);
        return Qnil;
    }
    

    Here, tally_i uses RB_BLOCK_CALL_FUNC_ARGLIST to call repeatedly to tally_up, which updates the tally hash on every iteration.

    Rough time & memory analysis

    The each_char method doesn't allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When we tally the characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.

    Time-wise, tally does a full scan of the string, and calling find to locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.

    However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).

    However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).


    My old accepted answer in Python

    You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:

    def first_non_repeated_character(string):
      chars = []
      repeated = []
      for character in string:
        if character in chars:
          chars.remove(character)
          repeated.append(character)
        else:
          if not character in repeated:
            chars.append(character)
      if len(chars):
        return chars[0]
      else:
        return False
    

    Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computer™.

    0 讨论(0)
  • 2020-11-27 19:12

    The following is a Ruby implementation of finding the first nonrepeated character of a string:

    def first_non_repeated_character(string)
      string1 = string.split('')
      string2 = string.split('')
    
      string1.each do |let1|
        counter = 0
        string2.each do |let2|
          if let1 == let2
            counter+=1
          end
        end
      if counter == 1 
        return let1
        break
      end
    end
    end
    
    p first_non_repeated_character('dont doddle in the forest')
    

    And here is a JavaScript implementation of the same style function:

    var first_non_repeated_character = function (string) {
      var string1 = string.split('');
      var string2 = string.split('');
    
      var single_letters = [];
    
      for (var i = 0; i < string1.length; i++) {
        var count = 0;
        for (var x = 0; x < string2.length; x++) {
          if (string1[i] == string2[x]) {
            count++
          }
        }
        if (count == 1) {
          return string1[i];
        }
      }
    }
    
    console.log(first_non_repeated_character('dont doddle in the forest'));
    console.log(first_non_repeated_character('how are you today really?'));
    

    In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.

    0 讨论(0)
  • 2020-11-27 19:12

    I read through the answers, but did not see any like mine, I think this answer is very simple and fast, am I wrong?

    def first_unique(s):
        repeated = []
    
        while s:
            if s[0] not in s[1:] and s[0] not in repeated:
                return s[0]
            else:
                repeated.append(s[0])
                s = s[1:]
        return None
    

    test

    (first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')
    
    0 讨论(0)
  • 2020-11-27 19:13

    Counter requires Python2.7 or Python3.1

    >>> from collections import Counter
    >>> def first_non_repeated_character(s):
    ...     counts = Counter(s)
    ...     for c in s:
    ...         if counts[c]==1:
    ...             return c
    ...     return None
    ... 
    >>> first_non_repeated_character("aaabbbcffffd")
    'c'
    >>> first_non_repeated_character("aaaebbbcffffd")
    'e'
    
    0 讨论(0)
  • I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).

    unsigned char find_first_unique(unsigned char *string)
    {
        int chars[256];
        int i=0;
        memset(chars, 0, sizeof(chars));
    
        while (string[i++])
        {
            chars[string[i]]++;
        }
    
        i = 0;
        while (string[i++])
        {
            if (chars[string[i]] == 1) return string[i];
        }
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题