Find the first un-repeated character in a string

后端未结

关注

 30  1397

猫巷女王i

What is the quickest way to find the first character which only appears once in a string?

相关标签:

30条回答

无人及你

2020-11-27 19:09

I have two strings i.e. 'unique' and 'repeated'. Every character appearing for the first time, gets added to 'unique'. If it is repeated for the second time, it gets removed from 'unique' and added to 'repeated'. This way, we will always have a string of unique characters in 'unique'. Complexity big O(n)

public void firstUniqueChar(String str){
    String unique= "";
    String repeated = "";
    str = str.toLowerCase();
    for(int i=0; i<str.length();i++){
        char ch = str.charAt(i);
        if(!(repeated.contains(str.subSequence(i, i+1))))
            if(unique.contains(str.subSequence(i, i+1))){
                unique = unique.replaceAll(Character.toString(ch), "");
                repeated = repeated+ch;
            }
            else
                unique = unique+ch;
    }
    System.out.println(unique.charAt(0));
}

0 讨论(0)

故里飘歌

2020-11-27 19:12
I see that people have posted some delightful answers below, so I'd like to offer something more in-depth.

An idiomatic solution in Ruby

We can find the first un-repeated character in a string like so:
```
def first_unrepeated_char string
   string.each_char.tally.find { |_, n| n == 1 }.first
end
```
How does Ruby accomplish this?

Reading Ruby's source

Let's break down the solution and consider what algorithms Ruby uses for each step.

First we call each_char on the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could use each_byte instead.

The each_char method is implemented like so:
```
rb_str_each_char(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_chars(str, 0);
}
```
In turn, rb_string_enumerate_chars is implemented as:
```
rb_str_enumerate_chars(VALUE str, VALUE ary)
{
    VALUE orig = str;
    long i, len, n;
    const char *ptr;
    rb_encoding *enc;


    str = rb_str_new_frozen(str);
    ptr = RSTRING_PTR(str);
    len = RSTRING_LEN(str);
    enc = rb_enc_get(str);


    if (ENC_CODERANGE_CLEAN_P(ENC_CODERANGE(str))) {
    for (i = 0; i < len; i += n) {
        n = rb_enc_fast_mbclen(ptr + i, ptr + len, enc);
        ENUM_ELEM(ary, rb_str_subseq(str, i, n));
    }
    }
    else {
    for (i = 0; i < len; i += n) {
        n = rb_enc_mbclen(ptr + i, ptr + len, enc);
        ENUM_ELEM(ary, rb_str_subseq(str, i, n));
    }
    }
    RB_GC_GUARD(str);
    if (ary)
    return ary;
    else
    return orig;
}
```
From this we can see that it calls rb_enc_mbclen (or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string as tally consumes the iterator.

Tally is then implemented like so:
```
static void
tally_up(VALUE hash, VALUE group)
{
    VALUE tally = rb_hash_aref(hash, group);
    if (NIL_P(tally)) {
        tally = INT2FIX(1);
    }
    else if (FIXNUM_P(tally) && tally < INT2FIX(FIXNUM_MAX)) {
        tally += INT2FIX(1) & ~FIXNUM_FLAG;
    }
    else {
        tally = rb_big_plus(tally, INT2FIX(1));
    }
    rb_hash_aset(hash, group, tally);
}


static VALUE
tally_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, hash))
{
    ENUM_WANT_SVALUE();
    tally_up(hash, i);
    return Qnil;
}
```
Here, tally_i uses RB_BLOCK_CALL_FUNC_ARGLIST to call repeatedly to tally_up, which updates the tally hash on every iteration.

Rough time & memory analysis

The each_char method doesn't allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When we tally the characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.

Time-wise, tally does a full scan of the string, and calling find to locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.

However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).

However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).

My old accepted answer in Python

You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:
```
def first_non_repeated_character(string):
  chars = []
  repeated = []
  for character in string:
    if character in chars:
      chars.remove(character)
      repeated.append(character)
    else:
      if not character in repeated:
        chars.append(character)
  if len(chars):
    return chars[0]
  else:
    return False
```
Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computer™.
0 讨论(0)
发布评论:

提交评论
- 加载中...

逝去的感伤

2020-11-27 19:12

The following is a Ruby implementation of finding the first nonrepeated character of a string:

def first_non_repeated_character(string)
  string1 = string.split('')
  string2 = string.split('')

  string1.each do |let1|
    counter = 0
    string2.each do |let2|
      if let1 == let2
        counter+=1
      end
    end
  if counter == 1 
    return let1
    break
  end
end
end

p first_non_repeated_character('dont doddle in the forest')

And here is a JavaScript implementation of the same style function:

var first_non_repeated_character = function (string) {
  var string1 = string.split('');
  var string2 = string.split('');

  var single_letters = [];

  for (var i = 0; i < string1.length; i++) {
    var count = 0;
    for (var x = 0; x < string2.length; x++) {
      if (string1[i] == string2[x]) {
        count++
      }
    }
    if (count == 1) {
      return string1[i];
    }
  }
}

console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));

In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.

0 讨论(0)

一整个雨季

2020-11-27 19:12

I read through the answers, but did not see any like mine, I think this answer is very simple and fast, am I wrong?

def first_unique(s):
    repeated = []

    while s:
        if s[0] not in s[1:] and s[0] not in repeated:
            return s[0]
        else:
            repeated.append(s[0])
            s = s[1:]
    return None

test

(first_unique('abdcab') == 'd', first_unique('aabbccdad') == None, first_unique('') == None, first_unique('a') == 'a')

0 讨论(0)

闹比i

2020-11-27 19:13

Counter requires Python2.7 or Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     counts = Counter(s)
...     for c in s:
...         if counts[c]==1:
...             return c
...     return None
... 
>>> first_non_repeated_character("aaabbbcffffd")
'c'
>>> first_non_repeated_character("aaaebbbcffffd")
'e'

0 讨论(0)

不要未来只要你来

2020-11-27 19:13
I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).
```
unsigned char find_first_unique(unsigned char *string)
{
    int chars[256];
    int i=0;
    memset(chars, 0, sizeof(chars));

    while (string[i++])
    {
        chars[string[i]]++;
    }

    i = 0;
    while (string[i++])
    {
        if (chars[string[i]] == 1) return string[i];
    }
    return 0;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Find the first un-repeated character in a string

An idiomatic solution in Ruby

Reading Ruby's source

Rough time & memory analysis

My old accepted answer in Python

test