Interleave an array of email addresses avoiding items with same domain to be consecutive

前提是你 提交于 2019-12-22 10:33:55

问题


I'm looking for an efficient way of sorting an array of email addresses to avoid items with the same domain to be consecutive, in C#.

Email addresses inside the array are already distinct and all of them are lower case.

Example:

Given an array with the following entries:

john.doe@domain1.com
jane_doe@domain1.com
patricksmith@domain2.com
erick.brown@domain3.com

I would like to obtain something similar to the following:

john.doe@domain1.com    
patricksmith@domain2.com
jane_doe@domain1.com
erick.brown@domain3.com

回答1:


With the help of an extension method (stolen from https://stackoverflow.com/a/27533369/172769), you can go like this:

List<string> emails = new List<string>();
emails.Add("john.doe@domain1.com");
emails.Add("jane_doe@domain1.com");
emails.Add("patricksmith@domain2.com");
emails.Add("erick.brown@domain3.com");

var q = emails.GroupBy(m => m.Split('@')[1]).Select(g => new List<string>(g)).Interleave();

The Interleave method is defined as:

public static IEnumerable<T> Interleave<T>(this IEnumerable<IEnumerable<T>> source )
{
    var queues = source.Select(x => new Queue<T>(x)).ToList();
    while (queues.Any(x => x.Any())) {
        foreach (var queue in queues.Where(x => x.Any())) {
            yield return queue.Dequeue();
        }
    }
}

So basically, we create groups based on the domain part of the email adresses, project (or Select) each group into a List<string>, and then "Interleave" those lists.

I have tested against your sample data, but more thorough testing might be needed to find edge cases.

DotNetFiddle snippet

Cheers




回答2:


This will distribute them semi-evenly and attempt to avoid matching domains next to each other (although in certain lists that may be impossible). This answer will use OOP and Linq.

DotNetFiddle.Net Example

using System;
using System.Linq;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        var seed = new List<string>()
        {
            "1@a.com",
            "2@a.com",
            "3@a.com",
            "4@a.com",
            "5@a.com",
            "6@a.com",
            "7@a.com",
            "8@a.com",
            "9@a.com",
            "10@a.com",

            "1@b.com",
            "2@b.com",
            "3@b.com",

            "1@c.com",
            "4@b.com",
            "2@c.com",
            "3@c.com",
            "4@c.com"
        };

        var work = seed
            // Create a list of EmailAddress objects
            .Select(s => new EmailAddress(s)) // s.ToLowerCase() ?
            // Group the list by Domain
            .GroupBy(s => s.Domain)
            // Create a List<EmailAddressGroup>
            .Select(g => new EmailAddressGroup(g))
            .ToList();

        var currentDomain = string.Empty;
        while(work.Count > 0)
        {
            // this list should not be the same domain we just used
            var noDups = work.Where(w => w.Domain != currentDomain);
            // if none exist we are done, or it can't be solved
            if (noDups.Count() == 0)
            {
                break;
            }
            // find the first group with the most items
            var workGroup = noDups.First(w => w.Count() == noDups.Max(g => g.Count()));
            // get the email address and remove it from the group list
            var workItem = workGroup.Remove();

            // if the group is empty remove it from *work*
            if (workGroup.Count() == 0)
            {
                work.Remove(workGroup);
                Console.WriteLine("removed: " + workGroup.Domain);
            }

            Console.WriteLine(workItem.FullEmail);

            // last domain looked at.
            currentDomain = workItem.Domain;
        }

        Console.WriteLine("Cannot disperse email addresses affectively, left overs:");

        foreach(var workGroup in work)
        {
            while(workGroup.Count() > 0)
            {
                var item = workGroup.Remove();
                Console.WriteLine(item.FullEmail);
            }
        }


    }

    public class EmailAddress
    {
        public EmailAddress(string emailAddress)
        {
            // Additional Email Address Validation

            var result = emailAddress.Split(new char[] {'@'}, StringSplitOptions.RemoveEmptyEntries)
                .ToList();

            if (result.Count() != 2)
            {
                new ArgumentException("emailAddress");
            }

            this.FullEmail = emailAddress;
            this.Name = result[0];
            this.Domain = result[1];
        }

        public string Name { get; private set; }
        public string Domain { get; private set; }
        public string FullEmail { get; private set; }
    }

    public class EmailAddressGroup
    {
        private List<EmailAddress> _emails;

        public EmailAddressGroup(IEnumerable<EmailAddress> emails)
        {
            this._emails = emails.ToList();
            this.Domain = emails.First().Domain;
        }

        public int Count()
        {
            return _emails.Count();
        }

        public string Domain { get; private set; }

        public EmailAddress Remove()
        {
            var result = _emails.First();
            _emails.Remove(result);
            return result;
        }
    }
}

Output:

1@a.com

1@b.com

2@a.com

1@c.com

3@a.com

2@b.com

4@a.com

2@c.com

5@a.com

3@b.com

6@a.com

3@c.com

7@a.com

removed: b.com

4@b.com

8@a.com

removed: c.com

4@c.com

9@a.com

Cannot disperse email addresses affectively, left overs:

10@a.com




回答3:


Something like this will spread them equally, but you will have the problems (=consecutive elements) at the end of the new list...

        var list = new List<string>();
        list.Add("john.doe@domain1.com");
        list.Add("jane_doe@domain1.com");
        list.Add("patricksmith@domain2.com");
        list.Add("erick.brown@domain3.com");

        var x = list.GroupBy(content => content.Split('@')[1]);

        var newlist = new List<string>();
        bool addedSomething=true;
        int i = 0;
        while (addedSomething) {
            addedSomething = false;
            foreach (var grp in x) {
                if (grp.Count() > i) {
                    newlist.Add(grp.ElementAt(i));
                    addedSomething = true;
                }
            }
            i++;
        }



回答4:


Edit: Added a high level description :)

What this code does is group each element by the domain, sort the groups by size in descending order (largest group first), project the elements of each group into a stack, and pop them off of each stack (always pop the next element off the largest stack with a different domain). If there is only a single stack left, then its contents are yielded.

This should make sure that all domains distributed as evenly as possible.

MaxBy extension method from: https://stackoverflow.com/a/31560586/969962

private IEnumerable<string> GetNonConsecutiveEmails(List<string> list)
{
    var emailAddresses = list.Distinct().Select(email => new EmailAddress { Email = email, Domain = email.Split('@')[1]}).ToArray();
    var groups = emailAddresses
                .GroupBy(addr => addr.Domain)
                .Select (group => new { Domain = group.Key, EmailAddresses = new Stack<EmailAddress>(group)})
                .ToList();

    EmailAddress lastEmail = null;
    while(groups.Any(g => g.EmailAddresses.Any()))
    {
        // Try and pick from the largest stack.
        var stack = groups
            .Where(g => (g.EmailAddresses.Any()) && (lastEmail == null ? true : lastEmail.Domain != g.Domain))
            .MaxBy(g => g.EmailAddresses.Count);

        // Null check to account for only 1 stack being left.
        // If so, pop the elements off the remaining stack.   
        lastEmail = (stack ?? groups.First(g => g.EmailAddresses.Any())).EmailAddresses.Pop();
        yield return lastEmail.Email;
    }
}

class EmailAddress
{
    public string Domain;
    public string Email;
}

public static class Extensions
{
    public static T MaxBy<T,U>(this IEnumerable<T> data, Func<T,U> f) where U:IComparable
    {
        return data.Aggregate((i1, i2) => f(i1).CompareTo(f(i2))>0 ? i1 : i2);
    }
}



回答5:


What I am trying to do here is to sort them first. Then I re-arrange from a different end. I'm sure there're more efficient ways to do this but this is one easy way to do it.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication4
{
    class Program
    {
        static void Main(string[] args)
        {
            String[] emails = { "john.doe@domain1.com", "jane_doe@domain1.com", "patricksmith@domain2.com", "erick.brown@domain3.com" };
            var result = process(emails);
        }

        static String[] process(String[] emails)
        {
            String[] result = new String[emails.Length];
            var comparer = new DomainComparer();
            Array.Sort(emails, comparer);
            for (int i = 0, j = emails.Length - 1, k = 0; i < j; i++, j--, k += 2)
            {
                if (i == j)
                    result[k] = emails[i];
                else
                {
                    result[k] = emails[i];
                    result[k + 1] = emails[j];
                }
            }

            return result;
        }
    }

    public class DomainComparer : IComparer<string>
    {
        public int Compare(string left, string right)
        {
            int at_pos = left.IndexOf('@');
            var left_domain = left.Substring(at_pos, left.Length - at_pos);
            at_pos = right.IndexOf('@');
            var right_domain = right.Substring(at_pos, right.Length - at_pos);
            return String.Compare(left_domain, right_domain);
        }
    }
}


来源:https://stackoverflow.com/questions/33849021/interleave-an-array-of-email-addresses-avoiding-items-with-same-domain-to-be-con

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!