F# How to Percentile Rank An Array of Doubles?

问题

I am trying to take a numeric array in F#, and rank all the elements so that ties get the same rank. Basically I'm trying to replicate the algorithm I have below in C#, but just for an array of doubles. Help?

rankMatchNum = 0; rankMatchSum = 0; previousScore = -999999999;

        for (int i = 0; i < factorStocks.Count; i++)
        {
            //The 1st time through it won't ever match the previous score...
            if (factorStocks[i].factors[factorName + "_R"] == previousScore)
            {
                rankMatchNum = rankMatchNum + 1;     //The count of matching ranks
                rankMatchSum = rankMatchSum + i + 1; //The rank itself...
                for (int j = 0; j <= rankMatchNum; j++)
                {
                    factorStocks[i - j].factors[factorName + "_WR"] = rankMatchSum / (rankMatchNum + 1);
                }
            }
            else
            {
                rankMatchNum = 0;
                rankMatchSum = i + 1;
                previousScore = factorStocks[i].factors[factorName + "_R"];
                factorStocks[i].factors[factorName + "_WR"] = i + 1;
            }
        }

回答1:

Here's how I would do it, although this isn't a direct translation of your code. I've done things in a functional style, piping results from one transformation to another.

let rank seq =
  seq
  |> Seq.countBy (fun x -> x)     // count repeated numbers
  |> Seq.sortBy (fun (k,v) -> k)  // order by key
  |> Seq.fold (fun (r,l) (_,n) -> // accumulate the number of items seen and the list of grouped average ranks
      let r'' = r + n             // get the rank after this group is processed
      let avg = List.averageBy float [r+1 .. r''] // average ranks for this group
      r'', ([for _ in 1 .. n -> avg]) :: l)       // add a list with avg repeated
      (0,[])                          // seed the fold with rank 0 and an empty list 
      |> snd                          // get the final list component, ignoring the component storing the final rank
      |> List.rev                     // reverse the list
      |> List.collect (fun l -> l)    // merge individual lists into final list

Or to copy Mehrdad's style:

let rank arr =
  let lt item = arr |> Seq.filter (fun x -> x < item) |> Seq.length
  let lte item = arr |> Seq.filter (fun x -> x <= item) |> Seq.length
  let avgR item = [(lt item) + 1 .. (lte item)] |> List.averageBy float
  Seq.map avgR arr

回答2:

I think that you'll probably find this problem far easier to solve in F# if you rewrite the above in a declarative manner rather than in an imperative manner. Here's my off-the-top-of-my-head approach to rewriting the above declaratively:

First we need a wrapper class to decorate our items with a property carrying the rank.

class Ranked<T> {
    public T Value { get; private set; }
    public double Rank { get; private set; }
    public Ranked(T value, double rank) {
        this.Value = value;
        this.Rank = rank;
    }
}

Here, then, is your algorithm in a declarative manner. Note that elements is your input sequence and the resulting sequence is in the same order as elements. The delegate func is the value that you want to rank elements by.

static class IEnumerableExtensions {
    public static IEnumerable<Ranked<T>> Rank<T, TRank>(
        this IEnumerable<T> elements,
        Func<T, TRank> func
    ) {
        var groups = elements.GroupBy(x => func(x));
        var ranks = groups.OrderBy(g => g.Key)
                          .Aggregate(
                              (IEnumerable<double>)new List<double>(),
                              (x, g) =>
                                  x.Concat(
                                      Enumerable.Repeat(
                                          Enumerable.Range(x.Count() + 1, g.Count()).Sum() / (double)g.Count(),
                                          g.Count()
                                      )
                                  )
                    )
                    .GroupBy(r => r)
                    .Select(r => r.Key)
                    .ToArray();

        var dict = groups.Select((g, i) => new { g.Key, Index = i })
                         .ToDictionary(x => x.Key, x => ranks[x.Index]);

        foreach (T element in elements) {
            yield return new Ranked<T>(element, dict[func(element)]);
        }        
    }
}

Usage:

class MyClass {
    public double Score { get; private set; }
    public MyClass(double score) { this.Score = score; }
}

List<MyClass> list = new List<MyClass>() {
    new MyClass(1.414),
    new MyClass(2.718),
    new MyClass(2.718),
    new MyClass(2.718),
    new MyClass(1.414),
    new MyClass(3.141),
    new MyClass(3.141),
    new MyClass(3.141),
    new MyClass(1.618)
};
foreach(var item in list.Rank(x => x.Score)) {
    Console.WriteLine("Score = {0}, Rank = {1}", item.Value.Score, item.Rank);
}

Output:

Score = 1.414, Rank = 1.5
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 1.414, Rank = 1.5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 1.618, Rank = 8

Note that I do not require the input sequence to be ordered. The resulting code is simpler if you enforce such a requirement on the input sequence. Note further that we do not mutate the input sequence, nor do we mutate the input items. This makes F# happy.

From here you should be able to rewrite this in F# easily.

回答3:

This is not a very efficient algorithm (O(n²)), but it's quite short and readable:

let percentile arr =
   let rank item = ((arr |> Seq.filter (fun i -> i < item) 
                         |> Seq.length |> float) + 1.0) 
                   / float (Array.length arr) * 100.0
   Array.map rank arr

You might mess with the expression fun i -> i < e (or the + 1.0 expression) to achieve your desired way of ranking results:

let arr = [|1.0;2.0;2.0;4.0;3.0;3.0|]
percentile arr |> print_any;;

[|16.66666667; 33.33333333; 33.33333333; 100.0; 66.66666667; 66.66666667|]

回答4:

Mehrdad's solution is very nice but a bit slow for my purposes. The initial sorting can be done 1 time. Rather than traversing the lists each time to get the number of items < or <= the target, we can use counters. This is more imperative (could have used a fold):

let GetRanks2 ( arr ) =
    let tupleList = arr |> Seq.countBy( fun x -> x ) |> Seq.sortBy( fun (x,count) -> x )
    let map = new System.Collections.Generic.Dictionary<int,float>()
    let mutable index = 1
    for (item, count) in tupleList do
        let c = count
        let avgRank = 
            let mutable s = 0
            for i = index to index + c - 1 do 
                s <- s + i
            float s / float c
        map.Add( item, avgRank )
        index <- index + c
    //
    map

来源：https://stackoverflow.com/questions/2239778/f-how-to-percentile-rank-an-array-of-doubles

标签