问题
I am trying to take a numeric array in F#, and rank all the elements so that ties get the same rank. Basically I'm trying to replicate the algorithm I have below in C#, but just for an array of doubles. Help?
rankMatchNum = 0; rankMatchSum = 0; previousScore = -999999999;
for (int i = 0; i < factorStocks.Count; i++)
{
//The 1st time through it won't ever match the previous score...
if (factorStocks[i].factors[factorName + "_R"] == previousScore)
{
rankMatchNum = rankMatchNum + 1; //The count of matching ranks
rankMatchSum = rankMatchSum + i + 1; //The rank itself...
for (int j = 0; j <= rankMatchNum; j++)
{
factorStocks[i - j].factors[factorName + "_WR"] = rankMatchSum / (rankMatchNum + 1);
}
}
else
{
rankMatchNum = 0;
rankMatchSum = i + 1;
previousScore = factorStocks[i].factors[factorName + "_R"];
factorStocks[i].factors[factorName + "_WR"] = i + 1;
}
}
回答1:
Here's how I would do it, although this isn't a direct translation of your code. I've done things in a functional style, piping results from one transformation to another.
let rank seq =
seq
|> Seq.countBy (fun x -> x) // count repeated numbers
|> Seq.sortBy (fun (k,v) -> k) // order by key
|> Seq.fold (fun (r,l) (_,n) -> // accumulate the number of items seen and the list of grouped average ranks
let r'' = r + n // get the rank after this group is processed
let avg = List.averageBy float [r+1 .. r''] // average ranks for this group
r'', ([for _ in 1 .. n -> avg]) :: l) // add a list with avg repeated
(0,[]) // seed the fold with rank 0 and an empty list
|> snd // get the final list component, ignoring the component storing the final rank
|> List.rev // reverse the list
|> List.collect (fun l -> l) // merge individual lists into final list
Or to copy Mehrdad's style:
let rank arr =
let lt item = arr |> Seq.filter (fun x -> x < item) |> Seq.length
let lte item = arr |> Seq.filter (fun x -> x <= item) |> Seq.length
let avgR item = [(lt item) + 1 .. (lte item)] |> List.averageBy float
Seq.map avgR arr
回答2:
I think that you'll probably find this problem far easier to solve in F# if you rewrite the above in a declarative manner rather than in an imperative manner. Here's my off-the-top-of-my-head approach to rewriting the above declaratively:
First we need a wrapper class to decorate our items with a property carrying the rank.
class Ranked<T> {
public T Value { get; private set; }
public double Rank { get; private set; }
public Ranked(T value, double rank) {
this.Value = value;
this.Rank = rank;
}
}
Here, then, is your algorithm in a declarative manner. Note that elements
is your input sequence and the resulting sequence is in the same order as elements
. The delegate func
is the value that you want to rank elements
by.
static class IEnumerableExtensions {
public static IEnumerable<Ranked<T>> Rank<T, TRank>(
this IEnumerable<T> elements,
Func<T, TRank> func
) {
var groups = elements.GroupBy(x => func(x));
var ranks = groups.OrderBy(g => g.Key)
.Aggregate(
(IEnumerable<double>)new List<double>(),
(x, g) =>
x.Concat(
Enumerable.Repeat(
Enumerable.Range(x.Count() + 1, g.Count()).Sum() / (double)g.Count(),
g.Count()
)
)
)
.GroupBy(r => r)
.Select(r => r.Key)
.ToArray();
var dict = groups.Select((g, i) => new { g.Key, Index = i })
.ToDictionary(x => x.Key, x => ranks[x.Index]);
foreach (T element in elements) {
yield return new Ranked<T>(element, dict[func(element)]);
}
}
}
Usage:
class MyClass {
public double Score { get; private set; }
public MyClass(double score) { this.Score = score; }
}
List<MyClass> list = new List<MyClass>() {
new MyClass(1.414),
new MyClass(2.718),
new MyClass(2.718),
new MyClass(2.718),
new MyClass(1.414),
new MyClass(3.141),
new MyClass(3.141),
new MyClass(3.141),
new MyClass(1.618)
};
foreach(var item in list.Rank(x => x.Score)) {
Console.WriteLine("Score = {0}, Rank = {1}", item.Value.Score, item.Rank);
}
Output:
Score = 1.414, Rank = 1.5
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 1.414, Rank = 1.5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 1.618, Rank = 8
Note that I do not require the input sequence to be ordered. The resulting code is simpler if you enforce such a requirement on the input sequence. Note further that we do not mutate the input sequence, nor do we mutate the input items. This makes F# happy.
From here you should be able to rewrite this in F# easily.
回答3:
This is not a very efficient algorithm (O(n2)), but it's quite short and readable:
let percentile arr =
let rank item = ((arr |> Seq.filter (fun i -> i < item)
|> Seq.length |> float) + 1.0)
/ float (Array.length arr) * 100.0
Array.map rank arr
You might mess with the expression fun i -> i < e
(or the + 1.0 expression) to achieve your desired way of ranking results:
let arr = [|1.0;2.0;2.0;4.0;3.0;3.0|]
percentile arr |> print_any;;
[|16.66666667; 33.33333333; 33.33333333; 100.0; 66.66666667; 66.66666667|]
回答4:
Mehrdad's solution is very nice but a bit slow for my purposes. The initial sorting can be done 1 time. Rather than traversing the lists each time to get the number of items < or <= the target, we can use counters. This is more imperative (could have used a fold):
let GetRanks2 ( arr ) =
let tupleList = arr |> Seq.countBy( fun x -> x ) |> Seq.sortBy( fun (x,count) -> x )
let map = new System.Collections.Generic.Dictionary<int,float>()
let mutable index = 1
for (item, count) in tupleList do
let c = count
let avgRank =
let mutable s = 0
for i = index to index + c - 1 do
s <- s + i
float s / float c
map.Add( item, avgRank )
index <- index + c
//
map
来源:https://stackoverflow.com/questions/2239778/f-how-to-percentile-rank-an-array-of-doubles