Deedle Frame.mapRows how to properly use it and how to construct objectseries properly

℡╲_俬逩灬. 提交于 2019-12-11 09:11:55

问题


I also noticed something strange about Deedle mapRows function i cant explain:

let col1 =       Series.ofObservations[1=>10.0;2=>System.Double.NaN;3=>System.Double.NaN;4=>10.0;5=>System.Double.NaN;6=>10.0; ]

let col2 = Series.ofObservations[1=>9.0;2=>5.5;3=>System.Double.NaN;4=>9.0;5=>System.Double.NaN;6=>9.0; ]
let f1 = Frame.ofColumns [ "c1" => col1; "c2" => col2 ]
let f2 = f1 |> Frame.mapRows (fun k r -> r) |> Frame.ofRows
let f3 = f1 |> Frame.mapRows (fun k r -> let x = r.Get("c1"); 
                                          let y = r.Get("c2");  
                                          r) |> Frame.ofRows


val f1 : Frame<int,string> =

      c1        c2        
 1 -> 10        9         
 2 -> <missing> 5.5       
 3 -> <missing> <missing> 
 4 -> 10        9         
 5 -> <missing> <missing> 
 6 -> 10        9         

 val f2 : Frame<int,string> =

      c1        c2        
 1 -> 10        9         
 2 -> <missing> 5.5       
 3 -> <missing> <missing> 
 4 -> 10        9         
 5 -> <missing> <missing> 
 6 -> 10        9         

 val f3 : Frame<int,string> =

      c1        c2        
 1 -> 10        9         
 2 -> <missing> <missing> 
 3 -> <missing> <missing> 
 4 -> 10        9         
 5 -> <missing> <missing> 
 6 -> 10        9         

How can f3 has a different value than f2? all i did with f3 is to get value from the obejectseries.

I am trying to use this mapRows function to do row based process and produce a objectseries then mapRows can create a new frame with the same row keys. The process has to be row based as the column value needs to be updated based on its own value and neighboring value.

The calculation cant be done using column to column directly as the calculation changes based on the row value.

Appreciate any advice

Update

Since the original question was posted, I have since used Deedle in C#. To my surprise the row based calculation is very easy in C# and the way C# Frame.rows function handle missing values are very different than F# mapRows function. The following is a very basic example i used to try and true the logic. it might be useful to anyone who is searching for similar application:

Things to pay attention to are: 1. The rows function didn't remove the row while both columns' value are missing 2. The mean function is smart enough to calculate mean based on available data point.

using System.Text;
using System.Threading.Tasks;
using Deedle;

namespace TestDeedleRowProcessWithMissingValues
{
    class Program
    {
        static void Main(string[] args)
        {
            var s1 = new SeriesBuilder<DateTime, double>(){
                 {DateTime.Today.Date.AddDays(-5),10.0},
                 {DateTime.Today.Date.AddDays(-4),9.0},
                 {DateTime.Today.Date.AddDays(-3),8.0},
                 {DateTime.Today.Date.AddDays(-2),double.NaN},
                 {DateTime.Today.Date.AddDays(-1),6.0},
                 {DateTime.Today.Date.AddDays(-0),5.0}
             }.Series;

            var s2 = new SeriesBuilder<DateTime, double>(){
                 {DateTime.Today.Date.AddDays(-5),10.0},
                 {DateTime.Today.Date.AddDays(-4),double.NaN},
                 {DateTime.Today.Date.AddDays(-3),8.0},
                 {DateTime.Today.Date.AddDays(-2),double.NaN},
                 {DateTime.Today.Date.AddDays(-1),6.0}                 
             }.Series;

            var f = Frame.FromColumns(new KeyValuePair<string, Series<DateTime, double>>[] { 
                KeyValue.Create("s1",s1),
                KeyValue.Create("s2",s2)
            });

            s1.Print();
            f.Print();


            f.Rows.Select(kvp => kvp.Value).Print();

//            29/05/2015 12:00:00 AM -> series [ s1 => 10; s2 => 10]
//            30/05/2015 12:00:00 AM -> series [ s1 => 9; s2 => <missing>]
//            31/05/2015 12:00:00 AM -> series [ s1 => 8; s2 => 8]
//            1/06/2015 12:00:00 AM  -> series [ s1 => <missing>; s2 => <missing>]
//            2/06/2015 12:00:00 AM  -> series [ s1 => 6; s2 => 6]
//            3/06/2015 12:00:00 AM  -> series [ s1 => 5; s2 => <missing>]


            f.Rows.Select(kvp => kvp.Value.As<double>().Mean()).Print();

//            29/05/2015 12:00:00 AM -> 10
//            30/05/2015 12:00:00 AM -> 9
//            31/05/2015 12:00:00 AM -> 8
//            1/06/2015 12:00:00 AM  -> <missing>
//            2/06/2015 12:00:00 AM  -> 6
//            3/06/2015 12:00:00 AM  -> 5


            //Console.ReadLine();
        }
    }
}

回答1:


The reason why f3 differs follows from the way mapRows handles missing values.

When you're accessing a value using r.Get("C1"), you either get the value or you get a ValueMissingException. The mapRows function handles this exception and marks the entire row as missing. If you write just:

let f3 = f1 |> Frame.mapRows (fun k r -> 
  let x = r.Get("c1"); 
  let y = r.Get("c2");  
  r)

Then the result will be:

1 -> series [ c1 => 10; c2 => 9] 
2 -> <missing>                   
3 -> <missing>                   
4 -> series [ c1 => 10; c2 => 9] 
5 -> <missing>                   
6 -> series [ c1 => 10; c2 => 9] 

If you want to write a function that returns the frame as it was (reading the data from original rows and producing new rows), you could do something like:

f1 
|> Frame.mapRows (fun k r -> 
  [ "X" => OptionalValue.asOption(r.TryGet("c1")); 
    "Y" => OptionalValue.asOption(r.TryGet("c2")) ] 
  |> Series.ofOptionalObservations )
|> Frame.ofRows


来源:https://stackoverflow.com/questions/26049661/deedle-frame-maprows-how-to-properly-use-it-and-how-to-construct-objectseries-pr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!