Plotting aggregated data with sub-columns in dc.js

空扰寡人 提交于 2020-06-23 12:36:38

问题


I have data in the form:

data = [..., {id:X,..., turnover:[[2015,2017,2018],[2000000,3000000,2800000]]}, ...];

My goal is to plot the year in the x-axis, against the average turnover for all companies currently selected via crossfilter in the y-axis.

The years recorded per company are inconsistent, but there should always be three years.

If it would help, I can reorganise the data to be in the form:

data = [..., {id:X,..., turnover:{2015:2000000, 2017:3000000, 2018:2800000}}, ...];

Had I been able to reorganise the data further to look like:

[...{id:X, ..., year:2015, turnover:2000000},{id:X,...,year:2017,turnover:3000000},{id:X,...,year:2018,turnover:2800000}]; 

Then this question would provide a solution.

But splitting the companies into separate rows doesn't make sense with everything else I'm doing.


回答1:


Unless I'm mistaken, you have what I call a "tag dimension", aka a dimension with array keys.

You want each row to be recorded once for each year it contains, but you only want it to affect this dimension. You don't want to observe the row multiple times in the other dimensions, which is why you don't want to flatten.

With your original data format, your dimension definition would look something like:

var yearsDimension = cf.dimension(d => d.turnover[0], true);

The key function for a tag dimension should return an array, here of years.

This feature is still fairly new, as crossfilter goes, and a couple of minor bugs were found this year. These bugs should be easy to avoid. The feature has gotten a lot of use and no major bugs have been found.

Always beware with tag dimensions, since any aggregations will add up to more than 100% - in your case 300%. But if you are doing averages across companies for a year, this should not be a problem.

pairs of tags and values

What's unique about your problem is that you not only have multiple keys per row, you also have multiple values associated with those keys.

Although the crossfilter tag dimension feature is handy, it gives you no way to know which tag you are looking at when you reduce. Further, the most powerful and general group reduction method, group.reduce(), doesn't tell you which key you are reducing..

But there is one even more powerful way to reduce across the entire crossfilter at once: dimension.groupAll()

A groupAll object behaves like a group, except that it is fed all of the rows, and it returns only one bin. If you use dimension.groupAll() you get a groupAll object that observes all filters except those on that dimension. You can also use crossfilter.groupAll if you want a groupAll that observes all filters.

Here is a solution (using ES6 syntax for brevity) of reduction functions for groupAll.reduce() that reduces all of the rows into an object of year => {count, total}.

function avg_paired_tag_reduction(idTag, valTag) {
  return {
    add(p, v) {
      v[idTag].forEach((id, i) => {
        p[id] = p[id] || {count: 0, total: 0};
        ++p[id].count;
        p[id].total += v[valTag][i];
      });
      return p;
    },
    remove(p, v) {
      v[idTag].forEach((id, i) => {
        console.assert(p[id]);
        --p[id].count;
        p[id].total -= v[valTag][i];
      })
      return p;
    },
    init() {
      return {};
    }
  };
}

It will be fed every row and it will loop over the keys and values in the row, producing a count and total for every key. It assumes that the length of the key array and the value array are the same.

Then we can use a "fake group" to turn the object on demand into the array of {key,value} pairs that dc.js charts expect:

function groupall_map_to_group(groupAll) {
  return {
    all() {
      return Object.entries(groupAll.value())
        .map(([key, value]) => ({key,value}));
    }
  };
}

Use these functions like this:

const red = avg_paired_tag_reduction('id', 'val');
const avgPairedTagGroup = turnoverYearsDim.groupAll().reduce(
  red.add, red.remove, red.init
);
console.log(groupall_map_to_group(avgPairedTagGroup).all());

Although it's possible to compute a running average, it's more efficient to instead calculate a count and total, as above, and then tell the chart how to compute the average in the value accessor:

chart.dimension(turnoverYearsDim)
  .group(groupall_map_to_group(avgPairedTagGroup))
  .valueAccessor(kv => kv.value.total / kv.value.count)

Demo fiddle.



来源:https://stackoverflow.com/questions/58132895/plotting-aggregated-data-with-sub-columns-in-dc-js

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!