Count the number of null value per column with pentaho

萝らか妹 提交于 2019-12-12 02:49:30

问题


I've got a csv file that contain more than 60 columns and 2 000 000 lines, I'm trying to count the number of null value per variable (per column) then to do the sum of that new row to get the number total of null value in the entire csv. For example if we got this file in input:

We expect this other file in output:

I know how to count the number of null value per line but, I didn't figure out how to count the number of null value per column.


回答1:


There has to be a better way to do this, but I made a really nasty JavaScript which does the job.

It has some problems for different column types, as it doesn't set the column type. (It should set all columns to integer, but I don't know if that is possible from JavaScript.)

You have to run Identify last row in a stream first, and save it to the column last (or change the script).

var nulls;
var seen;

if (!seen) {
    // Initialize array
    seen = 1;
    nulls = [];
    for (var i = 0; i < getInputRowMeta().size(); i++) {
        nulls[i] = 0;
    }
}

for (var i = 0; i < getInputRowMeta().size(); i++) {
    if (row[i] == null) {
        nulls[i] += 1;
    }
    // Hack to find empty strings
    else if (getInputRowMeta().getValueMeta(i).getType() == 2 && row[i].length() == 0) {
        nulls[i] += 1;
    }
}

// Don't store any values
trans_Status = SKIP_TRANSFORMATION;

// Only store the nulls at the last row
if (last == true) {
    putRow(nulls);
}



回答2:


Please drag and drop below steps in to canvas.

step1: Add constants: create one variable called constant and value = 1

step2: Filter Rows: you have filter null values of all columns.

step3: Group by: here group by field constant variable aggregates section we have to specify remaining columns like ct_inc.And type is Number of Values (N)

If you have any doubts feel free to ask.

skype_id : panabakavenkatesh



来源:https://stackoverflow.com/questions/35368635/count-the-number-of-null-value-per-column-with-pentaho

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!