Complex transformations and filters with Apache NiFi

佐手、 提交于 2021-02-11 06:54:41

问题


I have a JSON array:

[ {
  "account_login" : "some_mail@gmail.com",
  "view_id" : 11313231,
  "join_id" : "utm_campaign=toyota&utm_content=multiformat_sites&utm_medium=cpc&utm_source=mytarget",
  "start_date" : "2020-08-01",
  "end_date" : "2020-08-31"
}, {
  "account_login" : "another_mail@lab.net",
  "view_id" : 19556319183,
  "join_id" : "utm_campaign=mazda&utm_content=keywords_social-networks&utm_medium=cpc&utm_source=facebook",
  "start_date" : "2020-12-22",
  "end_date" : "2020-12-23"
}, {
...
} ]

For each join_id I should do next things:

  1. Split string into key-values pairs: utm_campaign, toyota; utm_content, multiformat_sites; etc
  2. Filter them (Java code below);
  3. Convert keys to another format; uses table from database (Java code below);

My main goals is to repeat this Java code:

public class GaUtmFactoryService {

    private static final String INVALID_MACRO_FOOTPRINTS = "^.*[{\\[%]+.+[}\\]%].*$";

    public Map<String, String> extractUtmMarks(String utmMarks) {
        if (utmMarks == null || utmMarks.isBlank()) {
            return Collections.emptyMap();
        }
        return Arrays.stream(utmMarks.split("\\s*&\\s*"))
                .map(s -> s.trim().split("\\s*=\\s*"))
                .filter(this::isUtmMarksValid)
                .collect(Collectors.toMap(
                        key -> convertCsUtmMarkToGa(key[0]),
                        value -> value[1],
                        (val1, val2) -> val2)
                );
    }

    
    private boolean isUtmMarksValid(String[] utmMarks) {
        return utmMarks.length == 2
                && !convertCsUtmMarkToGa(utmMarks[0]).isBlank()
                && !utmMarks[1].isBlank()
                && Arrays.stream(utmMarks).noneMatch(this::isUtmMarkContainsInvalidChars);
    }

    private boolean isUtmMarkContainsInvalidChars(String utmMark) {
        return utmMark.matches(INVALID_MACRO_FOOTPRINTS)
                || !StandardCharsets.US_ASCII.newEncoder().canEncode(utmMark);
    }

   
    private String convertCsUtmMarkToGa(String utmMark) {
       switch (utmMark) {
            case "utm_medium":
                return "ga:medium";
            case "utm_campaign":
                return "ga:campaign";
            case "utm_source":
                return "ga:source";
            case "utm_content":
                return "ga:adContent";
            case "utm_term":
                return "ga:keyword";
            case "utm_target":
            case "utm_a":
                return "";
            default:
                return rowUtmMarks;
        }
    }

}

Usages from outside:

public Map<String, String> getConvertedMarks() {
        GaUtmFactoryService gaUtmFactoryService = new GaUtmFactoryService();
        String utmMarks = "utm_campaign=toyota&utm_content=multiformat_sites&utm_medium=cpc&utm_source=facebook";
        Map<String, String> converted = gaUtmFactoryService.extractUtmMarks(utmMarks);
        //should be:
        ////{ga:campaign=toyota, ga:adContent=multiformat_sites, ga:medium=cpc, ga:source=facebook}
        return converted;
    }

Is it even possible with NiFi? Or if it's hard, maybe should i just create REST microservice with some endpoints for this task?

UPDATE

I did EvaluateJsonPath and SplitJson. Now each json file have an attribute: utm.marks = utm_campaign=toyota&utm_content=multiformat_sites&utm_medium=cpc&utm_source=mytarget

I need to split these attributes and get smth like this:

campaign.key = ga:campaign

campaign.value = toyota

content.key = ga:content

content.value = multiformat_sites

etc.


回答1:


the ExecuteGroovyScript could look like this for this transformation:

import groovy.json.*
//get file from session
def ff=session.get()
if(!ff)return
//read stream, convert to reader, parse to list/objects
def data=ff.read().withReader("UTF-8"){r-> new JsonSlurper().parse(r) }

//transform json
data.each{ i->
    i.join_id = i.join_id
        .split("\\s*&\\s*")  //# to array
        .collectEntries{ 
                //# convert each item to map entry
                String[] kv = it.split("\\s*=\\s*")
                kv[0] = [
                    "utm_medium"   : "ga:medium",
                    "utm_campaign" : "ga:campaign",
                    "utm_source"   : "ga:source",
                    "utm_content"  : "ga:adContent",
                    "utm_term"     : "ga:keyword",
                ].get( kv[0] )
                kv
            }
        .findAll{ k,v-> k } //# filter out empty/null keys
}

//write back to file
ff.write("UTF-8"){w-> new JsonBuilder(data).writeTo(w)}
//transfer to success
REL_SUCCESS<<ff



回答2:


Solution based on daggett answer for one JSON (not array):

import groovy.json.*
//get file from session
def ff=session.get()
if(!ff)return
//read stream, convert to reader, parse to list/objects

def data=ff.read().withReader("UTF-8"){r-> new JsonSlurper().parse(r) }
def builder = new JsonBuilder(data)

builder.content.join_id = builder.content.join_id.split("\\s*&\\s*")  //# to array
        .collectEntries{ 
                //# convert each item to map entry
                String[] kv = it.split("\\s*=\\s*")
                kv[0] = [
                    "utm_medium"   : "ga:medium",
                    "utm_campaign" : "ga:campaign",
                    "utm_source"   : "ga:source",
                    "utm_content"  : "ga:adContent",
                    "utm_term"     : "ga:keyword",
                ].get( kv[0] )
                kv
            }
        .findAll{ k,v-> k } //# filter out empty/null keys
ff.write("UTF-8"){w-> builder.writeTo(w)}
//transfer to success
REL_SUCCESS<<ff


来源:https://stackoverflow.com/questions/65448320/complex-transformations-and-filters-with-apache-nifi

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!