Java how to infer type from data coming from multiple sources

陌路散爱 提交于 2020-01-16 06:55:06

问题


I'm developing a java server that collects data from multiple sensors. These sensors usually return the queried values as string values. In the header it's indicated the data type that the server has to use to cast the received value. These value can be integer, boolean, double, float, and long.

It might happen that the sensors don't provide the "data type description" of the values so: I want to find a way to understand the data type it analyzing the received string.

I was thinking about using REGEX but maybe there are some other ways to do it better. Any suggestion?


回答1:


There is several approaches to do this. One is try to parse value by different standart java types in proper order i.e.

Boolean.parseBoolean(s)
Integer.parseInteger(s)
Long.parseLong(s)
... 
(and so on)

And catch exception every step Second approach - use apache commons library, there is detection of types i.e.

BooleanUtils.isBoolean(s)    
StringUtils.IsNumeric(s)
StringUtils.IsAlpha(s)



回答2:


I would create a data validator chain of responsibility, each of whose elements would attempt to cast the input data ordered from the most to the least restrictive type:

boolean
integer
long
float
double
String

If one fails to parse the data, the chain propagates it to the next parser and if everything fails, you throw an exception or use it as String.




回答3:


I was inspired by this post to write my own. It's really easy to use. String#trim() is used to remove leading and trailing whitespace, so the following work fine:

jshell> Typifier.typify(" 23  \t\n")
$206 ==> String[2] { "Byte", "23" }

jshell> Typifier.typify("\r\n 3.4")
$207 ==> String[2] { "Float", "3.4" }

But if the user enters only whitespace, that's fine, too:

jshell> Typifier.typify(" ")
$298 ==> String[2] { "String", " " }

Various representations of true/false are used to determine Boolean-ness:

jshell> Typifier.typify(" F ")
$208 ==> String[2] { "Boolean", "false" }

jshell> Typifier.typify(" 1 ")
$209 ==> String[2] { "Boolean", "true" }

jshell> Typifier.typify(" TRUE ")
$210 ==> String[2] { "Boolean", "true" }

Ranges of Byte, Short, and Float are used to box the value in the narrowest type available:

jshell> Typifier.typify(" 2 ")
$212 ==> String[2] { "Byte", "2" }

jshell> Typifier.typify(" 200 ")
$213 ==> String[2] { "Short", "200" }

jshell> Typifier.typify(" 2e9 ")
$214 ==> String[2] { "Float", "2.0E9" }

jshell> Typifier.typify(" 2e99 ")
$215 ==> String[2] { "Double", "2.0E99" }

Default type is String, but if the equation is parsable by the JavaScript ScriptEngine, if will be parsed and the result will be returned

jshell> Typifier.typify("var a = 3; var b = 6; a*b")
$230 ==> String[2] { "Float", "18.0" }

jshell> Typifier.typify("2*(2.4e2 + 34.8)")
$231 ==> String[2] { "Float", "549.6" }

If the input string has length 1 and is not Boolean or Byte, it will be assigned the Character type:

jshell> Typifier.typify("4")
$232 ==> String[2] { "Byte", "4" }

jshell> Typifier.typify("-")
$233 ==> String[2] { "Character", "-" }

jshell> Typifier.typify("a")
$234 ==> String[2] { "Character", "a" }

Possible extensions might include putting a flag for the formula evaluation, or a flag which would restrict returned types to "common" types (Boolean, Integer, Double, String). This code can also be found at gist.github. Anyway, here it is:

code

import javax.script.ScriptEngineManager;
import javax.script.ScriptEngine;
import javax.script.ScriptException;

import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Typifier {

  public Typifier() {
    // nothing special to do here
  }

  public static String[] typify (String data) {

    String s = data.trim();

    // -1. if the input data is only whitespace, return String
    if (s.length() == 0) return new String[]{"String", data};

    // 0. check if the data is Boolean (true or false)
    if (Arrays.asList("0", "f", "F", "false", "False", "FALSE").contains(s))
      return new String[]{"Boolean", "false"};
    else if (Arrays.asList("1", "t", "T", "true",  "True",  "TRUE" ).contains(s))
      return new String[]{"Boolean", "true"};

    // 1. check if data is a Byte (1-byte integer with range [-(2e7) = -128, ((2e7)-1) = 127])
    try {
      Byte b = Byte.parseByte(s);
      return new String[]{"Byte", b.toString()}; // if we make it to this line, the data parsed fine as a Byte
    } catch (java.lang.NumberFormatException ex) {
      // okay, guess it's not a Byte
    }

    // 2. check if data is a Short (2-byte integer with range [-(2e15) = -32768, ((2e15)-1) = 32767])
    try {
      Short h = Short.parseShort(s);
      return new String[]{"Short", h.toString()}; // if we make it to this line, the data parsed fine as a Short
    } catch (java.lang.NumberFormatException ex) {
      // okay, guess it's not a Short
    }

    // 3. check if data is an Integer (4-byte integer with range [-(2e31), (2e31)-1])
    try {
      Integer i = Integer.parseInt(s);
      return new String[]{"Integer", i.toString()}; // if we make it to this line, the data parsed fine as an Integer
    } catch (java.lang.NumberFormatException ex) {
      // okay, guess it's not an Integer
    }

    String s_L_trimmed = s;

    // 4. check if data is a Long (8-byte integer with range [-(2e63), (2e63)-1])

    //    ...first, see if the last character of the string is "L" or "l"
    if (Arrays.asList("L", "l").contains(s.substring(s.length() - 1)) && s.length() > 1)
      s_L_trimmed = s.substring(0, s.length() - 1);

    try {
      Long l = Long.parseLong(s_L_trimmed);
      return new String[]{"Long", l.toString()}; // if we make it to this line, the data parsed fine as a Long
    } catch (java.lang.NumberFormatException ex) {
      // okay, guess it's not a Long
    }

    // 5. check if data is a Float (32-bit IEEE 754 floating point with approximate extents +/- 3.4028235e38)
    try {
      Float f = Float.parseFloat(s);

      if (!f.isInfinite()) // if it's beyond the range of Float, maybe it's not beyond the range of Double
        return new String[]{"Float", f.toString()}; // if we make it to this line, the data parsed fine as a Float and is finite

    } catch (java.lang.NumberFormatException ex) {
      // okay, guess it's not a Float
    }

    // 6. check if data is a Double (64-bit IEEE 754 floating point with approximate extents +/- 1.797693134862315e308 )
    try {
      Double d = Double.parseDouble(s);

      if (!d.isInfinite())
        return new String[]{"Double", d.toString()}; // if we make it to this line, the data parsed fine as a Double
      else // if it's beyond the range of Double, just return a String and let the user decide what to do
        return new String[]{"String", s};

    } catch (java.lang.NumberFormatException ex) {
      // okay, guess it's not a Double
    }

    // 7. revert to String by default, with caveats...

    //   a. if string has length 1, it is a single character
    if (s.length() == 1) return new String[]{"Character", s};

    //   b. if string contains any of {+, -, /, *, =}, attempt to parse equation
    Pattern pattern = Pattern.compile("[+-/*=]");
    Matcher matcher = pattern.matcher(s);

    //   ...evaluate the equation and send the result back to typify() to get the type
    if (matcher.find()) {
      ScriptEngineManager manager = new ScriptEngineManager();
      ScriptEngine engine = manager.getEngineByName("JavaScript");

      try {
        String evaluated = engine.eval(s).toString();
        return typify(evaluated);
      } catch (javax.script.ScriptException ex) {
        // okay, guess it's not an equation
      }
    }

    // ...if we've made it all the way to here without returning, give up and return "String"

    return new String[]{"String", s};

  }

}


来源:https://stackoverflow.com/questions/13314215/java-how-to-infer-type-from-data-coming-from-multiple-sources

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!