ElasticSearch + Kibana to display business data

这一生的挚爱 提交于 2019-12-11 16:50:03

问题


So I have visitor data captured for the past several years - over 14 million records. On top of that I have form data from the past several years. There is a common ID between the two.

Right now I'm attempting to learn ElasticSearch + Kibana using the visitor data. The data is fairly simple but not real well formatted - PHP's $_REQUEST and $_SERVER data. Here's an example from a Google bot visit:

{u'Entrance Time': 1407551587.7385,
 u'domain': u'############',
 u'pages': {u'6818555600ccd9880bf7acef228c5d47': {u'REQUEST': [],
   u'SERVER': {u'DOCUMENT_ROOT': u'/var/www/####/',
    u'Entrance Time': 1407551587.7385,
    u'GATEWAY_INTERFACE': u'CGI/1.1',
    u'HTTP_ACCEPT': u'*/*',
    u'HTTP_ACCEPT_ENCODING': u'gzip,deflate',
    u'HTTP_CONNECTION': u'Keep-alive',
    u'HTTP_FROM': u'googlebot(at)googlebot.com',
    u'HTTP_HOST': u'############',
    u'HTTP_IF_MODIFIED_SINCE': u'Fri, 13 Jun 2014 20:26:33 GMT',
    u'HTTP_USER_AGENT': u'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
    u'PATH': u'/usr/local/bin:/usr/bin:/bin',
    u'PHP_SELF': u'/index.php',
    u'QUERY_STRING': u'',
    u'REDIRECT_SCRIPT_URI': u'http://############/',
    u'REDIRECT_SCRIPT_URL': u'############',
    u'REDIRECT_STATUS': u'200',
    u'REDIRECT_URL': u'############',
    u'REMOTE_ADDR': u'############',
    u'REMOTE_PORT': u'46271',
    u'REQUEST_METHOD': u'GET',
    u'REQUEST_TIME': u'1407551587',
    u'REQUEST_URI': u'############',
    u'SCRIPT_FILENAME': u'/var/www/PIAN/index.php',
    u'SCRIPT_NAME': u'/index.php',
    u'SCRIPT_URI': u'http://############/',
    u'SCRIPT_URL': u'/############/',
    u'SERVER_ADDR': u'############',
    u'SERVER_ADMIN': u'admin@############',
    u'SERVER_NAME': u'############',
    u'SERVER_PORT': u'80',
    u'SERVER_PROTOCOL': u'HTTP/1.1',
    u'SERVER_SIGNATURE': u'<address>Apache/2.2.22 (Ubuntu) Server at ############ Port 80</address>\n',
    u'SERVER_SOFTWARE': u'Apache/2.2.22 (Ubuntu)',
    u'uniqID': u'bbc398716f4703cfabd761cc8d4101a1'},
   u'SESSION': {u'Entrance Time': 1407551587.7385,
    u'uniqID': u'bbc398716f4703cfabd761cc8d4101a1'}}},
 u'uniqID': u'bbc398716f4703cfabd761cc8d4101a1'}

I use the Python package elasticsearch.py as my interface. I create my index like this:

es.indices.create(
    index=Visit_to_ElasticSearch.INDEX,
    body={
        'settings': {
            'number_of_shards': 5,
            'number_of_replicas': 1,
        }
    },
    # ignore already existing index
    ignore=400
)

And this is my mapping:

# Create mappings of a visit
time_date_mapping = { 'type': 'date_time' }
str_not_analyzed = { 'type': 'string'} # This used to include 'index': 'not_analyzed'

visit_mapping = {
    'properties': {
        'uniqID': str_not_analyzed,
        'pages': str_not_analyzed,
        'domain': str_not_analyzed,
        'Srvr IP': str_not_analyzed,
        'Visitor IP': str_not_analyzed,
        'Agent': { 'type': 'string', 'index': 'not_analyzed' },
        'Referrer': { 'type': 'string' },
        'Entrance Time': time_date_mapping,
        'Request Time': time_date_mapping,
        'Raw': { 'type': 'string', 'index': 'not_analyzed' },
        'Pages': { 'type': 'string', 'index': 'not_analyzed' },
    },
}

The actual mapping that ES reports:

'visits': {
  'mappings': {
    'visit': {
      'properties': {
        'Agent': {'type': 'string'},
        'Entrance Time': {'format': 'dateOptionalTime', 'type': 'date'},
        'Pages': {'type': 'string'},
        'Raw': {
          'properties': {
            'Entrance Time': {'type': 'double'},
            'domain': {'type': 'string'},
            'uniqID': {'type': 'string'}
          }
        },
        'Referrer': {'type': 'string'},
        'Request Time': {'format': 'dateOptionalTime', 'type': 'date'},
        'Srvr IP': {'type': 'string'},
        'Visitor IP': {'type': 'string'},
        'domain': {'type': 'string'},
        'uniqID': {'type': 'string'}
      }
    }
  }
}

When I dump my trial data into ES and view it in Kibana4 there are problems. From the Discover tab, it shows me a "Quick Count" of the top 5 Agents with a truncated version of the full string. However, when I create a visualization (Visualize->Pie Chart->From a new search->Split Slices) using Terms in Aggregation and Agetn in field I get the top 5 as a list of single words - the list is mozilla, 5.0, compatible, http, 2.0.

Kibana warns me that the Agent field is being Analyzed despite my telling it not to analyze that field in the mapping.

I'm brand new to this, am I incorrect in assuming that if Agent was not analyzed it would do counts on the full Agent string? Replacing spaces with underscores did not fix this. So how do I fix this? Is there a way to put the Agent sting into ES such that it is only consider as a whole?

Thank you

Full mapping code can be found at this question.

------- Mapping after cURL --------

I used curl --request PUT 'http://127.0.0.1:9200/visits/_mapping/visit?ignore_conflicts=true' --data '{"visit" : { "properties" : { "Agent" : { "type" : "string", "index" : "not_analyzed" } } } }' to alter the mapping and this is the new mapping:

{
  "visits" : {
    "mappings" : {
      "visit" : {
        "properties" : {
          "Agent" : {
            "type" : "string",
            "norms" : {
              "enabled" : false
            }
          },
          "Entrance Time" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "Pages" : {
            "type" : "string"
          },
          "Raw" : {
            "properties" : {
              "Entrance Time" : {
                "type" : "double"
              },
              "domain" : {
                "type" : "string"
              },
              "uniqID" : {
                "type" : "string"
              }
            }
          },
          "Referrer" : {
            "type" : "string"
          },
          "Request Time" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "Srvr IP" : {
            "type" : "string"
          },
          "Visitor IP" : {
            "type" : "string"
          },
          "domain" : {
            "type" : "string"
          },
          "uniqID" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

回答1:


This is the same issue as this other issue and the reason it doesn't work has to do with the fact that the mapping visit_mapping was never installed via put_mapping. Hence, ES has created his own mapping based on what's been sent in the visit document.

To solve this, simply call put_mapping with your mapping before indexing your first visit document.



来源:https://stackoverflow.com/questions/32469772/elasticsearch-kibana-to-display-business-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!