elasticsearch初步

我怕爱的太早我们不能终老 提交于 2020-03-11 12:58:42

安装启动elasticsearch

mac

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-darwin-x86_64.tar.gz
tar -xzvf elasticsearch-7.6.1-darwin-x86_64.tar.gz
cd elasticsearch-7.6.1
./bin/elasticsearch

验证elasticsearch是否运行成功

发送如下请求验证:

curl http://127.0.0.1:9200

如下结果,即表示运行成功:

{
  "name" : "xxx",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "rEAVwisyREqY5TYkmUhpgA",
  "version" : {
    "number" : "7.6.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "aa751e09be0a5072e8570670309b1f12348f023b",
    "build_date" : "2020-02-29T00:15:25.529771Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

安装中文分词插件

2020年3月10日
在elasticsearch的教程可以看到官方推荐使用SmartCN中文分词插件,不过也可以尝试使用ik分词插件,这里使用SmartCN插件。

安装SmartCN

sudo bin/elasticsearch-plugin install analysis-smartcn

安装完成后,需要重启当前节点。

测试SmartCN分词效果

curl --location --request POST 'http://127.0.0.1:9200/_analyze/ ' \
--header 'Content-Type: application/json' \
--data-raw '{"analyzer":"smartcn","text":"你好,我是SmartCN"}  '

效果:

{
    "tokens": [
        {
            "token": "你好",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "我",
            "start_offset": 3,
            "end_offset": 4,
            "type": "word",
            "position": 2
        },
        {
            "token": "是",
            "start_offset": 4,
            "end_offset": 5,
            "type": "word",
            "position": 3
        },
        {
            "token": "smartcn",
            "start_offset": 5,
            "end_offset": 12,
            "type": "word",
            "position": 4
        }
    ]
}

分词效果还可以。

Index

查询所有索引:

curl -X GET 'http://localhost:9200/_cat/indices?v'

新建索引:

curl -X PUT 'localhost:9200/weather'

返回结果:

{
  "acknowledged":true,
  "shards_acknowledged":true
}

acknowledgedtrue表示新建索引成功。
删除索引:

curl -X DELETE 'localhost:9200/weather'

查询所有索引的映射:

curl 'localhost:9200/_mapping?pretty=true'

新建索引

curl --location --request PUT 'localhost:9200/accounts' \
--header 'Content-Type: application/json' \
--data-raw '{
  "mappings": {
      "properties": {
        "user": {
          "type": "text",
          "analyzer": "smartcn",
          "search_analyzer": "smartcn"
        },
        "title": {
          "type": "text",
          "analyzer": "smartcn",
          "search_analyzer": "smartcn"
        },
        "desc": {
          "type": "text",
          "analyzer": "smartcn",
          "search_analyzer": "smartcn"
        }
      }
  }
}'

这里新建索引的过程中,顺便定义了这个索引的映射属性,已经这些属性的数据类型和分词插件。

插入json文档记录

curl --location --request POST 'localhost:9200/accounts/_doc/' \
--header 'Content-Type: application/json' \
--data-raw '{
  "user": "李四",
  "title": "工程师",
  "desc": "系统管理"
}'

结果:

{
    "_index": "accounts",
    "_type": "_doc",
    "_id": "ZW86w3AB8yT7_DRZEOIa",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 3,
    "_primary_term": 1
}

更新json文档

curl --location --request PUT 'localhost:9200/accounts/_doc/1' \
--header 'Content-Type: application/json' \
--data-raw '{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}'

即把插入json文档请求改成put方式即可。

查询所有json文档记录

curl --location --request GET 'localhost:9200/accounts/_doc/_search?pretty=true'

结果:

{
  "took": 59,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "accounts",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "user": "张三",
          "title": "工程师",
          "desc": "数据库管理"
        }
      },
      {
        "_index": "accounts",
        "_type": "_doc",
        "_id": "ZW86w3AB8yT7_DRZEOIa",
        "_score": 1.0,
        "_source": {
          "user": "李四",
          "title": "工程师",
          "desc": "系统管理"
        }
      }
    ]
  }
}

根据id查询json文档

curl --location --request GET 'localhost:9200/accounts/_doc/1'

将插入json文档请求,改成GET方式请求即可。

根据id删除json文档

curl --location --request DELETE 'localhost:9200/accounts/_doc/1'

在插入json文档请求的基础,改成DELETE方式请求即可。

全文搜索

重点来了,elasticsearch的全文搜索在Query DSL定义了很多方式来进行查询。

curl --location --request GET 'localhost:9200/accounts/_search?pretty=true' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query" : {
        "match" : { "desc" : "数据库" }
    }
}'

这里是对其中一个字段进行全文搜索,结果如下:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.6931471,
    "hits": [
      {
        "_index": "accounts",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.6931471,
        "_source": {
          "user": "张三",
          "title": "工程师",
          "desc": "数据库管理"
        }
      }
    ]
  }
}

and or查询

只需要调整请求Body如下:

or方式

{
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "desc": "管理"
                    }
                },
                {
                    "match": {
                        "desc": "数据库"
                    }
                }
            ]
        }
    }
}

and方式

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "desc": "管理"
                    }
                },
                {
                    "match": {
                        "desc": "数据库"
                    }
                }
            ]
        }
    }
}

这里主要使用elasticsearch的布尔查询语法,其中must表示为and操作,should表示为or操作。

总结

elasticsearch文档很详细,只要理解了倒排索引原理,熟悉elasticsearch中的Index,Document,分词插件,搜索API,查询语法和其他查询方式,应该是能够轻松运用的。

参考

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!