whoosh

whoosh学习(2)

一笑奈何 提交于 2019-12-09 14:29:53
hello world #!/usr/bin/env #coding:utf-8 from whoosh.fields import * from whoosh.index import create_in from whoosh.index import open_dir from whoosh.qparser import QueryParser def createIndexs(dirName): schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirName, schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.add_document(title=u"edc document",

django+haystack+whoosh+分词库=搜索站

时间秒杀一切 提交于 2019-12-09 14:27:43
前言 很少有朋友写相关haystack与whoosh的文章,更没有一个较好的示例。所以我把这个完整的 示例开源 了,希望有兴趣的同学能够交流。 示例网站 解决的问题: 不同字段的优先级,比如Title比Content高。(haystack的whoosh backend默认不支持此功能) 相关性搜索(more_like_this)功能 能结合mysql数据库使用 解决中文分词功能(可以用我自己的yaha分词,或者结巴分词) 实现了更好的 ChineseAnalyzer,这个的 测试地址 使用whoosh 2.5.1。默认haystack使用whoosh2.4,用2.5会报错 修正了haystack+whoosh2.5.1的搜索词纠正功能,默认用老的Spelling API,在whoosh2.5.1下不支持。 同时注意: 如果使用结巴分词默认的ChineseAnanlyzer,请修改代码如下会更好: def ChineseAnalyzer(stoplist=STOP_WORDS,minsize=1,stemfn=stem,cachesize=50000): return ChineseTokenizer()|LowercaseFilter()|StopFilter(stoplist=stoplist,minsize=minsize)\ |StemFilter(stemfn=stemfn

whoosh学习(1)

天大地大妈咪最大 提交于 2019-12-09 14:25:11
背景 当前项目需要用到全文搜索 redis不方便实现 mysql效率太低 搜索引擎选择 pylucene whoosh(似乎更受欢迎, 文档最全 ) 为什么选择 纯python实现,省了编译二进制包的繁琐过程。 python代码比java更容易读懂,而且用起来也更方便。(翻者注:这个容易引发口水) 在很多时候易用性比单纯的最求速度更重要。 whoosh使用流程 创建schema 索引生成 索引查询 来源: oschina 链接: https://my.oschina.net/u/2351685/blog/603063

Language Modal through whoosh in Information Retrieval

浪尽此生 提交于 2019-12-08 08:06:00
问题 I am working in IR. Can any one guide me, how can I implement the language modal in whoosh. I already Applied TD-IDF and BM25. I am new to IR. For an example, the simplest form of language model simply throws away all conditioning context, and estimates each term independently. Such a model is called a unigram language model: P_{uni}(t_1t_2t_3t_4) = P(t_1)P(t_2)P(t_3)P(t_4) There are many more complex kinds of language models, such as bigram language models, which condition on the previous

Django haystack+whoosh error

匆匆过客 提交于 2019-12-08 06:54:34
问题 I'm trying to make a search in my django app then i used haystack and whoosh but i faced some troubles. first when i tried to rebuild_index or update_index it gives me this error right down below, also second one is when i typed and search it gave me 0 results. So i just thought that if this rebuild_index fixed the search problem will be solved. And please anyone help me with this errors: /usr/local/lib/python2.7/dist-packages/django/db/models/fields/__init__.py:903: RuntimeWarning:

Django-Haystack returns no results in search form

自闭症网瘾萝莉.ら 提交于 2019-12-08 06:19:53
问题 I am using Django-Haystack with Whoosh backend. When I do a query I get no results. I tried the debugging steps suggested in the Haystack docs by typing the following into a Django shell, and I can see that all the text I want has been indexed. from haystack.query import SearchQuerySet sqs = SearchQuerySet().all() sqs.count() sqs[0].text My search.html page has the following section (copied straight from the documentation): {% for result in page.object_list %} <p> <a href="{{ result.object

Django 1.9/Haystack 2.4.1 “Model could not be found for SearchResult”

自古美人都是妖i 提交于 2019-12-07 17:58:39
Let me just first say, I have tried the fixes here: Haystack says “Model could not be found for SearchResult” and I'm still getting Model could not be found for SearchResult '<SearchResult: dictionary.termentry (pk=u'10')>'. I'm on Django 1.9 & Haystack 2.4.1 with Whoosh. I've determined that the SearchQuerySet is filtering just fine (when I print queryset I get a list of SearchResult objects). I didn't touch anything beyond the SearchIndex definitions, so this is out-of-the-box stuff. Just for reference, here's the relevant bits of code: in search_indexes.py: class TermIndex(indexes

whoosh学习(3)

情到浓时终转凉″ 提交于 2019-12-07 17:41:08
使用whoosh之前,你需要一个索引对象 第一次创建索引时,你需要定义索引schema(结构),schema包括所有的的索引字段。 索引字段记录了索引的信息,比如标题,内容。 索引字段可以用来搜索或者排序。 比如,拥有两个字段的schema from whoosh.fields import Schema, TEXT schema = Schema(title=TEXT, content=TEXT) 字段类型 whoosh.fields.ID whoosh.fields.STORED whoosh.fields.KEYWORD whoosh.fields.TEXT whoosh.fields.NUMERIC whoosh.fields.BOOLEAN whoosh.fields.DATETIME whoosh.fields.NGRAM and whoosh.fields.NGRAMWORDS DEMO 创建一个索引对象 from whoosh.fields import Schema, STORED, ID, KEYWORD, TEXT import os.path from whoosh.index import create_in schema = Schema(title=TEXT(stored=True), content=TEXT, path=ID(stored=True

Haystack whoosh models() not narrowing models

独自空忆成欢 提交于 2019-12-07 11:35:01
问题 I have the following query locations = SearchQuerySet().filter_or(content__in=words).models(Location) but it's returning other models as well, I would only want to see Location instances. Using Haystack 2.1.0 and whoosh 2.5 Any ideas? 回答1: My current work around is to use filter(django_ct='app_name.model') 回答2: I ran into the same issue with Model filtering being ignored. I was able to get .models() working by downgrading to Haystack 2.0.0 and Whoosh 2.4.1 回答3: This is based partly on James

Whoosh 原理与实战1--Python 搜索框架 Whoosh 简介

风格不统一 提交于 2019-12-07 04:20:29
Whoosh 是一个纯 Python 编写的搜索框架,类似于Lucene。比较简单,可以快速构建站内搜索。也可以在此基础上构建搜索引擎,但需要自己扩展 爬虫Spider 和 中文分词组件。 Whoosh详细可以查看 http://www.oschina.net/p/whoosh 最近构思了一个个人网站,准备采用 Python 开发,主要看重 Python 快速构建能力(当然,我不太会 Python,这也算一个 Python的学习作品,开发不一定快哈)。网站初步包括新闻、博客、社区,这三个版块需要站内搜索;同时有一个资讯版块,需要构建一个主题搜索引擎。这里都采用 Whoosh 作为基础,然后自己实现 Spider、中文分词完成。 下面将逐步介绍: 1. Whoosh 原理与实战 2. Python 中文分词设计与开发 3. Spider 设计与开发 由于需要边学习边写作,这只能算作我的 Python学习笔记。可能一些内容不太准确、合理,希望大家指正。 来源: oschina 链接: https://my.oschina.net/u/220491/blog/88685