whoosh | 易学教程

whoosh学习（2）

阅读更多关于 whoosh学习（2）

hello world #!/usr/bin/env #coding:utf-8 from whoosh.fields import * from whoosh.index import create_in from whoosh.index import open_dir from whoosh.qparser import QueryParser def createIndexs(dirName): schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirName, schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.add_document(title=u"edc document",

django+haystack+whoosh+分词库=搜索站

阅读更多关于 django+haystack+whoosh+分词库=搜索站

前言很少有朋友写相关haystack与whoosh的文章，更没有一个较好的示例。所以我把这个完整的示例开源了，希望有兴趣的同学能够交流。示例网站解决的问题：不同字段的优先级，比如Title比Content高。（haystack的whoosh backend默认不支持此功能）相关性搜索（more_like_this）功能能结合mysql数据库使用解决中文分词功能（可以用我自己的yaha分词，或者结巴分词）实现了更好的 ChineseAnalyzer，这个的测试地址使用whoosh 2.5.1。默认haystack使用whoosh2.4，用2.5会报错修正了haystack+whoosh2.5.1的搜索词纠正功能，默认用老的Spelling API，在whoosh2.5.1下不支持。同时注意：如果使用结巴分词默认的ChineseAnanlyzer，请修改代码如下会更好： def ChineseAnalyzer(stoplist=STOP_WORDS,minsize=1,stemfn=stem,cachesize=50000): return ChineseTokenizer()|LowercaseFilter()|StopFilter(stoplist=stoplist,minsize=minsize)\ |StemFilter(stemfn=stemfn

whoosh学习（1）

阅读更多关于 whoosh学习（1）

背景当前项目需要用到全文搜索 redis不方便实现 mysql效率太低搜索引擎选择 pylucene whoosh（似乎更受欢迎，文档最全）为什么选择纯python实现，省了编译二进制包的繁琐过程。 python代码比java更容易读懂，而且用起来也更方便。（翻者注：这个容易引发口水）在很多时候易用性比单纯的最求速度更重要。 whoosh使用流程创建schema 索引生成索引查询来源： oschina 链接： https://my.oschina.net/u/2351685/blog/603063

Language Modal through whoosh in Information Retrieval

阅读更多关于 Language Modal through whoosh in Information Retrieval

问题 I am working in IR. Can any one guide me, how can I implement the language modal in whoosh. I already Applied TD-IDF and BM25. I am new to IR. For an example, the simplest form of language model simply throws away all conditioning context, and estimates each term independently. Such a model is called a unigram language model: P_{uni}(t_1t_2t_3t_4) = P(t_1)P(t_2)P(t_3)P(t_4) There are many more complex kinds of language models, such as bigram language models, which condition on the previous

Django haystack+whoosh error

阅读更多关于 Django haystack+whoosh error

问题 I'm trying to make a search in my django app then i used haystack and whoosh but i faced some troubles. first when i tried to rebuild_index or update_index it gives me this error right down below, also second one is when i typed and search it gave me 0 results. So i just thought that if this rebuild_index fixed the search problem will be solved. And please anyone help me with this errors: /usr/local/lib/python2.7/dist-packages/django/db/models/fields/__init__.py:903: RuntimeWarning:

Django-Haystack returns no results in search form

阅读更多关于 Django-Haystack returns no results in search form

问题 I am using Django-Haystack with Whoosh backend. When I do a query I get no results. I tried the debugging steps suggested in the Haystack docs by typing the following into a Django shell, and I can see that all the text I want has been indexed. from haystack.query import SearchQuerySet sqs = SearchQuerySet().all() sqs.count() sqs[0].text My search.html page has the following section (copied straight from the documentation): {% for result in page.object_list %} <p> <a href="{{ result.object

Django 1.9/Haystack 2.4.1 “Model could not be found for SearchResult”

阅读更多关于 Django 1.9/Haystack 2.4.1 “Model could not be found for SearchResult”

Let me just first say, I have tried the fixes here: Haystack says “Model could not be found for SearchResult” and I'm still getting Model could not be found for SearchResult '<SearchResult: dictionary.termentry (pk=u'10')>'. I'm on Django 1.9 & Haystack 2.4.1 with Whoosh. I've determined that the SearchQuerySet is filtering just fine (when I print queryset I get a list of SearchResult objects). I didn't touch anything beyond the SearchIndex definitions, so this is out-of-the-box stuff. Just for reference, here's the relevant bits of code: in search_indexes.py: class TermIndex(indexes

whoosh学习（3）

阅读更多关于 whoosh学习（3）

使用whoosh之前，你需要一个索引对象第一次创建索引时，你需要定义索引schema（结构），schema包括所有的的索引字段。索引字段记录了索引的信息，比如标题，内容。索引字段可以用来搜索或者排序。比如,拥有两个字段的schema from whoosh.fields import Schema, TEXT schema = Schema(title=TEXT, content=TEXT) 字段类型 whoosh.fields.ID whoosh.fields.STORED whoosh.fields.KEYWORD whoosh.fields.TEXT whoosh.fields.NUMERIC whoosh.fields.BOOLEAN whoosh.fields.DATETIME whoosh.fields.NGRAM and whoosh.fields.NGRAMWORDS DEMO 创建一个索引对象 from whoosh.fields import Schema, STORED, ID, KEYWORD, TEXT import os.path from whoosh.index import create_in schema = Schema(title=TEXT(stored=True), content=TEXT, path=ID(stored=True

Haystack whoosh models() not narrowing models

阅读更多关于 Haystack whoosh models() not narrowing models

问题 I have the following query locations = SearchQuerySet().filter_or(content__in=words).models(Location) but it's returning other models as well, I would only want to see Location instances. Using Haystack 2.1.0 and whoosh 2.5 Any ideas? 回答1: My current work around is to use filter(django_ct='app_name.model') 回答2: I ran into the same issue with Model filtering being ignored. I was able to get .models() working by downgrading to Haystack 2.0.0 and Whoosh 2.4.1 回答3: This is based partly on James

Whoosh 原理与实战1--Python 搜索框架 Whoosh 简介

阅读更多关于 Whoosh 原理与实战1--Python 搜索框架 Whoosh 简介

Whoosh 是一个纯 Python 编写的搜索框架，类似于Lucene。比较简单，可以快速构建站内搜索。也可以在此基础上构建搜索引擎，但需要自己扩展爬虫Spider 和中文分词组件。 Whoosh详细可以查看 http://www.oschina.net/p/whoosh 最近构思了一个个人网站，准备采用 Python 开发，主要看重 Python 快速构建能力（当然，我不太会 Python，这也算一个 Python的学习作品，开发不一定快哈）。网站初步包括新闻、博客、社区，这三个版块需要站内搜索；同时有一个资讯版块，需要构建一个主题搜索引擎。这里都采用 Whoosh 作为基础，然后自己实现 Spider、中文分词完成。下面将逐步介绍： 1. Whoosh 原理与实战 2. Python 中文分词设计与开发 3. Spider 设计与开发由于需要边学习边写作，这只能算作我的 Python学习笔记。可能一些内容不太准确、合理，希望大家指正。来源： oschina 链接： https://my.oschina.net/u/220491/blog/88685