Fuzzy

《程序人生》系列-害敖丙差点被开除的P0事故

帅比萌擦擦* 提交于 2019-12-06 06:30:40
你知道的越多,你不知道的越多 点赞再看,养成习惯 GitHub https://github.com/JavaFamily 上已经收录有一线大厂面试点脑图、个人联系方式和技术交流群,欢迎Star和指教 前言 这是帅丙真实事件,大家都知道很多公司都是有故障等级这么一说的,这就是敖丙在公司背的P0级故障,敖丙差点因此 被解雇 ,事情经过 十分惊心动魄 ,我的 心脏病都差点复发 。 事故等级主要针对生产环境,划分依据类似于bug等级。 P0属于最高级别事故,比如崩溃,页面无法访问,主流程不通,主功能未实现,或者在影响面上影响很大(即使bug本身不严重)。 P1事故属于高级别事故,一般属于主功能上的分支,支线流程,核心次功能等,后面还有P2,P3等,主要根据企业实际情况划分。 正文 敖丙之前也负责公司的商品搜索业务,因为业务体量增速太快了,商品表中的商品数据也很快跃入千万级别,查询的RT(response time 响应时间)也越来越高了,而且产品说需要根据 更多维度去查询商品 。 因为之前我们都是根据商品的名称去查询的,但是电商其实都会根据很多个维度去查询商品。 就比如大家去淘宝的查询的时候就会发现,你搜商品名称、颜色、标签等等多个维度都可以找到这个商品,就比如下图的搜索,我只是搜了【 帅丙 】你会发现,名字里面也没有连续的帅丙两个字,有帅和丙的出来了

Elasticsearch DSL语法的学习

为君一笑 提交于 2019-12-05 01:49:54
DSL语法学习 (1)term和terms查询 (2)match查询 match_all: 查询所有文档 multi_match:可以指定多个字段 match_phrase:短语匹配查询 (3)rang范围查询 (4)wildcard查询 允许使用通配符*和?来进行查询 *代表0个或多个字符 ?代表任意一个字符 (5)fuzzy模糊查询 value:查询的关键字 boost:查询的权值,默认值1.0 (6)highlight高亮显示 fields (7)bool查询 must:满足的条件是----and should:可以满足也可以不满足的天剑-----or must_not:不需要的条件----not (8)聚合查询 sum:求总和 avg:求平均值 count:统计数 cardinality: 值去重计数 <hr/> 查询:GET GET/_search{ "query":{"term":{"user":"kimchy"}}} 查询document #对age进行倒序查询 POST/pigg/_search { "query": {"match_all": {}}, "sort": [ {"age": {"order": "desc"}} ] } #查询前2条数据,from是从0开始的 POST/pigg/_search { "query": {"match_all": {}}

MySql的checkpoint

橙三吉。 提交于 2019-12-04 13:46:43
前言 对数据库中数据进行增删改时,都是先在buffer pool中完成,为了提高事务的操作效率,buffer pool中数据并不会立即写入磁盘,所以可能出现内存中数据和磁盘数据不一致的情况。 如果buffer pool发生故障导致数据无法持久化,造成磁盘和buffer pool数据不一致。 为了防止内存中修改的数据尚未写入磁盘,发生故障而不能持久化的问题。可以通过redo log先行的方式进行保障。 redo log可以在故障重启之后做“重做”,保障了事务的持久化特性,但是redo log空间不可能无限扩大,对于内存中已修改未提交到磁盘的数据,也就是“脏页”,也需要写入磁盘。 对于内存中“脏页”的处理就是checkpoint的工作,在一定情况下将脏页放入磁盘。 checkpoint主要解决以下问题: 缩短数据库恢复的时间。 缓冲池不够用时,将脏页刷入磁盘。 重做日志不可用时,刷新脏页。 故障恢复时只需要对checkpoint后的重做日志进行恢复,缩短了恢复时间。 缓冲区不够用时,采用lru算法,使部分脏页刷入磁盘。 checkpoint分类 checkpoint分为两种: sharp checkpoint:在关闭数据库时,将buffer pool中的脏页全部刷入磁盘。 fuzzy checkpoint:在数据库正常运行时,找到不同时机将脏页写入磁盘,一部分一部分的刷入磁盘

Matching fuzzy strings

时间秒杀一切 提交于 2019-12-03 12:49:32
I have two tables that I need to merge together in PostgreSQL, on the common variable "company name." Unfortunately many of the company names don't match exactly (i.e. MICROSOFT in one table, MICROSFT in the other). I've tried removing common words from both columns such as "corporation" or "inc" or "ltd" in order to try to standardize names across both tables, but I'm having trouble thinking of additional strategies. Any ideas? Thanks. Also, if necessary I can do this in R. Have you considered the fuzzystrmatch module? You can use soundex , difference , levenshtein , metaphone and dmetaphone

Clang for fuzzy parsing C++

萝らか妹 提交于 2019-12-03 10:13:18
Is it at all possible to parse C++ with incomplete declarations with clang with its existing libclang API ? I.e. parse .cpp file without including all the headers, deducing declarations on the fly. so, e.g. The following text: A B::Foo(){return stuff();} Will detect unknown symbol A, call my callback that deducts A is a class using my magic heuristic, then call this callback the same way with B and Foo and stuff. In the end I want to be able to infer that I saw a member Foo of class B returning A, and stuff is a function.. Or something to that effect. context: I wanna see if I can do sensible

Find similar ASCII character in Unicode

一曲冷凌霜 提交于 2019-12-01 01:32:51
问题 Does someone know a easy way to find characters in Unicode that are similar to ASCII characters. An example is the "CYRILLIC SMALL LETTER DZE (ѕ)". I'd like to do a search and replace for similar characters. By similar I mean human readable. You can't see a difference by looking at it. 回答1: As noted by other commenters, Unicode normalisation ("compatibilty characters") isn't going to help you here as you aren't looking for official equivalences but for similarities in glyphs (letter shapes).

Create a unique ID by fuzzy matching of names (via agrep using R)

左心房为你撑大大i 提交于 2019-11-30 09:23:39
Using R, I am trying match on people's names in a dataset structured by year and city. Due to some spelling mistakes, exact matching is not possible, so I am trying to use agrep() to fuzzy match names. A sample chunk of the dataset is structured as follows: df <- data.frame(matrix( c("1200013","1200013","1200013","1200013","1200013","1200013","1200013","1200013", "1996","1996","1996","1996","2000","2000","2004","2004","AGUSTINHO FORTUNATO FILHO","ANTONIO PEREIRA NETO","FERNANDO JOSE DA COSTA","PAULO CEZAR FERREIRA DE ARAUJO","PAULO CESAR FERREIRA DE ARAUJO","SEBASTIAO BOCALOM RODRIGUES","JOAO

django fuzzy string translation not showing up

蹲街弑〆低调 提交于 2019-11-30 03:42:48
Why sometimes I get a fuzzy item in django.po language file. Actually, I have checked in my project the fuzzy string item is totally unique. #: .\users\views.py:81 .\users\views.py:101 #, fuzzy msgid "username or email" msgstr "9988" It is ok to be fuzzy but my translation of fuzzy item not showing up on the page, only English version shows up. It is totally odd. Martin v. Löwis msgmerge marks strings as fuzzy if the old catalog had a translation for a strings with a similar-looking msgid . It also carries over strings marked as fuzzy from an old catalog to a new one. msgfmt excludes fuzzy

clustering and matlab

走远了吗. 提交于 2019-11-29 04:12:13
I'm trying to cluster some data I have from the KDD 1999 cup dataset the output from the file looks like this: 0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal. with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now: I created a comma delimited file in excel and saved as a csv file then created a data source from the csv file in matlab, ive tryed running it through the fcm toolbox in matlab

django fuzzy string translation not showing up

非 Y 不嫁゛ 提交于 2019-11-28 22:28:36
问题 Why sometimes I get a fuzzy item in django.po language file. Actually, I have checked in my project the fuzzy string item is totally unique. #: .\users\views.py:81 .\users\views.py:101 #, fuzzy msgid "username or email" msgstr "9988" It is ok to be fuzzy but my translation of fuzzy item not showing up on the page, only English version shows up. It is totally odd. 回答1: msgmerge marks strings as fuzzy if the old catalog had a translation for a strings with a similar-looking msgid . It also