word

利用word分词提供的文本相似度算法来辅助记忆英语单词

99封情书 提交于 2019-11-27 14:25:16
本文实现代码: 利用word分词提供的文本相似度算法来辅助记忆英语单词 本文使用的英语单词囊括了几乎所有的考纲词汇共18123词: /** * 考纲词汇 * @return */ public static Set<Word> getSyllabusVocabulary(){ return get("/word_primary_school.txt", "/word_junior_school.txt", "/word_senior_school.txt", "/word_university.txt", "/word_new_conception.txt", "/word_ADULT.txt", "/word_CET4.txt", "/word_CET6.txt", "/word_TEM4.txt", "/word_TEM8.txt", "/word_CATTI.txt", "/word_GMAT.txt", "/word_GRE.txt", "/word_SAT.txt", "/word_BEC.txt", "/word_MBA.txt", "/word_IELTS.txt", "/word_TOEFL.txt", "/word_TOEIC.txt", "/word_考 研.txt"); } 启动程序后控制台输出: -------------------------------

利用word分词通过计算词的语境来获得相关词

北战南征 提交于 2019-11-27 14:25:02
我们如何通过计算词的 语境 来获得 相关词 呢? 语境 的定义是: 在一段文本中,任意一个词的语境由它的 前N个词和后N个词 组成。 相关词 的定义是: 如果两个词的语境越相似,那么这两个词就越相似,也就越相关。 算法由两个步骤组成: 1、从大规模语料库中计算每一个词的语境 ,并使用 词向量 来表示语境。 实现代码 2、把求两个词的相似度的问题 转换为 求这两个词的 语境 的 相似度 的问题。通过计算语境的相似度,就可得到词的相似度,越相似的词就越相关。 实现代码 关于相似度 计算,word分词还提供了很多种算法, 参考这里 使用方法如下: 1、使用 word分词 内置 语料库: 运行 word分词 项目根目录下的脚本 demo-word-vector-corpus.bat 或 demo-word-vector-corpus.sh 2、使用自己的文本内容: 运行 word分词 项目根目录下的 脚本 demo-word-vector-file.bat 或 demo-word-vector-file.sh 由于语料库很大,所以启动的时间会很长,请耐心等待,下面以例子来说明: 比如我们想分析 兰州 这个词的相关词有哪些,我们运行脚本 demo-word-vector-corpus.sh ,启动成功之后命令行提示: 开始初始化模型 模型初始化完成 可通过输入命令sa=cos来指定相似度算法

java动态填充word文档并上传到服务器

大憨熊 提交于 2019-11-27 12:16:00
一、 需求背景   在一些特殊应用场合,客户希望在服务器上生成文档的同时并填充数据,客户端的页面不显示打开文档,但是服务器上生成文档对服务器压力很大,目前服务器上生成文档第一种就是方式是jacob, 但是局限于windows平台,往往许多JAVA程序运行于其他操作系统,在此不讨论该方案。二是POI。 但是它的excel处理还凑合, word模块还局限于读取word的文本内容,写word文件的功能就更弱;还有一个要命的地方,处理doc格式和处理docx格式的类几乎完全不同,要分开针对不同的格式写不同的代码,这就意味着用户上传的docx格式文件如果使用了doc的扩展名,程序马上崩溃。而且个人认为poi结构混乱,编码比较复杂,开发过程非常消耗时间和精力。PageOffice提供了FileMakerCtrl组件,FileMakerCtrl是在客户端生成文档并上传到服务器,但是不会在Web网页里显示word文档,因此采用FileMakerCtrl生成word文件有两个优点:1. 在客户端生成word文档,不会对服务器造成任何压力;2. 生成的文档属于标准的word文档格式。 二、 核心代码   1.制作模板,打开word模板文件,在文件中插入书签:PO_company、PO_year、PO_number,如下图所示:      2. 动态填充word文档并上传到服务器

What are the sizes of tword, oword and yword operands?

江枫思渺然 提交于 2019-11-27 12:07:20
What are the sizes of tword , oword and yword operands, as used in the NASM / YASM manual ? And on a related note, is there a trick or underlying idea to these names? Is there a way by which bigger word sizes are given logical names? I know that while word sizes may differ between systems, a NASM word is 2 bytes, dword is double that (4 bytes), qword is a quad word (8 bytes), but... is tword a triple word (6 bytes)? And for oword and yword I can't even think of a plausible meaning. Note that it is probably an easy question, but I couldn't find an answer. In the NASM and YASM manuals these

Java API for plural forms of English words

给你一囗甜甜゛ 提交于 2019-11-27 08:02:21
Are there any Java API(s) which will provide plural form of English words (e.g. cacti for cactus )? Meng Lu Wolfram|Alpha return a list of inflection forms for a given word. See this as an example: http://www.wolframalpha.com/input/?i=word+cactus+inflected+forms And here is their API: http://products.wolframalpha.com/api/ Sławek Check Evo Inflector which implements English pluralization algorithm based on Damian Conway paper " An Algorithmic Approach to English Pluralization ". The library is tested against data from Wiktionary and reports 100% success rate for 1000 most used English words and

Counting word occurrences in a table column

柔情痞子 提交于 2019-11-27 07:52:08
问题 I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table. If there are 2 rows with these fields: "I like to eat bananas" "I don't like to eat like a monkey" I want to get word | count() --------------- like 3 eat 2 to 2 i 2 a 1 Any idea? I am using MySQL 5.2. 回答1: @Elad Meidar, I like your question and I found a solution: SELECT SUM(total_count) as total, value FROM ( SELECT count(*) AS

Word comparison algorithm

天涯浪子 提交于 2019-11-27 07:19:48
I am doing a CSV Import tool for the project I'm working on. The client needs to be able to enter the data in excel, export them as CSV and upload them to the database. For example I have this CSV record: 1, John Doe, ACME Comapny (the typo is on purpose) Of course, the companies are kept in a separate table and linked with a foreign key, so I need to discover the correct company ID before inserting. I plan to do this by comparing the company names in the database with the company names in the CSV. the comparison should return 0 if the strings are exactly the same, and return some value that

Javascript: find word in string

隐身守侯 提交于 2019-11-27 07:06:31
问题 Does Javascript have a built-in function to see if a word is present in a string? I'm not looking for something like indexOf() , but rather: find_word('test', 'this is a test.') -> true find_word('test', 'this is a test') -> true find_word('test', 'I am testing this out') -> false find_word('test', 'test this out please') -> true find_word('test', 'attest to that if you would') -> false Essentially, I'd like to know if my word appears, but not as part of another word. It wouldn't be too hard

Android Word-Wrap EditText text

我是研究僧i 提交于 2019-11-27 06:56:53
I have been trying to get my EditText box to word wrap, but can't seem to do it. I have dealt with much more complicated issues while developing Android applications, and this seems like it should be a straightforward process. However, the issue remains, and I have a large text box that is only allowing me to enter text on one line, continuing straight across, scrolling horizontally as I enter text. Here is the XML code for the EditText object from my layout file. <?xml version="1.0" encoding="utf-8"?> <LinearLayout android:id="@+id/myWidget48" android:layout_width="fill_parent" android:layout

利用word分词来计算文本相似度

懵懂的女人 提交于 2019-11-27 01:11:16
word分词 提供了多种文本相似度计算方式: 方式一:余弦相似度,通过计算两个向量的夹角余弦值来评估他们的相似度 实现类: org.apdplat.word.analysis.CosineTextSimilarity 用法如下: String text1 = "我爱购物"; String text2 = "我爱读书"; String text3 = "他是黑客"; TextSimilarity textSimilarity = new CosineTextSimilarity(); double score1pk1 = textSimilarity.similarScore(text1, text1); double score1pk2 = textSimilarity.similarScore(text1, text2); double score1pk3 = textSimilarity.similarScore(text1, text3); double score2pk2 = textSimilarity.similarScore(text2, text2); double score2pk3 = textSimilarity.similarScore(text2, text3); double score3pk3 = textSimilarity.similarScore