crf

安装oracle12C RAC时可跳过gimr安装

断了今生、忘了曾经 提交于 2019-12-26 18:40:27
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 在安装12C GI的过程中,会要求安装 GIMR 的数据库MGMTDB,当然你可能会选NO。在12.1.01版本时是可以选择GIMR不安装,但是在12.1.0.2和12.2 版本中GIMR成了强制安装,即使在这里选择了NO,这里的YES和NO的区别只是把MTMTDB是存放在OCR ASM DISKGROU还是独立的创建ASM DISKGROUP. 12C r1是GIMR的位置是有OCR的路径决定的。 MGMTDB只是1个CDB包含1个PDB的完整的数据库环境,通常不需要人维护,存储的是GIMR的信息,用于存放 cluster health monitor 生成的一些操作系统级的负载指标,存储着历史信息用于分析性能和诊断问题,是全集成在EM 12C 中。 GIMR 存放的信息更多查看 the documentation here,而对于 cluster health monitor 是可以手动停止,使用下面的命令: $ crsctl stop res ora.crf -init $ crsctl delete res ora.crf -init 我们可以在安装gi的时候增加命令行 -J-D oracle .install.mgmtDB= false 来跳过MGMTDB的安装。 来源: oschina 链接:

Learnig NER using category list

狂风中的少年 提交于 2019-12-25 05:19:17
问题 In the template for training CRF++, how can I include a custom dictionary.txt file for listed companies, another for popular European foods, for eg, or just about any category. Then provide a sample training data for each category whereby it learns how those specific named entites are used within a context for that category. In this way, I as well as the system, can be sure it correctly understood how certain named entites are structured in a text, whether a tweet or a Pulitzer prize winning

NER CRF, Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory [duplicate]

蹲街弑〆低调 提交于 2019-12-25 04:44:05
问题 This question already has answers here : Why am I getting a NoClassDefFoundError in Java? (23 answers) Closed 3 years ago . I have downloaded the latest version for NER from this link. Then after extracting it, I have run this command. java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop This is not working and getting following exception. CRFClassifier invoked on Mon Jul 25 06:56:22 EDT 2016 with arguments: -prop austen.prop Exception in thread "main" java.lang

Training Stanford-NER-CRF, control number of iterations and regularisation (L1,L2) parameters

China☆狼群 提交于 2019-12-24 23:15:53
问题 I was looking through StanfordNER documentation/FAQ but I can't find anything related to specifying the maximum number of iterations in training and also the value of the regularisation parameters L1 and L2. I saw an answer on which is suggested to set, for instance: maxIterations=10 in the properties file, but that did not gave any results. Is it possible to set these parameters? 回答1: I had to dig in the code but found it, so basically StanfordNER supports many different numerical

CRF/Seq2Seq/CTC的Loss实现对比

本秂侑毒 提交于 2019-12-16 08:05:34
CRF/Seq2Seq/CTC的目标函数对比(CRF Loss解析) 这里基于TensorFlow的实现,对三种序列化的任务的目标函数做一个总结。 1. 序列化任务的定义和训练 输入输出都是序列。 先明确下三个任务的不同: CRF:通常用于序列标注任务,比如:BiLSTM+CRF、IDCNN+CRF,场景的特点是, 输入与输出是一一对应的 。 其中语义模型先根据输入生成每个字的“打分”(后验概率的-log),作为解码时的反向观测概率。 Seq2Seq:通常用于生成式问答、摘要生成、机器翻译等等,一般是一种编码器和解码器的结构,特点是:输入与输出长度不一定相同。 CTC解码:一种语音识别的方法,输入为语音,输出为文字,特点是:一种输出可能对应着多个正确的路径。 CTC可以参考:https://distill.pub/2017/ctc/ 三个问题都是解码问题,因为特点的不同,目标函数也不一样: 对于CRF来说目标函数包含两个部分: loss = unary potential + pairwise potential = ClassifyLoss + TranstionLoss (名字是我自己编的,感觉好理解一些),然后用句子的真实长度做mask。 ClassifyLoss(unary potential):语义模型会生成每个字的得分,其实就是预测的tag在tags词典中的概率的

Stanford CRFClassifier performance evaluation output

做~自己de王妃 提交于 2019-12-11 18:33:43
问题 I'm following this FAQ https://nlp.stanford.edu/software/crf-faq.shtml for training my own classifier and I noticed that the performance evaluation output does not match the results (or at least not in the way I expect). Specifically this section CRFClassifier tagged 16119 words in 1 documents at 13824.19 words per second. Entity P R F1 TP FP FN MYLABEL 1.0000 0.9961 0.9980 255 0 1 Totals 1.0000 0.9961 0.9980 255 0 1 I expect TP to be all instances where the predicted label matched the golden

CRF可以应用于数值型数据吗?

萝らか妹 提交于 2019-12-10 11:54:39
最近在做车辆的轨迹预测,采用的LSTM,seq2seq的网络结构,训练的结果显示,总体的Loss曲线是收敛的,问题在于,预测的轨迹数值有跳动,了解到CEF可以学习到标签之间的依赖关系。所以想请教一下各位,可不可以将CRF模型应用于数值型的数据?有没有哪位遇到过相似的问题? 来源: CSDN 作者: 国服程咬金 链接: https://blog.csdn.net/weixin_40548480/article/details/103470766

how to represent gazetteers or dictionaries as features in crf++?

不羁岁月 提交于 2019-12-10 09:24:23
问题 how to use gazetteers or dictionaries as features in CRF++? To elaborate: suppose I want to do NER on person names, and I am having a gazetteer (or dictionary) containing commonly seen person names, I want to use this gazetteer as an input to crf++, how can I do that? I am using the conditional random field package crf++ to perform named entity recognition tasks. I know how to represent some commonly used features in crf++. For example, if we want to use Capitalization as a feature, we can

中文自然语言处理工具hanlp隐马角色标注详解

喜欢而已 提交于 2019-12-10 01:39:18
本文旨在介绍如何利用HanLP训练分词模型,包括语料格式、语料预处理、训练接口、输出格式等。 目前HanLP内置的训练接口是针对一阶HMM-NGram设计的,另外附带了通用的语料加载工具,可以通过少量代码导出供其他训练工具使用的特定格式(如CRF++)。 语料格式 输入语料格式为人民日报分词语料库格式。该格式并没有明确的规范,但总体满足以下几点: 1、单词与词性之间使用“/”分割,如华尔街/nsf,且任何单词都必须有词性,包括标点等。 2、单词与单词之间使用空格分割,如美国/nsf 华尔街/nsf 股市/n。 3、支持用[]将多个单词合并为一个复合词,如[纽约/nsf 时报/n]/nz,复合词也必须遵守1和2两点规范。 你可以参考OpenCorpus/pku98/199801.txt(作者并无版权,请勿询问)。 语料预处理 语料预处理指的是将语料加载到内存中,根据需要增删改其中部分词语的一个过程。 在HanLP中,这是通过CorpusLoader.walk实现的: CorpusLoader.walk("path/to/your/corpus", new CorpusLoader.Handler() { @Override public void handle(Document document) { System.out.println(document); } }); 其中

How to make a template file of CRF++?

亡梦爱人 提交于 2019-12-09 10:29:22
问题 I'm new to CRF++. I'm teaching myself looking at its manual: http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ And I don't understand what this means: This is a template to describe unigram features. When you give a template "U01:%x[0,1]", CRF++ automatically generates a set of feature functions (func1 ... funcN) like: func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0 func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0 func3 = if