lr

pytorch中动态调整学习率

最后都变了- 提交于 2019-12-03 11:49:34
https://blog.csdn.net/bc521bc/article/details/85864555 这篇bolg说的很详细了,但是具体在代码中怎么用还是有点模糊。自己试验了一下,顺路记一下,其实很简单,在optimizer后面定义一下,然后在每个epoch中step一下就可以了。一开始出错是因为我把step放到 T_optimizer.step()后面了,导致一个epoch后小到看不出来了. T_optimizer = SGD(net.parameters(), lr=LR, weight_decay=0.0005, momentum=0.9)scheduler = lr_scheduler.StepLR(T_optimizer, step_size=30, gamma=0.5) for epoch in enumerate(range(startepoch, startepoch + EPOCH)): for anc, pos, neg in Triplet_data: net.zero_grad() anc_feat = net(anc.to(device)) pos_feat = net(pos.to(device)) neg_feat = net(neg.to(device)) tri_loss = T_loss(anc_feat, pos_feat, neg

Example of an LR grammar that cannot be represented by LL?

痞子三分冷 提交于 2019-12-03 10:09:47
All LL grammars are LR grammars, but not the other way around, but I still struggle to deal with the distinction. I'm curious about small examples, if any exist, of LR grammars which do not have an equivalent LL representation. Chris Dodd Well, as far as grammars are concerned, its easy -- any simple left-recursive grammar is LR (probably LR(1)) and not LL. So a list grammar like: list ::= list ',' element | element is LR(1) (assuming the production for element is) but not LL. Such grammars can be fairly easily converted into LL grammars by left-factoring and such, so this is not too

个性化排序算法实践(四)——GBDT+LR

[亡魂溺海] 提交于 2019-12-03 09:53:13
本质上GBDT+LR是一种具有 stacking 思想的二分类器模型,所以可以用来解决二分类问题。这个方法出自于Facebook 2014年的论文 Practical Lessons from Predicting Clicks on Ads at Facebook 。 GBDT+LR 使用最广泛的场景是CTR点击率预估,即预测当给用户推送的广告会不会被用户点击。 点击率预估模型涉及的训练样本一般是上亿级别,样本量大,模型常采用速度较快的LR。但LR是线性模型,学习能力有限,此时特征工程尤其重要。现有的特征工程实验,主要集中在寻找到有区分度的特征、特征组合,折腾一圈未必会带来效果提升。GBDT算法的特点正好可以用来发掘有区分度的特征、特征组合,减少特征工程中人力成本。 思想 GBDT+LR 由两部分组成,其中GBDT用来对训练集提取特征作为新的训练输入数据,LR作为新训练输入数据的分类器。 GBDT首先对原始训练数据做训练,得到一个二分类器,当然这里也需要利用网格搜索寻找最佳参数组合。 与通常做法不同的是,当GBDT训练好做预测的时候,输出的并不是最终的二分类概率值,而是要把模型中的每棵树计算得到的预测概率值所属的叶子结点位置记为1,这样,就构造出了新的训练数据。 设GBDT有两个弱分类器,分别以蓝色和红色部分表示,其中蓝色弱分类器叶子结点个数为3,红色弱分类器叶子结点个数为2

How to use the Spatial Pyramid Layer in caffe in proto files?

匿名 (未验证) 提交于 2019-12-03 09:06:55
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Hi I would like to know how to use the SPP Layer in a proto file. Maybe someone could explain to me how to read the caffe docs, as it is sometimes hard for me to understand it directly. My attempt is based on this protofile , but I think it differs from the current version? I defined the layer like this: layers { name: "spatial_pyramid_pooling" type: "SPP" bottom: "conv2" top: "spatial_pyramid_pooling" spatial_pyramid_pooling_param { pool: MAX spatial_bin: 1 spatial_bin: 2 spatial_bin: 3 spatial_bin: 6 scale: 1 } } When I try to start

Why are there LR(0) parsers but not LL(0) parsers?

帅比萌擦擦* 提交于 2019-12-03 09:04:40
问题 I've been reading on both in Wikipedia, and noticed that although LR(0) parsers exist, there's no such thing as LL(0) parser. From what I read, I understand that the k in LL(k)/LR(k) means how many characters the parser can see beyond the current character that it's currently working on. So my question is, why is there no such thing as LL(0) parser even though LR(0) exists? 回答1: The difference has to do with what the k means in LR(k) versus LL(k). In LL(k), the parser maintains information

How do Java, C++, C#, etc. get around this particular syntactic ambiguity with < and >?

老子叫甜甜 提交于 2019-12-03 03:10:12
问题 I used to think C++ was the "weird" one with all the ambiguities with < and > , but after trying to implement a parser I think I found an example which breaks just about every language that uses < and > for generic types: f(g<h, i>(j)); This could be syntactically either interpreted as a generic method call ( g ), or it could be interpreted as giving f the results of two comparisons. How do such languages (especially Java, which I thought was supposed to be LALR(1)-parsable?) get around this

Why are there LR(0) parsers but not LL(0) parsers?

匿名 (未验证) 提交于 2019-12-03 02:51:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I've been reading on both in Wikipedia, and noticed that although LR(0) parsers exist, there's no such thing as LL(0) parser. From what I read, I understand that the k in LL(k)/LR(k) means how many characters the parser can see beyond the current character that it's currently working on. So my question is, why is there no such thing as LL(0) parser even though LR(0) exists? 回答1: The difference has to do with what the k means in LR(k) versus LL(k). In LL(k), the parser maintains information about a top-down, left-to-right parse that traces

How should “BatchNorm” layer be used in caffe?

匿名 (未验证) 提交于 2019-12-03 01:26:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am a little confused about how should I use/insert "BatchNorm" layer in my models. I see several different approaches, for instance: ResNets : "BatchNorm" + "Scale" (no parameter sharing) "BatchNorm" layer is followed immediately with "Scale" layer: layer { bottom: "res2a_branch1" top: "res2a_branch1" name: "bn2a_branch1" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "res2a_branch1" top: "res2a_branch1" name: "scale2a_branch1" type: "Scale" scale_param { bias_term: true } } cifar10 example : only

连续特征如何离散化,为什么要离散化,常用于逻辑回归模型

匿名 (未验证) 提交于 2019-12-03 00:27:02
转自: 连续特征离散化达到更好的效果,特征选择的工程方法 连续特征的离散化:在什么情况下将连续的特征离散化之后可以获得更好的效果? Q:CTR 预估,发现 CTR 预估一般都是用 LR ,而且特征都是离散的。为什么一定要用离散特征呢?这样做的好处在哪里? A: 在工业界,很少直接将连续值作为逻辑回归模型的特征输入,而是将连续特征离散化为一系列0、1特征交给逻辑回归模型,这样做的优势有以下几点: 0、 离散特征的增加和减少都很容易,易于模型的快速迭代。(离散特征的增加和减少,模型也不需要调整,重新训练是必须的,相比贝叶斯推断方法或者树模型方法迭代快) 1、稀疏向量内积乘法运算速度快,计算结果方便存储,容易扩展; 2、 离散化后的特征对异常数据有很强的鲁棒性:比如一个特征是年龄>30 是1,否则0。如果特征没有离散化,一个异常数据“年龄300岁”会给模型造成很大的干扰;离散化后年龄300岁也只对应于一个权重,如果训练数据中没有出现特征"年龄-300岁",那么在LR模型中,其权重对应于0,所以,即使测试数据中出现特征"年龄-300岁",也不会对预测结果产生影响。特征离散化的过程,比如特征A,如果当做连续特征使用,在LR模型中,A会对应一个权重w,如果离散化,那么A就拓展为特征A-1,A-2,A-3...,每个特征对应于一个权重,如果训练样本中没有出现特征A-4,那么训练的模型对于A