softmax

How best to deal with “None of the above” in Image Classification?

阅读更多关于 How best to deal with “None of the above” in Image Classification?

问题 This seems to be a fundamental question which some of you out there must have an opinion on. I have an image classifier implemented in CNTK with 48 classes. If the image does not match any of the 48 classes very well, then I'd like to be able to conclude that it was not among these 48 image types. My original idea was simply that if the highest output of the final Softmax layer was low, I would be able to conclude that the test image matched none well. While I occasionally see this occur, in

Argmax on a tensor and ceiling in Tensorflow

阅读更多关于 Argmax on a tensor and ceiling in Tensorflow

问题 Suppose I have a tensor in Tensorflow that its values are like: A = [[0.7, 0.2, 0.1],[0.1, 0.4, 0.5]] How can I change this tensor into the following: B = [[1, 0, 0],[0, 0, 1]] In other words I want to just keep the maximum and replace it with 1. Any help would be appreciated. 回答1: I think that you can solve it with a one-liner: import tensorflow as tf import numpy as np x_data = [[0.7, 0.2, 0.1],[0.1, 0.4, 0.5]] # I am using hard-coded dimensions for simplicity x = tf.placeholder(dtype=tf

tensorflow 错误杂记

阅读更多关于 tensorflow 错误杂记

ValueError：No gradients provided for any variable 错误解释：要进行训练的变量与 Loss function 之间没有路径联系起来原因：很大可能是因为在 sess.run(train_step) 使用了 sess.run() 或者是 x.eval() 修改方法：在训练之前，不要使用任何的 run ，修改代码，使得所有的 op 在最后的会话 ‘session’ 中进行实现训练之后输出的结果为 nan 具体的原因不太清楚，我改正我这个问题的做法是将前面代码的 tf.nn.softmax(x) 改为了 tf.nn.log_softmax(x) 就解决了 ValueError: setting an array element with a sequence 通常是因为这儿需要的是 array，你用的是 list，或者需要的是 list，你用的 array，从这方面入手进行改错优化器 optimizer，GradientDescentOptimizer 不报错，RMSPropOptimizer，AdamOptimizer 会报错因为 AdamOptimizer， RMSPropOptimizer 他们在内部会生成新的变量，所以 tf.initialize_all_variables() 应该在 optimizer 定义的后面再运行

Neural Network using Softmax with strange outputs

阅读更多关于 Neural Network using Softmax with strange outputs

问题 I'm trying to build a tensorflow neural network using a sigmoid activation hidden layer and a softmax output layer with 3 classes. The outputs are mostly very bad and I believe it is because I am making a mistake in my model construction because I've built a similar model with Matlab and the results have been good. The data is normalized. These results look like this: [9.2164397e-01 1.6932052e-03 7.6662831e-02] [3.4100169e-01 2.2419590e-01 4.3480241e-01] [2.3466848e-06 1.3276369e-04 9

Softmax function of a numpy array by row

阅读更多关于 Softmax function of a numpy array by row

问题 I am trying to apply a softmax function to a numpy array. But I am not getting the desired results. This is the code I have tried: import numpy as np x = np.array([[1001,1002],[3,4]]) softmax = np.exp(x - np.max(x))/(np.sum(np.exp(x - np.max(x))) print softmax I think the x - np.max(x) code is not subtracting the max of each row. The max needs to be subtracted from x to prevent very large numbers. This is supposed to output np.array([ [0.26894142, 0.73105858], [0.26894142, 0.73105858]]) But I

ArcFace算法笔记

阅读更多关于 ArcFace算法笔记

论文：ArcFace: Additive Angular Margin Loss for Deep Face Recognition 论文链接： https://arxiv.org/abs/1801.07698 代码链接： https://github.com/deepinsight/insightface 这篇文章提出一种新的用于人脸识别的损失函数：additive angular margin loss，基于该损失函数训练得到人脸识别算法ArcFace（开源代码中为该算法取名为insightface，二者意思一样，接下来都用ArchFace代替）。ArcFace的思想（additive angular margin）和SphereFace以及不久前的CosineFace（additive cosine margin ）有一定的共同点，重点在于：在ArchFace中是直接在角度空间（angular space）中最大化分类界限，而CosineFace是在余弦空间中最大化分类界限，这也是为什么这篇文章叫ArcFace的原因，因为arc含义和angular一样。除了损失函数外，本文的作者还清洗了公开数据集MS-Celeb-1M的数据，并强调了干净数据的对实验结果的影响，同时还对网络结构和参数做了优化。总体来说ArcFace这篇文章做了很多实验来验证additive angular

阅读更多关于 softmax

1、what ? Softmax function, a wonderful activation function that turns numbers aka logits into probabilities that sum to one. Softmax function outputs a vector that represents the probability distributions of a list of potential outcomes. 2、how ？　　two component 　　special number e & sum 　　 3、Why not just divide each logits by the sum of logits? Why do we need exponents? 　　When logits are negative, adding it together does not give us the correct normalization . exponentiate logits turn them them zero or positive! 参考补充知识Logits。 4、python 实现代码： import numpy as np def softmax(logits): ## 以e 为底，list

Neural Network using Softmax with strange outputs

阅读更多关于 Neural Network using Softmax with strange outputs

I'm trying to build a tensorflow neural network using a sigmoid activation hidden layer and a softmax output layer with 3 classes. The outputs are mostly very bad and I believe it is because I am making a mistake in my model construction because I've built a similar model with Matlab and the results have been good. The data is normalized. These results look like this: [9.2164397e-01 1.6932052e-03 7.6662831e-02] [3.4100169e-01 2.2419590e-01 4.3480241e-01] [2.3466848e-06 1.3276369e-04 9.9986482e-01] [6.5199631e-01 3.4800139e-01 2.3596617e-06] [9.9879754e-01 9.0103465e-05 1.1123115e-03] [6

模型蒸馏（Distil）及mnist实践

阅读更多关于模型蒸馏（Distil）及mnist实践

结论：蒸馏是个好方法。模型压缩/蒸馏在论文《Model Compression》及《Distilling the Knowledge in a Neural Network》提及，下面介绍后者及使用keras测试mnist数据集。蒸馏：使用小模型模拟大模型的泛性。通常，我们训练mnist时，target是分类标签，在蒸馏模型时，使用的是教师模型的输出概率分布作为“soft target”。也即损失为学生网络与教师网络输出的交叉熵（这里采用DistilBert论文中的策略，此论文不同）。当训练好教师网络后，我们可以不再需要分类标签，只需要比较2个网络的输出概率分布。当然可以在损失里再加上学生网络的分类损失，论文也提到可以进一步优化。如图，将softmax公式稍微变换一下，目的是使得输出更小，softmax后就更为平滑。论文的损失定义本文代码使用的损失为p和q的交叉熵代码测试部分 1，教师网络，测试精度99.46%，已经相当好了，可训练参数858,618。 # 教师网络 inputs=Input((28,28,1)) x=Conv2D(64,3)(inputs) x=BatchNormalization(center=True,scale=False)(x) x=Activation('relu')(x) x=Conv2D(64,3,strides=2)(x) x

[转]MNIST机器学习入门

阅读更多关于 [转]MNIST机器学习入门

MNIST机器学习入门转自:http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_beginners.html?plg_nld=1&plg_uin=1&plg_auth=1&plg_nld=1&plg_usr=1&plg_vkey=1&plg_dev=1 这个教程的目标读者是对机器学习和TensorFlow都不太了解的新手。如果你已经了解MNIST和softmax回归(softmax regression)的相关知识，你可以阅读这个快速上手教程。当我们开始学习编程的时候，第一件事往往是学习打印"Hello World"。就好比编程入门有Hello World，机器学习入门有MNIST。 MNIST是一个入门级的计算机视觉数据集，它包含各种手写数字图片：它也包含每一张图片对应的标签，告诉我们这个是数字几。比如，上面这四张图片的标签分别是5，0，4，1。在此教程中，我们将训练一个机器学习模型用于预测图片里面的数字。我们的目的不是要设计一个世界一流的复杂模型 -- 尽管我们会在之后给你源代码去实现一流的预测模型 -- 而是要介绍下如何使用TensorFlow。所以，我们这里会从一个很简单的数学模型开始，它叫做Softmax Regression。对应这个教程的实现代码很短

订阅 softmax