Softmax

softmax算法

softmax可以大致理解为,尽可能让正确的概率接近1,因此任何一点变动都会导致最后结果的改变。
softmax函数的计算公式为
softmaxFunc
其中logC取-max(f),这么做的目的是使所有指数都<=0,防止指数爆炸。
因此损失函数的计算公式为
lossFunc
注意自觉减去max(f)

梯度函数的计算公式为:
gradientFunc

朴素的损失函数

dWeach = np.zeros_like(W)
trainNum = X.shape[0]
classNum = W.shape[1]
scores = X.dot(W) # 一个N行C列的矩阵
maxScore = np.reshape(np.max(scores, axis = 1), (trainNum, 1)) # N行1列
probability = np.exp(scores - maxScore) / np.sum(np.exp(scores - maxScore), axis = 1, keepdims = True) # N行C列,每一点代表该图片x对应分类y的概率
correctClass = np.zeros_like(probability)
correctClass[np.arange(trainNum), y] = 1.0 # 克罗内克矩阵,所有对应的正确的类为1
for i in range(trainNum):
    for j in range(classNum):
        loss += -correctClass[i][j] * np.log(probability[i][j]) # 因为只考虑正确的那个标签的概率,用克罗内克矩阵显然很合适
        dWeach[:,j] = -(correctClass[i][j] - probability[i][j]) * X[i,:] # 根据梯度的计算公式可得
    dW += dWeach
loss /= trainNum
loss += 0.5 * reg * np.sum(W * W)
dW /= trainNum
dW += reg * W

问题1

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer: Fill this in
随机生成的数据,期望的损失值接近-log(1/C)

向量化的损失函数

根据之前的公式,可以得到相应的结论。

trainNum = X.shape[0]
scores = X.dot(W)
maxScore = np.reshape(np.max(scores, axis = 1), (trainNum, 1)) # N行1列
probability = np.exp(scores - maxScore) / np.sum(np.exp(scores - maxScore), axis = 1, keepdims = True) # N行C列,每一点代表该图片x对应分类y的概率
correctClass = np.zeros_like(probability)
correctClass[np.arange(trainNum), y] = 1.0 # 克罗内克矩阵,所有对应的正确的类为1

loss += -np.sum(correctClass * np.log(probability)) # 直接修改成矩阵乘即可
dW += -np.dot(X.T, correctClass - probability) # 把分类讨论合并,发现点乘即可

loss /= trainNum
loss += 0.5 * reg * np.sum(W * W)
dW /= trainNum
dW += reg * W

调超参

与svm类似

part = 5 # 把区间平均n等分,得到n+1个点
for i in range(part + 1):
    for j in range(part + 1):
        rate = i * (learning_rates[1] - learning_rates[0]) / part + learning_rates[0] # 当前步长
        strength = j * (regularization_strengths[1] - regularization_strengths[0]) / part + regularization_strengths[0] # 当前惩罚系数
        softmax = Softmax()
        softmax.train(X_train, y_train, learning_rate = rate, reg = strength, num_iters = 500)
        # 看训练集上效果怎么样
        yTrainpred = softmax.predict(X_train)
        trainAccuracy = np.mean(y_train == yTrainpred)
        # 看验证集上效果怎么样
        yValpred = softmax.predict(X_val)
        valAccuracy = np.mean(y_val == yValpred)
        results[(rate, strength)] = (trainAccuracy, valAccuracy)
        # 选取验证集上最好的那个
        if(best_val < valAccuracy):
            best_val = valAccuracy
            best_softmax = softmax

最好超参的结果

35.4% 还可以。。

问题2

It’s possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.

Your answer:
True
Your explanation:
softmax会一直向概率为1的方向优化,因此加入一个数据也会导致最优结果的变化

最后的模型可视化

还是什么都看不出来。。除了车稍微有点车的轮廓。。