支持向量机
SVM算法
SVM的大致思想是,让正确答案得分比较多,比其它稍微多一些即可,至于多多少并不关心。
SVM的公式为:
其中,yi表示正确分类,j表示其它分类,Δ为一个常数
SVM梯度计算公式为:
朴素的损失函数计算
通过两层循环,计算出来每个点的得分,并与正确分类得分比较,得出结果。
def svm_loss_naive(W, X, y, reg):
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
# 这里是添加的地方,根据公式得出
dW[:, y[i]] += -X[i,: ]
dW[:, j] += X[i,: ]
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
# Add regularization to the loss.
# 很重要的归一化!我把参数都扩大两倍,竟然就神奇地overflow了。。
loss += 0.5 * reg * np.sum(W * W)
dW += reg * W
return loss, dW
问题1
It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable
Your Answer: fill this in.
数值解不符有可能是因为刚好在转折点上,该点不可导。可以减小dx使得数值解更精准。
(但是本次测试貌似相差不大。。刚好没碰到那个点)
向量化的损失函数计算
向量算起来确实要快啊。。
def svm_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
numTrain = X.shape[0]
numClasses = W.shape[1]
scores = X.dot(W)
# 损失计算
correctClassScore = scores[np.arange(numTrain), y] # 先提取出来所有行的第y列,此时是一个一维数组
correctClassScore = np.reshape(correctClassScore, (numTrain, -1)) # 整合成列向量,此时每一行都是一个单独的list
margins = scores - correctClassScore + 1 # score是一个矩阵,用广播的形式减去列向量,+1
margins = np.maximum(0, margins)
margins[np.arange(numTrain), y] = 0 # 第y列为0
loss += np.sum(margins)
loss /= numTrain
loss += 0.5 * reg * np.sum(W * W)
# 梯度计算
margins[margins > 0] = 1 # 根据公式,那些最后结果大于0的为1
rowSum = np.sum(margins, axis = 1) # 从左向右,计数那些>0的列,用来给yi梯度用
margins[np.arange(numTrain), y] = -rowSum
dW += np.dot(X.T, margins)
dW /= numTrain
dW += reg * W
return loss, dW
梯度下降的训练函数
def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
batch_size=200, verbose=False):
num_train, dim = X.shape
num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
if self.W is None:
# lazily initialize W
self.W = 0.001 * np.random.randn(dim, num_classes)
# Run stochastic gradient descent to optimize W
loss_history = []
for it in range(num_iters):
X_batch = None
y_batch = None
# 从训练集数量中随机选取一小部分数量,采用代替的形式,听说replace更快。
index = np.random.choice(num_train, batch_size, replace = True)
X_batch = X[index,:]
y_batch = y[index]
# evaluate loss and gradient
loss, grad = self.loss(X_batch, y_batch, reg)
loss_history.append(loss)
# perform parameter update
self.W += -learning_rate * grad
if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))
return loss_history
预测函数
def predict(self, X):
y_pred = np.zeros(X.shape[0])
# 点乘一下,取个最大的就是预测的了
# 第一次自己想出来的代码QAQ
y_pred = np.argmax(X.dot(self.W), axis = 1)
return y_pred
调超参
调整步长以及惩罚系数
part = 5 # 把区间平均n等分,得到n+1个点
for i in range(part + 1):
for j in range(part + 1):
rate = i * (learning_rates[1] - learning_rates[0]) / part + learning_rates[0] # 当前步长
strength = j * (regularization_strengths[1] - regularization_strengths[0]) / part + regularization_strengths[0] # 当前惩罚系数
svm = LinearSVM()
svm.train(X_train, y_train, learning_rate = rate, reg = strength, num_iters = 500)
# 看训练集上效果怎么样
yTrainpred = svm.predict(X_train)
trainAccuracy = np.mean(y_train == yTrainpred)
# 看验证集上效果怎么样
yValpred = svm.predict(X_val)
valAccuracy = np.mean(y_val == yValpred)
results[(rate, strength)] = (trainAccuracy, valAccuracy)
# 选取验证集上最好的那个
if(best_val < valAccuracy):
best_val = valAccuracy
best_svm = svm
测试调出来的超参
36.4%,还阔以
问题2
Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.
Your answer: fill this in
什么都看不清。。。因为是多个图片训练的结果。。