支持向量机

SVM算法

SVM的大致思想是,让正确答案得分比较多,比其它稍微多一些即可,至于多多少并不关心。
SVM的公式为:
SVMFormula
其中,yi表示正确分类,j表示其它分类,Δ为一个常数

SVM梯度计算公式为:
SVMGradientFormula
SVMGradientFormula

朴素的损失函数计算

通过两层循环,计算出来每个点的得分,并与正确分类得分比较,得出结果。

def svm_loss_naive(W, X, y, reg):
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in range(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin
        # 这里是添加的地方,根据公式得出
        dW[:, y[i]] +=  -X[i,: ]
        dW[:, j] += X[i,: ]

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train
  # Add regularization to the loss.
  # 很重要的归一化!我把参数都扩大两倍,竟然就神奇地overflow了。。
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg * W

  return loss, dW

问题1

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.
数值解不符有可能是因为刚好在转折点上,该点不可导。可以减小dx使得数值解更精准。
(但是本次测试貌似相差不大。。刚好没碰到那个点)

向量化的损失函数计算

向量算起来确实要快啊。。

def svm_loss_vectorized(W, X, y, reg):
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero

  numTrain = X.shape[0]
  numClasses = W.shape[1]
  scores = X.dot(W)

  # 损失计算
  correctClassScore = scores[np.arange(numTrain), y] # 先提取出来所有行的第y列,此时是一个一维数组
  correctClassScore = np.reshape(correctClassScore, (numTrain, -1)) # 整合成列向量,此时每一行都是一个单独的list
  margins = scores - correctClassScore + 1 # score是一个矩阵,用广播的形式减去列向量,+1
  margins = np.maximum(0, margins)
  margins[np.arange(numTrain), y] = 0 # 第y列为0
  loss += np.sum(margins)
  loss /= numTrain
  loss += 0.5 * reg * np.sum(W * W)

  # 梯度计算
  margins[margins > 0] = 1 # 根据公式,那些最后结果大于0的为1
  rowSum = np.sum(margins, axis = 1) # 从左向右,计数那些>0的列,用来给yi梯度用
  margins[np.arange(numTrain), y] = -rowSum
  dW += np.dot(X.T, margins)
  dW /= numTrain
  dW += reg * W

  return loss, dW

梯度下降的训练函数

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
          batch_size=200, verbose=False):
  num_train, dim = X.shape
  num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
  if self.W is None:
    # lazily initialize W
    self.W = 0.001 * np.random.randn(dim, num_classes)

  # Run stochastic gradient descent to optimize W
  loss_history = []
  for it in range(num_iters):
    X_batch = None
    y_batch = None

    # 从训练集数量中随机选取一小部分数量,采用代替的形式,听说replace更快。
    index = np.random.choice(num_train, batch_size, replace = True)
    X_batch = X[index,:]
    y_batch = y[index]

    # evaluate loss and gradient
    loss, grad = self.loss(X_batch, y_batch, reg)
    loss_history.append(loss)

    # perform parameter update
    self.W += -learning_rate * grad

    if verbose and it % 100 == 0:
      print('iteration %d / %d: loss %f' % (it, num_iters, loss))

  return loss_history

预测函数

def predict(self, X):
  y_pred = np.zeros(X.shape[0])
  # 点乘一下,取个最大的就是预测的了
  # 第一次自己想出来的代码QAQ
  y_pred = np.argmax(X.dot(self.W), axis = 1)
  return y_pred

调超参

调整步长以及惩罚系数

part = 5 # 把区间平均n等分,得到n+1个点
for i in range(part + 1):
    for j in range(part + 1):
        rate = i * (learning_rates[1] - learning_rates[0]) / part + learning_rates[0] # 当前步长
        strength = j * (regularization_strengths[1] - regularization_strengths[0]) / part + regularization_strengths[0] # 当前惩罚系数
        svm = LinearSVM()
        svm.train(X_train, y_train, learning_rate = rate, reg = strength, num_iters = 500)
        # 看训练集上效果怎么样
        yTrainpred = svm.predict(X_train)
        trainAccuracy = np.mean(y_train == yTrainpred)
        # 看验证集上效果怎么样
        yValpred = svm.predict(X_val)
        valAccuracy = np.mean(y_val == yValpred)
        results[(rate, strength)] = (trainAccuracy, valAccuracy)
        # 选取验证集上最好的那个
        if(best_val < valAccuracy):
            best_val = valAccuracy
            best_svm = svm

测试调出来的超参

36.4%,还阔以

问题2

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in
什么都看不清。。。因为是多个图片训练的结果。。