Caffe学习笔记01
1.数据集
CIFAR10数据集由60000幅32x32大小的彩色图像构成,一共有10个类别,其中每个类别有6000张图像。有50000张训练图像和10000张测试图像。
2.准备工作
首先你需要从CIFAR-10网站下载并且转化数据格式。为了实现这个,你可以简单地运行下面的指令:
(1)下载数据集
cd $CAFFE_ROOT/data/cifar10
./get_cifar10.sh
CAFFE_ROOT是Caffe的根目录
(2)转换数据格式
cd $CAFFE_ROOT
./examples/cifar10/create_cifar10.sh
到此为止,我们已经下好了数据集保存在/CAFFE_ROOT/data/cifar10文件夹下,并且数据集转换格式后的文件和数据集的均值在/CAFFE_ROOT/examples/cifar10文件夹下。
3.模型
模型文件是存在在/CAFFE_ROOT/examples/cifar10文件夹下的prototxt文件中,我们以cifar10_quick_train_test.prototxt文件为例。
name: "CIFAR10_quick"
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.0001
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "pool1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool3"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 64
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
以上内容就是我们的网络结构,我们可以使用可视化的方式观看,通过可视化工具点击这里,我们可以观察我们定义的网络。可以看出这个简单的网络由数据输入层、卷积层、池化层、RELU层和全连接层构成。所以说如果我们想更改网络结构只需修改prototxt文件即可。
4. 训练和测试
在我们写好网络定义protobuf文件和参数设置文件solver之后(实际上是示例代码已经给咱们写好了)。简单的运行train_quick.sh
即可。
cd $CAFFE_ROOT
./examples/cifar10/train_quick.sh
train_quick.sh是一个简单的脚本,会把执行信息显示出来。
分别会显示
(1)我们在solver中设置的参数信息
I0316 03:04:56.309629 31444 solver.cpp:48] Initializing solver from parameters:
test_iter: 100
test_interval: 500
base_lr: 0.001
display: 100
max_iter: 4000
lr_policy: "fixed"
momentum: 0.9
weight_decay: 0.004
snapshot: 4000
snapshot_prefix: "examples/cifar10/cifar10_quick"
solver_mode: GPU
device_id: 0
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
(2)从prototxt文件中定义的网络结构
name: "CIFAR10_quick"
state {
phase: TRAIN
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
(3)创建网络的层与层之间连接顺序,及所需要的内存,和哪些层需要反传计算。
I0316 03:04:56.311331 31444 layer_factory.hpp:77] Creating layer cifar
I0316 03:04:56.311914 31444 net.cpp:91] Creating Layer cifar
I0316 03:04:56.311936 31444 net.cpp:399] cifar -> data
I0316 03:04:56.311974 31444 net.cpp:399] cifar -> label
I0316 03:04:56.311990 31444 data_transformer.cpp:25] Loading mean file from: examples/cifar10/mean.binaryproto
I0316 03:04:56.313457 31446 db_lmdb.cpp:35] Opened lmdb examples/cifar10/cifar10_train_lmdb
I0316 03:04:56.344117 31444 data_layer.cpp:41] output data size: 100,3,32,32
I0316 03:04:56.348085 31444 net.cpp:141] Setting up cifar
I0316 03:04:56.348131 31444 net.cpp:148] Top shape: 100 3 32 32 (307200)
I0316 03:04:56.348142 31444 net.cpp:148] Top shape: 100 (100)
I0316 03:04:56.348150 31444 net.cpp:156] Memory required for data: 1229200
I0316 03:04:56.348161 31444 layer_factory.hpp:77] Creating layer conv1
I0316 03:04:56.348196 31444 net.cpp:91] Creating Layer conv1
I0316 03:04:56.348206 31444 net.cpp:425] conv1 <- data
I0316 03:04:56.348223 31444 net.cpp:399] conv1 -> conv1
I0316 03:04:56.758509 31444 net.cpp:141] Setting up conv1
I0316 03:04:56.758564 31444 net.cpp:148] Top shape: 100 32 32 32 (3276800)
I0316 03:04:56.758575 31444 net.cpp:156] Memory required for data: 14336400
I0316 03:04:56.758605 31444 layer_factory.hpp:77] Creating layer pool1
I0316 03:04:56.758627 31444 net.cpp:91] Creating Layer pool1
...
I0316 03:04:56.773228 31444 net.cpp:156] Memory required for data: 31978800 <--请看好,这里是需要的内存
I0316 03:04:56.773244 31444 layer_factory.hpp:77] Creating layer loss
I0316 03:04:56.773257 31444 net.cpp:91] Creating Layer loss
I0316 03:04:56.773267 31444 net.cpp:425] loss <- ip2
I0316 03:04:56.773275 31444 net.cpp:425] loss <- label
I0316 03:04:56.773288 31444 net.cpp:399] loss -> loss
I0316 03:04:56.773317 31444 layer_factory.hpp:77] Creating layer loss
I0316 03:04:56.773861 31444 net.cpp:141] Setting up loss
I0316 03:04:56.773887 31444 net.cpp:148] Top shape: (1)
I0316 03:04:56.773896 31444 net.cpp:151] with loss weight 1
I0316 03:04:56.773921 31444 net.cpp:156] Memory required for data: 31978804
<---------------请看好,这里是需要反传计算的层------------------->
I0316 03:04:56.773931 31444 net.cpp:217] loss needs backward computation.
I0316 03:04:56.773941 31444 net.cpp:217] ip2 needs backward computation.
I0316 03:04:56.773948 31444 net.cpp:217] ip1 needs backward computation.
I0316 03:04:56.773957 31444 net.cpp:217] pool3 needs backward computation.
I0316 03:04:56.773964 31444 net.cpp:217] relu3 needs backward computation.
I0316 03:04:56.773973 31444 net.cpp:217] conv3 needs backward computation.
I0316 03:04:56.773983 31444 net.cpp:217] pool2 needs backward computation.
I0316 03:04:56.773991 31444 net.cpp:217] relu2 needs backward computation.
I0316 03:04:56.773999 31444 net.cpp:217] conv2 needs backward computation.
I0316 03:04:56.774008 31444 net.cpp:217] relu1 needs backward computation.
I0316 03:04:56.774016 31444 net.cpp:217] pool1 needs backward computation.
I0316 03:04:56.774024 31444 net.cpp:217] conv1 needs backward computation.
I0316 03:04:56.774034 31444 net.cpp:219] cifar does not need backward computation.
I0316 03:04:56.774044 31444 net.cpp:261] This network produces output loss
I0316 03:04:56.774061 31444 net.cpp:274] Network initialization done.
...
(4)最后会输出优化过程结果
)
I0316 03:07:11.809412 31705 sgd_solver.cpp:106] Iteration 4700, lr = 0.0001
I0316 03:07:14.348947 31705 solver.cpp:228] Iteration 4800, loss = 0.427255
I0316 03:07:14.349012 31705 solver.cpp:244] Train net output #0: loss = 0.427255 (* 1 = 0.427255 loss)
I0316 03:07:14.349023 31705 sgd_solver.cpp:106] Iteration 4800, lr = 0.0001
I0316 03:07:17.086114 31705 solver.cpp:228] Iteration 4900, loss = 0.469861
I0316 03:07:17.086187 31705 solver.cpp:244] Train net output #0: loss = 0.469861 (* 1 = 0.469861 loss)
I0316 03:07:17.086199 31705 sgd_solver.cpp:106] Iteration 4900, lr = 0.0001
I0316 03:07:20.278740 31705 solver.cpp:464] Snapshotting to HDF5 file examples/cifar10/cifar10_quick_iter_5000.caffemodel.h5
I0316 03:07:20.289602 31705 sgd_solver.cpp:283] Snapshotting solver state to HDF5 file examples/cifar10/cifar10_quick_iter_5000.solverstate.h5
I0316 03:07:20.296205 31705 solver.cpp:317] Iteration 5000, loss = 0.593591
I0316 03:07:20.296250 31705 solver.cpp:337] Iteration 5000, Testing net (#0)
I0316 03:07:21.401197 31705 solver.cpp:404] Test net output #0: accuracy = 0.7573
I0316 03:07:21.401386 31705 solver.cpp:404] Test net output #1: loss = 0.751382 (* 1 = 0.751382 loss)
I0316 03:07:21.401397 31705 solver.cpp:322] Optimization Done.
I0316 03:07:21.401404 31705 caffe.cpp:222] Optimization Done.
其中╮(╯_╰)╭每100次迭代次数显示一次训练时lr(learning rate)和loss(训练损失函数),每500次测试一次,输出accuracy(准确率)和loss(测试损失函数)
当5000次迭代之后,正确率约是0.75,该模型的参数存储在二进制protobuf格式在cifar10_quick_iter_5000
然后,这个模型就可以用来运行在新数上了。训练好的模型如下图所示。
5.GUP or CPU
可以通过在cifar10*solver.prototxt文件可以使用选择使用CPU或是GPU训练模型。
# solver mode: CPU or GPU
solver_mode: CPU
附带train_quick.sh文件和cifar10_quick_solver.prototxt文件内容。
train_quick.sh
#!/usr/bin/env sh
TOOLS=./build/tools
$TOOLS/caffe train \
--solver=examples/cifar10/cifar10_quick_solver.prototxt
# reduce learning rate by factor of 10 after 8 epochs
$TOOLS/caffe train \
--solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt \
--snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
cifar10_quick_solver.prototxt
# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 4000
# snapshot intermediate results
snapshot: 4000
snapshot_format: HDF5
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: GPU