Reshape 和 Transpose 是各大深度学习框架必备的算子。
TensorFlow:tf.reshape(...) 和 tf.transpose(...)
PyTorch:torch.reshape(...) 和 torch.transpose(...)
mindspore:mindspore.ops.Reshape(...) 和 mindspore.ops.Transpose(...)
等等等等.....
### 一、我遇到了什么问题
我们使用 np.array(Image.open("image_path"), dtype='uint8') 读入的图片数据是 HWC 格式的(假设为彩色图,即有 C=3,该数组共有三“层”,每“层”记录了 R、G、B三个通道在 width*height 面积中各自的像素值)。但是通常的神经网络结构在网络中处理的数据一般都要求为 NCHW 格式(N 指 batch_size),这种现象在 mindspore 中尤为突出(主要我用 mindspore 相对多一点)。
实际操作中,如果 HWC --> NCHW 或 NCHW --> HWC 的操作使用不当,就会遇到一些很难受的情况,我将通过几个示例介绍它的应用场景及误用导致的问题。
### 二、使用场景1:图像数据转可视化存储
下述代码中的 "create_BRDNetDataset" 来自 https://gitee.com/mindspore/mindspore/tree/r1.2/model_zoo/official/cv/brdnet 中的 src.dataset.py ,其作用在于为给定图片集施加噪声并返回噪声图片和原图片。
注意:create_BRDNetDataset 函数默认只寻找后缀为 "bmp" 的图片,如果图片格式不为 "bmp" ,请做对应修改;比如本例中,Kodak24 文件夹下的图片均为 "png" 格式,则我将 "src.dataset.py" 中第 31 行的代码改为了
```python
file_dictory = glob.glob(data_path+'*.png') #notice the data format
```
本例的原始目的为:
create_BRDNetDataset 包装成数据集迭代器后,我们可以逐次取出最开始指定的图片数据集中的图片和其对应的噪声图片,但是如果只通过简单的 print(Tensor) 操作的话,我们只能看到图片数据的像素值,不是很直观,因此,我想将这些数据转成图片保存下来,以便观察施加的噪声效果。
注意:迭代下述代码中的 data_loader 得到的图片数据格式为 NCHW,其中 N=1,C=3。
普通操作(误用版):
```python
from src.dataset import create_BRDNetDataset
from mindspore.ops import composite as C
import PIL.Image as Image
import os
from mindspore import context
if __name__ == '__main__':
device_id = int(os.getenv('DEVICE_ID', '6'))
context.set_context(mode=context.GRAPH_MODE, device_id=device_id, device_target="Ascend", save_graphs=False)
out_dir = "./outputs/"
dataset, _= create_BRDNetDataset(r'./Test/Kodak24/', 75, 3, 1, 1, 0, shuffle=False)
data_loader = dataset.create_dict_iterator()
for i, data in enumerate(data_loader):
img_test = data["image"]
img_test = C.clip_by_value(img_test, 0, 1)
img_out = data["label"]
img_out = C.clip_by_value(img_out , 0, 1)
img_test=img_test.asnumpy()
img_test=img_test.reshape((500,500,3))
img_test = Image.fromarray((img_test*255).astype('uint8'))
img_test.save(out_dir+'noise_'+str(i)+'_sigma75.png')
img_out=img_out.asnumpy()
img_out=img_out.reshape((500,500,3))
img_out = Image.fromarray((img_out*255).astype('uint8'))
img_out.save(out_dir+'label_'+str(i)+'_sigma75.png')
```
上述代码中,我们得到的 img_test 和 img_out 的尺寸都是 (1, 3, 500, 500)。由于 Image.save() 操作接收的图片格式必须为 HWC,因此我将 img_test 和 img_out 直接 reshape 成 (500, 500, 3) 格式的,但得到的结果却很懵逼。
原图:

保存后的原图:

保存后的噪声图:

好好的数据保存后图片为什么分成了 9 格?图片中每一行的表现效果还都不一样,他们和 R、B、G 三个通道分别有什么联系?每一行中各列的图片看起来是一样的,他们之间又有什么联系?
(本来打算把这篇文章写完后回过头来补这一块,结果发现 3 个示例介绍完后文章已经很长了,实在不利于阅读学习,因此把这块区域移到另一篇文章单独介绍 http://luxuff.cn/archives/reshape操作对nchw格式的图片数据的影响 )
正常操作版:
```python
from src.dataset import create_BRDNetDataset
from mindspore.ops import composite as C
import PIL.Image as Image
import os
from mindspore import context
if __name__ == '__main__':
device_id = int(os.getenv('DEVICE_ID', '6'))
context.set_context(mode=context.GRAPH_MODE, device_id=device_id, device_target="Ascend", save_graphs=False)
out_dir = "./outputs2/"
dataset, _= create_BRDNetDataset(r'./Test/Kodak24/', 75, 3, 1, 1, 0, shuffle=False)
data_loader = dataset.create_dict_iterator()
for i, data in enumerate(data_loader):
img_test = data["image"]
img_test = C.clip_by_value(img_test, 0, 1)
img_out = data["label"]
img_out= C.clip_by_value(img_out, 0, 1)
img_test=img_test.asnumpy()
img_test=img_test.squeeze(0).transpose((1, 2, 0))
img_test = Image.fromarray((img_test*255).astype('uint8'))
img_test.save(out_dir+'noise_'+str(i)+'_sigma75.png')
img_out=img_out.asnumpy()
img_out=img_out.squeeze(0).transpose((1, 2, 0))
img_out = Image.fromarray((img_out*255).astype('uint8'))
img_out.save(out_dir+'label_'+str(i)+'_sigma75.png')
```
保存后的原图:

保存后的噪声图:

图片看起来是正常了,噪声效果也非常直观。
这份代码和前一份的区别是什么呢?
前者为:
```python
img_test=img_test.reshape((500,500,3))
```
后者为:
```python
img_test=img_test.squeeze(0).transpose((1, 2, 0))
```
已知 img_test 最初的尺寸为 (1, 3, 500, 500),squeeze(0) 操作将其降维,变成了 (3, 500, 500),实际上就是扔掉了 4 维数组最外层的括号;(3, 500, 500) 现在表征的是 CHW 数据,要变成 HWC 数据的话,我们需要将 H 挪到第 0 维, W 挪到第 1 维,C 挪到第 2 维,transpose((1, 2, 0)) 操作即是对数组进行转置,新数组的 0、1、2 维数据现在依次表征原先的 H-(1)、W-(2)、C-(0)。
squeeze(0) 加 transpose((1, 2, 0)) 操作才正确完成了我们想要的工作,这通过简单的 reshape((500,500,3)) 是不行的。那 reshape 为什么会造成上述效果呢?它内部的执行逻辑又是怎么样的呢? 这块留到后面再做解释。
### 三、使用场景2:计算图片间的峰值信噪比
原代码来自:https://github.com/hellloxiaotian/BRDNet/blob/master/colorimage/mainimprovementcolor.py
```python
import argparse
import logging
import os, time, glob
import PIL.Image as Image
import numpy as np
import pandas as pd
#from keras import backend as K
import tensorflow as tf
from keras.callbacks import CSVLogger, ModelCheckpoint, LearningRateScheduler
from keras.models import load_model
from keras.optimizers import Adam
from skimage.measure import compare_psnr, compare_ssim
import models
from multiprocessing import Pool
import random
## Params
parser = argparse.ArgumentParser()
parser.add_argument('--model', default='BRDNet', type=str, help='choose a type of model')
parser.add_argument('--batch_size', default=20, type=int, help='batch size') #128
parser.add_argument('--train_data', default='./data/3859waterloo5050step40color1/', type=str, help='path of train data') #201807081928tcw
parser.add_argument('--test_dir', default='./data/Test/Kodak24', type=str, help='directory of test dataset')
parser.add_argument('--sigma',default = 75, type =int,help='noise level')
parser.add_argument('--epoch', default=50, type=int, help='number of train epoches')
parser.add_argument('--lr', default=1e-3, type=float, help='initial learning rate for Adam')
parser.add_argument('--save_every', default=5, type=int, help='save model at every x epoches') #every x epoches save model
parser.add_argument('--pretrain', default=None, type=str, help='path of pre-trained model')
parser.add_argument('--only_test', default=False, type=bool, help='train and test or only test')
args = parser.parse_args()
if not args.only_test:
save_dir = './snapshot/save_'+ args.model + '_' + 'sigma' + str(args.sigma) + '_' + time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + '/'
if not os.path.exists(save_dir):
os.mkdir(save_dir)
# log
logging.basicConfig(level=logging.INFO,format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
datefmt='%Y %H:%M:%S',
filename=save_dir+'info.log',
filemode='w')
console = logging.StreamHandler()
console.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)-6s: %(levelname)-6s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)
logging.info(args)
else:
save_dir = '/'.join(args.pretrain.split('/')[:-1]) + '/'
def step_decay(epoch):
initial_lr = args.lr
if epoch<30: #tcw
#lr = initial_lr/10 #tcw
lr= initial_lr
else:
lr = initial_lr/10
return lr
def train_datagen(y_, batch_size=8): #201807081925tcw
while(True):
for i in range(0,len(y_),batch_size):
img1 = []
for j in range(i,min(i+batch_size,len(y_))): #read batchsize images, which uses to train
img = np.array(Image.open(y_[j]),dtype='uint8') #dtype='float32')/255.0#.convert('L') #tcw 2018041
img1.append(img)
get_batch_y = img1
get_batch_y= np.array(get_batch_y)
get_batch_y = get_batch_y.astype('float32')/255.0
get_batch_y = get_batch_y.reshape(get_batch_y.shape[0], get_batch_y.shape[1], get_batch_y.shape[2], 3) # If the last parameter is 1, the image is gray. If the last parameter is 3, the image is color image. 201807082123tcw
# np.random.shuffle(get_batch_y)
noise = np.random.normal(0, args.sigma/255.0, get_batch_y.shape) # noise
get_batch_x = get_batch_y + noise # input image = clean image + noise
yield get_batch_x, get_batch_y
#201807081928tcw
def load_images(data_path):
images = [];
file_dictory1 = glob.glob(args.train_data+'*.bmp') #notice the data format
for file in file_dictory1:
#print file
images.append(file)
random.shuffle(images)
return images
def train():
images = load_images(args.train_data)
# model selection
if args.pretrain: model = load_model(args.pretrain, compile=False)
else:
if args.model == 'BRDNet': model = models.BRDNet() #orginal format tcw 20180429
# compile the model
model.compile(optimizer=Adam(), loss=['mse'])
# use call back functions
ckpt = ModelCheckpoint(save_dir+'/model_{epoch:02d}.h5', monitor='val_loss',
verbose=0, period=args.save_every)
csv_logger = CSVLogger(save_dir+'/log.csv', append=True, separator=',')
lr = LearningRateScheduler(step_decay)
history = model.fit_generator(train_datagen(images, batch_size=args.batch_size),
steps_per_epoch=len(images)//args.batch_size, epochs=args.epoch, verbose=1,
callbacks=[ckpt, csv_logger, lr])
return model
def test(model):
print('Start to test on {}'.format(args.test_dir))
out_dir = save_dir + args.test_dir.split('/')[-1] + '/'
if not os.path.exists(out_dir):
os.mkdir(out_dir)
name = []
psnr = []
ssim = []
file_list = glob.glob('{}/*'.format(args.test_dir)) #notice: it is easy to generate error $201804101000tcw #notice; need to change the format
for file in file_list:
# read image
img_clean = np.array(Image.open(file), dtype='float32') / 255.0
np.random.seed(0) #obtain the same random data when it is in the test phase tcw201804151350
img_test = img_clean + np.random.normal(0, args.sigma/255.0, img_clean.shape)
img_test = img_test.astype('float32')
# predict
x_test = img_test.reshape(1, img_test.shape[0], img_test.shape[1], 3) #if the last parameter is 1, the image is gray. If the last parameter is 3201807082123tcw
y_predict = model.predict(x_test) #tcw
# calculate numeric metrics
img_out = y_predict.reshape(img_clean.shape)
img_out = np.clip(img_out, 0, 1)
psnr_noise, psnr_denoised = compare_psnr(img_clean, img_test,True), compare_psnr(img_clean, img_out,True)
ssim_noise, ssim_denoised = compare_ssim(img_clean, img_test,multichannel=True), compare_ssim(img_clean, img_out,multichannel=True)
psnr.append(psnr_denoised)
ssim.append(ssim_denoised)
# save images
filename = file.split('/')[-1].split('.')[0] # get the name of image file
name.append(filename)
img_test = Image.fromarray((img_test*255).astype('uint8'))
img_test.save(out_dir+filename+'_sigma'+'{}_psnr{:.2f}.png'.format(args.sigma, psnr_noise))
img_out = Image.fromarray((img_out*255).astype('uint8'))
img_out.save(out_dir+filename+'_psnr{:.2f}.png'.format(psnr_denoised))
# print psnr_denoised
# print len(psnr)
#print sum(psnr)
psnr_avg = sum(psnr)/len(psnr)
ssim_avg = sum(ssim)/len(ssim)
name.append('Average')
psnr.append(psnr_avg)
ssim.append(ssim_avg)
print('Average PSNR = {0:.2f}, SSIM = {1:.2f}'.format(psnr_avg, ssim_avg))
pd.DataFrame({'name':np.array(name), 'psnr':np.array(psnr), 'ssim':np.array(ssim)}).to_csv(out_dir+'/metrics.csv', index=True)
if __name__ == '__main__':
if args.only_test:
model = load_model(args.pretrain, compile=False)
test(model)
else:
model = train()
test(model)
```
我在用 mindspore 重写其中的 test 函数时,有点没过脑子,直接逐句翻译了过来,忽略了一个点:他的模型输入和输出数据格式都为 NHWC,而我在 mindspore 中使用的数据格式为 NCHW,因此在其中的 reshape 操作那块会出问题,导致我计算出的峰值信噪比(PSNR)值不对劲。
错误代码:
关注 test 函数内部对图片的处理即可。
```python
import datetime
import argparse
import os
import time
import glob
import pandas as pd
import numpy as np
import PIL.Image as Image
import mindspore
import mindspore.nn as nn
from mindspore import context
from mindspore.common import set_seed
from mindspore.ops import composite as C
from mindspore.ops import operations as P
from mindspore.common.tensor import Tensor
from mindspore import load_checkpoint, load_param_into_net
from src.logger import get_logger
from src.models import BRDNet
## Params
parser = argparse.ArgumentParser()
parser.add_argument('--test_dir', default='./Test/Kodak24/'
, type=str, help='directory of test dataset')
parser.add_argument('--sigma', default=75, type=int, help='noise level')
parser.add_argument('--channel', default=3, type=int
, help='image channel, 3 for color, 1 for gray')
parser.add_argument('--pretrain_path', default="./", type=str, help='path of pre-trained model')
parser.add_argument('--ckpt_name', default="channel_3_sigma_75_rank_0-120_546720.ckpt", type=str, help='ckpt_name')
parser.add_argument('--use_modelarts', type=int, default=0
, help='1 for True, 0 for False;when set True, we should load dataset from obs with moxing')
parser.add_argument('--train_url', type=str, default='train_url/'
, help='needed by modelarts, but we donot use it because the name is ambiguous')
parser.add_argument('--data_url', type=str, default='data_url/'
, help='needed by modelarts, but we donot use it because the name is ambiguous')
parser.add_argument('--output_path', type=str, default='./output/'
, help='output_path,when use_modelarts is set True, it will be cache/output/')
parser.add_argument('--outer_path', type=str, default='s3://output/'
, help='obs path,to store e.g ckpt files ')
parser.add_argument('--device_target', type=str, default='Ascend',
help='device where the code will be implemented. (Default: Ascend)')
set_seed(1)
args = parser.parse_args()
save_dir = os.path.join(args.output_path, 'sigma_' + str(args.sigma) + \
'_' + datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
if not args.use_modelarts and not os.path.exists(save_dir):
os.makedirs(save_dir)
def test(model_path):
args.logger.info('Start to test on {}'.format(args.test_dir))
out_dir = os.path.join(save_dir, args.test_dir.split('/')[-2]) # args.test_dir must end by '/'
if not args.use_modelarts and not os.path.exists(out_dir):
os.makedirs(out_dir)
model = BRDNet(args.channel)
args.logger.info('load test weights from '+str(model_path))
load_param_into_net(model, load_checkpoint(model_path))
name = []
psnr = [] #after denoise
ssim = [] #after denoise
psnr_b = [] #before denoise
ssim_b = [] #before denoise
file_list = glob.glob(args.test_dir+'*') # args.test_dir must end by '/'
model.set_train(False)
cast = P.Cast()
reshape = P.Reshape()
transpose = P.Transpose()
expand_dims = P.ExpandDims()
compare_psnr = nn.PSNR()
compare_ssim = nn.SSIM()
args.logger.info("start testing....")
start_time = time.time()
for file in file_list:
suffix = file.split('.')[-1]
# read image
if args.channel == 3:
img_clean = np.array(Image.open(file), dtype='float32') / 255.0
else:
img_clean = np.expand_dims(np.array(Image.open(file).convert('L'), dtype='float32') / 255.0, axis=2)
np.random.seed(0) #obtain the same random data when it is in the test phase
img_test = img_clean + np.random.normal(0, args.sigma/255.0, img_clean.shape)
img_clean = Tensor(img_clean, mindspore.float32) #HWC
img_test = Tensor(img_test, mindspore.float32) #HWC
img_clean = reshape(img_clean,(1,3, img_clean.shape[0], img_clean.shape[1]))
img_test = reshape(img_test,(1,3, img_test.shape[0], img_test.shape[1]))
y_predict = model(img_test) #NCHW
# calculate numeric metrics
img_out = C.clip_by_value(y_predict, 0, 1)
psnr_noise, psnr_denoised = compare_psnr(img_clean, img_test), compare_psnr(img_clean, img_out)
ssim_noise, ssim_denoised = compare_ssim(img_clean, img_test), compare_ssim(img_clean, img_out)
psnr.append(psnr_denoised.asnumpy()[0])
ssim.append(ssim_denoised.asnumpy()[0])
psnr_b.append(psnr_noise.asnumpy()[0])
ssim_b.append(ssim_noise.asnumpy()[0])
# save images
filename = file.split('/')[-1].split('.')[0] # get the name of image file
name.append(filename)
if not args.use_modelarts:
# inner the operation 'Image.save', it will first check the file \
# existence of same name, that is not allowed on modelarts
img_test = cast(img_test*255, mindspore.uint8).asnumpy()
img_test = img_test.squeeze(0).transpose((1, 2, 0)) #turn into HWC to save as an image
img_test = Image.fromarray(img_test)
img_test.save(os.path.join(out_dir, filename+'_sigma'+'{}_psnr{:.2f}.'\
.format(args.sigma, psnr_noise.asnumpy()[0])+str(suffix)))
img_out = cast(img_out*255, mindspore.uint8).asnumpy()
img_out = img_out.squeeze(0).transpose((1, 2, 0)) #turn into HWC to save as an image
img_out = Image.fromarray(img_out)
img_out.save(os.path.join(out_dir, filename+'_psnr{:.2f}.'.format(psnr_denoised.asnumpy()[0])+str(suffix)))
psnr_avg = sum(psnr)/len(psnr)
ssim_avg = sum(ssim)/len(ssim)
psnr_avg_b = sum(psnr_b)/len(psnr_b)
ssim_avg_b = sum(ssim_b)/len(ssim_b)
name.append('Average')
psnr.append(psnr_avg)
ssim.append(ssim_avg)
psnr_b.append(psnr_avg_b)
ssim_b.append(ssim_avg_b)
args.logger.info('Before denoise: Average PSNR_b = {0:.2f}, \
SSIM_b = {1:.2f};After denoise: Average PSNR = {2:.2f}, SSIM = {3:.2f}'\
.format(psnr_avg_b, ssim_avg_b, psnr_avg, ssim_avg))
args.logger.info("testing finished....")
time_used = time.time() - start_time
args.logger.info("time cost:"+str(time_used)+" seconds!")
if not args.use_modelarts:
pd.DataFrame({'name': np.array(name), 'psnr_b': np.array(psnr_b), \
'psnr': np.array(psnr), 'ssim_b': np.array(ssim_b), \
'ssim': np.array(ssim)}).to_csv(out_dir+'/metrics.csv', index=True)
if __name__ == '__main__':
device_id = int(os.getenv('DEVICE_ID', '0'))
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, device_id=device_id, save_graphs=False)
args.logger = get_logger(save_dir, "BRDNet", 0)
args.logger.save_args(args)
test(os.path.join(args.pretrain_path, args.ckpt_name))
```
上述代码在计算 PSNR 值时会使得网络得到的结果和论文结果差距极大(论文 PSNR=27.49,实际我只能得到17.79),关键在于网络结构不是很复杂,输入输出数据也都是正常的(训练阶段是正确的 NCHW 格式),但是训练完成后推理结果始终不如人意。
上述代码的问题在于,第96、97行直接对输入图片做了 reshape 操作,实际得到了一个 “错误顺序” 的图片像素数组,而网络是已被训练好用于处理 “正确顺序” 的图片像素数组,因此无法有效完成去噪任务,从而得到很低的 PSNR 值,当然处理得到的图片可视化后也是一塌糊涂,因为经过 reshape 后,图片中的数据根本就没有按照它应该存在的顺序进行排列。
后来总结了一下,这纯粹是因为前期的眼神不好使和对 reshape 与 transpose 的错误理解,导致后期白白损失了头发。
正确代码:
仍然只需要关注 test 函数内部对图片的处理。
```python
import datetime
import argparse
import os
import time
import glob
import pandas as pd
import numpy as np
import PIL.Image as Image
import mindspore
import mindspore.nn as nn
from mindspore import context
from mindspore.common import set_seed
from mindspore.ops import composite as C
from mindspore.ops import operations as P
from mindspore.common.tensor import Tensor
from mindspore import load_checkpoint, load_param_into_net
from src.logger import get_logger
from src.models import BRDNet
## Params
parser = argparse.ArgumentParser()
parser.add_argument('--test_dir', default='./Test/Kodak24/'
, type=str, help='directory of test dataset')
parser.add_argument('--sigma', default=75, type=int, help='noise level')
parser.add_argument('--channel', default=3, type=int
, help='image channel, 3 for color, 1 for gray')
parser.add_argument('--pretrain_path', default="./", type=str, help='path of pre-trained model')
parser.add_argument('--ckpt_name', default="channel_3_sigma_75_rank_0-120_546720.ckpt", type=str, help='ckpt_name')
parser.add_argument('--use_modelarts', type=int, default=0
, help='1 for True, 0 for False;when set True, we should load dataset from obs with moxing')
parser.add_argument('--train_url', type=str, default='train_url/'
, help='needed by modelarts, but we donot use it because the name is ambiguous')
parser.add_argument('--data_url', type=str, default='data_url/'
, help='needed by modelarts, but we donot use it because the name is ambiguous')
parser.add_argument('--output_path', type=str, default='./output/'
, help='output_path,when use_modelarts is set True, it will be cache/output/')
parser.add_argument('--outer_path', type=str, default='s3://output/'
, help='obs path,to store e.g ckpt files ')
parser.add_argument('--device_target', type=str, default='Ascend',
help='device where the code will be implemented. (Default: Ascend)')
set_seed(1)
args = parser.parse_args()
save_dir = os.path.join(args.output_path, 'sigma_' + str(args.sigma) + \
'_' + datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
if not args.use_modelarts and not os.path.exists(save_dir):
os.makedirs(save_dir)
def test(model_path):
args.logger.info('Start to test on {}'.format(args.test_dir))
out_dir = os.path.join(save_dir, args.test_dir.split('/')[-2]) # args.test_dir must end by '/'
if not args.use_modelarts and not os.path.exists(out_dir):
os.makedirs(out_dir)
model = BRDNet(args.channel)
args.logger.info('load test weights from '+str(model_path))
load_param_into_net(model, load_checkpoint(model_path))
name = []
psnr = [] #after denoise
ssim = [] #after denoise
psnr_b = [] #before denoise
ssim_b = [] #before denoise
file_list = glob.glob(args.test_dir+'*') # args.test_dir must end by '/'
model.set_train(False)
cast = P.Cast()
transpose = P.Transpose()
expand_dims = P.ExpandDims()
compare_psnr = nn.PSNR()
compare_ssim = nn.SSIM()
args.logger.info("start testing....")
start_time = time.time()
for file in file_list:
suffix = file.split('.')[-1]
# read image
if args.channel == 3:
img_clean = np.array(Image.open(file), dtype='float32') / 255.0
else:
img_clean = np.expand_dims(np.array(Image.open(file).convert('L'), dtype='float32') / 255.0, axis=2)
np.random.seed(0) #obtain the same random data when it is in the test phase
img_test = img_clean + np.random.normal(0, args.sigma/255.0, img_clean.shape)
img_clean = Tensor(img_clean, mindspore.float32) #HWC
img_test = Tensor(img_test, mindspore.float32) #HWC
# predict
img_clean = expand_dims(transpose(img_clean, (2, 0, 1)), 0)#NCHW
img_test = expand_dims(transpose(img_test, (2, 0, 1)), 0)#NCHW
y_predict = model(img_test) #NCHW
# calculate numeric metrics
img_out = C.clip_by_value(y_predict, 0, 1)
psnr_noise, psnr_denoised = compare_psnr(img_clean, img_test), compare_psnr(img_clean, img_out)
ssim_noise, ssim_denoised = compare_ssim(img_clean, img_test), compare_ssim(img_clean, img_out)
psnr.append(psnr_denoised.asnumpy()[0])
ssim.append(ssim_denoised.asnumpy()[0])
psnr_b.append(psnr_noise.asnumpy()[0])
ssim_b.append(ssim_noise.asnumpy()[0])
# save images
filename = file.split('/')[-1].split('.')[0] # get the name of image file
name.append(filename)
if not args.use_modelarts:
# inner the operation 'Image.save', it will first check the file \
# existence of same name, that is not allowed on modelarts
img_test = cast(img_test*255, mindspore.uint8).asnumpy()
img_test = img_test.squeeze(0).transpose((1, 2, 0)) #turn into HWC to save as an image
img_test = Image.fromarray(img_test)
img_test.save(os.path.join(out_dir, filename+'_sigma'+'{}_psnr{:.2f}.'\
.format(args.sigma, psnr_noise.asnumpy()[0])+str(suffix)))
img_out = cast(img_out*255, mindspore.uint8).asnumpy()
img_out = img_out.squeeze(0).transpose((1, 2, 0)) #turn into HWC to save as an image
img_out = Image.fromarray(img_out)
img_out.save(os.path.join(out_dir, filename+'_psnr{:.2f}.'.format(psnr_denoised.asnumpy()[0])+str(suffix)))
psnr_avg = sum(psnr)/len(psnr)
ssim_avg = sum(ssim)/len(ssim)
psnr_avg_b = sum(psnr_b)/len(psnr_b)
ssim_avg_b = sum(ssim_b)/len(ssim_b)
name.append('Average')
psnr.append(psnr_avg)
ssim.append(ssim_avg)
psnr_b.append(psnr_avg_b)
ssim_b.append(ssim_avg_b)
args.logger.info('Before denoise: Average PSNR_b = {0:.2f}, \
SSIM_b = {1:.2f};After denoise: Average PSNR = {2:.2f}, SSIM = {3:.2f}'\
.format(psnr_avg_b, ssim_avg_b, psnr_avg, ssim_avg))
args.logger.info("testing finished....")
time_used = time.time() - start_time
args.logger.info("time cost:"+str(time_used)+" seconds!")
if not args.use_modelarts:
pd.DataFrame({'name': np.array(name), 'psnr_b': np.array(psnr_b), \
'psnr': np.array(psnr), 'ssim_b': np.array(ssim_b), \
'ssim': np.array(ssim)}).to_csv(out_dir+'/metrics.csv', index=True)
if __name__ == '__main__':
device_id = int(os.getenv('DEVICE_ID', '0'))
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, device_id=device_id, save_graphs=False)
args.logger = get_logger(save_dir, "BRDNet", 0)
args.logger.save_args(args)
test(os.path.join(args.pretrain_path, args.ckpt_name))
```
在这份代码中,为了将输入图片由 HWC 格式调整为 NCHW 格式,而且不破坏图片中原来的数据顺序(结构),我首先将 HWC 经由 transpose(img_clean, (2, 0, 1)) 操作调整为 CHW 格式,然后用 expand_dims(img_clean, 0) 对其进行升维,简单来讲就是在 3 维数组外再加一层括号,这样就将其变为了 NCHW 的格式,其中 N=1,C=3。
在 Kodak24 数据集上对 sigma=75 的噪声,经过网络去噪后,PSNR 值由 10.63 提升到 27.41,基本接近论文数据,表明处理流程正常。
### 四、使用场景3:计算语义分割时的交叉熵损失
我曾经在用 mindspore 重写 FastSCNN 时掉了 1 斤头发。
重写完的网络各方面看起来都很正常,就是loss死活不降,顶多在小数点后四位有一部分波动,不能说它一点没变吧,只能说约等于没有。在从数据集加载排查到反向计算梯度后,我终于开始怀疑损失函数出了问题。
我这篇文章对损失函数的实现进行了说明:http://luxuff.cn/archives/如何用mindspore的nnsoftmaxcrossentropywithlogits算子完成pytorch中nncrossentropyloss算子的ignoreindex功能 ,故此处只关注 reshape和transpose。
错误版本:
```python
import mindspore
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore.common.tensor import Tensor
from mindspore.nn import SoftmaxCrossEntropyWithLogits
__all__ = ['MixSoftmaxCrossEntropyLoss']
class MixSoftmaxCrossEntropyLoss(nn.Cell):
'''MixSoftmaxCrossEntropyLoss'''
def __init__(self, args, ignore_label=-1, aux=True, aux_weight=0.4, \
sparse=True, reduction='none', one_d_length=2*768*768, **kwargs):
super(MixSoftmaxCrossEntropyLoss, self).__init__()
self.ignore_label = ignore_label
self.weight = aux_weight if aux else 1.0
self.select = ops.Select()
self.reduceSum = ops.ReduceSum(keep_dims=False)
self.div_no_nan = ops.DivNoNan()
self.mul = ops.Mul()
self.reshape = ops.Reshape()
self.cast = ops.Cast()
self.transpose = ops.Transpose()
self.zero_tensor = Tensor([0]*one_d_length, mindspore.float32)
self.SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits(sparse=sparse, reduction="none")
args.logger.info('using MixSoftmaxCrossEntropyLoss....')
args.logger.info('self.ignore_label:' + str(self.ignore_label))
args.logger.info('self.aux:' + str(aux))
args.logger.info('self.weight:' + str(self.weight))
args.logger.info('one_d_length:' + str(one_d_length))
def construct(self, *inputs, **kwargs):
'''construct'''
preds, target = inputs[:-1], inputs[-1]
target = self.reshape(target, (-1,))
valid_flag = target != self.ignore_label
num_valid = self.reduceSum(self.cast(valid_flag, mindspore.float32))
loss = self.SoftmaxCrossEntropyWithLogits(self.reshape(preds[0], (-1, 19)), target)
loss = self.select(valid_flag, loss, self.zero_tensor)
loss = self.reduceSum(loss)
loss = self.div_no_nan(loss, num_valid)
for i in range(1, len(preds)):
aux_loss = self.SoftmaxCrossEntropyWithLogits(self.reshape(preds[i], (-1, 19)), target)
aux_loss = self.select(valid_flag, aux_loss, self.zero_tensor)
aux_loss = self.reduceSum(aux_loss)
aux_loss = self.div_no_nan(aux_loss, num_valid)
loss += self.mul(self.weight, aux_loss)
return loss
```
语义分割中的交叉熵损失是这么计算的:
我们通过网络预测得到了一个 NCHW 的结果,在本例中是 2x19x768x768(本例中 preds 是一个长度为3的元组,preds[0]、preds[1]、preds[2]都是预测的结果),它表示 batch_size=2, 分类数=19,图片高度、宽度为768。其中的结果均在区间[0,1)范围内,对于 19x768x768 的结果,数组中的值表示 **网络认为 768x768 这些像素点属于各个类别(0~18)的概率** ;target 是拿到的标签,在本例中是 2x768x768,它表示 **768x768这些像素点实际标注的类别(0~18)** 。
SoftmaxCrossEntropyWithLogits 接收两个参数,一个是 ZxC 格式(2维,Z=NxHxW,C是类别数)的预测数据,一个是 Zx1 (1维,长度为NxHxW) 的标签数据,然后返回一个 Zx1 长度的损失数组,表示各像素预测结果与实际结果之间的损失值。
问题出在哪儿呢?
我在包装数据集加载类时,将 image 转为了 NCHW 的格式(网络内部大多数算子都要求这个格式),但是对于标签,我们读入一张图片的数据时,它是 HW 格式的(在本例就是 768x768),有 batch_size 时,它就是 NHW 格式的。
NCHW 格式和 NHW 格式在数据存储 **顺序** 上是不同的。
```python
def create_CitySegmentation(args, data_path='../dataset/', split='train', mode=None, \
transform=None, base_size=1024, crop_size=(512, 1024), \
batch_size=2, device_num=1, rank=0, shuffle=True):
'''create_CitySegmentation'''
dataset = CitySegmentation(args, root=data_path, split=split, mode=mode, \
base_size=base_size, crop_size=crop_size)
dataset_len = len(dataset)
distributed_sampler = DistributedSampler(dataset_len, device_num, rank, shuffle=shuffle)
data_set = ds.GeneratorDataset(dataset, column_names=["image", "label"], num_parallel_workers=8, \
shuffle=shuffle, sampler=distributed_sampler)
# general resize, normalize and toTensor
if transform is not None:
data_set = data_set.map(input_columns=["image"], operations=transform, num_parallel_workers=8)
else:
hwc_to_chw = CV.HWC2CHW()
data_set = data_set.map(input_columns=["image"], operations=hwc_to_chw, num_parallel_workers=8)
data_set = data_set.batch(batch_size, drop_remainder=True)
return data_set, dataset_len
```
在计算交叉熵时,通过 self.reshape(target, (-1,)) 操作将 target 展成 1 维数据,非常好操作,**顺着 NHW 一点点撕开排成一行就可以了**;而为了得到一个 ZxC 格式的数据以适应交叉熵的计算逻辑,我又一次没有过脑子了,直接将 NCHW 的 preds[i] 预测数据 reshape 成 ZxC(本例中C=19)格式了。
这样做的结果是什么呢?
程序愉快地 train 了起来,就是 loss 死活不降,当然了,分割的结果一塌糊涂,甚至还不如瞎蒙的。
我特么!!!
那么为什么 loss 不降呢?
从前面一、二、三节的示例中,我逐渐在引入 **顺序** 的概念。无论是 HW 格式、NHW 格式、 CHW 格式、 HWC 格式还是 NCHW 格式,本质上都是以数组存储,但是数据存储在数组中是有**顺序**的。
reshape 操作的本质是:在保持数据原有存储顺序的情况下捏一个新的形状出来。
transpose操作的本质是:将现有数据按我需要的**维度顺序**重新排列。
NHW (2x768x768)展成 1 维后的数据是这样的(重点在按原有顺序展开,当然**捏**成其他形状时也是类似的):

而 NCHW 要捏成 ZxC 时,它会变成什么样的呢?

上图中左边这一列是我们期望捏出来的 ZxC 大小的结果,在本例中就是(2x768x768)x 19,右边是它的数据来源 2x19x768x768 (NCHW)大小。
我们想象一种场景,对于 4 维数组 2x19x768x768 来说,假设有一根绳子,从[0,0,0,0] 出发,沿着 [0,0,0,1],[0,0,0,2]...[0,0,0,767],[0,0,1,0],[0,0,1,1]...[0,0,1,767]...[1,18,767,767] 这条路径将全部数据串起来了,reshape 时,会首先将这串数据拉直,变成 1 维的,然后根据目标大小,把这串数据一截一截的宰成合适的大小,逐个放入目标区域,reshape 完成之后,最后一截刚好放入最后一块目标区域(因为 reshape 不会改变前后的数据量),而且新数组中的数据仍然保持了原来的**存储顺序**。
在本例中,我们将 [0,0,0,0:19] 的数据拿出来,放到了第 1 行,[0,0,0,19:38] 的数据放到了第 2 行,以此类推,达到整个 reshape 的效果。
实际验证一下:
```python
z = self.reshape(preds[0], (-1, 19))
```
将上述损失函数中的 reshape 操作提出来,查看一下 preds[0] 和 z 之间的数据关联(右键查看图片更清晰)。

由于 z 是 reshape 之后的数组,我将第 0 行和第 1 行都打印了出来;而 preds[0] 本身还是一个 2x19x768x768 的 4 维数组,因此我只打印了它**最前面**的 40 个数据。从图中可以看出,数据分布完全吻合 **绳子规律**。
到此,我们知道 reshape 是用错了,不过为什么 loss 不降呢?
回顾展成 1 维的 target 数组,它表示什么?它记录了这 1179648 (即2x768x768)个像素分别所属的类别 id。
而 ZxC 数组呢?本来这 1179648x19 个格子应该表示的是:1179648 个像素点,每个像素属于各类(0~18 共19类)的概率值。换言之,每一行的 19 个格子应该是属于同一个像素点的。
但经过 reshape 操作后,每一行的数据其实来自不同的像素点(它们可能属于同一类,当处在数据横跨位置时,也可能分属于不同类),这样计算出的交叉熵根本就不对劲,因此无论网络内部怎么反向计算梯度、优化参数,回过头来第二次到这一步时,损失计算又混乱了。
所以,计算交叉熵时,我们需要一个 ZxC 形状的数组,但是它的数据应该是按我们想要的顺序排列,所以要先调整一下。
正确代码:
```python
import mindspore
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore.common.tensor import Tensor
from mindspore.nn import SoftmaxCrossEntropyWithLogits
__all__ = ['MixSoftmaxCrossEntropyLoss']
class MixSoftmaxCrossEntropyLoss(nn.Cell):
'''MixSoftmaxCrossEntropyLoss'''
def __init__(self, args, ignore_label=-1, aux=True, aux_weight=0.4, \
sparse=True, reduction='none', one_d_length=2*768*768, **kwargs):
super(MixSoftmaxCrossEntropyLoss, self).__init__()
self.ignore_label = ignore_label
self.weight = aux_weight if aux else 1.0
self.select = ops.Select()
self.reduceSum = ops.ReduceSum(keep_dims=False)
self.div_no_nan = ops.DivNoNan()
self.mul = ops.Mul()
self.reshape = ops.Reshape()
self.cast = ops.Cast()
self.transpose = ops.Transpose()
self.zero_tensor = Tensor([0]*one_d_length, mindspore.float32)
self.SoftmaxCrossEntropyWithLogits = \
SoftmaxCrossEntropyWithLogits(sparse=sparse, reduction="none")
args.logger.info('using MixSoftmaxCrossEntropyLoss....')
args.logger.info('self.ignore_label:' + str(self.ignore_label))
args.logger.info('self.aux:' + str(aux))
args.logger.info('self.weight:' + str(self.weight))
args.logger.info('one_d_length:' + str(one_d_length))
def construct(self, *inputs, **kwargs):
'''construct'''
preds, target = inputs[:-1], inputs[-1]
target = self.reshape(target, (-1,))
valid_flag = target != self.ignore_label
num_valid = self.reduceSum(self.cast(valid_flag, mindspore.float32))
z = self.transpose(preds[0], (0, 2, 3, 1))#move the C-dim to the last, then reshape it.
#This operation is vital, or the data would be soiled.
loss = self.SoftmaxCrossEntropyWithLogits(self.reshape(z, (-1, 19)), target)
loss = self.select(valid_flag, loss, self.zero_tensor)
loss = self.reduceSum(loss)
loss = self.div_no_nan(loss, num_valid)
for i in range(1, len(preds)):
z = self.transpose(preds[i], (0, 2, 3, 1))
aux_loss = self.SoftmaxCrossEntropyWithLogits(self.reshape(z, (-1, 19)), target)
aux_loss = self.select(valid_flag, aux_loss, self.zero_tensor)
aux_loss = self.reduceSum(aux_loss)
aux_loss = self.div_no_nan(aux_loss, num_valid)
loss += self.mul(self.weight, aux_loss)
return loss
```
在 reshape 操作之前,我们用 transpose 操作将 NCHW 的数据转换成了 NHWC 的形式。再经过 reshape 时,每一截绳子(长度为 19 )刚好取了我们想要的数据放在了我们想要的位置。
这样一来损失计算才算正确无误,修复后的代码 loss 收敛很快,没一会儿就收敛到零点几了。美滋滋!!!
### 五、结论
reshape 和 transpose 操作有本质的不同,前者依照**绳子规律**将原数组捏成我们想要的形状,后者是将现有数据按我们需要的**维度顺序**重新排列。
特别是在处理图片数据时,要关注 HWC 和 CHW 的存储形式,把握好什么时候用reshape,什么时候用 transpose。
当源数据和新数据存储形式一致时(比如 NCHW --->NCHW),可以用 reshape 达到数据升维和降维的效果。
虽然没有查看 reshape 的底层代码,但应该八九不离十了。

reshape和transpose的区别