记一次pytorch训练过程中无论怎么修改学习率loss也不下降的经历

2022-05-10 default Comments

先上模型代码

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(768,512)
        self.fc2 = torch.nn.Linear(512,256)
        self.fc3 = torch.nn.Linear(256,13)
        self.bert_model = BertModel.from_pretrained('../bert-base-chinese/')
        for param in  self.bert_model.parameters():
            param.requires_grad_(True)

    def forward(self,input_ids,attention_mask,token_type_ids):
        out = self.bert_model(   input_ids = input_ids,
                    attention_mask = attention_mask,
                    token_type_ids = token_type_ids
                )
        out = F.dropout(out.last_hidden_state[:,0],p =0.2)
        out = self.fc1(out)
        out = F.relu(out)
        out = self.fc2(out)
        out = F.relu(out)
        out = self.fc3(out)
        out = out.softmax(dim = 1)
        return (out)

非常简单的利用bert Fine-tune文本分类的模型，但是效果始终一般。然后今天在做一个NER任务时，也用了类似的写法，结果loss 直接不动了，维持在一个很勉强的水平。

后来才发现，是因为模型中这一句话所导致的：

1	out = out.softmax(dim = 1)

模型输出时经过了softmax，然后在使用pytorch的交叉熵损失函数求loss时，还会经过一个softmax过程，如此一来，相当于模型最后有俩softmax！！也就导致模型无法收敛到最佳的效果。

细思极恐，之前的模型都是在这么训练的。。。。。。唉经过测试文本分类任务修改后提升了将近20%，NER任务提升了近20%

以后得仔细检查代码！

本文链接： https://www.yeahchen.cn/2022/05/10/记一次pytorch训练过程中无论怎么修改学习率loss也不下降的经历/
版权声明： 本博客所有文章除特别声明外，均为转载，仅限本人学习记录，不做他用！如有冒犯，请>>与我联系<<！

Adam Chen

A Runner