先是彭敏同学报线上服务器出现了kernel panic,我上去一看panic的地方在
md_seq_show()--> mddev_unlock()
里的 md_wakeup_thread(mddev->thread),说是mddev->thread为NULL。用的是基于2.6.32-220的ali_kernel。查了一下,这个问题倒是在2.6.32-279已经fix了,upstream的补丁是"md: Avoid waking up a thread after it has been freed"

于是我让彭敏同学升级内核到2.6.32-279,但是更悲剧的是,279更不靠谱,创建raid10后刚开始mkfs.ext4就panic了....panic的地方更诡异,在drivers/scsi/scsi_lib.c的一个BUG_ON里:
panic

        /*
         * Filesystem requests must transfer data.
         */
        BUG_ON(!req->nr_phys_segments);

        cmd = scsi_get_cmd_from_req(sdev, req);
        if (unlikely(!cmd))
                return BLKPREP_DEFER;

        memset(cmd->cmnd, 0, BLK_MAX_CDB);
        return scsi_init_io(cmd, GFP_ATOMIC);
}

只好顺着代码一点点调试,才发现是raid10在处理io的函数make_request()里错把upstream的补丁直接backport过来,upstream里已经没有BIO_FLUSH和BIO_FUA只有REQ_FLUSH和REQ_FUA了,但是backport的人显然不知道,就直接用REQ_FLUSH来代替BIO_FLUSH,但在2.6.32内核里,这压根是两个不同的值。于是,make_request()在clone request时把request的FLUSH标志给漏掉了,到了scsi层:

static int sd_prep_fn(struct request_queue *q, struct request *rq)
{
        ....

        /*
         * Discard request come in as REQ_TYPE_FS but we turn them into
         * block PC requests to make life easier.
         */
        if (rq->cmd_flags & REQ_DISCARD) {
                ret = sd_setup_discard_cmnd(sdp, rq);
                goto out;
        } else if (rq->cmd_flags & REQ_WRITE_SAME) {
                ret = sd_setup_write_same_cmnd(sdp, rq);
                goto out;
        } else if (rq->cmd_flags & REQ_FLUSH) {
                ret = scsi_setup_flush_cmnd(sdp, rq);
                goto out;
        } else if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
                ret = scsi_setup_blk_pc_cmnd(sdp, rq);
                goto out;
        } else if (rq->cmd_type != REQ_TYPE_FS) {
                ret = BLKPREP_KILL;
                goto out;
        }
        ret = scsi_setup_fs_cmnd(sdp, rq);

对REQ_FLUSH的判断失效,进不了scsi_setup_flush_cmd而直接走了scsi_setup_fs_cmnd,结果悲剧了。看来升279不是办法,还是把"md: Avoid waking up a thread after it has been freed"的补丁打到220上吧。
感谢彭敏同学对软RAID功能的支持。

把这个279的bug告诉了涛哥
涛哥:upstream把BIO_FLUSH和REQ_FLUSH合并为一个不是Tejun Heo搞的鬼吗?他不是去红帽了吗?他搞了这摊事怎么不backport到rhel6的279上去?
我:....大概他去红帽就只做upstream不用干backport的苦活吧