[原写于2012年6月]

为了监控上线的新内核,我们把google的netoops backport到了自己的内核,生产上如有kernel panic,会将panic的栈信息发送到日志服务器,方便调试和修复。
前天,洪川同学报告说以前线上的netoops都是把bond的slave网口作为发消息的dev,新上线2.6.32-220内核后,启动netoops失败,系统报:

”eth0 is a slave device, aborting."

找了一下从 2.6.32-131 到 2.6.32-220 的redhat的变动,发现了王聪同学的这个patch:

commit 0c1ad04aecb975f2a2014e1bc5a2fa23923ecbd9
Author: WANG Cong
Date: Thu Jun 9 00:28:13 2011 -0700

netpoll: prevent netpoll setup on slave devices

In commit 8d8fc29d02a33e4bd5f4fa47823c1fd386346093
(netpoll: disable netpoll when enslave a device), we automatically
disable netpoll when the underlying device is being enslaved,
we also need to prevent people from setuping netpoll on
devices that are already enslaved.

Signed-off-by: WANG Cong
Signed-off-by: David S. Miller

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2d7d6d4..42ea4b0 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -792,6 +792,12 @@ int netpoll_setup(struct netpoll *np)
return -ENODEV;
}

+ if (ndev->master) {
+ printk(KERN_ERR "%s: %s is a slave device, aborting.\n",
+ np->name, np->dev_name);
+ return -EBUSY;
+ }
+
if (!netif_running(ndev)) {
unsigned long atmost, atleast;

从此,netpoll就无法使用slave设备了(netoops用的就是netpoll),不过我奇怪为什么以前可以现在又不行了,所以发邮件问了王聪同学为何现在不能使用slave设备,回答是:

“因为slave设备没有IP地址,http://wangcong.org/blog/archives/1657”

而且王同学在redhat搞netconsole也遇到了同样的问题,只能改用master网口。我们的netoops也只能遵循同样的规则,改用 bond0做dev