【排错】ceph mon时钟偏移问题

当前位置: 首页 >> K8s >> 【排错】ceph mon时钟偏移问题 >> 正文

【排错】ceph mon时钟偏移问题

2021年04月08日 10:01:50 作者:Jiaozn 分类:K8s 评论(0)

之前记录的关于时钟漂移的解决过程和心得，这里分享一下。

现象

查看集群状态

# ceph -s
    cluster f01fb68c-58c6-4707-8adb-b7ac88172340
     health HEALTH_WARN
            clock skew detected on mon.xs734
            Monitor clock skew detected
...

根据提示，发现这是一个明显的时钟漂移的问题

配置NTP

第一反应就是ntp同步，配置ntp

yum install ntp ntpdate -y
ntpdate pool.ntp.org
systemctl restart ntpdate.service
systemctl restart ntpd.service
systemctl enable ntpdate.service
systemctl enable ntpd.service

每台机器上重复如上过程

此时，查看集群的时间

# for x in xs732 xs733 xs734; do ssh $x 'date'; done
Thu May 05 17:51:26 CST 2017
Thu May 05 17:51:27 CST 2017
Thu May 05 17:51:27 CST 2017

说明时间基本已经同步了。一般情况下问题就已经解决了，但这里还有问题。

进一步跟踪集群状态

# ceph -s
    cluster f01fb68c-58c6-4707-8adb-b7ac88172340
     health HEALTH_WARN
            clock skew detected on mon.xs734
            Monitor clock skew detected
...

这时候问题来了，发现集群还是处于 WARN 的状态，进一步看问题出在哪

# ceph -w
...
2017-05-26 10:27:14.325856 mon.0 [WRN] mon.2 10.34.57.27:6789/0 clock skew 0.0556591s > max 0.05s
2017-05-26 10:27:44.292273 mon.0 [INF] HEALTH_WARN; clock skew detected on mon.xs733, mon.xs734; Monitor clock skew detected

问题所在：0.0556591s > max 0.05s 漂移的时间略大于默认的0.05秒

很少出现这种大于0.05秒的情况

修改配置

更改配置文件

在 /etc/ceph/ceph.conf 中添加如下

1
2
3

[mon]
mon_clock_drift_allowed = 0.10
mon clock drift warn backoff = 10

亲测，参数名称要不要下划线都可以

改完后将配置推到集群所有的机器

1	ceph-deploy --overwrite-conf config push xs732 xs733 xs734

关于这两个参数

mon clock drift allowed

描述:	监视器间允许的时钟漂移量
类型:	Float
默认值:	.050

mon clock drift warn backoff

描述:	时钟偏移警告的退避指数。
类型:	Float
默认值:	5

链接：http://docs.ceph.org.cn/rados/configuration/mon-config-ref/

查看此时运行的ceph配置

root@xs732:~/my-custer# ceph daemon mon.xs732 config show | grep clock
    "mon_clock_drift_allowed": "0.05",
    "mon_clock_drift_warn_backoff": "5",
    "clock_offset": "0",

发现还是默认的 0.05 和 5，说明更改的配置根本没生效。

重启monitor

1	systemctl restart ceph-mon@xs732.service

注意，每台机器上都重启一下

然后再查看，发现生效了

root@xs732:~/my-custer# ceph daemon mon.xs732 config show | grep clock
    "mon_clock_drift_allowed": "0.10",
    "mon_clock_drift_warn_backoff": "10",
    "clock_offset": "0",

再次查看ceph集群状态

# ceph -s
    cluster f01fb68c-58c6-4707-8adb-b7ac88172340
     health HEALTH_OK
     monmap e2: 3 mons at {xs732=10.34.57.25:6789/0,xs733=10.34.57.26:6789/0,xs734=10.34.57.27:6789/0}
            election epoch 12, quorum 0,1,2 xs732,xs733,xs734
        mgr active: xs733 standbys: xs734, xs732
     osdmap e50: 9 osds: 9 up, 9 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v113: 256 pgs, 1 pools, 0 bytes data, 0 objects
            46397 MB used, 4423 GB / 4469 GB avail
                 256 active+clean

OK，问题解决！

除非注明，发表在“Jiaozn的博客”的文章『【排错】ceph mon时钟偏移问题』版权归Jiaozn所有。转载请注明出处为“本文转载于『Jiaozn的博客』原地址https://www.jiaozn.com/reed/669.html”

上一篇:【问题】docker下删除两个id相同的镜像

下一篇:【使用】rsync用法详细解释

发表评论