a.查詢該樹莓派搭載的watchdog timer模組規格
由下列指令取得樹莓派5配備Broadcom BCM2835 watchdog timer
raspberry@rpi5-01:~ $ sudo dmesg | grep wdt
[ 0.475696] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[ 0.475696] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
經上述查詢,在 Raspberry Pi 5 上,bcm2835-wdt 是用于硬件 Watchdog 的内核模块。並利用modprobe加載該模組,不用重新開機。
sudo apt update
sudo apt install watchdog
sudo modprobe bcm2835_wdt
sudo nano /etc/watchdog.conf
# ====================================================================
# Configuration for the watchdog daemon. For more information on the
# parameters in this file use the command 'man watchdog.conf'
# ====================================================================
# =================== The hardware timer settings ====================
# For this daemon to be effective it really needs some hardware timer
# to back up any reboot actions. If you have a server then see if it
# has IPMI support. Otherwise for Intel-based machines try the iTCO_wdt
# module, otherwise (or if that fails) then see if any of the following
# module load and work:
# it87_wdt it8712f_wdt w83627hf_wdt w83877f_wdt w83977f_wdt
# If all else fails then 'softdog' is better than no timer at all!
# Or work your way through the modules listed under:
# /lib/modules/`uname -r`/kernel/drivers/watchdog/
# To see if they load, present /dev/watchdog, and are capable of
# resetting the system on time-out.
# Uncomment this to use the watchdog device driver access "file".
watchdog-device = /dev/watchdog
# Uncomment and edit this line for hardware timeout values that differ
# from the default of one minute.
watchdog-timeout = 60
# If your watchdog trips by itself when the first timeout interval
# elapses then try uncommenting the line below and changing the
# value to 'yes'.
#watchdog-refresh-use-settimeout = auto
# If you have a buggy watchdog device (e.g. some IPMI implementations)
# try uncommenting this line and setting it to 'yes'.
#watchdog-refresh-ignore-errors = no
# ====================== Other system settings ========================
# Interval between tests. Should be a couple of seconds shorter than
# the hardware time-out value.
interval = 10
# The number of intervals skipped before a log message is written (i.e.
# a multiplier for 'interval' in terms of syslog messages)
#logtick = 1
# Directory for log files (probably best not to change this)
#log-dir = /var/log/watchdog
# Email address for sending the reboot reason. This needs sendmail to
# be installed and properly configured. Maybe you should just enable
# syslog forwarding instead?
#admin = root
# Lock the daemon in to memory as a real-time process. This greatly
# decreases the chance that watchdog won't be scheduled before your
# machine is really loaded.
realtime = yes
priority = 1
# ====================== How to handle errors =======================
# If you have a custom binary/script to handle errors then uncomment
# this line and provide the path. For 'v1' test binary files they also
# handle error cases.
#repair-binary = /usr/sbin/repair
#repair-timeout = 60
# The retry-timeout and repair limit are used to handle errors in a
# more robust manner. Errors must persist for longer than this to
# action a repair or reboot, and if repair-maximum attempts are
# made without the test passing a reboot is initiated anyway.
#retry-timeout = 60
#repair-maximum = 1
# Configure the delay on reboot from sending SIGTERM to all processes
# and to following up with SIGKILL for any that are ignoring the polite
# request to stop.
#sigterm-delay = 5
# ====================== User-specified tests ========================
# Specify the directory for auto-added 'v1' test programs (any executable
# found in the 'test-directory should be listed).
#test-directory = /etc/watchdog.d
# Specify any v0 custom tests here. Multiple lines are permitted, but
# having any 'v1' programs/scripts discovered in the 'test-directory' is
# the better way.
#test-binary =
# Specify the time-out value for a test error to be reported.
#test-timeout = 60
# ====================== Typical tests ===============================
# Specify any IPv4 numeric addresses to be probed.
# NOTE: You should check you have permission to ping any machine before
# using it as a test. Also remember if the target goes down then this
# machine will reboot as a result!
#ping =
#ping =
#使用google的公共DNS IP(做ping測試
ping =
# Set the number of ping attempts in each 'interval' of time. Default
# is 3 and it completes on the first successful ping.
# NOTE: Round-trip delay has to be less than 'interval' / 'ping-count'
# for test success, but this is unlikely to be exceeded except possibly
# on satellite links (very unlikely case!).
ping-count = 3
# Specify any network interface to be checked for activity.
#interface = eth0
# Specify any files to be checked for presence, and if desired, checked
# that they have been updated more recently than 'change' seconds.
#file = /var/log/syslog
#change = 1407
# Uncomment to enable load average tests for 1, 5 and 15 minute
# averages. Setting one of these values to '0' disables it. These
# values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher
# than 25 in most cases).
max-load-1 = 24
max-load-5 = 18
max-load-15 = 12
# Check available memory on the machine.
# The min-memory check is a passive test from reading the file
# /proc/meminfo and computed from MemFree + Buffers + Cached
# If this is below a few tens of MB you are likely to have problems.
# The allocatable-memory is an active test checking it can be paged
# in to use.
# Maximum swap should be based on normal use, probably a large part of
# available swap but paging 1GB of swap can take tens of seconds.
# NOTE: This is the number of pages, to get the real size, check how
# large the pagesize is on your machine (typically 4kB for x86 hardware).
min-memory = 1
#allocatable-memory = 1
#max-swap = 0
# Check for over-temperature. Typically the temperature-sensor is a
# 'virtual file' under /sys and it contains the temperature in
# milli-Celsius. Usually these are generated by the 'sensors' package,
# but take care as device enumeration may not be fixed.
temperature-sensor = /sys/class/thermal/thermal_zone0/temp
max-temperature = 95000
# Check for a running process/daemon by its PID file. For example,
# check if rsyslogd is still running by enabling the following line:
#pidfile = /var/run/rsyslogd.pid
- watchdog-device = /dev/watchdog : 指定 Watchdog 設備文件,通常是 /dev/watchdog。這是硬件或軟件計時器的接口,守護進程會定期與該設備通信以防止系統重啟。
- watchdog-timeout = 60 : 設定硬件計時器的超時值(以秒為單位)。默認為 60 秒,即守護進程需要每分鐘餵狗一次,以防止系統重啟。
- interval = 10 : 設定兩次檢測間的時間間隔(秒)。應比硬件計時器的超時值稍短,以確保守護進程能在超時前進行檢測。
- realtime = yes 和 priority = 1 : 使守護進程為實時進程,並設置進程優先級,以確保其即使在高負載下也能被調用。
- ping = : 使用google公共DNS作為Ping的對象。換言之,若藉由與google的連結失效,表示外網不通,守護進程即會觸發系統重啟。
- ping-count = 3 : 每次間隔內的 Ping 嘗試次數。默認為 3。
- max-load-1 = 24, max-load-5 = 18, max-load-15 = 12 : 檢測系統負載(1分鐘、5分鐘和15分鐘平均值)。如果負載超過設定值,系統將被重啟。
- min-memory = 1 : 檢查系統可用內存是否低於指定值(以頁數為單位)。
- temperature-sensor = /sys/class/thermal/thermal_zone0/temp 和 max-temperature = 95000 : 檢測系統溫度(以毫攝氏度為單位)。超過設定溫度95℃,將觸發系統重啟。
sudo systemctl start watchdog
sudo systemctl status watchdog