Skip to content

feat: config.toml 增加 watch reload 能力#1452

Open
0x0034 wants to merge 1 commit into
flashcatcloud:mainfrom
0x0034:feature/config-reload-on-change
Open

feat: config.toml 增加 watch reload 能力#1452
0x0034 wants to merge 1 commit into
flashcatcloud:mainfrom
0x0034:feature/config-reload-on-change

Conversation

@0x0034

@0x0034 0x0034 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: ruochen <wanxialianwei@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds config.toml hot-reload support (watch + debounce) and refactors startup/reload flow so the agent and writers can be rebuilt and restarted when configuration changes.

Changes:

  • Introduces global.reload_on_change and a filesystem watcher (pkg/reloadwatcher) to trigger reloads when config.toml changes.
  • Adds a new agentRuntime that coordinates config reload, writer rebuild/apply, agent restart, and watcher lifecycle.
  • Refactors writer initialization to support rebuilding and applying writer maps safely, plus adds unit tests for reload behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
writer/writers.go Refactors writer initialization/reload; adds BuildWriters, ApplyWriters, and improves concurrent access in WriteTimeSeries.
writer/writers_test.go Adds a unit test ensuring writer reload replaces writer map while preserving the queue.
pkg/reloadwatcher/reloadwatcher.go Adds a debounced fsnotify-based watcher for a single target file.
pkg/reloadwatcher/reloadwatcher_test.go Adds debounce/target-filtering tests for the watcher.
main.go Switches runtime control from agent.Agent to agentRuntime and routes SIGHUP to runtime reload.
main_windows.go Updates Windows service/interactive execution to use agentRuntime.
main_posix.go Updates POSIX run loop to use agentRuntime.
main_other.go Updates non-windows/non-linux run loop to use agentRuntime.
go.mod Makes github.com/fsnotify/fsnotify a direct dependency.
config/config.go Adds ReloadOnChange, factors out LoadConfig, and makes InitConfig safer about updating global Config.
config/config_test.go Adds a test ensuring failed config reload does not overwrite the current config.
conf/config.toml Documents the new reload_on_change option.
agent_runtime.go Adds the new runtime responsible for reload orchestration and config.toml watching.
agent_runtime_test.go Adds unit tests for runtime reload success/failure behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent_runtime.go
Comment on lines +103 to +118
currentConfig := config.Config
currentAgent := rt.agent
rt.deps.applyConfig(nextConfig)

nextAgent, err := rt.deps.newAgent()
if err != nil {
rt.deps.applyConfig(currentConfig)
return err
}

currentAgent.Stop()
rt.deps.applyWriters(nextWriters)
rt.agent = nextAgent
rt.deps.initLog(nextConfig.Log.FileName)
rt.agent.Start()
rt.reconcileWatcherLocked()
Comment thread agent_runtime.go

import (
"log"
"path"
Comment on lines +99 to +103
case _, ok := <-w.watcher.Errors:
if !ok {
return
}
case <-timerC:
@kongfei605

Copy link
Copy Markdown
Collaborator

感谢 PR @0x0034 。整体方向是有价值的,新增 runtime wrapper 和 watcher 来支持 config.toml reload 是一个不错的起点。

不过我看完实现后,认为当前版本暂时不适合直接 merge。主要有几个生命周期和一致性问题需要先处理:

1. reload 流程在停止旧 agent 之前,先替换了全局 config。
     这会导致旧 agent 的 Stop() 过程中读到新配置。比如旧配置启用了 ibex,新配置关闭了 ibex,那么 IbexAgent.Stop() 可能会因为看到新配置里 ibex 已关闭而直接 return,导致旧的 ibex 任务没有被真正停止。

    更安全的方式是:旧 agent 停止时仍然使用旧配置,或者确保各模块的 Stop() 不依赖可变的全局配置。

2. config.Config 变成了运行期可替换的全局指针,但没有同步保护。
    
     目前 writer、api、heartbeat、inputs、logs 等很多 goroutine 都会直接读取 config.Config。运行期直接替换这个指针,可能引入 data race,也可能导致不同模块在同一时刻读到不一致的配置状态。

   在让 config.Config 支持运行期 reload 之前,需要先有一个并发安全的配置访问和更新模型。

3. api 和 heartbeat 没有纳入新的 runtime 管理。
     这个 PR 会重启 agent modules,并替换 writers,但 api.Start() 和 heartbeat.Work() 仍然是在 main 里单独启动的,不会被 runtime 重启。因此 http server 配置、heartbeat 配置的修改并不会被完整应用。

这里要么把 api 和 heartbeat 也纳入 reload 生命周期管理,要么明确说明这些配置项暂不支持热加载。

4. writer reload 只替换了 writerMap,但保留了旧 queue。

这个设计也可以接受,但这意味着 writer_opt.chan_size 这类配置在 reload 后不会生效。需要明确文档说明当前支持热加载的配置范围,或者在实现里显式处理这类配置。

建议下一步:
 + 调整 reload 顺序,确保旧 agent 停止时仍然使用旧配置。
 + 避免无同步地运行期替换 config.Config,或者引入一个并发安全的配置持有和访问方式。
 + 明确 api 和 heartbeat 是否要纳入 reload 生命周期。
 + 明确并文档化 config.toml 中哪些 section 和字段支持热加载,哪些不支持。
 + 增加一些关键场景测试,比如启用/关闭 ibex、修改 http/heartbeat 配置、修改 writer_opt.chan_size 等。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants