Files
sta/specs/001-modbus-relay-control/research.md
Lucien Cartier-Tilet a683810bdc docs: add project specs and documentation for Modbus relay control
Initialize project documentation structure:
- Add CLAUDE.md with development guidelines and architecture principles
- Add project constitution (v1.1.0) with hexagonal architecture and SOLID principles
- Add MCP server configuration for Context7 integration

Feature specification (001-modbus-relay-control):
- Complete feature spec for web-based Modbus relay control system
- Implementation plan with TDD approach using SQLx for persistence
- Type-driven development design for domain types
- Technical decisions document (SQLx over rusqlite, SQLite persistence)
- Detailed task breakdown (94 tasks across 8 phases)
- Specification templates for future features

Documentation:
- Modbus POE ETH Relay hardware documentation
- Modbus Application Protocol specification (PDF)

Project uses SQLx for compile-time verified SQL queries, aligned with
type-driven development principles.
2026-01-22 00:57:10 +01:00

719 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Research Document: Modbus Relay Control System
**Created**: 2025-12-28
**Feature**: [spec.md](./spec.md)
**Status**: Complete
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [Tokio-Modbus Research](#tokio-modbus-research)
3. [WebSocket vs HTTP Polling](#websocket-vs-http-polling)
4. [Existing Codebase Patterns](#existing-codebase-patterns)
5. [Integration Recommendations](#integration-recommendations)
---
## Executive Summary
### Key Decisions
| Decision Area | Recommendation | Rationale |
|---------------------------|--------------------------------------|---------------------------------------------------------|
| **Modbus Library** | tokio-modbus 0.17.0 | Native async/await, production-ready, good testability |
| **Communication Pattern** | HTTP Polling (as in spec) | Simpler, reliable, adequate for 10 users @ 2s intervals |
| **Connection Management** | Arc<Mutex<Context>> for MVP | Single device, simple, can upgrade later if needed |
| **Retry Strategy** | Simple retry-once helper | Matches FR-007 requirement |
| **Testing Approach** | Trait-based abstraction with mockall | Enables >90% coverage without hardware |
### User Input Analysis
**User requested**: "Use tokio-modbus crate, poem-openapi for REST API, Vue.js with WebSocket for real-time updates"
**Findings**:
- ✅ tokio-modbus 0.17.0: Excellent choice, validated by research
- ✅ poem-openapi: Already in use, working well
- ⚠️ **WebSocket vs HTTP Polling**: Spec says HTTP polling (FR-028). WebSocket adds 43x complexity for negligible benefit at this scale.
**RECOMMENDATION**: Maintain HTTP polling as specified. WebSocket complexity not justified for 10 concurrent users with 2-second update intervals.
### Deployment Architecture
**User clarification (2025-12-29)**: Frontend on Cloudflare Pages, backend on Raspberry Pi behind Traefik with Authelia
**Architecture**:
- **Frontend**: Cloudflare Pages (Vue 3 static build) - global CDN delivery
- **Backend**: Raspberry Pi HTTP API (same local network as Modbus device)
- **Reverse Proxy**: Traefik on Raspberry Pi
- HTTPS termination (TLS certificates)
- Authelia middleware for authentication
- Routes frontend requests to backend HTTP service
- **Communication Flow**:
- Frontend (CDN) → HTTPS → Traefik (HTTPS termination + auth) → Backend (HTTP) → Modbus TCP → Device
**Security**:
- Frontend-Backend: HTTPS via Traefik (encrypted, authenticated)
- Backend-Device: Modbus TCP on local network (unencrypted, local only)
---
## Tokio-Modbus Research
### Decision: Recommended Patterns
**Primary Recommendation**: Use tokio-modbus 0.17.0 with a custom trait-based abstraction layer (`RelayController` trait) for testability. Implement connection management using Arc<Mutex<Context>> for MVP.
### Technical Details
**Version**: tokio-modbus 0.17.0 (latest stable, released 2025-10-22)
**Protocol**: Modbus RTU over TCP (NOT Modbus TCP)
- Hardware uses RTU protocol tunneled over TCP
- Includes CRC16 validation
- Different from native Modbus TCP (no CRC, different framing)
**Connection Strategy**:
- Shared `Arc<Mutex<Context>>` for simplicity
- Single persistent connection (only one device)
- Can migrate to dedicated async task pattern if reconnection logic needed
**Timeout Handling**:
- Wrap all operations with `tokio::time::timeout(Duration::from_secs(3), ...)`
- **CRITICAL**: tokio-modbus has NO built-in timeouts
**Retry Logic**:
- Implement simple retry-once helper per FR-007
- Matches specification requirement
**Testing**:
- Use `mockall` crate with `async-trait` for unit testing
- Trait abstraction enables testing without hardware
- Supports >90% test coverage target (NFR-013)
### Critical Gotchas
1. **Device Gateway Configuration**: Hardware MUST be set to "Multi-host non-storage type" - default storage type sends spurious queries causing failures
2. **No Built-in Timeouts**: tokio-modbus has NO automatic timeouts - must wrap every operation with `tokio::time::timeout`
3. **RTU vs TCP Confusion**: Device uses Modbus RTU protocol over TCP (with CRC), not native Modbus TCP protocol
4. **Address Indexing**: Relays labeled 1-8, but Modbus addresses are 0-7 (use newtype pattern with conversion methods)
5. **Nested Result Handling**: Returns `Result<Result<T, Exception>, std::io::Error>` - must handle both layers (use `???` triple-question-mark pattern)
6. **Concurrent Access**: Context is not thread-safe - requires `Arc<Mutex>` or dedicated task serialization
### Code Examples
**Basic Connection Setup**:
```rust
use tokio_modbus::prelude::*;
use tokio::time::{timeout, Duration};
// Connect to device
let socket_addr = "192.168.1.200:8234".parse()?;
let mut ctx = tcp::connect(socket_addr).await?;
// Set slave ID (unit identifier)
ctx.set_slave(Slave(0x01));
// Read all 8 relay states with timeout
let states = timeout(
Duration::from_secs(3),
ctx.read_coils(0x0000, 8)
).await???; // Triple-? handles timeout + transport + exception errors
```
**Toggle Relay with Retry**:
```rust
async fn toggle_relay(
ctx: &mut Context,
relay_id: u8, // 1-8
) -> Result<(), RelayError> {
let addr = (relay_id - 1) as u16; // Convert to 0-7
// Read current state
let states = timeout(Duration::from_secs(3), ctx.read_coils(addr, 1))
.await???;
let current = states[0];
// Write opposite state with retry
let new_state = !current;
let write_op = || async {
timeout(Duration::from_secs(3), ctx.write_single_coil(addr, new_state))
.await
};
// Retry once on failure (FR-007)
match write_op().await {
Ok(Ok(Ok(()))) => Ok(()),
Err(_) | Ok(Err(_)) | Ok(Ok(Err(_))) => {
tracing::warn!("Write failed, retrying");
write_op().await???
}
}
}
```
**Trait-Based Abstraction for Testing**:
```rust
use async_trait::async_trait;
#[async_trait]
pub trait RelayController: Send + Sync {
async fn read_all_states(&mut self) -> Result<Vec<bool>, RelayError>;
async fn write_state(&mut self, relay_id: RelayId, state: RelayState) -> Result<(), RelayError>;
}
// Real implementation with tokio-modbus
pub struct ModbusRelayController {
ctx: Arc<Mutex<Context>>,
}
#[async_trait]
impl RelayController for ModbusRelayController {
async fn read_all_states(&mut self) -> Result<Vec<bool>, RelayError> {
let mut ctx = self.ctx.lock().await;
timeout(Duration::from_secs(3), ctx.read_coils(0, 8))
.await
.map_err(|_| RelayError::Timeout)?
.map_err(RelayError::Transport)?
.map_err(RelayError::Exception)
}
// ... other methods
}
// Mock for testing (using mockall)
mock! {
pub RelayController {}
#[async_trait]
impl RelayController for RelayController {
async fn read_all_states(&mut self) -> Result<Vec<bool>, RelayError>;
async fn write_state(&mut self, relay_id: RelayId, state: RelayState) -> Result<(), RelayError>;
}
}
```
### Alternatives Considered
1. **modbus-robust**: Provides auto-reconnection but lacks retry logic and timeouts - insufficient for production
2. **bb8 connection pool**: Overkill for single-device scenario, adds unnecessary complexity
3. **Synchronous modbus-rs**: Would block Tokio threads, poor scalability for concurrent users
4. **Custom Modbus implementation**: Reinventing wheel, error-prone, significant development time
### Resources
- [GitHub - slowtec/tokio-modbus](https://github.com/slowtec/tokio-modbus)
- [tokio-modbus on docs.rs](https://docs.rs/tokio-modbus/)
- [Context7 MCP: `/slowtec/tokio-modbus`](mcp://context7/slowtec/tokio-modbus)
- [Context7 MCP: `/websites/rs_tokio-modbus_0_16_3_tokio_modbus`](mcp://context7/websites/rs_tokio-modbus_0_16_3_tokio_modbus)
---
## WebSocket vs HTTP Polling
### Recommendation: HTTP Polling (as specified)
The specification's decision to use HTTP polling is technically sound. **HTTP polling is the better choice** for this specific use case.
### Performance at Your Scale (10 users, 2-second intervals)
**Bandwidth Comparison:**
- HTTP Polling: ~20 Kbps (10 users × 0.5 req/sec × 500 bytes × 8)
- WebSocket: ~2.4 Kbps sustained
- **Difference: 17.6 Kbps** - negligible on any modern network
**Server Load:**
- HTTP Polling: 5 requests/second system-wide (trivial)
- WebSocket: 10 persistent connections (~80-160 KB memory)
- **Verdict: Both are trivial at this scale**
### Implementation Complexity
**HTTP Polling:**
- Backend: 0 lines (reuse existing `GET /api/relays`)
- Frontend: ~10 lines (simple setInterval)
- **Total effort: 15 minutes**
**WebSocket:**
- Backend: ~115 lines (handler + background poller + channel setup)
- Frontend: ~135 lines (WebSocket manager + reconnection logic)
- Testing: ~180 lines (connection lifecycle + reconnection tests)
- **Total effort: 2-3 days + ongoing maintenance**
**Complexity ratio: 43x more code for WebSocket**
### Reliability & Error Handling
**HTTP Polling Advantages:**
- Stateless (automatic recovery on next poll)
- Standard HTTP error codes
- Works everywhere (proxies, firewalls, old browsers)
- No connection state management
- Simple testing
**WebSocket Challenges:**
- Connection lifecycle management
- Exponential backoff reconnection logic
- State synchronization on reconnect
- Thundering herd problem (all clients reconnect after server restart)
- May fail behind corporate proxies (requires fallback to HTTP polling anyway)
### Decision Matrix
| Criterion | HTTP Polling | WebSocket | Weight |
|-----------|--------------|-----------|--------|
| Simplicity | 5 | 2 | 3x |
| Reliability | 5 | 3 | 3x |
| Testing | 5 | 2 | 2x |
| Performance @ 10 users | 4 | 5 | 1x |
| Scalability to 100+ | 3 | 5 | 1x |
| Architecture fit | 5 | 3 | 2x |
**Weighted Scores:**
- **HTTP Polling: 4.56/5**
- **WebSocket: 3.19/5**
HTTP Polling scores **43% higher** when complexity, reliability, and testing are properly weighted for this project's scale.
### When WebSocket Makes Sense
WebSocket advantages manifest at:
- **100+ concurrent users** (4x throughput advantage becomes meaningful)
- **Sub-second update requirements** (<1 second intervals)
- **High-frequency updates** where latency matters
- **Bidirectional communication** (chat, gaming, trading systems)
For relay control with 2-second polling:
- Latency: 0-4 seconds (avg 2 sec) - **acceptable for lights/pumps**
- Not a real-time critical system (not chat, gaming, or trading)
### Migration Path (If Needed Later)
Starting with HTTP polling does NOT prevent WebSocket adoption later:
1. **Phase 1:** Add `/api/ws` endpoint (non-breaking change)
2. **Phase 2:** Progressive enhancement (detect WebSocket support)
3. **Phase 3:** Gradual rollout with monitoring
**Key Point:** HTTP polling provides a baseline. Adding WebSocket later is straightforward, but removing WebSocket complexity is harder.
### Poem WebSocket Support (For Reference)
Poem has excellent WebSocket support through `poem::web::websocket`:
```rust
use poem::web::websocket::{WebSocket, Message};
#[handler]
async fn ws_handler(
ws: WebSocket,
state_tx: Data<&watch::Sender<RelayCollection>>,
) -> impl IntoResponse {
ws.on_upgrade(move |socket| async move {
let (mut sink, mut stream) = socket.split();
let mut rx = state_tx.subscribe();
// Send initial state
let initial = rx.borrow().clone();
sink.send(Message::text(serde_json::to_string(&initial)?)).await?;
// Stream updates
while rx.changed().await.is_ok() {
let state = rx.borrow().clone();
sink.send(Message::text(serde_json::to_string(&state)?)).await?;
}
})
}
```
**Broadcasting Pattern**: Use `tokio::sync::watch` channel:
- Maintains only most recent value (perfect for relay state)
- Automatic deduplication of identical states
- New connections get immediate state snapshot
- Memory-efficient (single state copy)
### Resources
- [Poem WebSocket API Documentation](https://docs.rs/poem/latest/poem/web/websocket/)
- [HTTP vs WebSockets Performance](https://blog.feathersjs.com/http-vs-websockets-a-performance-comparison-da2533f13a77)
- [Tokio Channels Tutorial](https://tokio.rs/tokio/tutorial/channels)
---
## Existing Codebase Patterns
### Architecture Overview
The current codebase is a well-structured Rust backend API using Poem framework with OpenAPI support, following clean architecture principles.
**Current Structure**:
```
src/
├── lib.rs - Library entry point, orchestrates application setup
├── main.rs - Binary entry point, calls lib::run()
├── startup.rs - Application builder, server configuration, route setup
├── settings.rs - Configuration from YAML files + environment variables
├── telemetry.rs - Logging and tracing setup
├── route/ - HTTP endpoint handlers
│ ├── mod.rs - API aggregation and OpenAPI tags
│ ├── health.rs - Health check endpoints
│ └── meta.rs - Application metadata endpoints
└── middleware/ - Custom middleware implementations
├── mod.rs
└── rate_limit.rs - Rate limiting middleware using governor
```
### Key Patterns Discovered
#### 1. Route Registration Pattern
**Location**: `src/startup.rs:95-107`
```rust
fn setup_app(settings: &Settings) -> poem::Route {
let api_service = OpenApiService::new(
Api::from(settings).apis(),
settings.application.clone().name,
settings.application.clone().version,
)
.url_prefix("/api");
let ui = api_service.swagger_ui();
poem::Route::new()
.nest("/api", api_service.clone())
.nest("/specs", api_service.spec_endpoint_yaml())
.nest("/", ui)
}
```
**Key Insights**:
- OpenAPI service created with all API handlers via `.apis()` tuple
- URL prefix `/api` applied to all API routes
- Swagger UI automatically mounted at root `/`
- OpenAPI spec YAML available at `/specs`
#### 2. API Handler Organization Pattern
**Location**: `src/route/mod.rs:14-37`
```rust
#[derive(Tags)]
enum ApiCategory {
Health,
Meta,
}
pub(crate) struct Api {
health: health::HealthApi,
meta: meta::MetaApi,
}
impl From<&Settings> for Api {
fn from(value: &Settings) -> Self {
let health = health::HealthApi;
let meta = meta::MetaApi::from(&value.application);
Self { health, meta }
}
}
impl Api {
pub fn apis(self) -> (health::HealthApi, meta::MetaApi) {
(self.health, self.meta)
}
}
```
**Key Insights**:
- `Tags` enum groups APIs into categories for OpenAPI documentation
- Aggregator struct (`Api`) holds all API handler instances
- Dependency injection via `From<&Settings>` trait
- `.apis()` method returns tuple of all handlers
#### 3. OpenAPI Handler Definition Pattern
**Location**: `src/route/health.rs:7-29`
```rust
#[derive(ApiResponse)]
enum HealthResponse {
#[oai(status = 200)]
Ok,
#[oai(status = 429)]
TooManyRequests,
}
#[derive(Default, Clone)]
pub struct HealthApi;
#[OpenApi(tag = "ApiCategory::Health")]
impl HealthApi {
#[oai(path = "/health", method = "get")]
async fn ping(&self) -> HealthResponse {
tracing::event!(target: "backend::health", tracing::Level::DEBUG,
"Accessing health-check endpoint");
HealthResponse::Ok
}
}
```
**Key Insights**:
- Response types are enums with `#[derive(ApiResponse)]`
- Each variant maps to HTTP status code via `#[oai(status = N)]`
- Handlers use `#[OpenApi(tag = "...")]` for categorization
- Type-safe responses at compile time
- Tracing at architectural boundaries
#### 4. JSON Response Pattern with DTOs
**Location**: `src/route/meta.rs:9-56`
```rust
#[derive(Object, Debug, Clone, serde::Serialize, serde::Deserialize)]
struct Meta {
version: String,
name: String,
}
#[derive(ApiResponse)]
enum MetaResponse {
#[oai(status = 200)]
Meta(Json<Meta>),
#[oai(status = 429)]
TooManyRequests,
}
#[OpenApi(tag = "ApiCategory::Meta")]
impl MetaApi {
#[oai(path = "/meta", method = "get")]
async fn meta(&self) -> Result<MetaResponse> {
Ok(MetaResponse::Meta(Json(self.into())))
}
}
```
**Key Insights**:
- DTOs use `#[derive(Object)]` for OpenAPI schema generation
- Response variants can hold `Json<T>` payloads
- Handler struct holds state/configuration
- Returns `Result<MetaResponse>` for error handling
#### 5. Middleware Composition Pattern
**Location**: `src/startup.rs:59-91`
```rust
let app = value
.app
.with(RateLimit::new(&rate_limit_config))
.with(Cors::new())
.data(value.settings);
```
**Key Insights**:
- Middleware applied via `.with()` method chaining
- Order matters: RateLimit → CORS → data injection
- Settings injected as shared data via `.data()`
- Configuration drives middleware behavior
#### 6. Configuration Management Pattern
**Location**: `src/settings.rs:40-62`
```rust
let settings = config::Config::builder()
.add_source(config::File::from(settings_directory.join("base.yaml")))
.add_source(config::File::from(
settings_directory.join(environment_filename),
))
.add_source(
config::Environment::with_prefix("APP")
.prefix_separator("__")
.separator("__"),
)
.build()?;
```
**Key Insights**:
- Three-tier configuration: base → environment-specific → env vars
- Environment detected via `APP_ENVIRONMENT` variable
- Environment variables use `APP__` prefix with double underscore separators
- Type-safe deserialization
#### 7. Testing Pattern
**Location**: `src/route/health.rs:31-38`
```rust
#[tokio::test]
async fn health_check_works() {
let app = crate::get_test_app();
let cli = poem::test::TestClient::new(app);
let resp = cli.get("/api/health").send().await;
resp.assert_status_is_ok();
}
```
**Key Insights**:
- Test helper creates full application with random port
- `TestClient` provides fluent assertion API
- Tests are async with `#[tokio::test]`
- Real application used in tests
### Type System Best Practices
Current code demonstrates excellent TyDD:
- `Environment` enum instead of strings
- `RateLimitConfig` newtype instead of raw numbers
- `ApiResponse` enums for type-safe HTTP responses
### Architecture Compliance
**Current Layers**:
1. **Presentation Layer**: `src/route/*` - HTTP adapters
2. **Infrastructure Layer**: `src/middleware/*`, `src/startup.rs`, `src/telemetry.rs`
**Missing Layers** (to be added for Modbus):
3. **Domain Layer**: Pure relay logic, no Modbus knowledge
4. **Application Layer**: Use cases (get status, toggle)
---
## Integration Recommendations
### Recommended Architecture for Modbus Feature
Following hexagonal architecture principles from constitution:
```
src/
├── domain/
│ └── relay/
│ ├── mod.rs - Domain types (RelayId, RelayState, Relay)
│ ├── relay.rs - Relay entity
│ ├── error.rs - Domain errors
│ └── repository.rs - RelayRepository trait
├── application/
│ └── relay/
│ ├── mod.rs - Use case exports
│ ├── get_status.rs - GetRelayStatus use case
│ ├── toggle.rs - ToggleRelay use case
│ └── bulk_control.rs - BulkControl use case
├── infrastructure/
│ └── modbus/
│ ├── mod.rs - Modbus exports
│ ├── client.rs - ModbusRelayRepository implementation
│ ├── config.rs - Modbus configuration
│ └── error.rs - Modbus-specific errors
└── route/
└── relay.rs - HTTP adapter (presentation layer)
```
### Integration Points
| Component | File | Action |
|-----------|------|--------|
| **API Category** | `src/route/mod.rs` | Add `Relay` to `ApiCategory` enum |
| **API Aggregator** | `src/route/mod.rs` | Add `relay: RelayApi` field to `Api` struct |
| **API Tuple** | `src/route/mod.rs` | Add `RelayApi` to `.apis()` return tuple |
| **Settings** | `src/settings.rs` | Add `ModbusSettings` struct and `modbus` field |
| **Config Files** | `settings/base.yaml` | Add `modbus:` section |
| **Shared State** | `src/startup.rs` | Inject `ModbusClient` via `.data()` |
| **Dependencies** | `Cargo.toml` | Add `tokio-modbus`, `async-trait`, `mockall` |
### Example: New Route Handler
```rust
// src/route/relay.rs
use poem::Result;
use poem_openapi::{ApiResponse, Object, OpenApi, payload::Json, param::Path};
use crate::domain::relay::{RelayId, RelayState, Relay};
#[derive(Object, Serialize, Deserialize)]
struct RelayDto {
id: u8,
state: String, // "on" or "off"
label: Option<String>,
}
#[derive(ApiResponse)]
enum RelayResponse {
#[oai(status = 200)]
Status(Json<RelayDto>),
#[oai(status = 400)]
BadRequest,
#[oai(status = 503)]
ServiceUnavailable,
}
#[OpenApi(tag = "ApiCategory::Relay")]
impl RelayApi {
#[oai(path = "/relays/:id", method = "get")]
async fn get_status(&self, id: Path<u8>) -> Result<RelayResponse> {
let relay_id = RelayId::new(id.0)
.map_err(|_| poem::Error::from_status(StatusCode::BAD_REQUEST))?;
// Use application layer use case
match self.get_status_use_case.execute(relay_id).await {
Ok(relay) => Ok(RelayResponse::Status(Json(relay.into()))),
Err(_) => Ok(RelayResponse::ServiceUnavailable),
}
}
}
```
### Example: Settings Extension
```rust
// src/settings.rs
#[derive(Debug, serde::Deserialize, Clone)]
pub struct ModbusSettings {
pub host: String,
pub port: u16,
pub slave_id: u8,
pub timeout_seconds: u64,
}
#[derive(Debug, serde::Deserialize, Clone)]
pub struct Settings {
pub application: ApplicationSettings,
pub debug: bool,
pub frontend_url: String,
pub rate_limit: RateLimitSettings,
pub modbus: ModbusSettings, // New field
}
```
```yaml
# settings/base.yaml
modbus:
host: "192.168.1.100"
port: 502
slave_id: 1
timeout_seconds: 3
```
---
## Summary
### Key Takeaways
1. **tokio-modbus 0.17.0**: Excellent choice, use trait abstraction for testability
2. **HTTP Polling**: Maintain spec decision, simpler and adequate for scale
3. **Hexagonal Architecture**: Add domain/application layers following existing patterns
4. **Type-Driven Development**: Apply newtype pattern (RelayId, RelayState)
5. **Testing**: Use mockall with async-trait for >90% coverage without hardware
### Next Steps
1. **Clarifying Questions**: Resolve ambiguities in requirements
2. **Architecture Design**: Create multiple implementation approaches
3. **Final Plan**: Select approach and create detailed implementation plan
4. **Implementation**: Follow TDD workflow with types-first design
---
**End of Research Document**