# Research Document: Modbus Relay Control System **Created**: 2025-12-28 **Feature**: [spec.md](./spec.md) **Status**: Complete ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [Tokio-Modbus Research](#tokio-modbus-research) 3. [WebSocket vs HTTP Polling](#websocket-vs-http-polling) 4. [Existing Codebase Patterns](#existing-codebase-patterns) 5. [Integration Recommendations](#integration-recommendations) --- ## Executive Summary ### Key Decisions | Decision Area | Recommendation | Rationale | |---------------------------|--------------------------------------|---------------------------------------------------------| | **Modbus Library** | tokio-modbus 0.17.0 | Native async/await, production-ready, good testability | | **Communication Pattern** | HTTP Polling (as in spec) | Simpler, reliable, adequate for 10 users @ 2s intervals | | **Connection Management** | Arc> for MVP | Single device, simple, can upgrade later if needed | | **Retry Strategy** | Simple retry-once helper | Matches FR-007 requirement | | **Testing Approach** | Trait-based abstraction with mockall | Enables >90% coverage without hardware | ### User Input Analysis **User requested**: "Use tokio-modbus crate, poem-openapi for REST API, Vue.js with WebSocket for real-time updates" **Findings**: - ✅ tokio-modbus 0.17.0: Excellent choice, validated by research - ✅ poem-openapi: Already in use, working well - ⚠️ **WebSocket vs HTTP Polling**: Spec says HTTP polling (FR-028). WebSocket adds 43x complexity for negligible benefit at this scale. **RECOMMENDATION**: Maintain HTTP polling as specified. WebSocket complexity not justified for 10 concurrent users with 2-second update intervals. ### Deployment Architecture **User clarification (2025-12-29)**: Frontend on Cloudflare Pages, backend on Raspberry Pi behind Traefik with Authelia **Architecture**: - **Frontend**: Cloudflare Pages (Vue 3 static build) - global CDN delivery - **Backend**: Raspberry Pi HTTP API (same local network as Modbus device) - **Reverse Proxy**: Traefik on Raspberry Pi - HTTPS termination (TLS certificates) - Authelia middleware for authentication - Routes frontend requests to backend HTTP service - **Communication Flow**: - Frontend (CDN) → HTTPS → Traefik (HTTPS termination + auth) → Backend (HTTP) → Modbus TCP → Device **Security**: - Frontend-Backend: HTTPS via Traefik (encrypted, authenticated) - Backend-Device: Modbus TCP on local network (unencrypted, local only) --- ## Tokio-Modbus Research ### Decision: Recommended Patterns **Primary Recommendation**: Use tokio-modbus 0.17.0 with a custom trait-based abstraction layer (`RelayController` trait) for testability. Implement connection management using Arc> for MVP. ### Technical Details **Version**: tokio-modbus 0.17.0 (latest stable, released 2025-10-22) **Protocol**: Modbus TCP (native TCP protocol) - Hardware configured to use native Modbus TCP protocol - Uses MBAP (Modbus Application Protocol) header - No CRC16 validation (TCP/IP handles error detection) - Standard Modbus TCP protocol on port 502 **Connection Strategy**: - Shared `Arc>` for simplicity - Single persistent connection (only one device) - Can migrate to dedicated async task pattern if reconnection logic needed **Timeout Handling**: - Wrap all operations with `tokio::time::timeout(Duration::from_secs(3), ...)` - **CRITICAL**: tokio-modbus has NO built-in timeouts **Retry Logic**: - Implement simple retry-once helper per FR-007 - Matches specification requirement **Testing**: - Use `mockall` crate with `async-trait` for unit testing - Trait abstraction enables testing without hardware - Supports >90% test coverage target (NFR-013) ### Critical Gotchas 1. **Device Protocol Configuration**: Hardware MUST be configured to use Modbus TCP protocol (not RTU over TCP) via VirCom software - Set "Transfer Protocol" to "Modbus TCP protocol" in Advanced Settings - Device automatically switches to port 502 when TCP protocol is selected 2. **Device Gateway Configuration**: Hardware MUST be set to "Multi-host non-storage type" - default storage type sends spurious queries causing failures 3. **No Built-in Timeouts**: tokio-modbus has NO automatic timeouts - must wrap every operation with `tokio::time::timeout` 4. **Address Indexing**: Relays labeled 1-8, but Modbus addresses are 0-7 (use newtype pattern with conversion methods) 5. **Nested Result Handling**: Returns `Result, std::io::Error>` - must handle both layers (use `???` triple-question-mark pattern) 6. **Concurrent Access**: Context is not thread-safe - requires `Arc` or dedicated task serialization ### Code Examples **Basic Connection Setup**: ```rust use tokio_modbus::prelude::*; use tokio::time::{timeout, Duration}; // Connect to device using Modbus TCP on standard port 502 let socket_addr = "192.168.1.200:502".parse()?; let mut ctx = tcp::connect(socket_addr).await?; // Set slave ID (unit identifier) ctx.set_slave(Slave(0x01)); // Read all 8 relay states with timeout let states = timeout( Duration::from_secs(3), ctx.read_coils(0x0000, 8) ).await???; // Triple-? handles timeout + transport + exception errors ``` **Note**: Modbus TCP uses the standard MBAP header and does not require CRC16 validation. The protocol is cleaner and more standardized than RTU over TCP. **Toggle Relay with Retry**: ```rust async fn toggle_relay( ctx: &mut Context, relay_id: u8, // 1-8 ) -> Result<(), RelayError> { let addr = (relay_id - 1) as u16; // Convert to 0-7 // Read current state let states = timeout(Duration::from_secs(3), ctx.read_coils(addr, 1)) .await???; let current = states[0]; // Write opposite state with retry let new_state = !current; let write_op = || async { timeout(Duration::from_secs(3), ctx.write_single_coil(addr, new_state)) .await }; // Retry once on failure (FR-007) match write_op().await { Ok(Ok(Ok(()))) => Ok(()), Err(_) | Ok(Err(_)) | Ok(Ok(Err(_))) => { tracing::warn!("Write failed, retrying"); write_op().await??? } } } ``` **Trait-Based Abstraction for Testing**: ```rust use async_trait::async_trait; #[async_trait] pub trait RelayController: Send + Sync { async fn read_all_states(&mut self) -> Result, RelayError>; async fn write_state(&mut self, relay_id: RelayId, state: RelayState) -> Result<(), RelayError>; } // Real implementation with tokio-modbus pub struct ModbusRelayController { ctx: Arc>, } #[async_trait] impl RelayController for ModbusRelayController { async fn read_all_states(&mut self) -> Result, RelayError> { let mut ctx = self.ctx.lock().await; timeout(Duration::from_secs(3), ctx.read_coils(0, 8)) .await .map_err(|_| RelayError::Timeout)? .map_err(RelayError::Transport)? .map_err(RelayError::Exception) } // ... other methods } // Mock for testing (using mockall) mock! { pub RelayController {} #[async_trait] impl RelayController for RelayController { async fn read_all_states(&mut self) -> Result, RelayError>; async fn write_state(&mut self, relay_id: RelayId, state: RelayState) -> Result<(), RelayError>; } } ``` ### Alternatives Considered 1. **modbus-robust**: Provides auto-reconnection but lacks retry logic and timeouts - insufficient for production 2. **bb8 connection pool**: Overkill for single-device scenario, adds unnecessary complexity 3. **Synchronous modbus-rs**: Would block Tokio threads, poor scalability for concurrent users 4. **Custom Modbus implementation**: Reinventing wheel, error-prone, significant development time ### Resources - [GitHub - slowtec/tokio-modbus](https://github.com/slowtec/tokio-modbus) - [tokio-modbus on docs.rs](https://docs.rs/tokio-modbus/) - [Context7 MCP: `/slowtec/tokio-modbus`](mcp://context7/slowtec/tokio-modbus) - [Context7 MCP: `/websites/rs_tokio-modbus_0_16_3_tokio_modbus`](mcp://context7/websites/rs_tokio-modbus_0_16_3_tokio_modbus) --- ## WebSocket vs HTTP Polling ### Recommendation: HTTP Polling (as specified) The specification's decision to use HTTP polling is technically sound. **HTTP polling is the better choice** for this specific use case. ### Performance at Your Scale (10 users, 2-second intervals) **Bandwidth Comparison:** - HTTP Polling: ~20 Kbps (10 users × 0.5 req/sec × 500 bytes × 8) - WebSocket: ~2.4 Kbps sustained - **Difference: 17.6 Kbps** - negligible on any modern network **Server Load:** - HTTP Polling: 5 requests/second system-wide (trivial) - WebSocket: 10 persistent connections (~80-160 KB memory) - **Verdict: Both are trivial at this scale** ### Implementation Complexity **HTTP Polling:** - Backend: 0 lines (reuse existing `GET /api/relays`) - Frontend: ~10 lines (simple setInterval) - **Total effort: 15 minutes** **WebSocket:** - Backend: ~115 lines (handler + background poller + channel setup) - Frontend: ~135 lines (WebSocket manager + reconnection logic) - Testing: ~180 lines (connection lifecycle + reconnection tests) - **Total effort: 2-3 days + ongoing maintenance** **Complexity ratio: 43x more code for WebSocket** ### Reliability & Error Handling **HTTP Polling Advantages:** - Stateless (automatic recovery on next poll) - Standard HTTP error codes - Works everywhere (proxies, firewalls, old browsers) - No connection state management - Simple testing **WebSocket Challenges:** - Connection lifecycle management - Exponential backoff reconnection logic - State synchronization on reconnect - Thundering herd problem (all clients reconnect after server restart) - May fail behind corporate proxies (requires fallback to HTTP polling anyway) ### Decision Matrix | Criterion | HTTP Polling | WebSocket | Weight | |-----------|--------------|-----------|--------| | Simplicity | 5 | 2 | 3x | | Reliability | 5 | 3 | 3x | | Testing | 5 | 2 | 2x | | Performance @ 10 users | 4 | 5 | 1x | | Scalability to 100+ | 3 | 5 | 1x | | Architecture fit | 5 | 3 | 2x | **Weighted Scores:** - **HTTP Polling: 4.56/5** - **WebSocket: 3.19/5** HTTP Polling scores **43% higher** when complexity, reliability, and testing are properly weighted for this project's scale. ### When WebSocket Makes Sense WebSocket advantages manifest at: - **100+ concurrent users** (4x throughput advantage becomes meaningful) - **Sub-second update requirements** (<1 second intervals) - **High-frequency updates** where latency matters - **Bidirectional communication** (chat, gaming, trading systems) For relay control with 2-second polling: - Latency: 0-4 seconds (avg 2 sec) - **acceptable for lights/pumps** - Not a real-time critical system (not chat, gaming, or trading) ### Migration Path (If Needed Later) Starting with HTTP polling does NOT prevent WebSocket adoption later: 1. **Phase 1:** Add `/api/ws` endpoint (non-breaking change) 2. **Phase 2:** Progressive enhancement (detect WebSocket support) 3. **Phase 3:** Gradual rollout with monitoring **Key Point:** HTTP polling provides a baseline. Adding WebSocket later is straightforward, but removing WebSocket complexity is harder. ### Poem WebSocket Support (For Reference) Poem has excellent WebSocket support through `poem::web::websocket`: ```rust use poem::web::websocket::{WebSocket, Message}; #[handler] async fn ws_handler( ws: WebSocket, state_tx: Data<&watch::Sender>, ) -> impl IntoResponse { ws.on_upgrade(move |socket| async move { let (mut sink, mut stream) = socket.split(); let mut rx = state_tx.subscribe(); // Send initial state let initial = rx.borrow().clone(); sink.send(Message::text(serde_json::to_string(&initial)?)).await?; // Stream updates while rx.changed().await.is_ok() { let state = rx.borrow().clone(); sink.send(Message::text(serde_json::to_string(&state)?)).await?; } }) } ``` **Broadcasting Pattern**: Use `tokio::sync::watch` channel: - Maintains only most recent value (perfect for relay state) - Automatic deduplication of identical states - New connections get immediate state snapshot - Memory-efficient (single state copy) ### Resources - [Poem WebSocket API Documentation](https://docs.rs/poem/latest/poem/web/websocket/) - [HTTP vs WebSockets Performance](https://blog.feathersjs.com/http-vs-websockets-a-performance-comparison-da2533f13a77) - [Tokio Channels Tutorial](https://tokio.rs/tokio/tutorial/channels) --- ## Existing Codebase Patterns ### Architecture Overview The current codebase is a well-structured Rust backend API using Poem framework with OpenAPI support, following clean architecture principles. **Current Structure**: ``` src/ ├── lib.rs - Library entry point, orchestrates application setup ├── main.rs - Binary entry point, calls lib::run() ├── startup.rs - Application builder, server configuration, route setup ├── settings.rs - Configuration from YAML files + environment variables ├── telemetry.rs - Logging and tracing setup ├── route/ - HTTP endpoint handlers │ ├── mod.rs - API aggregation and OpenAPI tags │ ├── health.rs - Health check endpoints │ └── meta.rs - Application metadata endpoints └── middleware/ - Custom middleware implementations ├── mod.rs └── rate_limit.rs - Rate limiting middleware using governor ``` ### Key Patterns Discovered #### 1. Route Registration Pattern **Location**: `src/startup.rs:95-107` ```rust fn setup_app(settings: &Settings) -> poem::Route { let api_service = OpenApiService::new( Api::from(settings).apis(), settings.application.clone().name, settings.application.clone().version, ) .url_prefix("/api"); let ui = api_service.swagger_ui(); poem::Route::new() .nest("/api", api_service.clone()) .nest("/specs", api_service.spec_endpoint_yaml()) .nest("/", ui) } ``` **Key Insights**: - OpenAPI service created with all API handlers via `.apis()` tuple - URL prefix `/api` applied to all API routes - Swagger UI automatically mounted at root `/` - OpenAPI spec YAML available at `/specs` #### 2. API Handler Organization Pattern **Location**: `src/route/mod.rs:14-37` ```rust #[derive(Tags)] enum ApiCategory { Health, Meta, } pub(crate) struct Api { health: health::HealthApi, meta: meta::MetaApi, } impl From<&Settings> for Api { fn from(value: &Settings) -> Self { let health = health::HealthApi; let meta = meta::MetaApi::from(&value.application); Self { health, meta } } } impl Api { pub fn apis(self) -> (health::HealthApi, meta::MetaApi) { (self.health, self.meta) } } ``` **Key Insights**: - `Tags` enum groups APIs into categories for OpenAPI documentation - Aggregator struct (`Api`) holds all API handler instances - Dependency injection via `From<&Settings>` trait - `.apis()` method returns tuple of all handlers #### 3. OpenAPI Handler Definition Pattern **Location**: `src/route/health.rs:7-29` ```rust #[derive(ApiResponse)] enum HealthResponse { #[oai(status = 200)] Ok, #[oai(status = 429)] TooManyRequests, } #[derive(Default, Clone)] pub struct HealthApi; #[OpenApi(tag = "ApiCategory::Health")] impl HealthApi { #[oai(path = "/health", method = "get")] async fn ping(&self) -> HealthResponse { tracing::event!(target: "backend::health", tracing::Level::DEBUG, "Accessing health-check endpoint"); HealthResponse::Ok } } ``` **Key Insights**: - Response types are enums with `#[derive(ApiResponse)]` - Each variant maps to HTTP status code via `#[oai(status = N)]` - Handlers use `#[OpenApi(tag = "...")]` for categorization - Type-safe responses at compile time - Tracing at architectural boundaries #### 4. JSON Response Pattern with DTOs **Location**: `src/route/meta.rs:9-56` ```rust #[derive(Object, Debug, Clone, serde::Serialize, serde::Deserialize)] struct Meta { version: String, name: String, } #[derive(ApiResponse)] enum MetaResponse { #[oai(status = 200)] Meta(Json), #[oai(status = 429)] TooManyRequests, } #[OpenApi(tag = "ApiCategory::Meta")] impl MetaApi { #[oai(path = "/meta", method = "get")] async fn meta(&self) -> Result { Ok(MetaResponse::Meta(Json(self.into()))) } } ``` **Key Insights**: - DTOs use `#[derive(Object)]` for OpenAPI schema generation - Response variants can hold `Json` payloads - Handler struct holds state/configuration - Returns `Result` for error handling #### 5. Middleware Composition Pattern **Location**: `src/startup.rs:59-91` ```rust let app = value .app .with(RateLimit::new(&rate_limit_config)) .with(Cors::new()) .data(value.settings); ``` **Key Insights**: - Middleware applied via `.with()` method chaining - Order matters: RateLimit → CORS → data injection - Settings injected as shared data via `.data()` - Configuration drives middleware behavior #### 6. Configuration Management Pattern **Location**: `src/settings.rs:40-62` ```rust let settings = config::Config::builder() .add_source(config::File::from(settings_directory.join("base.yaml"))) .add_source(config::File::from( settings_directory.join(environment_filename), )) .add_source( config::Environment::with_prefix("APP") .prefix_separator("__") .separator("__"), ) .build()?; ``` **Key Insights**: - Three-tier configuration: base → environment-specific → env vars - Environment detected via `APP_ENVIRONMENT` variable - Environment variables use `APP__` prefix with double underscore separators - Type-safe deserialization #### 7. Testing Pattern **Location**: `src/route/health.rs:31-38` ```rust #[tokio::test] async fn health_check_works() { let app = crate::get_test_app(); let cli = poem::test::TestClient::new(app); let resp = cli.get("/api/health").send().await; resp.assert_status_is_ok(); } ``` **Key Insights**: - Test helper creates full application with random port - `TestClient` provides fluent assertion API - Tests are async with `#[tokio::test]` - Real application used in tests ### Type System Best Practices Current code demonstrates excellent TyDD: - `Environment` enum instead of strings - `RateLimitConfig` newtype instead of raw numbers - `ApiResponse` enums for type-safe HTTP responses ### Architecture Compliance **Current Layers**: 1. **Presentation Layer**: `src/route/*` - HTTP adapters 2. **Infrastructure Layer**: `src/middleware/*`, `src/startup.rs`, `src/telemetry.rs` **Missing Layers** (to be added for Modbus): 3. **Domain Layer**: Pure relay logic, no Modbus knowledge 4. **Application Layer**: Use cases (get status, toggle) --- ## Integration Recommendations ### Recommended Architecture for Modbus Feature Following hexagonal architecture principles from constitution: ``` src/ ├── domain/ │ └── relay/ │ ├── mod.rs - Domain types (RelayId, RelayState, Relay) │ ├── relay.rs - Relay entity │ ├── error.rs - Domain errors │ └── repository.rs - RelayRepository trait ├── application/ │ └── relay/ │ ├── mod.rs - Use case exports │ ├── get_status.rs - GetRelayStatus use case │ ├── toggle.rs - ToggleRelay use case │ └── bulk_control.rs - BulkControl use case ├── infrastructure/ │ └── modbus/ │ ├── mod.rs - Modbus exports │ ├── client.rs - ModbusRelayRepository implementation │ ├── config.rs - Modbus configuration │ └── error.rs - Modbus-specific errors └── route/ └── relay.rs - HTTP adapter (presentation layer) ``` ### Integration Points | Component | File | Action | |-----------|------|--------| | **API Category** | `src/route/mod.rs` | Add `Relay` to `ApiCategory` enum | | **API Aggregator** | `src/route/mod.rs` | Add `relay: RelayApi` field to `Api` struct | | **API Tuple** | `src/route/mod.rs` | Add `RelayApi` to `.apis()` return tuple | | **Settings** | `src/settings.rs` | Add `ModbusSettings` struct and `modbus` field | | **Config Files** | `settings/base.yaml` | Add `modbus:` section | | **Shared State** | `src/startup.rs` | Inject `ModbusClient` via `.data()` | | **Dependencies** | `Cargo.toml` | Add `tokio-modbus`, `async-trait`, `mockall` | ### Example: New Route Handler ```rust // src/route/relay.rs use poem::Result; use poem_openapi::{ApiResponse, Object, OpenApi, payload::Json, param::Path}; use crate::domain::relay::{RelayId, RelayState, Relay}; #[derive(Object, Serialize, Deserialize)] struct RelayDto { id: u8, state: String, // "on" or "off" label: Option, } #[derive(ApiResponse)] enum RelayResponse { #[oai(status = 200)] Status(Json), #[oai(status = 400)] BadRequest, #[oai(status = 503)] ServiceUnavailable, } #[OpenApi(tag = "ApiCategory::Relay")] impl RelayApi { #[oai(path = "/relays/:id", method = "get")] async fn get_status(&self, id: Path) -> Result { let relay_id = RelayId::new(id.0) .map_err(|_| poem::Error::from_status(StatusCode::BAD_REQUEST))?; // Use application layer use case match self.get_status_use_case.execute(relay_id).await { Ok(relay) => Ok(RelayResponse::Status(Json(relay.into()))), Err(_) => Ok(RelayResponse::ServiceUnavailable), } } } ``` ### Example: Settings Extension ```rust // src/settings.rs #[derive(Debug, serde::Deserialize, Clone)] pub struct ModbusSettings { pub host: String, pub port: u16, pub slave_id: u8, pub timeout_seconds: u64, } #[derive(Debug, serde::Deserialize, Clone)] pub struct Settings { pub application: ApplicationSettings, pub debug: bool, pub frontend_url: String, pub rate_limit: RateLimitSettings, pub modbus: ModbusSettings, // New field } ``` ```yaml # settings/base.yaml modbus: host: "192.168.1.100" port: 502 slave_id: 1 timeout_seconds: 3 ``` --- ## Summary ### Key Takeaways 1. **tokio-modbus 0.17.0**: Excellent choice, use trait abstraction for testability 2. **HTTP Polling**: Maintain spec decision, simpler and adequate for scale 3. **Hexagonal Architecture**: Add domain/application layers following existing patterns 4. **Type-Driven Development**: Apply newtype pattern (RelayId, RelayState) 5. **Testing**: Use mockall with async-trait for >90% coverage without hardware ### Next Steps 1. **Clarifying Questions**: Resolve ambiguities in requirements 2. **Architecture Design**: Create multiple implementation approaches 3. **Final Plan**: Select approach and create detailed implementation plan 4. **Implementation**: Follow TDD workflow with types-first design --- **End of Research Document**