Files
sta/specs/001-modbus-relay-control/research.md
Lucien Cartier-Tilet 9bae638bd2 refactor(modbus): switch to native Modbus TCP protocol
Switch from Modbus RTU over TCP to native Modbus TCP based on hardware
testing. Uses standard MBAP header (no CRC16), port 502, and TCP-only
tokio-modbus feature for simpler implementation.

Updated: Cargo.toml, plan.md, research.md, tasks.md
2026-01-22 00:57:10 +01:00

23 KiB
Raw Blame History

Research Document: Modbus Relay Control System

Created: 2025-12-28 Feature: spec.md Status: Complete

Table of Contents

  1. Executive Summary
  2. Tokio-Modbus Research
  3. WebSocket vs HTTP Polling
  4. Existing Codebase Patterns
  5. Integration Recommendations

Executive Summary

Key Decisions

Decision Area Recommendation Rationale
Modbus Library tokio-modbus 0.17.0 Native async/await, production-ready, good testability
Communication Pattern HTTP Polling (as in spec) Simpler, reliable, adequate for 10 users @ 2s intervals
Connection Management Arc<Mutex> for MVP Single device, simple, can upgrade later if needed
Retry Strategy Simple retry-once helper Matches FR-007 requirement
Testing Approach Trait-based abstraction with mockall Enables >90% coverage without hardware

User Input Analysis

User requested: "Use tokio-modbus crate, poem-openapi for REST API, Vue.js with WebSocket for real-time updates"

Findings:

  • tokio-modbus 0.17.0: Excellent choice, validated by research
  • poem-openapi: Already in use, working well
  • ⚠️ WebSocket vs HTTP Polling: Spec says HTTP polling (FR-028). WebSocket adds 43x complexity for negligible benefit at this scale.

RECOMMENDATION: Maintain HTTP polling as specified. WebSocket complexity not justified for 10 concurrent users with 2-second update intervals.

Deployment Architecture

User clarification (2025-12-29): Frontend on Cloudflare Pages, backend on Raspberry Pi behind Traefik with Authelia

Architecture:

  • Frontend: Cloudflare Pages (Vue 3 static build) - global CDN delivery
  • Backend: Raspberry Pi HTTP API (same local network as Modbus device)
  • Reverse Proxy: Traefik on Raspberry Pi
    • HTTPS termination (TLS certificates)
    • Authelia middleware for authentication
    • Routes frontend requests to backend HTTP service
  • Communication Flow:
    • Frontend (CDN) → HTTPS → Traefik (HTTPS termination + auth) → Backend (HTTP) → Modbus TCP → Device

Security:

  • Frontend-Backend: HTTPS via Traefik (encrypted, authenticated)
  • Backend-Device: Modbus TCP on local network (unencrypted, local only)

Tokio-Modbus Research

Primary Recommendation: Use tokio-modbus 0.17.0 with a custom trait-based abstraction layer (RelayController trait) for testability. Implement connection management using Arc<Mutex> for MVP.

Technical Details

Version: tokio-modbus 0.17.0 (latest stable, released 2025-10-22)

Protocol: Modbus TCP (native TCP protocol)

  • Hardware configured to use native Modbus TCP protocol
  • Uses MBAP (Modbus Application Protocol) header
  • No CRC16 validation (TCP/IP handles error detection)
  • Standard Modbus TCP protocol on port 502

Connection Strategy:

  • Shared Arc<Mutex<Context>> for simplicity
  • Single persistent connection (only one device)
  • Can migrate to dedicated async task pattern if reconnection logic needed

Timeout Handling:

  • Wrap all operations with tokio::time::timeout(Duration::from_secs(3), ...)
  • CRITICAL: tokio-modbus has NO built-in timeouts

Retry Logic:

  • Implement simple retry-once helper per FR-007
  • Matches specification requirement

Testing:

  • Use mockall crate with async-trait for unit testing
  • Trait abstraction enables testing without hardware
  • Supports >90% test coverage target (NFR-013)

Critical Gotchas

  1. Device Protocol Configuration: Hardware MUST be configured to use Modbus TCP protocol (not RTU over TCP) via VirCom software

    • Set "Transfer Protocol" to "Modbus TCP protocol" in Advanced Settings
    • Device automatically switches to port 502 when TCP protocol is selected
  2. Device Gateway Configuration: Hardware MUST be set to "Multi-host non-storage type" - default storage type sends spurious queries causing failures

  3. No Built-in Timeouts: tokio-modbus has NO automatic timeouts - must wrap every operation with tokio::time::timeout

  4. Address Indexing: Relays labeled 1-8, but Modbus addresses are 0-7 (use newtype pattern with conversion methods)

  5. Nested Result Handling: Returns Result<Result<T, Exception>, std::io::Error> - must handle both layers (use ??? triple-question-mark pattern)

  6. Concurrent Access: Context is not thread-safe - requires Arc<Mutex> or dedicated task serialization

Code Examples

Basic Connection Setup:

use tokio_modbus::prelude::*;
use tokio::time::{timeout, Duration};

// Connect to device using Modbus TCP on standard port 502
let socket_addr = "192.168.1.200:502".parse()?;
let mut ctx = tcp::connect(socket_addr).await?;

// Set slave ID (unit identifier)
ctx.set_slave(Slave(0x01));

// Read all 8 relay states with timeout
let states = timeout(
    Duration::from_secs(3),
    ctx.read_coils(0x0000, 8)
).await???; // Triple-? handles timeout + transport + exception errors

Note: Modbus TCP uses the standard MBAP header and does not require CRC16 validation. The protocol is cleaner and more standardized than RTU over TCP.

Toggle Relay with Retry:

async fn toggle_relay(
    ctx: &mut Context,
    relay_id: u8, // 1-8
) -> Result<(), RelayError> {
    let addr = (relay_id - 1) as u16; // Convert to 0-7

    // Read current state
    let states = timeout(Duration::from_secs(3), ctx.read_coils(addr, 1))
        .await???;
    let current = states[0];

    // Write opposite state with retry
    let new_state = !current;
    let write_op = || async {
        timeout(Duration::from_secs(3), ctx.write_single_coil(addr, new_state))
            .await
    };

    // Retry once on failure (FR-007)
    match write_op().await {
        Ok(Ok(Ok(()))) => Ok(()),
        Err(_) | Ok(Err(_)) | Ok(Ok(Err(_))) => {
            tracing::warn!("Write failed, retrying");
            write_op().await???
        }
    }
}

Trait-Based Abstraction for Testing:

use async_trait::async_trait;

#[async_trait]
pub trait RelayController: Send + Sync {
    async fn read_all_states(&mut self) -> Result<Vec<bool>, RelayError>;
    async fn write_state(&mut self, relay_id: RelayId, state: RelayState) -> Result<(), RelayError>;
}

// Real implementation with tokio-modbus
pub struct ModbusRelayController {
    ctx: Arc<Mutex<Context>>,
}

#[async_trait]
impl RelayController for ModbusRelayController {
    async fn read_all_states(&mut self) -> Result<Vec<bool>, RelayError> {
        let mut ctx = self.ctx.lock().await;
        timeout(Duration::from_secs(3), ctx.read_coils(0, 8))
            .await
            .map_err(|_| RelayError::Timeout)?
            .map_err(RelayError::Transport)?
            .map_err(RelayError::Exception)
    }
    // ... other methods
}

// Mock for testing (using mockall)
mock! {
    pub RelayController {}

    #[async_trait]
    impl RelayController for RelayController {
        async fn read_all_states(&mut self) -> Result<Vec<bool>, RelayError>;
        async fn write_state(&mut self, relay_id: RelayId, state: RelayState) -> Result<(), RelayError>;
    }
}

Alternatives Considered

  1. modbus-robust: Provides auto-reconnection but lacks retry logic and timeouts - insufficient for production
  2. bb8 connection pool: Overkill for single-device scenario, adds unnecessary complexity
  3. Synchronous modbus-rs: Would block Tokio threads, poor scalability for concurrent users
  4. Custom Modbus implementation: Reinventing wheel, error-prone, significant development time

Resources


WebSocket vs HTTP Polling

Recommendation: HTTP Polling (as specified)

The specification's decision to use HTTP polling is technically sound. HTTP polling is the better choice for this specific use case.

Performance at Your Scale (10 users, 2-second intervals)

Bandwidth Comparison:

  • HTTP Polling: ~20 Kbps (10 users × 0.5 req/sec × 500 bytes × 8)
  • WebSocket: ~2.4 Kbps sustained
  • Difference: 17.6 Kbps - negligible on any modern network

Server Load:

  • HTTP Polling: 5 requests/second system-wide (trivial)
  • WebSocket: 10 persistent connections (~80-160 KB memory)
  • Verdict: Both are trivial at this scale

Implementation Complexity

HTTP Polling:

  • Backend: 0 lines (reuse existing GET /api/relays)
  • Frontend: ~10 lines (simple setInterval)
  • Total effort: 15 minutes

WebSocket:

  • Backend: ~115 lines (handler + background poller + channel setup)
  • Frontend: ~135 lines (WebSocket manager + reconnection logic)
  • Testing: ~180 lines (connection lifecycle + reconnection tests)
  • Total effort: 2-3 days + ongoing maintenance

Complexity ratio: 43x more code for WebSocket

Reliability & Error Handling

HTTP Polling Advantages:

  • Stateless (automatic recovery on next poll)
  • Standard HTTP error codes
  • Works everywhere (proxies, firewalls, old browsers)
  • No connection state management
  • Simple testing

WebSocket Challenges:

  • Connection lifecycle management
  • Exponential backoff reconnection logic
  • State synchronization on reconnect
  • Thundering herd problem (all clients reconnect after server restart)
  • May fail behind corporate proxies (requires fallback to HTTP polling anyway)

Decision Matrix

Criterion HTTP Polling WebSocket Weight
Simplicity 5 2 3x
Reliability 5 3 3x
Testing 5 2 2x
Performance @ 10 users 4 5 1x
Scalability to 100+ 3 5 1x
Architecture fit 5 3 2x

Weighted Scores:

  • HTTP Polling: 4.56/5
  • WebSocket: 3.19/5

HTTP Polling scores 43% higher when complexity, reliability, and testing are properly weighted for this project's scale.

When WebSocket Makes Sense

WebSocket advantages manifest at:

  • 100+ concurrent users (4x throughput advantage becomes meaningful)
  • Sub-second update requirements (<1 second intervals)
  • High-frequency updates where latency matters
  • Bidirectional communication (chat, gaming, trading systems)

For relay control with 2-second polling:

  • Latency: 0-4 seconds (avg 2 sec) - acceptable for lights/pumps
  • Not a real-time critical system (not chat, gaming, or trading)

Migration Path (If Needed Later)

Starting with HTTP polling does NOT prevent WebSocket adoption later:

  1. Phase 1: Add /api/ws endpoint (non-breaking change)
  2. Phase 2: Progressive enhancement (detect WebSocket support)
  3. Phase 3: Gradual rollout with monitoring

Key Point: HTTP polling provides a baseline. Adding WebSocket later is straightforward, but removing WebSocket complexity is harder.

Poem WebSocket Support (For Reference)

Poem has excellent WebSocket support through poem::web::websocket:

use poem::web::websocket::{WebSocket, Message};

#[handler]
async fn ws_handler(
    ws: WebSocket,
    state_tx: Data<&watch::Sender<RelayCollection>>,
) -> impl IntoResponse {
    ws.on_upgrade(move |socket| async move {
        let (mut sink, mut stream) = socket.split();
        let mut rx = state_tx.subscribe();

        // Send initial state
        let initial = rx.borrow().clone();
        sink.send(Message::text(serde_json::to_string(&initial)?)).await?;

        // Stream updates
        while rx.changed().await.is_ok() {
            let state = rx.borrow().clone();
            sink.send(Message::text(serde_json::to_string(&state)?)).await?;
        }
    })
}

Broadcasting Pattern: Use tokio::sync::watch channel:

  • Maintains only most recent value (perfect for relay state)
  • Automatic deduplication of identical states
  • New connections get immediate state snapshot
  • Memory-efficient (single state copy)

Resources


Existing Codebase Patterns

Architecture Overview

The current codebase is a well-structured Rust backend API using Poem framework with OpenAPI support, following clean architecture principles.

Current Structure:

src/
├── lib.rs          - Library entry point, orchestrates application setup
├── main.rs         - Binary entry point, calls lib::run()
├── startup.rs      - Application builder, server configuration, route setup
├── settings.rs     - Configuration from YAML files + environment variables
├── telemetry.rs    - Logging and tracing setup
├── route/          - HTTP endpoint handlers
│   ├── mod.rs      - API aggregation and OpenAPI tags
│   ├── health.rs   - Health check endpoints
│   └── meta.rs     - Application metadata endpoints
└── middleware/     - Custom middleware implementations
    ├── mod.rs
    └── rate_limit.rs - Rate limiting middleware using governor

Key Patterns Discovered

1. Route Registration Pattern

Location: src/startup.rs:95-107

fn setup_app(settings: &Settings) -> poem::Route {
    let api_service = OpenApiService::new(
        Api::from(settings).apis(),
        settings.application.clone().name,
        settings.application.clone().version,
    )
    .url_prefix("/api");
    let ui = api_service.swagger_ui();
    poem::Route::new()
        .nest("/api", api_service.clone())
        .nest("/specs", api_service.spec_endpoint_yaml())
        .nest("/", ui)
}

Key Insights:

  • OpenAPI service created with all API handlers via .apis() tuple
  • URL prefix /api applied to all API routes
  • Swagger UI automatically mounted at root /
  • OpenAPI spec YAML available at /specs

2. API Handler Organization Pattern

Location: src/route/mod.rs:14-37

#[derive(Tags)]
enum ApiCategory {
    Health,
    Meta,
}

pub(crate) struct Api {
    health: health::HealthApi,
    meta: meta::MetaApi,
}

impl From<&Settings> for Api {
    fn from(value: &Settings) -> Self {
        let health = health::HealthApi;
        let meta = meta::MetaApi::from(&value.application);
        Self { health, meta }
    }
}

impl Api {
    pub fn apis(self) -> (health::HealthApi, meta::MetaApi) {
        (self.health, self.meta)
    }
}

Key Insights:

  • Tags enum groups APIs into categories for OpenAPI documentation
  • Aggregator struct (Api) holds all API handler instances
  • Dependency injection via From<&Settings> trait
  • .apis() method returns tuple of all handlers

3. OpenAPI Handler Definition Pattern

Location: src/route/health.rs:7-29

#[derive(ApiResponse)]
enum HealthResponse {
    #[oai(status = 200)]
    Ok,
    #[oai(status = 429)]
    TooManyRequests,
}

#[derive(Default, Clone)]
pub struct HealthApi;

#[OpenApi(tag = "ApiCategory::Health")]
impl HealthApi {
    #[oai(path = "/health", method = "get")]
    async fn ping(&self) -> HealthResponse {
        tracing::event!(target: "backend::health", tracing::Level::DEBUG,
                       "Accessing health-check endpoint");
        HealthResponse::Ok
    }
}

Key Insights:

  • Response types are enums with #[derive(ApiResponse)]
  • Each variant maps to HTTP status code via #[oai(status = N)]
  • Handlers use #[OpenApi(tag = "...")] for categorization
  • Type-safe responses at compile time
  • Tracing at architectural boundaries

4. JSON Response Pattern with DTOs

Location: src/route/meta.rs:9-56

#[derive(Object, Debug, Clone, serde::Serialize, serde::Deserialize)]
struct Meta {
    version: String,
    name: String,
}

#[derive(ApiResponse)]
enum MetaResponse {
    #[oai(status = 200)]
    Meta(Json<Meta>),
    #[oai(status = 429)]
    TooManyRequests,
}

#[OpenApi(tag = "ApiCategory::Meta")]
impl MetaApi {
    #[oai(path = "/meta", method = "get")]
    async fn meta(&self) -> Result<MetaResponse> {
        Ok(MetaResponse::Meta(Json(self.into())))
    }
}

Key Insights:

  • DTOs use #[derive(Object)] for OpenAPI schema generation
  • Response variants can hold Json<T> payloads
  • Handler struct holds state/configuration
  • Returns Result<MetaResponse> for error handling

5. Middleware Composition Pattern

Location: src/startup.rs:59-91

let app = value
    .app
    .with(RateLimit::new(&rate_limit_config))
    .with(Cors::new())
    .data(value.settings);

Key Insights:

  • Middleware applied via .with() method chaining
  • Order matters: RateLimit → CORS → data injection
  • Settings injected as shared data via .data()
  • Configuration drives middleware behavior

6. Configuration Management Pattern

Location: src/settings.rs:40-62

let settings = config::Config::builder()
    .add_source(config::File::from(settings_directory.join("base.yaml")))
    .add_source(config::File::from(
        settings_directory.join(environment_filename),
    ))
    .add_source(
        config::Environment::with_prefix("APP")
            .prefix_separator("__")
            .separator("__"),
    )
    .build()?;

Key Insights:

  • Three-tier configuration: base → environment-specific → env vars
  • Environment detected via APP_ENVIRONMENT variable
  • Environment variables use APP__ prefix with double underscore separators
  • Type-safe deserialization

7. Testing Pattern

Location: src/route/health.rs:31-38

#[tokio::test]
async fn health_check_works() {
    let app = crate::get_test_app();
    let cli = poem::test::TestClient::new(app);
    let resp = cli.get("/api/health").send().await;
    resp.assert_status_is_ok();
}

Key Insights:

  • Test helper creates full application with random port
  • TestClient provides fluent assertion API
  • Tests are async with #[tokio::test]
  • Real application used in tests

Type System Best Practices

Current code demonstrates excellent TyDD:

  • Environment enum instead of strings
  • RateLimitConfig newtype instead of raw numbers
  • ApiResponse enums for type-safe HTTP responses

Architecture Compliance

Current Layers:

  1. Presentation Layer: src/route/* - HTTP adapters
  2. Infrastructure Layer: src/middleware/*, src/startup.rs, src/telemetry.rs

Missing Layers (to be added for Modbus): 3. Domain Layer: Pure relay logic, no Modbus knowledge 4. Application Layer: Use cases (get status, toggle)


Integration Recommendations

Following hexagonal architecture principles from constitution:

src/
├── domain/
│   └── relay/
│       ├── mod.rs           - Domain types (RelayId, RelayState, Relay)
│       ├── relay.rs         - Relay entity
│       ├── error.rs         - Domain errors
│       └── repository.rs    - RelayRepository trait
├── application/
│   └── relay/
│       ├── mod.rs           - Use case exports
│       ├── get_status.rs    - GetRelayStatus use case
│       ├── toggle.rs        - ToggleRelay use case
│       └── bulk_control.rs  - BulkControl use case
├── infrastructure/
│   └── modbus/
│       ├── mod.rs           - Modbus exports
│       ├── client.rs        - ModbusRelayRepository implementation
│       ├── config.rs        - Modbus configuration
│       └── error.rs         - Modbus-specific errors
└── route/
    └── relay.rs             - HTTP adapter (presentation layer)

Integration Points

Component File Action
API Category src/route/mod.rs Add Relay to ApiCategory enum
API Aggregator src/route/mod.rs Add relay: RelayApi field to Api struct
API Tuple src/route/mod.rs Add RelayApi to .apis() return tuple
Settings src/settings.rs Add ModbusSettings struct and modbus field
Config Files settings/base.yaml Add modbus: section
Shared State src/startup.rs Inject ModbusClient via .data()
Dependencies Cargo.toml Add tokio-modbus, async-trait, mockall

Example: New Route Handler

// src/route/relay.rs
use poem::Result;
use poem_openapi::{ApiResponse, Object, OpenApi, payload::Json, param::Path};
use crate::domain::relay::{RelayId, RelayState, Relay};

#[derive(Object, Serialize, Deserialize)]
struct RelayDto {
    id: u8,
    state: String,  // "on" or "off"
    label: Option<String>,
}

#[derive(ApiResponse)]
enum RelayResponse {
    #[oai(status = 200)]
    Status(Json<RelayDto>),
    #[oai(status = 400)]
    BadRequest,
    #[oai(status = 503)]
    ServiceUnavailable,
}

#[OpenApi(tag = "ApiCategory::Relay")]
impl RelayApi {
    #[oai(path = "/relays/:id", method = "get")]
    async fn get_status(&self, id: Path<u8>) -> Result<RelayResponse> {
        let relay_id = RelayId::new(id.0)
            .map_err(|_| poem::Error::from_status(StatusCode::BAD_REQUEST))?;

        // Use application layer use case
        match self.get_status_use_case.execute(relay_id).await {
            Ok(relay) => Ok(RelayResponse::Status(Json(relay.into()))),
            Err(_) => Ok(RelayResponse::ServiceUnavailable),
        }
    }
}

Example: Settings Extension

// src/settings.rs
#[derive(Debug, serde::Deserialize, Clone)]
pub struct ModbusSettings {
    pub host: String,
    pub port: u16,
    pub slave_id: u8,
    pub timeout_seconds: u64,
}

#[derive(Debug, serde::Deserialize, Clone)]
pub struct Settings {
    pub application: ApplicationSettings,
    pub debug: bool,
    pub frontend_url: String,
    pub rate_limit: RateLimitSettings,
    pub modbus: ModbusSettings,  // New field
}
# settings/base.yaml
modbus:
  host: "192.168.1.100"
  port: 502
  slave_id: 1
  timeout_seconds: 3

Summary

Key Takeaways

  1. tokio-modbus 0.17.0: Excellent choice, use trait abstraction for testability
  2. HTTP Polling: Maintain spec decision, simpler and adequate for scale
  3. Hexagonal Architecture: Add domain/application layers following existing patterns
  4. Type-Driven Development: Apply newtype pattern (RelayId, RelayState)
  5. Testing: Use mockall with async-trait for >90% coverage without hardware

Next Steps

  1. Clarifying Questions: Resolve ambiguities in requirements
  2. Architecture Design: Create multiple implementation approaches
  3. Final Plan: Select approach and create detailed implementation plan
  4. Implementation: Follow TDD workflow with types-first design

End of Research Document