Expand user guide documentation and storage example with REPL#288
Expand user guide documentation and storage example with REPL#288
Conversation
This replaces the gorumsexample with storageexample and removes the obsolete tutorial proto.
- Remove auto-serve from NewLocalSystems; callers must now call sys.Serve() after registering services, fixing a racy RegisterHandler-after-Serve pattern. - Remove dead server functions (startServer2, startServerOnListener, startServer) and simplify runServer to use NewSystem directly. - Move terminal creation into newRepl(), removing the awkward NewTerminal inline struct from main() and the *term.Terminal parameter threading. - Add -interceptors CLI flag for comma-separated interceptor selection at startup (logging, nofoo, metadata, delayed). - Add doc/issue-dynamic-interceptors.md for future consideration of runtime interceptor toggling.
To avoids attempting to call the local node in nested calls.
This should prevent most issues where servers emit output after the client finishes a nested quorum call, multicast, or unicast calls that does not wait for a response.
Previously, creating a sub-configuration using cfg commands in the storage repl (e.g., cfg 0,2) failed with an "address already in use by node" error. This could happened because parseConfiguration used gorums.WithNodeList with the addresses of the selected nodes. WithNodeList is designed to instantiate new nodes with automatically generated IDs, which caused a conflict since those addresses were already registered in the manager's node registry under different IDs. This commit resolves the issue by directly extracting the existing pb.Node instances from the manager, sorting them by ID (as NewConfiguration normally would), and casting the resulting slice to pb.Configuration. This correctly reuses the existing nodes and their identifiers without attempting to register duplicate connections.
This replaces the "archived" shlex.Split dependency with a simple splitQuoted custom implementation that supports what we need for our simple repl implementation, namely quoted arguments. We do not need full Posix command line parsing supported by shlex and replacement libraries. This also updates the dependencies and the go version since we are already touching go.mod.
I believe this has been fixed in 2024 with this commit: golang/term@f867b76
Add a new section explaining how to use the interactive REPL tool included in examples/storage. The section covers how to start the storage example (both locally and against remote servers), lists every supported command in a reference table, and includes an example session showing reads, writes, unicast/multicast, correctable, and nested quorum-call operations. A forward reference to the nested-QC and ClientConfig sections is included. These sections will be added in separate follow up commits.
Add a new section explaining the nested quorum call pattern where a server handler issues its own quorum calls to peer nodes using ServerCtx.Config(). The section covers how to enable peer tracking with WithConfig, the importance of calling ctx.Release() before outbound nested calls to avoid blocking inbound message processing, handler examples for both ReadNestedQC and WriteNestedMulticast, and a sequence diagram illustrating the client-to-server-to-peers call flow.
This covers the addition of the new 'Reverse Direction Calls with ServerCtx.ClientConfig' section, which documents the firewall use case, WithClientConfig() server setup, anonymous client registration via Register, the handler pattern using ctx.ClientConfig(), the comparison table, and the sequence diagram.
Remove the peerCfg field from storageServer and the manual self-exclusion loop in main.go. NewOutboundConfig now uses the node list stored by NewLocalSystems, so callers no longer need to filter out their own address or pass an explicit node list. ReadNestedQC and WriteNestedMulticast always use ctx.Config() directly. System.NewOutboundConfig is updated to compose the base options (stored node list and optional request handler) before any caller-supplied options.
|
|
Overall Grade |
Security Reliability Complexity Hygiene |
Code Review Summary
| Analyzer | Status | Updated (UTC) | Details |
|---|---|---|---|
| Go | Mar 14, 2026 3:54p.m. | Review ↗ | |
| Shell | Mar 14, 2026 3:54p.m. | Review ↗ |
There was a problem hiding this comment.
Pull request overview
This PR updates Gorums documentation and examples to better cover all call types (unicast, multicast, quorum calls, correctable streaming) and symmetric/nested call patterns, while adding a core helper (NewLocalSystems) to simplify local multi-node setups and outbound configuration.
Changes:
- Expand
doc/user-guide.mdanddoc/dev-guide.mdwith updated guidance, diagrams, interceptor docs, and REPL usage. - Upgrade
examples/storageto cover additional call types (including correctable + nested calls) and extend the REPL accordingly; remove the legacyexamples/gorumsexampleproto. - Add
gorums.NewLocalSystemsand updateSystem.NewOutboundConfigto reuse a stored node list when available.
Reviewed changes
Copilot reviewed 16 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| system.go | Adds NewLocalSystems, stores a node list on System, and changes NewOutboundConfig defaults. |
| go.work | Bumps workspace Go version to 1.26.0. |
| go.sum | Updates dependency checksums after module/version bumps. |
| go.mod | Bumps module Go version to 1.26.0 and updates dependencies. |
| examples/storage/server.go | Refactors server startup via gorums.NewSystem and adds new handlers (unicast, multicast, correctable, nested calls). |
| examples/storage/repl.go | Adds new REPL commands and replaces shlex with a local quoted-splitting implementation. |
| examples/storage/proto/storage_gorums.pb.go | Regenerated Gorums bindings for new call types. |
| examples/storage/proto/storage.proto | Extends the Storage service with unicast, multicast, nested quorum calls, and correctable streaming quorum call. |
| examples/storage/proto/storage.pb.go | Regenerated protobuf bindings for updated storage proto. |
| examples/storage/main.go | Uses gorums.NewLocalSystems, adds interceptor flag parsing, and sets up local systems. |
| examples/storage/client.go | Minor cleanup while keeping configuration creation for the REPL client. |
| examples/gorumsexample/storage.proto | Removes legacy tutorial proto (directory deprecation). |
| examples/go.sum | Updates example-module checksums (including removal of google/shlex). |
| examples/go.mod | Bumps example module Go version to 1.26.0 and updates deps. |
| doc/user-guide.md | Major expansion: call types, diagrams, interceptors, REPL docs, nested call patterns. |
| doc/dev-guide.md | Clarifies “static code” definition in dev/ and expands Makefile targets table. |
| AGENTS.md | Adds conventional-commit requirements; removes a design-doc reference. |
| .vscode/gorums.txt | Adds new words/commands to workspace dictionary. |
| .gitignore | Removes ignores for legacy tutorial generated files (since tutorial example is removed). |
Comments suppressed due to low confidence (1)
doc/user-guide.md:123
- This section says the runnable example is
examples/storage, but the sequence diagram and proto snippet still use method names like ReadOrdered/ReadQCStream that don’t exist inexamples/storage/proto/storage.proto(which uses ReadRPC/WriteRPC/ReadCorrectable, etc.). Please update the diagram/snippet to match the actual storage example API, or clarify that it’s a separate tutorial proto.
Note over Client,Node B: Correctable (Streaming)
Client->>Node A: ReadQCStream(Req)
Client->>Node B: ReadQCStream(Req)
Node A-->>Client: ReadResponse(v1, t1)
Node B-->>Client: ReadResponse(v1, t1)
Note over Client: Client reads first quorum result
Node A-->>Client: ReadResponse(v2, t2)
Node B-->>Client: ReadResponse(v2, t2)
Note over Client: Client reads updated quorum result
Configuring Call Types in Protobuf
The file storage.proto should have the following content, illustrating the different call types:
edition = "2024";
package proto;
option go_package = "github.com/relab/gorums/examples/storage/proto";
option features.field_presence = IMPLICIT;
import "google/protobuf/empty.proto";
import "gorums.proto";
// Storage service defines the RPCs for a simple key-value storage system.
service Storage {
// ReadOrdered is a FIFO-ordered RPC to a single node.
rpc ReadOrdered(google.protobuf.Empty) returns (State) {}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add isOption() to ServerOption so it satisfies the Option interface, allowing NewSystem and NewLocalSystems to accept a single ...Option variadic instead of ...ServerOption. NewLocalSystems auto-creates an outbound Configuration per system when ManagerOptions (e.g. dial options) are provided, accessible via System.OutboundConfig(). NewSystem does the same when a NodeListOption is present. The internal createOutboundConfig helper combines the node list, self-routing request handler, and manager options. NewOutboundConfig is retained for asymmetric setups where the node list differs per system; it no longer requires a stored nodeList field, which is removed from the System struct. NewConfig now returns a clear error when a ServerOption is passed to it by mistake instead of the generic unknown-type error. TestSystems is simplified to wrap NewLocalSystems directly, removing the callback-based setup and duplicate listener pre-allocation. All symmetric test helpers in system_test.go are updated to use sys.OutboundConfig(). The storage example is simplified to pass dial options directly to NewLocalSystems.
- remove WithClientConfig and always initialize inbound peer management - track only peer-capable clients (gorums-node-id: 0) as dynamic peers - keep regular clients without gorums-node-id out of ClientConfig - add Server.NewConfig to create outbound configs with back-channel handler support - update inbound manager logic and tests for revised client/peer behavior
Extract splitOptions to DRY up option parsing in NewSystem and NewLocalSystems, and guard against typed-nil options via reflect.ValueOf.IsNil. Rename createOutboundConfig to newOutboundConfig with variadic ManagerOption signature; always include withRequestHandler unconditionally rather than conditionally on inboundManager. Reject NodeListOption in NewLocalSystems and always create the outbound Configuration. Extract allocateListeners as a standalone helper function. Remove the NewOutboundConfig public method; asymmetric setups should use Server.NewConfig directly instead. Rename system_test.go tests for clarity. Simplify createClientServerSystems to use a plain *Server as the client instead of a full System with its own listener. Add opts_test.go with table-driven tests covering typed-nil handling in splitOptions.
A client must explicitly announce NodeID=0 in its connection metadata to be assigned a dynamic node ID by the server and to appear in ClientConfig() for reverse-direction calls. Update the "Connecting as an Anonymous Client" section to: - Lead with the NodeID=0 announcement as the enabling mechanism - Replace the stale myID reference with the current node ID wording - Link the announcement explicitly to enabling reverse-direction calls - Note that clientSrv.NewConfig without WithConfig sends NodeID=0 automatically
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 27 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
doc/user-guide.md:123
- The guide states that the runnable example and proto definition live in
examples/storage/proto/storage.proto, but the proto snippet and diagrams below still use the older API shape (e.g., ReadOrdered/ReadQCStream/State) which does not matchexamples/storage/proto/storage.protoin this PR (ReadRPC/WriteRPC/ReadCorrectable/etc.). Please either update the snippet/diagrams to match the actual storage example, or clarify that this section uses a simplified, separate proto that is not the same as the runnable example.
The runnable version of this example lives in `examples/storage`.
The protobuf definition for that example lives in `examples/storage/proto/storage.proto`.
If you are starting from scratch, you can still create your own module and use the same proto structure.
```shell
mkdir $HOME/storageexample
cd $HOME/storageexample
go mod init storageexample
go get github.com/relab/gorums
Call Types
Gorums offers several call types including quorum calls, and one-way unicast and multicast communication.
To select a call type for a Protobuf service method, specify one of the following options (they cannot be combined):
| Call type | Gorums option | Description |
|---|---|---|
| Ordered RPC | no option | FIFO-ordered synchronous RPC to a single node. |
| Unicast | gorums.unicast |
FIFO-ordered one-way asynchronous unicast. |
| Multicast | gorums.multicast |
FIFO-ordered one-way asynchronous multicast. |
| Quorum Call | gorums.quorumcall |
FIFO-ordered asynchronous quorum call on a configuration of nodes. |
| Synchronous Quorum Call | gorums.quorumcall |
Synchronous variant; chain a terminal method (.First(), .Majority(), .All(), .Threshold(n)) to block until the quorum threshold is met. |
| Correctable | gorums.quorumcall + stream |
Streaming quorum call; use .Correctable(n) to observe progressive updates as nodes respond. |
The generated API is similar to unary gRPC, unless the stream keyword is used in the proto definition.
Server streaming is only supported for quorum calls.
Client streaming is not supported.
sequenceDiagram
participant Client
participant Node A
participant Node B
Note over Client,Node B: Multicast (One-way)
Client->>Node A: WriteMulticast(Req)
Client->>Node B: WriteMulticast(Req)
Note over Client: Client continues immediately
Note over Client,Node B: Quorum Call (Synchronous)
Client->>Node A: ReadQC(Req)
Client->>Node B: ReadQC(Req)
Node A-->>Client: ReadResponse(Value, Time)
Node B-->>Client: ReadResponse(Value, Time)
Note over Client: Client aggregates responses
Note over Client,Node B: Correctable (Streaming)
Client->>Node A: ReadQCStream(Req)
Client->>Node B: ReadQCStream(Req)
Node A-->>Client: ReadResponse(v1, t1)
Node B-->>Client: ReadResponse(v1, t1)
Note over Client: Client reads first quorum result
Node A-->>Client: ReadResponse(v2, t2)
Node B-->>Client: ReadResponse(v2, t2)
Note over Client: Client reads updated quorum result
Configuring Call Types in Protobuf
The file storage.proto should have the following content, illustrating the different call types:
edition = "2024";
package proto;
option go_package = "github.com/relab/gorums/examples/storage/proto";
option features.field_presence = IMPLICIT;
import "google/protobuf/empty.proto";
import "gorums.proto";
// Storage service defines the RPCs for a simple key-value storage system.
service Storage {
// ReadOrdered is a FIFO-ordered RPC to a single node.
rpc ReadOrdered(google.protobuf.Empty) returns (State) {}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
System.Stop relied solely on grpc.Server.Stop to close the listener, but gRPC only closes listeners registered via Serve. If Stop was called before Serve (e.g. in the NewLocalSystems error-unwind path or via the returned stop closure), the pre-allocated net.Listener leaked a file descriptor. Fix by always calling s.lis.Close() in System.Stop after s.srv.Stop; a harmless error is discarded when gRPC already closed it. Add tests to verify the listener is released in both scenarios.
Allow interceptors and other options to be passed to runServer.
We removed the clientPeers argument in an earlier commit; this fixes the documentation accordingly.
Update method names, message types, and code snippets throughout the user guide to match examples/storage/proto/storage.proto. Replace ReadOrdered/ReadQCStream/State with ReadRPC/ReadCorrectable/ ReadRequest/ReadResponse/WriteRequest. Add a note that the proto snippet is a partial representation showing one RPC per call type. Use AsTime().After() for timestamp comparisons to match the API used in the actual client and server code.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 28 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
We should not return a response when there is an error.
Documentation and Expanded Storage Example
Significantly expands the user guide and developer guide documentation, and upgrades the
examples/storageexample to cover all call types — including unicast, multicast, correctable reads, and nested (symmetric) quorum calls usingServerCtx.Config(). AddsNewLocalSystemsto the core library for pre-allocating local test clusters, and removes the legacygorumsexampledirectory in favor of the richerstorageexample.Changes
Documentation (
doc/)ctx.Release()usage, handler code, and a sequence diagram.gorumsexampletostorage/proto.dev/(non-zorums_.gofiles excluding test files).Storage example (
examples/storage/)WriteUnicast(unicast)WriteMulticast(multicast)ReadNestedQC(quorum call with server-side fan-out viaServerCtx.Config())WriteNestedMulticast(quorum call wrapping a nested multicast)ReadCorrectable(streaming quorum call)WriteUnicast,ReadCorrectable,ReadNestedQC, andWriteNestedMulticasthandlers.rawWriterhelper that translates\nto\r\nfor correct output in raw terminal mode.startServerhelper withgorums.NewSystem-based setup.newStorageServerto acceptio.Writerand a label for logging.gorums.NewLocalSystemsfor local cluster setup.--interceptorsflag supportinglogging,nofoo,metadata,delayed.sys.NewOutboundConfig(no explicit node list needed — stored byNewLocalSystems).ucast/unicastcommand for one-way unicast writes.creadsub-command for correctable reads (streams progressive updates).nreadandnwritesub-commands for nested quorum call / nested multicast.github.com/google/shlexwith a self-containedsplitQuotedimplementation (removes external dependency).parseConfigurationintoparseConfiguration+parseIndices(error-returning, no implicitfmt.Println).cfgrange syntax to allowstop == numNodes(exclusive end).delayOutputconstant for short post-call delay so server log lines appear before the prompt.Core library
NewLocalSystems(n int, opts ...ServerOption) ([]*System, func(), error)— pre-allocates all listeners before creating any system (solves the chicken-and-egg address problem), configures each system withWithConfig(node IDs 1..n), and returns astopclosure.nodeListfield toSystemto store the node list set byNewLocalSystems.NewOutboundConfigto use the stored node list automatically, simplifying call sites.Testing
All existing tests continue to pass. The storage example builds and runs end-to-end with the new REPL commands and local cluster setup. Storage example has been tested manually.
Related
Closes #285