Light Clients: Independent Verification

grantkee · August 19, 2024, 8:51pm

The network needs to support trustless clients that verify consensus output independently. These “execution nodes” or “light clients” are used by operators who want to track the canonical chain but do not want to store historic data.

All execution results are stored in memory and garbage collected over time.

One use case is creating attestations for bridging.

grantkee · August 20, 2024, 5:05pm

Clarifying scope

There is an important consideration to make between “light clients” and “execution nodes”.

Light Clients

Light clients rely on execution results from other nodes. The goal for light clients is to verify execution results without independently executing the blocks themselves. These clients are still secure and are much faster/lighter than “execution nodes”.

Light clients receive and validate consensus output using validator public keys and asymmetric cryptography. If the light client considers the output valid, it makes a request to get the latest header from a full validator node that matches the “parent_beacon_block_root” and “nonce”. These header values correspond to TN consensus output digest and consensus sequence number for the round.

Using a fetched block from a known validator (verifiable on-chain) allows the light client to quickly and efficiently obtain the latest state of Telcoin Network without having to execute every transaction itself.

Execution Nodes

Execution nodes are stateless clients that independently execute output from consensus. The execution results are not persisted and only stored in memory. The execution node clears up memory over time to prevent exhausting resources. These nodes must download all worker blocks that reach quorum and verify state transitions are valid according to consensus.

By downloading worker blocks as they reach quorum, execution nodes likely have all the blocks they need to execute consensus output. However, the nodes must also have access to download any blocks they’re missing in order to fully execute the output. The missing blocks should be available from nodes that specialize in data availability to limit the number of requests from peers outside the network of nodes that reach consensus.

Ethereum plans to support “stateless clients” eventually, but Verkle trees are a prerequisite for their implementation. Telcoin Network may support these later, but chooses to prioritize a separate node type for data availability.

Validators waiting to participate in consensus must receive output from consensus, download all worker blocks that reach quorum, independently execute all transactions, and store all data. The “non-voting validators” (NVVs) nodes act as an important buffer for data availability and reduce the burden on “committee voting validators” (CVVs), which are responsible for reaching consensus and extending the canonical chain. NVVs support the network as archives for the epoch in which they are not voting. It also supports epoch transitions by ensuring validators are online and ready to participate. The network’s integrity could benefit from NVVs attesting to CVV output, but is outside the scope of this improvement proposal.

Careful consideration is needed to balance the burden for CVVs to reliably broadcasting sealed artifacts to peers within the voting committee and broadcasting sealed artifacts to NVVs for wider network propagation. This is a known scalability issue, but disregarded for now in order to prioritize protocol features over optimization.

grantkee · August 20, 2024, 5:17pm

Security hypothetical: Fake Bridge Transaction

Scenario

The malicious actor creates a fake committee with BLS keys that they control. The validator’s workers are deployed and seemingly running the protocol successfully to produce blocks. However, the network was constructed with an invalid genesis state that benefited the attacker.

In this scenario, the attacker is trying to trick the bridge into migrating funds.

The attacker has funded execution layer accounts (secp256k1) with a generous amount of TEL and they want to bridge off Telcoin Network to Ethereum.

Light client experience

A new validator joins the Axelar network to attest that bridging transactions are legitimate from Telcoin Network to Ethereum. They use a discovery mechanism that has been hijacked by the attackers. They see the malicious committee as valid.

The malicious committee transitions state to show a transaction bridging TEL to Ethereum.

Outcome

The transactions would not succeed because the locked TEL on Ethereum would not match the request. Axelar would not be able to unlock the amount of funds allocated by the attacker to their account.

grantkee · September 10, 2024, 6:11pm

Bridging as safe as consensus (BASAC)

Telcoin Network is fundamentally tied to the bridging process. Without a secure, successful, efficient bridging solution, Telcoin Network can’t survive. Bridging is so important, one could argue it’s as important as consensus itself.

Built-in light client support

Because bridging is arguably as important as consensus, validators should fully support efficient ways to verify state changes. Ideally, anyone can use light clients but the scope of this proposal should prioritize bridging attestation clients.

Validators already produce full blocks for full execution clients, but they should also create “light blocks” with all the information light clients need to verify the next block.

Signed messages

Validator public keys are available on-chain, so any light client with a known block can use that to verify committees and signatures. Once a light client has its “genesis” root (the well-known block it uses as a root to verify future blocks - could also be TN genesis block, but doesn’t have to match), it collects gossiped light blocks to track state changes. All nodes must gossip light blocks to support data availability for bridging services.

Light clients simply collect light blocks from a quorum of validators for the round of consensus.

Technical flow

Light client starts with well-known genesis block that the node operator is responsible for manually verifying. For this flow, assume the light client’s well-known block is also the genesis block from epoch 0.
Light client reaches out to RPC for handshake (limited to only NVVs?)
- Validator adds light client to peer list. This list is used to gossip light blocks for new rounds of consensus.
- Validator may or may not validate light client credentials. If for some reason merkle proofs bog down the network, it may be necessary to prioritize connections for light clients involved with bridging. For now, focus on happy path: Validator simply acknowledges that they are now peers and replies with it’s version of the public committee information that the light client can use to discover other validators.
- Light clients must reach out to validators and introduce themselves?
CVV commits new round of consensus and creates consensus output and signs a light block.
- Gossip light block to NVVs
- NVVs verify and gossip light block
- Light clients verify and gossip light block
Client receives light block through gossip network or directly from peer validator
- Client verifies the integrity of the message
- Client reaches out to start downloading block data?
  - At this point, would it be better for protocol to support the data bridging needs through a special RPC endpoint?
    - CVV uses worker to sign bridge requests for account info, any relevant logs?
    - How much of this should be baked into protocol vs EVM agnostic solution?
    - I think lean towards EVM agnostic for now. It places more execution burden on the light client, but reduces the lift for protocol devs in the short term. Execution code already exists, but protocol devs would need to develop special RPC endpoints to support custom bridging info requests.
    - If only creating a “bridging client”, then the validators could create a “bridge block” instead of a “light block” that contains specific account info and logs needed for bridging.
Client tracks messages for latest round. Once enough messages (2f+1) are collected from CVVs for the round, the client considers the round validated.
- For example, in a 4 node network for “Round X”:
  - Light client needs at least 3 signed messages for round x with matching data
- The light client starts downloading data for the round from a peer once it receives the first signed message

Outstanding considerations

Is eth_getProof rpc call sufficient for gossiped blocks without additional light block type?
If committed to light blocks, should light block messages be signed by the primary or is worker sufficient?
- If worker signs, then this could be used for follow-up exchanges
  - Light block only contains roots for merkle proof, but bridging light clients are looking for on-chain events and account balances.
  - Light client could verify any request with worker signature for special rpc endpoint, BUT this creates the need for light client verification to prevent DoS attacks by having working constantly signing data
TN might include a new rpc method for finding block data by round (aka block nonce)
- Consensus round could produce multiple execution blocks
TN as a “friendly network” where peer connections for workers are included in worker blocks so bridging clients have more support
- Protocol could propagate bridging peers to better support discoverability
  - Beneficial for supporting bridge-specific clients, but less ideal for trustless light client approach
Verifying a single block requires all data from the round
- To re-execute a block, a client would need only the parent execution block, but then all of the data from consensus output for the round

robriks · September 13, 2024, 6:24pm

Agreed on aiming for keeping our bridging infra as EVM agnostic as possible since that is how Axelar’s infra is designed to be used. Introducing new RPC endpoints and baking more complexity into the protocol for bridging purposes increases attack surface area and can introduce more tech debt than we’d want to take on, especially considering the likelihood of Axelar pushing updates to their protocol which could include breaking changes that require us to reorganize TN at the protocol layer

The boilerplate voting-verifier setup which we will be replacing with the light client uses the following components:

Ampd is the daemon which attaches to Axelar network and will listen for events (such as those reported by a relay tx) as well as connects to Telcoin Network via RPC where it can verify events on the TN side.

Tofnd is a dependency for Ampd, providing signing capabilities for transactions and batches.

Ampd checks the TN rpc endpoint to verify TN event emission when it is informed of a new bridging request (kickstarted by a the start of a “signing session”, ie a vote). If the bridge request is valid as confirmed by the RPC, then Ampd uses Tofnd to “vote yea” by signing a transaction and submitting it to the voting-verifier. This is the functionality that we will need to connect to the light client’s RPC.

I am still developing my understanding of these things but it really seems like the light client approach is very similar to the voting-verifier, with the main difference being the voting-verifer collects signatures from multiple Ampd+Tofnd instances by weight and uses them to construct a multisig signature on the multisig-prover when the weighted threshold is reached.

robriks · September 16, 2024, 9:37pm

Grant and I just spoke to Ben and Stephen @ Axelar to clarify some of the bridging architecture and its relevance to TN

For security, simplicity, and forward-compatibility with potential Axelar changes we will use all of their architecture’s default components, listed below

TN-specific deployments:

external gateway (on TN)
internal gateway (on Axelar)
voting verifier
multisig prover

Axelar general components:

router (routes verified messages between internal gateways)
governance (used to natively integrate TN with Axelar and handles upgrades of our external gateway)

Major takeaways that we will need to tackle:

implement relayer for outbound (from TN) bridge txs. this relayer must quasi-reconstruct consensus by collecting 2n+1 signatures from validators that a bridge event happened
implement verifier. this verifier(s) must verify the above signatures and vote until quorum is reached for the bridge message to be forwarded to the router
decide on number of verifier instances & their rewards schemas (for verifying messages from the relayer). These verifier instances will also serve to sign incoming messages from other chains via the multisig prover (and be compensated accordingly)

Other smaller miscellaneous takeaways:

users pay their own gas, fronting enough gas for all axelar internal costs and source chain costs
only support ethereum ↔ TN to start for simplicity & security

Here is a helpful screenshots for clarifying mental models of the bridging process:

Diagram Breakdown:

User locks $TEL in external gateway on TN, emitting ContractCall() event. These events are special occurrences for TN validators, who sign relevant data to commit that they executed this bridging state change and provide it to the quasi-consensus relayer
Once the quasi-consensus relayer has received 2n+1 validator-signed commits of the bridging event, it initiates a tx on Axelar chain calling verify_messages() on Axelar’s internal gateway (which is TN-specific)
Axelar internal TNGateway then kicks off verification by calling the TN voting-verifier who in turn commences a poll
verifier instances monitor the voting-verifier and notice the start of a new poll, leading them to initiate vote txs to Axelar chain. They should only vote yes if they find 2n+1 valid signatures originated by validators for the bridge message
poll ends
voting-verifier responds to the internal gateway on whether quorum of yea votes for the bridge message was reached
Once a message has been validated, a tx can be initiated on internal_gateway::route_messages() to forward the bridge message to the router
router performs additional check to ensure the forwarded message is recorded as verified by the internal gateway (seems extraneous, I am probably missing something here)
router looks at the destination chain member of the verified bridge message to identify which destination internal gateway to route to (ethereum in this case) and instructs the destination internal gateway to store the routed message
external entity initiates an Axelar transaction calling the multisig-prover for the destination chain destination_multisig_prover::construct_proof() which first fetches the routed bridge message from the destination_internal_gateway
Multisig prover initiates a signing session, returning a session id
multisig_prover::construct_proof() pings the destination chain’s multisig contract to alert the destination chain’s verifiers that a signing session for session id has begun
destination chain’s verifiers notice session id has started and submit voting transactions on the validity of the bridge message
signing session completes, emitting an event
multisig contract informs the multisig prover whether session id reached quorum of verifier votes. If successful, the bridge message is finalized and stored as a proof
relayer for the destination chain (ethereum) can now query destination_multisig_prover::getProof() and use it to execute the bridged msg at the destination chain

robriks · September 23, 2024, 11:34pm

Update on bridging architecture for the Telcoin-Network (EVM) side

Gateway Contracts:

AxelarAmplifierGateway and AxelarAmplifierGatewayProxy contracts are deployed with CREATE3 because it provides deterministic canonical contract addresses that are not dependent on constructor configurations (WeightedSigners, etc) which are liable to change in the near future

RWTEL Module:

The Recoverable Wrapped Telcoin rwTEL contract serves as TN’s bridge architecture entry and exit point WRT execution. In simpler terms, this module performs the actual delivery of inbound $TEL from Ethereum “ethTEL” and locks outbound $TEL “tnTEL”

RWTEL.sol combines Circle Research’s recoverable wrapper utility with the required Axelar parent contract, AxelarGMPExecutable, which provides the bridge setup’s execution interface _execute() which must be implemented to support Axelar bridge infra as it will be called by the Axelar gateway.

RWTEL Rundown

Rather than revert execution of failing bridge transfers, _execute() emits an ExecutionFailed event. This is inspired by ERC4337’s handling of failed UserOperations and ensures that failed bridge transfers don’t reach an invalid state where the message is marked as BaseAxelarAmplifierGateway::MESSAGE_APPROVED but can never be executed to reach BaseAxelarAmplifierGateway::MESSAGE_EXECUTED. Such a state would only confuse relayers watching these states who might repeatedly try to re-execute failures.
Instead, failed bridge transfers will emit ExecutionFailed and still be marked as executed even when the destination address rejects a native tnTEL transfer. The onus is then on the user/bridger to resubmit a valid, non-reverting bridge transfer which succeeds.
Because ethTEL is the canonical Telcoin token which will be used as native currency on Telcoin, a way for incoming ethTEL ERC20 tokens to be converted to a non-ERC20 base layer currency must be implemented at the bridge entrypoint (RWTEL module). If not, incoming ethTEL from Ethereum mainnet would be delivered as ERC20 wTEL and cannot be unwrapped to native tnTEL; even worse, no native currency would exist so no entity would even be able to transact on TN as there would be no currency to pay gas with. There are two viable approaches to achieve this in the RWTEL module:
1. The total supply of ethTel is “minted” to the RWTEL module as native tnTEL at genesis, and exists there but cannot be accessed by anyone in any way other than bridging from Ethereum.
  - In this case, the tnTEL native currency exists from network genesis but is locked and can only be unlocked when a corresponding amount of ethTEL ERC20 token is locked in the bridge gateway contract on Ethereum.
  - The security posture for this case is sandboxed to ECDSA integrity: the total supply of tnTEL is inaccessible to anyone unless they brute force a private key for the RWTEL address or both:
  A. possess ethTEL on Ethereum, and
  B. submit a bridge message by locking ethTEL on Ethereum.
2. The RWTEL module is granted special authority to access the TN db directly using a syscall or similar mechanism which is used to mint new native tnTEL each time a valid bridge request is processed and received to the RWTEL contract
  - In this case, tnTEL doesn’t exist until it is bridged from Ethereum
  - The security posture for this case is manifold, since RWTEL would retain access the chain’s most sensitive component:
  A. ECDSA integrity of the RWTEL (brute force its private key)
  B. exploitation of the RWTEL module’s contract logic (solidity side)
  C. exploitation of the RWTEL module’s syscall logic (rust side)

In both case 1 and 2 above, the locked status of tokens on Ethereum must be independently verified by a quorum of Axelar verifiers before that bridge request can be carried out on TN.

Currently, the implementations in this PR are designed around #1, which is simpler and more secure as well as easier to ship quickly. This could change and discussion on the topic is encouraged

robriks · November 19, 2024, 2:08am

Update on Bridging Architecture: Relayers

The Telcoin-Network bridging architecture will make use of three different kinds of relayers. These are called

Subscribers
Listeners
Includers

Each type of relayer serves a specific hooking role, listening for events to be emitted by a contract and acting upon noticing them. The Subscribers and Includers are close to completion but I am waiting on API onboarding from Axelar and to confirm that the Listeners are handled by their GMP API.

The Subscriber

Subscriber spec:

The Subscriber’s job is to guarantee that every protocol event on the source chain is detected and either successfully published to the GMP API or directly relayed to the Axelar blockchain. The Subscriber detects outgoing GMP messages from the chain to the AVM and passes them along so that they can be verified, routed, signed, and delivered to the destination chain.

subscribe to external eth gateway, hooking into onchain events
filter for ContractCall(address indexed sender, string destinationChain, string destinationContractAddress, bytes32 indexed payloadHash, bytes payload)
pass payload on to voting verifier

The Includer

Includer spec:

The Includer’s job is to guarantee that some payload (termed task by Axelar in the GMP API) gets included in a transaction in a block on the destination chain. The Includer receives incoming GMP messages from the AVM either directly or through the GMP API and then executes them by writing the transaction payloads to the external gateway contract in a block on the destination chain.

if API: poll GMP Task API for new tasks
if not API: listen for AVM event
check whether new tasks are already executed (ie by another includer)
translate task payload into transaction
sign transaction and publish to TN via RPC
monitor transaction & adjust gas params if necessary
if API: inform GMP Task API of successful execution

To show the role of each relayer type and their connections between networks, I’ve outlined the high level flow of bridge data from end to end below

Topic		Replies	Views
Native Token: TEL TNIP core	0	111	August 20, 2024
Telcoin Association Release, August 2024 August 2024	1	976	August 23, 2024
TELx Council Election 1: Liquidity Miners TELx Council Elections liquidity-miners	4	673	August 30, 2024
Telcoin Association Release (Lite Version), August 2024 August 2024	0	224	August 1, 2024
TGIP1: Establishing the Telcoin Platform and Association Miner Assembly	16	5289	December 21, 2023