Original author: @Web3 Mario
Introduction: With the launch of Notcoin, the largest game in the TON ecosystem, on Binance and the huge wealth effect caused by the fully circulated token economic model, TON has gained great attention in a short period of time. After chatting with a friend, I learned that the technical threshold of TON is relatively high, and the DApp development paradigm is very different from the mainstream public chain protocol. Therefore, I spent some time to study the relevant topics in depth, and I have some experience to share with you. In short, the core design concept of TON is to reconstruct the traditional blockchain protocol in a bottom-up way, and to achieve the ultimate pursuit of high concurrency and high scalability at the cost of abandoning interoperability.
The core design concept of TON - high concurrency and high scalability
It can be said that the purpose of all the complex technology selection in TON comes from the pursuit of high concurrency and high scalability. Of course, it is not difficult for us to understand this from the background of its birth. TON, or The Open Network, is a decentralized computing network consisting of an L1 blockchain and multiple components. TON was originally developed by Telegrams founder Nikolai Durov and his team, and now it is supported and maintained by a community of independent contributors around the world. Its birth dates back to 2017, when the Telegram team began to explore blockchain solutions for themselves. Since there was no existing L1 blockchain that could support Telegrams nine-digit user base at the time, they decided to design their own blockchain, which was called Telegram Open Network at the time. Time came to 2018. In order to obtain the resources needed to realize TON, Telegram launched the sale of Gram tokens (later renamed Toncoin) in the first quarter of 2018. In 2020, due to regulatory issues, the Telegram team withdrew from the TON project. Subsequently, a small group of open source developers and Telegram competition winners took over the TON codebase, renamed the project to The Open Network, and continue to actively develop the blockchain to this day, following the principles outlined in the original TON white paper.
Since the design goal is to be a decentralized execution environment for Telegram, it naturally faces two problems: high concurrent requests and massive data. As we know, with the development of technology, Solana, which claims to have the highest TPS, has a measured maximum TPS of only 65,000, which is obviously not enough to support the Telegram ecosystem that requires millions of TPS. At the same time, with the large-scale application of Telegram, the amount of data it generates has already broken through the sky, and as an extremely redundant distributed system, blockchain requires that each node in the network save a complete copy of the data, which is also unrealistic.
Therefore, in order to solve the above two problems, TON has made two optimizations to the mainstream blockchain protocol:
By adopting the Infinite Sharding Paradigm to design the system, the data redundancy problem is solved, so that it can carry big data while alleviating performance bottlenecks;
By introducing a fully parallel execution environment based on the Actor model, the network TPS is greatly improved;
Become a blockchain chain - through unlimited sharding capabilities, each account has a dedicated account chain
We know that sharding has become the mainstream solution for most blockchain protocols to improve performance and reduce costs, and TON has taken this to the extreme and proposed an infinite sharding paradigm, which means that the blockchain is allowed to dynamically increase or decrease the number of shards according to the network load. This paradigm enables TON to handle large-scale transactions and smart contract operations while maintaining high performance. In theory, TON can establish an exclusive account chain for each account and ensure the consistency between these chains through certain rules.
To put it abstractly, there are four layers of chain structure in TON:
Account Chain: This layer of chain represents a chain of transactions related to a certain account. The reason why transactions can form a chain structure is that for a state machine, as long as the execution rules are consistent, the state machine will get the same result after receiving the same order of instructions. Therefore, all blockchain distributed systems need to sort transactions in a chain, and TON is no exception. The account chain is the most basic component unit in the TON network. Usually, the account chain is a virtual concept, and it is unlikely that an independent account chain will actually exist.
Shard Chain: In most contexts, the shard chain is the actual component unit of TON. The so-called shard chain is a collection of account chains.
WorkChain: It can also be called a set of shard chains with custom rules, such as creating an EVM-based workchain and running Solidity smart contracts on it. In theory, everyone in the community can create their own workchain. In reality, building it is a rather complex task, and before that you have to pay the (expensive) fee for creating it and get 2/3 of the votes of the validators to approve the creation of your workchain.
MasterChain: Finally, there is a special chain in TON called the master chain, which is responsible for bringing finality to all shard chains. Once the hash value of a shard chains block is merged into the master chains block, the shard chain block and all its parent blocks are considered final, which means that they can be considered fixed and immutable content and referenced by subsequent blocks of all shard chains.
By adopting this paradigm, the TON network has the following three characteristics:
Dynamic Sharding: TON can automatically split and merge shard chains to adapt to changes in load. This means that new blocks are always generated quickly and transactions do not incur long waiting times.
Highly scalable: Through the infinite sharding paradigm, TON is able to support an almost unlimited number of shards, theoretically up to 2 to the power of 60 working chains.
Adaptability: When the load on a part of the network increases, that part can be subdivided into more shards to handle the increased transaction volume. Conversely, when the load decreases, shards can be merged to improve efficiency.
Then such a multi-chain system first needs to face the problem of cross-chain communication, especially because of the ability of unlimited sharding. When the number of shards in the network reaches a certain level, information routing between chains will become a difficult task. Imagine that there are 4 nodes in the network, each node is responsible for maintaining an independent working chain, where the link relationship means that in addition to being responsible for the transaction sorting work in its own working chain, the node also needs to monitor and process the state changes in the target chain. In TON, this is achieved by monitoring the messages in the output queue.
Assume that account A in work chain 1 wants to send a message to account C in work chain 3. Then the message routing problem needs to be designed. In this example, there are two routing paths, work chain 1 -> work chain 2 -> work chain 3, and work chain 1 -> work chain 4 -> work chain 3.
When faced with more complex situations, an efficient and low-cost routing algorithm is needed to quickly complete message communication. TON chose the so-called hypercube routing algorithm to achieve cross-chain message communication routing discovery. The so-called hypercube structure refers to a special network topology. An n-dimensional hypercube is composed of 2^n vertices, each of which can be uniquely identified by an n-bit binary number. In this structure, any two vertices are adjacent if they differ by only one bit in the binary representation. For example, in a 3-dimensional hypercube, vertex 000 and vertex 001 are adjacent because they differ only in the last bit. The above example is a 2-dimensional hypercube.
In the hypercube routing protocol, the process of routing messages from the source chain to the target chain is performed by comparing the binary representation of the source and target chain addresses. The routing algorithm finds the minimum distance between the two addresses (i.e., the number of different bits in the binary representation) and forwards the information step by step through adjacent chains until it reaches the target chain. This method ensures that data packets are transmitted along the shortest path, thereby improving the communication efficiency of the network.
Of course, in order to simplify this process, TON also proposed an optimistic technical solution. When a user can provide valid proof of a routing path, which is usually a merkle trie root, the node can directly acknowledge the credibility of the message submitted by the user. This is also called instant hypercube routing.
Therefore, we can see that the addresses in TON are significantly different from those in other blockchain protocols. Most other mainstream blockchain protocols use the hash corresponding to the public key in the public-private key generated by the elliptic encryption algorithm as the address, because the address is only used for uniqueness and does not need to carry the function of routing addressing. The address in TON consists of two parts, (workchain_id, account_id), where workchain_id is encoded according to the hypercube routing algorithm address, which will not be elaborated here.
There is another point that is easy to doubt. You may have noticed that the main chain and each working chain are linked. Then all cross-chain information can be relayed through the main chain, just like Cosmos. In the design concept of TON, the main chain is only used to handle the most critical tasks, that is, to maintain the finality of many working chains. It is not impossible to route messages through the main chain, but the resulting handling fees will be very expensive.
Finally, let me briefly mention its consensus algorithm. TON uses the BFT+PoS method, that is, any staker has the opportunity to participate in block packaging. TONs election governance contract will randomly select a packing validator cluster from all Stakers at regular intervals. The selected nodes, called validators, will pack blocks through the BFT algorithm. If they pack wrong information or do evil, their staked tokens will be confiscated, otherwise they will receive block rewards. This is basically a common choice, so I wont introduce it here.
Actor-based smart contracts and fully parallel execution environment
Another difference between TON and mainstream blockchain protocols is its smart contract execution environment. In order to break through the TPS limitations of mainstream blockchain protocols, TON adopted a bottom-up design approach and reconstructed smart contracts and their execution methods using the Actor model, enabling them to have full parallel execution capabilities.
We know that most mainstream blockchain protocols use a single-threaded serial execution environment. Taking Ethereum as an example, its execution environment EVM is a state machine that takes transactions as input. When the block-producing node completes the sorting of transactions by packaging blocks, it will execute transactions through EVM in this order. The whole process is completely serial and single-threaded, that is, only one transaction can be executed at a time. The advantage of this is that as long as the transaction order is confirmed, the execution result is consistent in a wide distributed cluster. At the same time, since only one transaction is executed serially at the same time, it means that during the execution process, it is impossible for other transactions to modify a state data to be accessed, thus achieving interoperability between smart contracts. For example, we use USDT to buy ETH through Uniswap. When the transaction is executed, the distribution of LPs in the transaction pair is a certain value, so the corresponding results can be obtained through certain mathematical models. However, if this is not the case, when executing a bonding curve calculation, other LPs add new liquidity, then the calculation result will be an outdated result, which is obviously unacceptable.
However, this architecture also has obvious limitations, that is, the TPS bottleneck, and this bottleneck seems very outdated under the current multi-core processors. It is like using the latest PC to play some old computer games, such as Red Alert. When the number of combat units reaches a certain level, you will still find that it is stuck. This is a problem with the software architecture.
You may hear that some protocols have already paid attention to this issue and proposed their own parallel solutions. For example, Solana, which is currently known to have the highest TPS, also has the ability to execute in parallel. However, its design concept is different from TON. In Solana, the core idea is to divide all transactions into several groups according to execution dependencies, and no state data is shared between different groups. That is, there is no identical dependency, so transactions in different groups can be executed in parallel without worrying about conflicts. For transactions in the same group, the traditional serial method is still used for execution.
In TON, however, the serial execution architecture is completely abandoned, and a development paradigm designed for parallelism, the Actor model, is adopted to reconstruct the execution environment. The so-called Actor model was first proposed by Carl Hewitt in 1973, with the aim of solving the complexity of shared state in traditional concurrent programs through message passing. Each Actor has its own private state and behavior, and does not share any state information with other Actors. The Actor model is a computing model for concurrent computing that implements parallel computing through message passing. In this model, Actor is the basic unit of work, which can process received messages, create new Actors, send more messages, and decide how to respond to the next message. The Actor model needs to have the following characteristics:
Encapsulation and Independence: Each Actor is completely independent in processing messages and can process messages in parallel without interfering with each other.
Message passing: Actors interact only by sending and receiving messages, and message passing is asynchronous.
Dynamic structure: Actors can create more Actors at runtime. This dynamism enables the Actor model to expand the system as needed.
TON uses this architecture to design the smart contract model, which means that in TON, each smart contract is an Actor model with completely independent storage space. Because it does not rely on any external data. In addition, calls to the same smart contract are still executed according to the order of messages in the receiving queue, so transactions in TON can be efficiently executed in parallel without worrying about conflicts.
However, this design also brings some new impacts. For DApp developers, their accustomed development paradigm will be broken, as follows:
1. Asynchronous calls between smart contracts: It is impossible to call external contracts or access external contract data atomically within TON’s smart contracts. We know that in Solidity, calling function 2 of contract B from function 1 of contract A, or accessing certain state data through read-only function 3 of contract C, the whole process is atomic and executed in one transaction, which is a very easy thing. However, in TON, this will not be possible. Any interaction with external smart contracts will be executed asynchronously by packaging new transactions. Such transactions initiated by smart contracts are also called internal messages. And the execution process cannot be blocked to obtain the execution result.
For example, if we develop a DEX and adopt the common paradigm in EVM, there will usually be a unified router contract to manage transaction routing, and each Pool will manage the LP data related to a certain trading pair separately. Suppose there are currently two pools, USDT-DAI and DAI-ETH. When a user wants to purchase ETH directly through USDT, he can request these two pools sequentially in one transaction through the router contract to complete the atomic transaction. However, it is not so easy to achieve in TON, and we need to think about a new development paradigm. If we still reuse this paradigm, the information flow may be like this: this request will be accompanied by an external message initiated by the user and three internal messages (note that this is used to illustrate the difference. In real development, even the ERC 20 paradigm needs to be redesigned).
2. It is necessary to carefully consider the processing flow of execution errors when calling across contracts, and design corresponding bounce functions for each inter-contract call. We know that in the mainstream EVM, when a problem occurs during the execution of a transaction, the entire transaction will be rolled back, that is, reset to the state at the beginning of the execution. This is easy to understand in the serial single-threaded model. However, in TON, since the inter-contract calls are executed asynchronously, even if an error occurs in a subsequent link, since the previously successfully executed transactions have been executed and confirmed, this may cause problems. Therefore, a special message type is set in TON, called a bounce message, that is, when an error occurs in the subsequent execution process triggered by an internal message, the triggered contract can trigger a reset of certain states in the contract by triggering the bounce function reserved by the contract.
3. In some complex cases, the transaction received first may not be executed first, so this timing relationship cannot be preset. In such a system of asynchronous and parallel smart contract calls, it may be difficult to define the order of processing operations. This is why each message in TON has its logical time Lamport time (hereinafter referred to as lt). It is used to understand which event triggered another and what the validator needs to process first. For a simple model, the transaction received first must be executed first.
In this model, A and B represent two smart contracts respectively, and there is a timing relationship that if msg 1 _lt < msg 2 _lt, then tx 1 _lt < tx 2 _lt.
However, in more complex cases, this rule will be broken. There is such an example in the official document. Suppose we have three contracts A, B and C. In a transaction, A sends two internal messages msg 1 and msg 2: one to B and the other to C. Although they are created in the exact order (msg 1 first, then msg 2), we cannot be sure that msg 1 will be processed before msg 2. This is because the routes from A to B and from A to C may differ in length and validator set. If these contracts are in different shard chains, one of the messages may take several blocks to reach the target contract. That is, we have two possible transaction paths, as shown in the figure.
4. In TON, the persistent storage of its smart contracts uses a directed acyclic graph with Cell as the unit as the data structure . The data will be compactly compressed into a Cell according to the encoding rules, and extended downward in the manner of a directed acyclic graph. This is different from the structural organization of state data based on hashmap in EVM. Due to different data request algorithms, TON sets different Gas prices for data processing at different depths. The deeper the Cell data processing, the higher the Gas required. Therefore, there is a paradigm of DOS attack in TON, that is, some malicious users occupy all shallow Cells in a smart contract by sending a large number of spam messages, which means that the storage cost of honest users will become higher and higher. In EVM, since the query complexity of hashmap is o(1), there is the same Gas and there will be no similar problem. Therefore, TON Dapp developers should try to avoid unbounded data types in smart contracts. When unbounded data types appear, they should be fragmented by sharding.
5. Some features are not so special, such as the need for smart contracts to pay rent for storage, the fact that smart contracts are naturally upgradeable in TON, and the native abstract account function, that is, all wallet addresses in TON are smart contracts, but they are not initialized, etc., which require developers to pay careful attention.
The above are some of my experiences in learning TON-related technologies during this period. I would like to share them with you. If there are any mistakes, I hope you can correct me. At the same time, I believe that with the huge traffic resources of Telegram, the TON ecosystem will definitely bring some new applications to Web3. Friends who are interested in TON DApp development can also contact me and discuss with us.
X Links: https://x.com/web3_mario
Telegram Handle: @MarioChin Web3