Each article covers one big segment of real-world AMM (read Uniswap) pools behaviour analysis and IXS mitigation model stress testing, covering details, and giving simplified explanations of the work performed. The detailed report, codebase, and simulation data are open-sourced and continuously updated on our GitHub at https://github.com/IX-Swap/data-science.
The financial market is a place of abrupt changes, for making risky decisions, which acts depending on the general pattern of behaviour of other traders and exposure to emotions in decision-making, regardless of the trader’s resistance to stress and negative factors.
Crypto markets have higher risks, while market changes are tremendous, with little to no guarantees and lack of regulations.
Cryptographic financial markets suffer from a lack of information on most of the assets, which together with the previously identified factors, causes the behaviour of traders to depend on the currently observed market situation. This behaviour can be described as a snowball that grows in volume as it moves.
Also, the lack of regulatory structures, centralization, and monitoring services provide traders with the opportunity to conduct fraudulent transactions in the market for profit, which is difficult or impossible in traditional financial markets.
For these reasons, in order to conduct effective tests of the cryptographic market, it is necessary to rely on two types of simulation of trader behaviour:
Since the consideration of the real history of trading is a complex topic, a separate article will be devoted to it, thus within this article, we’ll focus on the following two aspects:
There are two problems requiring a solution in performing efficient traders behaviour simulations from a mathematical approach. Transaction frequency defines how many transactions will happen per specified time interval, and it is required for those transactions to have unstable behaviour, with transactions amount per specified time interval deviating from the mean count. Transaction value defines how many tokens will be requested per exchange transaction by traders and simulation of those values requires similarity to the real transactions distributions. These two problems will be used in the next solutions:
In the case of transaction values, for models was applied principle for generating only positive values for transactions due to the impossibility of appearing negative values. This principle allows tuning the models for each token of each pool individually. The best models for simulating traders behaviour are Cauchy and Log-Normal ones. For those two best models were written scripts for finding the best parameters conforming to the principle of finding the smallest harmonic error. Parameter picking algorithms and the reason why Log-Normal and Cauchy distributions are demonstrated and explained below.
The number of operations in a set time interval is unstable, causing deviations. To simulate the behaviour of traders, a mathematical model is needed that can take into account the average number of transactions over a time interval with an unstable deviation in the number of transactions. Also, transactions must have randomly arranged time metrics for a specified time interval. To solve these problems, it was decided to use Poisson distribution. Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events happening in fixed time intervals with a constant mean rate and independently from the last event time (it can also be applied to other metrics like distance). The formula for the Poisson Distribution is:
where e represents Euler’s number, x represents the number of event occurrences, λ is equal to the expected value of x also equal to its variance. This distribution is responsible for generating the number of transactions in the specified time interval, but each transaction must be timestamped. For this, it was decided to set the initial time of the considered interval and add to the initial time a randomly determined time within the interval. This technique allows you to randomly generate time for each transaction independently of each other.
This structure allows simulating the transaction rate close to the real one with the unstable placement of timestamps for transactions and unstable transaction rates within the specified average value. Using the Poisson distribution, it is possible to conveniently adjust the frequency of transactions to adjust the degree of activity of traders.
Poisson distribution effectively solves the issue of trader activity, but there remains one more aspect to consider – the volume of exchanged assets. For each transaction, it is necessary to establish the volume of the transferred asset. This task requires a comparative analysis of several mathematical models, a description of the principles of their work and situations in which they can be used.
Normal distribution is important in statistics and is often used in the natural and social sciences. According to the central limit theorem, under some conditions, the average of many samples of a random variable with a finite mean and variance is itself a random variable distribution of which is converging to a normal one with an increasing number of samples. The probability density function of this distribution is shown below:
where μ is the mean or expectation of the distribution, σ is the standard deviation, e is Euler’s constant. The probability density function of this distribution is also noted as (x).
The presented distribution can be used to generate the values of traders’ transactions, since setting the mean value of the distribution and its standard deviation makes it easy to manipulate the distribution. This makes it possible to easily simulate different market situations.
The problem with this distribution is the generation of negative values, which is impossible within transactions. To resolve this issue, the truncated normal distribution must be used:
where ψ(x) represents a probability density function of the “parent” general normal distribution with mean, variance, truncation interval represented by a and b. There is one more symbol requiring explanation – the Φ one.
Imagine a situation where we need to determine the probability that a distribution will generate a value less than a specified value of x. The calculation of this probability will be according to the following function:
which is called the cumulative distribution function that can be written as the integral of the “parent” general normal distribution formula where F(x) = Φ(x).
These formulas allow using a truncated normal distribution to generate positive transaction values that would correspond to a normal distribution over a limited range of values.
Log-normal distribution is the probability distribution of a random variable whose logarithm is normally distributed. Conform this distribution generated value x can be described by the formula:
where Z is a standard normal variable, μ represents distribution mean and σ – standard deviation. Considering that traders’ activity has extreme rises and drops, it is required to consider such a case, which is covered by this type of distribution.
Pareto distribution is used for generating values conforming to the famous “80 to 20” rule, meaning that 80% of taken samples will be present in the 20% values interval. This distribution is similar to the real trading distribution principle, which defines that in most of the cases transactions will be performed with small values (matching the distribution of wealth in society that was covered by Pareto). This distribution has the next formula:
where xm is a minimal possible value of x (also called as scale parameter) and shape parameter a. Compared to previous distributions it would be harder to simulate different market situations due to the harder interpretability of the variables.
In the case of normal distributions, unstable market situations can be covered with a bigger standard deviation. Traders’ desire to get rid of the token will lead to bigger mean value of the previous distributions. In the Pareto case, an unstable market situation can be covered with a bigger shape parameter and traders’ desire to get rid of the token will lead to a bigger scale parameter. Pareto is able to generate values higher than in the Cauchy case (reviewed next), meaning that this function requires the use of some mechanism that will “truncate” the distribution till specified upper bound.
Cauchy distribution has this probability density function:
where x0 is the locational parameter setting the location of the distribution peak and γ is the scale parameter that specifies the half-width and half-maximum. The γ is also equal to half of the interquartile range and is sometimes called the probable error. The problem of the presented distribution is that there could be only positive values and for generating values similar to the real transaction values will be used in the next formula:
where µ is a location parameter and σ is a scale parameter. There is still one problem remaining about Cauchy – it is able to give unrealistically big transaction values, meaning that there is a small chance that there will appear anomalous value that does not correspond to the real-world case. This problem was solved via “mapping” values mechanism, graphical representation of which can be understood from the given example:
where the generated value is representing the original Cauchy generated value, the limit demonstrates the upper bound of the possible values. Such an algorithm allows keeping the original Cauchy distribution almost unchanged (without breaking the probabilities) and producing values only of a specific limit.
Until this moment were specified four approaches to generating the transaction values and one approach for generating transaction frequency with random timestamps. It is required to connect those approaches together in one module able to simulate real-world transactions. Normal, Log-normal, Pareto and Cauchy distributions are being called (and implemented as) transaction value generators and Poisson distribution as transaction frequency generators. All transaction value generators are implemented following similar approaches in order to make them easy to integrate into one module and to allow switching between them if required. The Monte Carlo simulator first generates a number of transactions per specified time interval, defines timestamps for each transaction and at the second stage defines transaction values conform to chosen probability distribution functions. With all performed work there are still two problems that require solution:
The best correlations are between log-normal, Cauchy, and real distributions, which can be seen on the distributions presented below:
From the left to the right are log-normal distribution, Cauchy distribution, and real transaction values distribution. Considering that those distributions are able to match real-life distributions it is required to write an algorithm able to automatically pick the best parameters for specified distributions.
Considering that the best distributions are log-normal and Cauchy ones it was decided to write parameter picking algorithms that will be able to find the best parameters combination. The first problem that requires a solution – how the algorithm will pick the best possible parameters combination, considering that all probability distribution simulations generate different values and therefore distribution can have small deviations causing the probability of one launch to perform better than another one and in order to check overall efficiency it is required to perform check with multiple simulation runs (creating an average picture). Another moment is how an algorithm will check if one distribution is “similar” or “matching” another one. Harmonic mean formula:
is working for two parameters. It means a harmonic two-error formula can be used to define the best possible parameters combination. Taking into account the previously mentioned rule of “80 to 20” that can be applied to the social and economical processes, the first quartile and median are the most important elements of distributions. The compared distributions will be the simulated one and the real one. So the final representation of finding the error based on harmonic mean formula for two numbers is:
and the model will pick as best parameters the ones where the average error harmonic mean for all launches of the simulation will be minimal. It is possible to add a version that will find harmonic mean based on three numbers and more, but in this case, the formula will change and the required amount of calculations will be higher. It will cause a better match of the simulated distribution to the real one considering that in this case, it is possible to use for comparing 25th and 75th percentiles with medians of both distributions. There is a range of parameters iterating through which is performed via incrementing parameter from lower bound to upper one using a step parameter. All intermediate results (each parameter set and their average harmonic error) are saved and the smallest average harmonic mean error parameter pick is chosen. Below is presented an example of a distribution that picks the best parameter values considering the harmonic error. The distribution shows that there are two smallest error values and with higher resolution (in other words, with more iterations) the smaller will be the error.
Parameter search algorithm works by checking the specified range of values and the specified step of parameters. For better work with this algorithm, it is required to take a big range with a big step, then the algorithm should take a smaller range with smaller steps and higher simulations count.
Using the aforementioned mathematical models, it is possible to perform various market simulations and demonstrate the behavior of different groups of traders. The combination of these models, together with algorithms for finding the optimal parameters, make the simulations’ closer to the real traders behavior.
Despite the possibility of simulating different market behavior depending on factors, the generated transactions will not be able to demonstrate or cover fraudulent transactions, but they will perfectly be able to demonstrate attacks on the market if the correct parameters are specified.
If simulations make it possible to test the strength of the market in exact accordance with the indicated market situations, then the real history of transactions will allow testing various market controlling mechanisms to prevent fraudulent activities that besides negatively affecting the activities of certain groups of traders may also affect the market as a whole.