Building a DeFi arbitrage path finder (1)
How to find triangular arbitrage paths on Uniswap forked DEXs
Created on June 16, 2023.
Table of Contents
Starting out
I’ve been working on my DEX arbitrage bot for about a week now. As a result, I did manage to pull off my first few trades with a 5% success ratio - a very disappointing rate for most traders. But I only needed to check if this strategy had any potential. And if it did show any promise, I was going to give it the time it deserved and pursue it further.
The first bot I built was a proof of concept bot that simply used quotes to recognize price spreads across multiple pools on Arbitrum Uniswap V3, and placed multihop swap orders accordingly.
The execution layer wasn’t as difficult as it seemed at first, because there were numerous examples out there showing us exactly how we can send out order transactions onto the blockchain.
The below Github repos are among such examples:
- Rusty Sando
- Artemis: Paradigmxyz’s Opensea/Sudoswap arbitrage bot
- Degenbot: Uniswap V2/V3 arbitrage bot
- Subway: Uniswap V2 sandwich MEV bot written in TX
The harder part was in understanding the atomicity of blockchain transactions, and why to ensure this I needed to use smart contracts to perform arbitrage strategies.
When I first grasped the concept of flashloans, it was eye opening, because there was no equivalent in legacy finance. But as revolutionizing as the tech is, I needed to learn to deploy smart contracts to ensure transaction atomicity. So I put this on hold for a while and moved on to another important aspect of arbitrage strategies - which I would like to talk about in this blog post today.
Arbitrage opportunity searching
The more difficult part of DEX arbitrage, for me, was in opportunity searching.
This, in my opinion, is the more important aspect of building profitable bots, but isn’t touched on that much. Building an execution engine will take a couple of days tops to study and also to build - but if “searching” isn’t done right, finding profitable paths will take an infinite amount of time.
When you are attempting arbitrage trades on one DEX with a couple of liquidity abundant tokens like WETH, USDC, USDT, the 3-way pairs you can create are quite limited. So you can simply hardcode the 3-way paths and retrieve price quotes to update the profitability of each path.
The difficult part comes when you are attempting a 4-way, 5-way, n-way path swaps in multiple different DEXs. On top of this, try adding lending, derivatives, or options trading in DeFi to the equation, your path finding will become impossible to do by hand - or even with a program.
So, before I began developing my execution engine any further, I decided to build a simple system that finds n-way arbitrage paths using multiple DEXs. This was to:
- figure out if simple DEX arbitrage strategies were still profitable,
- and if they weren’t anymore, under what conditions price spreads widened,
- and to build a prototype simulation engine written in vectors to speed up my arbitrage path finding.
This project is still a work in progress, but I did release my scripts on Github to keep track of my work. Here’s the link to the repo:
How to use Path Finder
The project structure is pretty simple as it retrieves data from Subgraphs and bulk loads information about Uniswap fork pools.
First, clone the Github repo:
git clone https://github.com/solidquant/defi_path_finder/tree/main
Then, open up examples.py, where you will find the example code of using defi_path_finder.
from defi_path_finder import Preprocessor
from defi_path_finder import make_triangular_paths
if __name__ == '__main__':
p = Preprocessor()
pools, reserves = p.load_data(500)
pairs, paths = make_triangular_paths(pools, [], 6)
The above code is everything that this project does.
You create a Preprocessor instance that collects subgraph data using the Collector instance. (preprocessor.py)
The important function here is the load_data(self, pool_cnt: int = 500) method call:
def load_data(self, pool_cnt: int = 500):
# 1. Collect raw indexed data
sushi = self.polygon_sushiswap_v2_pools(pool_cnt)
mesh = self.polygon_meshswap_v2_pools()
# 2. Create a numpy array by mapping token, exchange data to integers
_arr = []
for pools_df in [sushi, mesh]:
pools = np.zeros((len(pools_df), 3), dtype=int)
pools[:] = pools_df[['token0', 'token1', 'exchange']].values
_arr.append(pools)
pools_array = np.concatenate(_arr, axis=0)
# 3. Create a numpy array containing reserves data for all existing pools
reserves_array = np.zeros((
len(self.TOKENS),
len(self.TOKENS),
len(self.EXCHANGES),
2
), dtype=float)
for pools_df in [sushi, mesh]:
for _, row in pools_df.iterrows():
t0 = int(row['token0'])
t1 = int(row['token1'])
e = int(row['exchange'])
r0 = row['reserve0']
r1 = row['reserve1']
reserves_array[t0, t1, e, :] = [r0, r1]
return pools_array, reserves_array
As seen from the code, it retrieves Sushiswap V2 pools data and Meshswap pools data.
It then creates a pools_array numpy array that looks like this:
array([[ 0, 1, 0],
[ 2, 3, 0],
[ 4, 1, 0],
...,
[ 2, 449, 1],
[ 10, 450, 1],
[413, 451, 1]])
This is a mappings of (token0, token1, exchange) from both the exchanges.
The first column represents: token0. The other two are: token1, exchange.
So basically, the first row of pools_array will be: pool where token0 is 0, token1 is 1, and the exchange is 0. This maps to: USDC-KLIMA pool in Sushiswap V2 at 0x5786b267d35f9d011c4750e0b0ba584e1fdbead1
Also, check a debug breakpoint somewhere within that function and try out the below commands:
>> self.TOKENS
{'0x2791bca1f2de4661ed88a30c99a7a9449aa84174': {'id': 0, 'symbol': 'USDC', 'decimals': '6'},
'0x4e78011ce80ee02d2c3e649fb657e45898257815': {'id': 1, 'symbol': 'KLIMA', 'decimals': '9'},
... }
>> self.EXCHANGES
{'sushiswap_v2': 0, 'meshswap': 1}
>> self.POOLS['0_1_0']
'0x5786b267d35f9d011c4750e0b0ba584e1fdbead1'
You can go check this out at: https://polygonscan.com/address/0x5786b267d35f9d011c4750e0b0ba584e1fdbead1#code
Now that I have a mapper of all the pools in both the exchanges, what you want to do is call make_triangular_paths() function in utils.py. (utils.py)
This function will run in 2 steps.
- make_3_pairs,
- make_paths_array.
We will look at what each of these functions do.
1. make_3_pairs
This function will create a combination of trangular pairs for all the tokens that you are tracking. This means that, for example, if you had tokens: WETH, WMATIC, WBTC, USDC, USDT, you should be able to make pairs that look like:
- WETH, WMATIC, WBTC
- WETH, WMATIC, USDC
- WETH, WMATIC, USDT
- WMATIC, WBTC, USDC
- WMATIC, WBTC, USDT
- …
so on and so forth.
This can be easily done with Python as below:
tokens = np.unique(data[:, :2]) # "data" is the "pools_array" from above
combinations_list = list(combinations(tokens, 3))
Try debuggin these values:
>> tokens
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
... ])
>> combinations_list
[(0, 1, 2), (0, 1, 3), ... ]
combinations_list has all the possible 3-token combinations we can use to perform triangular arbitrage.
The next step is to run _make_3_pairs_process function to get pairs data that looks like the below:
>> pairs.keys()
dict_keys([(395, 396, 397), (395, 396, 398), ... ])
# the array here has information on: token0, token1, exchange, token_in, token_out
>> pairs[(395, 396, 397)]
array([[395, 396, 1, 395, 396],
[397, 396, 1, 397, 396],
[395, 397, 1, 395, 397],
[395, 396, 1, 396, 395],
[397, 396, 1, 396, 397],
[395, 397, 1, 397, 395]])
>> pairs[(0, 1, 2)]
array([[0, 1, 0, 0, 1],
[2, 0, 0, 2, 0],
[2, 1, 0, 2, 1],
[2, 0, 1, 2, 0],
[0, 1, 0, 1, 0],
[2, 0, 0, 0, 2],
[2, 1, 0, 1, 2],
[2, 0, 1, 0, 2]])
Let’s take a look at pairs[(0, 1, 2)]. As you can see, you only need to look at the upper half of the array for unique pools information. The first 4 pools are:
- 0, 1, 0
- 2, 0, 0
- 2, 1, 0
- 2, 0, 1
pools from both exchanges 0 and 1. The latter two columns represent token_in, token_out. The additional columns are added to simplify the process of path building in the next step.
This function call will be run in parallel, because if you run it using one process, it will take an hour to finish. But using 6 processes will let you finish in 10 minutes. Also, the complexity of this process is directly related to how many combinations there are. So when you are doing 4-way, 5-way, n-way path findings, the time you have to input will grow exponentially.
The current method will work fine for triangular examples but will need improvements for other more complicated tasks.
2. make_paths_array
Now comes the most important part. Whereas the above function took a really long time to run, this process luckily takes a split second to finish.
Upon running:
paths = make_paths_array(pairs)
You will get an output like this:
>> paths
array([[[395, 396, 1, 395, 396],
[397, 396, 1, 396, 397],
[395, 397, 1, 397, 395]],
[[395, 397, 1, 395, 397],
[397, 396, 1, 397, 396],
[395, 396, 1, 396, 395]],
... ]])
This is the 3-way path you are searching for. A close up would tell us more:
>> paths[0]
array([[395, 396, 1, 395, 396],
[397, 396, 1, 396, 397],
[395, 397, 1, 397, 395]])
Let’s look at the last 2 columns - which are token_in, token_out.
This swap path is as follows:
swap 395 to 396
swap 396 to 397
swap 397 to 395
You can also get the pool address by:
>> p.POOLS['395_396_1']
'0x2bab2b10e423491c12c449d99ad8aaed5c46fe4a'
>> p.POOLS['397_396_1']
'0x35456fd93f9e95fb059da27a18d6cc1266a1b3fb'
>> p.POOLS['395_397_1']
'0x3ee2a6db97f493f0221bb285f15e3021cdcaeccd'
What to do next?
Finding a 3-way path was straightforward. But the more difficult part is in:
- simulating the swap paths,
- extending the framework to work on n-way paths
In the next post, I would like to create a simple swap path simulation engine that accounts for market price impact.
Previous
I decided to build my own MEV bot. Here's how I'm going to do it.