Building a DeFi arbitrage path finder (1)

How to find triangular arbitrage paths on Uniswap forked DEXs

MEV arbitrage

Created on June 16, 2023.

Starting out

Flashbots

I’ve been working on my DEX arbitrage bot for about a week now. As a result, I did manage to pull off my first few trades with a 5% success ratio - a very disappointing rate for most traders. But I only needed to check if this strategy had any potential. And if it did show any promise, I was going to give it the time it deserved and pursue it further.

The first bot I built was a proof of concept bot that simply used quotes to recognize price spreads across multiple pools on Arbitrum Uniswap V3, and placed multihop swap orders accordingly.

The execution layer wasn’t as difficult as it seemed at first, because there were numerous examples out there showing us exactly how we can send out order transactions onto the blockchain.

The below Github repos are among such examples:

The harder part was in understanding the atomicity of blockchain transactions, and why to ensure this I needed to use smart contracts to perform arbitrage strategies.

When I first grasped the concept of flashloans, it was eye opening, because there was no equivalent in legacy finance. But as revolutionizing as the tech is, I needed to learn to deploy smart contracts to ensure transaction atomicity. So I put this on hold for a while and moved on to another important aspect of arbitrage strategies - which I would like to talk about in this blog post today.

Arbitrage opportunity searching

The more difficult part of DEX arbitrage, for me, was in opportunity searching.

This, in my opinion, is the more important aspect of building profitable bots, but isn’t touched on that much. Building an execution engine will take a couple of days tops to study and also to build - but if “searching” isn’t done right, finding profitable paths will take an infinite amount of time.

When you are attempting arbitrage trades on one DEX with a couple of liquidity abundant tokens like WETH, USDC, USDT, the 3-way pairs you can create are quite limited. So you can simply hardcode the 3-way paths and retrieve price quotes to update the profitability of each path.

The difficult part comes when you are attempting a 4-way, 5-way, n-way path swaps in multiple different DEXs. On top of this, try adding lending, derivatives, or options trading in DeFi to the equation, your path finding will become impossible to do by hand - or even with a program.

So, before I began developing my execution engine any further, I decided to build a simple system that finds n-way arbitrage paths using multiple DEXs. This was to:

figure out if simple DEX arbitrage strategies were still profitable,
and if they weren’t anymore, under what conditions price spreads widened,
and to build a prototype simulation engine written in vectors to speed up my arbitrage path finding.

This project is still a work in progress, but I did release my scripts on Github to keep track of my work. Here’s the link to the repo:

👉 DeFi Path Finder Repo

How to use Path Finder

The project structure is pretty simple as it retrieves data from Subgraphs and bulk loads information about Uniswap fork pools.

First, clone the Github repo:

git clone https://github.com/solidquant/defi_path_finder/tree/main

Then, open up examples.py, where you will find the example code of using defi_path_finder.

from defi_path_finder import Preprocessor
from defi_path_finder import make_triangular_paths

if __name__ == '__main__':
    p = Preprocessor()
    pools, reserves = p.load_data(500)
    pairs, paths = make_triangular_paths(pools, [], 6)

The above code is everything that this project does.

You create a Preprocessor instance that collects subgraph data using the Collector instance. (preprocessor.py)

The important function here is the load_data(self, pool_cnt: int = 500) method call:

def load_data(self, pool_cnt: int = 500):
    # 1. Collect raw indexed data
    sushi = self.polygon_sushiswap_v2_pools(pool_cnt)
    mesh = self.polygon_meshswap_v2_pools()

    # 2. Create a numpy array by mapping token, exchange data to integers
    _arr = []

    for pools_df in [sushi, mesh]:
        pools = np.zeros((len(pools_df), 3), dtype=int)
        pools[:] = pools_df[['token0', 'token1', 'exchange']].values
        _arr.append(pools)

    pools_array = np.concatenate(_arr, axis=0)

    # 3. Create a numpy array containing reserves data for all existing pools
    reserves_array = np.zeros((
        len(self.TOKENS),
        len(self.TOKENS),
        len(self.EXCHANGES),
        2
    ), dtype=float)

    for pools_df in [sushi, mesh]:
        for _, row in pools_df.iterrows():
            t0 = int(row['token0'])
            t1 = int(row['token1'])
            e = int(row['exchange'])
            r0 = row['reserve0']
            r1 = row['reserve1']

            reserves_array[t0, t1, e, :] = [r0, r1]

    return pools_array, reserves_array

As seen from the code, it retrieves Sushiswap V2 pools data and Meshswap pools data.

It then creates a pools_array numpy array that looks like this:

array([[  0,   1,   0],
       [  2,   3,   0],
       [  4,   1,   0],
       ...,
       [  2, 449,   1],
       [ 10, 450,   1],
       [413, 451,   1]])

This is a mappings of (token0, token1, exchange) from both the exchanges.

The first column represents: token0. The other two are: token1, exchange.

So basically, the first row of pools_array will be: pool where token0 is 0, token1 is 1, and the exchange is 0. This maps to: USDC-KLIMA pool in Sushiswap V2 at 0x5786b267d35f9d011c4750e0b0ba584e1fdbead1

Also, check a debug breakpoint somewhere within that function and try out the below commands:

>> self.TOKENS
{'0x2791bca1f2de4661ed88a30c99a7a9449aa84174': {'id': 0, 'symbol': 'USDC', 'decimals': '6'},
'0x4e78011ce80ee02d2c3e649fb657e45898257815': {'id': 1, 'symbol': 'KLIMA', 'decimals': '9'},
... }

>> self.EXCHANGES
{'sushiswap_v2': 0, 'meshswap': 1}

>> self.POOLS['0_1_0']
'0x5786b267d35f9d011c4750e0b0ba584e1fdbead1'

You can go check this out at: https://polygonscan.com/address/0x5786b267d35f9d011c4750e0b0ba584e1fdbead1#code

Now that I have a mapper of all the pools in both the exchanges, what you want to do is call make_triangular_paths() function in utils.py. (utils.py)

This function will run in 2 steps.

make_3_pairs,
make_paths_array.

We will look at what each of these functions do.

1. make_3_pairs

This function will create a combination of trangular pairs for all the tokens that you are tracking. This means that, for example, if you had tokens: WETH, WMATIC, WBTC, USDC, USDT, you should be able to make pairs that look like:

WETH, WMATIC, WBTC
WETH, WMATIC, USDC
WETH, WMATIC, USDT
WMATIC, WBTC, USDC
WMATIC, WBTC, USDT
…

so on and so forth.

This can be easily done with Python as below:

  tokens = np.unique(data[:, :2]) # "data" is the "pools_array" from above
  combinations_list = list(combinations(tokens, 3))

Try debuggin these values:

>> tokens
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        ... ])

>> combinations_list
[(0, 1, 2), (0, 1, 3), ... ]

combinations_list has all the possible 3-token combinations we can use to perform triangular arbitrage.

The next step is to run _make_3_pairs_process function to get pairs data that looks like the below:

>> pairs.keys()
dict_keys([(395, 396, 397), (395, 396, 398), ... ])

# the array here has information on: token0, token1, exchange, token_in, token_out
>> pairs[(395, 396, 397)]
array([[395, 396,   1, 395, 396],
       [397, 396,   1, 397, 396],
       [395, 397,   1, 395, 397],
       [395, 396,   1, 396, 395],
       [397, 396,   1, 396, 397],
       [395, 397,   1, 397, 395]])

>> pairs[(0, 1, 2)]
array([[0, 1, 0, 0, 1],
       [2, 0, 0, 2, 0],
       [2, 1, 0, 2, 1],
       [2, 0, 1, 2, 0],
       [0, 1, 0, 1, 0],
       [2, 0, 0, 0, 2],
       [2, 1, 0, 1, 2],
       [2, 0, 1, 0, 2]])

Let’s take a look at pairs[(0, 1, 2)]. As you can see, you only need to look at the upper half of the array for unique pools information. The first 4 pools are:

0, 1, 0
2, 0, 0
2, 1, 0
2, 0, 1

pools from both exchanges 0 and 1. The latter two columns represent token_in, token_out. The additional columns are added to simplify the process of path building in the next step.

This function call will be run in parallel, because if you run it using one process, it will take an hour to finish. But using 6 processes will let you finish in 10 minutes. Also, the complexity of this process is directly related to how many combinations there are. So when you are doing 4-way, 5-way, n-way path findings, the time you have to input will grow exponentially.

The current method will work fine for triangular examples but will need improvements for other more complicated tasks.

2. make_paths_array

Now comes the most important part. Whereas the above function took a really long time to run, this process luckily takes a split second to finish.

Upon running:

paths = make_paths_array(pairs)

You will get an output like this:

>> paths
array([[[395, 396,   1, 395, 396],
        [397, 396,   1, 396, 397],
        [395, 397,   1, 397, 395]],

       [[395, 397,   1, 395, 397],
        [397, 396,   1, 397, 396],
        [395, 396,   1, 396, 395]],

        ... ]])

This is the 3-way path you are searching for. A close up would tell us more:

>> paths[0]
array([[395, 396,   1, 395, 396],
       [397, 396,   1, 396, 397],
       [395, 397,   1, 397, 395]])

Let’s look at the last 2 columns - which are token_in, token_out.

This swap path is as follows:

swap 395 to 396
       swap 396 to 397
              swap 397 to 395

You can also get the pool address by:

>> p.POOLS['395_396_1']
'0x2bab2b10e423491c12c449d99ad8aaed5c46fe4a'

>> p.POOLS['397_396_1']
'0x35456fd93f9e95fb059da27a18d6cc1266a1b3fb'

>> p.POOLS['395_397_1']
'0x3ee2a6db97f493f0221bb285f15e3021cdcaeccd'

What to do next?

Finding a 3-way path was straightforward. But the more difficult part is in:

simulating the swap paths,
extending the framework to work on n-way paths

In the next post, I would like to create a simple swap path simulation engine that accounts for market price impact.

I decided to build my own MEV bot. Here's how I'm going to do it.