Tiling GPUs get a significant performance boost by doing many draws with "tiles" of the render target in internal fast memory. Yet there are many GL patterns seen in games which would cause a naive GL driver to need to split up batches (tile passes) and move render target data back to system memory, and then later back into the tile buffer. These can include mid-batch UBO/texture/etc updates, unnecessary FBO switches, etc.
I have been working on batch/resource dependency tracking, plus resource shadowing, in freedreno to avoid splitting batches and reduce the number of tile passes, which so far has lead to a 10-20+% fps boost in many games. But it is also important that the additional bookkeeping is lightweight, so as to not introduce significant overhead itself.
I will cover a few different approaches that I tried, along with more detail about the approach I eventually settled on, along with benchmark results.