I was having almost the same exact problem.
I just managed to port some Atmel 32UC3 code that uses UDP. This code floods the local net with UDP packets. A few seconds generates 100s of 1K packets. (a minute is about 13MB in the wireshark pcap dump.)
There is not a lot online about lwip or the low level HAL_ETH. Most of the stuff is PDFs that simply harvest the Doxgen comments and list the calling functions. Most of the other sites are me too, broken links, Obsolete examples, or else they are phishing nets that have harvested old messages and preform necropicy. These show up as new (in the last month or 24 hours.) Then hit you with a browser redirect. I have learned not to click on anything foreign.
That said, the best advice was to tear apart lwip and use only the parts that one needs. There are two ways of handling Packet reception. Polling and interrupt. I chose the latter as it was an option in the CubeMX setup. One still has to poll for next packet availabe from the DMA system.
This is where wireshark comes in. Capture what is being sent, and see what is being missed. Or where the endian order is swapped.
What the mid level driver does is allocate a chain of linked buffers. When the HAL_ETH receives a packet it is in DMA. When I put the break point at the ISR. (took me several days of searching to find where the ISR was instantiated deep in the driver code) I found that the buffers pointed to many of the packets even though I only wanted one. Only the first DMA buffer was being served.
Deconstructing the lwip showed how this was copied into the buffer chain, the DMA buffer then needed to be released. Then when the next call happened the next buffer was issued.
Oh the wonders of object oriented programming and data hiding

I have not been much of a fan of C++ on embedded for this reason. But if one wants to use Arduino type code C++ then one has to pay the devil.
The nice thing is that I wanted to change the display from a 4linex20 SPI character display to a TFT ILI9341. I imported the Print class to my C++ project then attached it to the adafruit Arduino library. This then gave me access to Print functions. The code was calling C ASF functions to control a the SPI display. I wanted to use the display controller on the STM32F29 disco board. but there is a hardware bug on the chip that makes it impossible to use Ethernet MAC and the TFT display at the same time.
I wound up using the 407 discovery and the FMSC lcd controller with a 83848PHY. I have some EN28j60s that are I think SPI based. Have not tried that yet. This should work on a system with out a MAC like the blue or whatevercoloryouwantpill. Have not seen if there is an lwip implementation yet. Last year I got the low level to work on the youtube tutorial.
By using object oriented modules, I can now switch between the different Hardware configurations as needed.
As for the problem at hand:
There were provisions in the code to count packets and errors, then report them to the display. All I had to do was replace calls like DisplayLine() with a class called Display and a call like display->Line() ; The debug code print code was already written, That built the message in a vsprintf. all I had to do was enable it. Saved a week of frustration that way. The whole reason for object oriented programming. High learning curve, then easy re-use.
I found that my direct calls to HAL_ETH were not releasing the buffer once it was copied. I also found that when lwip needs a call to free the pbuf once one is done with it.
Most of the lwip examples are for TCP rather than UDP. Where web pages are being served in not real time. I often wonder why this code is called light weight, Personaly I think this should be called mwip as it is more middleweight than lightweight. I shudder to think what a what a heavy weight TCP stack would look like.