Is there a way to limit the number of concurrent requests from one IP with Gunicorn?

2020-02-03 python nginx flask gunicorn

Basically I'm running a Flask web server that crunches a bunch of data and sends it back to the user. We aren't expecting many users ~60, but I've noticed what could be an issue with concurrency. Right now, if I open a tab and send a request to have some data crunched, it takes about 30s, for our application that's ok.

If I open another tab and send the same request at the same time, unicorn will do it concurrently, this is great if we have two seperate users making two seperate requests. But what happens if I have one user open 4 or 8 tabs and send the same request? It backs up the server for everyone else, is there a way I can tell Gunicorn to only accept 1 request at a time from the same IP?


This is probably NOT best handled at the flask level. But if you had to do it there, then it turns out someone else already designed a flask plugin to do just this:

If a request takes at least 30s then make your limit by address for one request every 30s. This will solve the issue of impatient users obsessively clicking instead of waiting for a very long process to finish.

This isn't exactly what you requested, since it means that longer/shorter requests may overlap and allow multiple requests at the same time, which doesn't fully exclude the behavior you describe of multiple tabs, etc. That said, if you are able to tell your users to wait 30 seconds for anything, it sounds like you are in the drivers seat for setting UX expectations. Probably a good wait/progress message will help too if you can build an asynchronous server interaction.

A better solution to the answer by @jon would be limiting the access by your web server instead of the application server. A good way would always be to have separation between the responsibilities to be carried out by the different layers of your application. Ideally, the application server, flask should not have any configuration for the limiting or anything to do with from where the requests are coming. The responsibility of the web server, in this case nginx is to route the request based on certain parameters to the right client. The limiting should be done at this layer.

Now, coming to the limiting, you could do it by using the limit_req_zone directive in the http block config of nginx

http {
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;


server {


    location / {
        limit_req zone=one burst=5;
        proxy_pass ...

where, binary_remote_addris the IP of the client and not more than 1 request per second at an average is allowed, with bursts not exceeding 5 requests.

Pro-tip: Since the subsequent requests from the same IP would be held in a queue, there is a good chance of nginx timing out. Hence, it would be advisable to have a better proxy_read_timeout and if the reports take longer then also adjusting the timeout of gunicorn

Documentation of limit_req_zone

A blog post by nginx on rate limiting can be found here