Retries on gateway timeout
Some requests may take a lot of time (ex: CSV export) but the WSGI server or reverse-proxy has often a timeout to prevent to keep staling requests. This result in a 504 Gateway Timeout when the timeout is raised by a proxy. When it is the WSGI server (like gunicorn) usually it is a 502 Bad Gateway because the worker is killed. But when it is a 504, often the worker keeps working on the request and it just fails when trying to write the response.
The idea would be to be able to resume such timeout-ed request.
For that the client could include in the request header a unique key for the request (I do not think we should rely on JSON-RPC id
, but on UUID). When the client receive a 504, it waits for the Retry-After
and send a resume request with the UUID. The client keep retrying until receive an answer, after n retries, the client should ask the user if it still needs to retry.
On the server-side, a configuration stores the timeout of the proxy. When the response takes more than this timeout, it is stored in a temporary table using the UUID from the header. This table is used by the resume entrypoint. Once the response has been consumed, it is deleted from the table.
A cron task will clean older response that have not been read after a defined period.
One drawback is if the WSGI worker is killed when processing the request but a 504 has already been answered by the proxy. In this case the client may try to resume the request indefinitely. This is why I think after n retries, the client must ask to the user if he wants to keep resuming.