Concurrency issues with sale process() (and purchase)
In a production server we've found a situation with a sale where:
- There's a two shipments: one in "Done" and the other "Waiting"
- There's two invoices with the exact same quantities
Checking the create/write dates we see that the two invoices we see that the second invoice was created just a second after the first one. We also see that neither of them was modified by the users. Everything looks like if the process
of the sale was responsible for the duplication.
Thinking about it I think it is actually possible for that to happen because there're not the right locks in place.
Consider two concurrent processes (a) and (b), both of which execute the aforementioned process
method.
This method locks the whole sale table, but that does not happen at the very beginning of the transaction but a little bit afterwards.
What can happen is the following:
(a) BEGIN; (b) BEGIN; (a) Succeeds locking the table (a) Computes process and creates new invoices (a) COMMIT; (b) Succeeds locking the table because (a) has already commited (b) Computes process and creates new invoices again because its transasction started BEFORE (a) had finished, so it does not find the invoice lines created by (a) (b) COMMIT;
It seems the problem would still exist even if Tryton used SERIALIZABLE isolation level instead of REPEATABLE READ.
I've been thinking about it and the only solution I found so far is to have a mechanism by which trytond would open a transaction (1), force de lock (probably using lock_id), keep that transaction open until the end of the RPC request. Then, start a new (2) transaction "in parallel" to the one started. This second transaction is the standard one used to process all the request. Once finished the request finishes, first transation (2) is commited, and then transaction (1) is commited.
Given that the lock would be held before the transaction reading the data is started we can be sure that the second call to process() will either fail (because (1) cannot lock) or read the right data (because (2) is started after the lock has been acquired.