Tryton server is vulnerable to malicious DoS / DDoS by exploiting the
get_login method.
Exploit:
Script provided can cause a DoS / DDoS
Discussion:
The current Tryton server stores unsuccessful login attempts in
a database table.
Currently, the function implements a timeout method for failed
login attempts, but only for those existing in the failed
logins table.
If the attacker uses non-existing, random account ids, the
failed timeout won't affect them.
Solution:
Implement the attached patch, that does not store the failed logins in a
database table (res_user_login_attempt)
It also implements a fixed timeout of 3 seconds for any invalid
login attempt, and removes the existing timeout of order 2^n .
There will be a configuration parameter (failed_login_timeout)
that defines the timeout in seconds. The default will be 3
seconds.
As we discussed yesterday, I don't think there is anything Tryton can do to prevent a (D)DoS. This should be handled by the sysadmin using iptables / ip-whatever.
I don't think neither that there is a real exploit especially because there are no factor of amplification for the data stored (we store only the login used).
Also I don't think it is a good reason to drop the good brute force security in place. Especially the fact that the succeeded login will not be waiting if there was previous failing attempt for the same login.
But indeed, it is true that the LoginAttempt table is not cleared for invalid login and there is not real benefit to keep them. So I think it will be a good new feature to add a cron task that clean the LoginAttempt record that are at least older than the session.timeout.
So if anybody disagree, I propose to change this issue into feature request.
I see no danger about publishing the script because anyone running a trytond publicly accessible should anyway monitor its traffic and its loggings to detect such bad behaviour and take measure long time before it could hurt by filling the disk.
Also anybody can put in its cron such query:
DELETE FROM res_user_login_attempt WHERE (NOW() - create_date) > '1 days'::INTERVAL
Dear all
This is a real security issue, and it's critical. Anyone, from anywhere, can take down the a tryton server in seconds / minutes by running the script in parallel.
Being able to write anonymously to a tryton table is a not a idea and is asking for trouble.
I am attaching some very basic metrics that I took briefly running a couple of instances of the program, just so we have a better idea of the impact / scale of the issue.
Please DO NOT release the PoC exploit script to the public, at least until we apply the attached patch to the server. That would be irresponsible. I sent this in confidence to be discussed among the security team only.
There are many Tryton and GNU Health implementations out there and I take this very seriously. I know you do too, so thank you for your understanding!
For me, performing a DDoS on an web application is not a security issue because any data are corrupted nor leaked.
Tryton as web application can not do anything to prevent DDoS, this has to be managed at the network level and/or at the server web level.
Also your patch doesn't fix any DDoS at all. An attacker will start new connection instead of waiting the 3 seconds.
About your metrics, I see no useful metrics. What is the rate of disk space filled? In how must time will a standard disk size be filled? I have the feeling that the log file is growing faster that the database because it is plain text with much more data.
I'm pretty sure that running the SQL query I gave when alert about disk space is raised is very enough.
Traceback (most recent call last):
File "/trytond/protocols/jsonrpc.py", line 162, in _marshaled_dispatch
response['result'] = dispatch_method(method, params)
File "/trytond/protocols/jsonrpc.py", line 191, in _dispatch
res = dispatch(*args)
File "/trytond/protocols/dispatcher.py", line 186, in dispatch
Cache.resets(database_name)
File "/trytond/cache.py", line 112, in resets
where=table.name == name))
File "/trytond/backend/postgresql/database.py", line 294, in execute
return self.cursor.execute(sql, params)
TransactionRollbackError: no se pudo serializar el acceso debido a un update concurrente
This discussion is taking way longer that what it took me to detect the issue, create the PoC and generate the patch.... so hopefully we can finally move on to other things soon.
Cedric, I'm not talking about web applications in general. I am focused on very specific problem, in a very specific context, that I have already explained.
The current design on the get_login method has problems, and it's allowing to rapidly exhaust / consume the system resources, because you are allowing anonymous write to the database system table, with no timeout on wrong login.
As I said previously, the sample metrics I collected are just that, a sample. Not scientific, but enough to give you an idea of the magnitude .
For instance, with just a few parallel processes, we can easily generate 50 records / second, which accounts to 4.320.000 (over 4 million needless, hard DB operations as writes and deletes) records per day, and, in doing so, generates a very high, sustained system load. So it's not just about "filling the disc", but also about the resources consumed in doing so.
Please don't tell me that you want to mitigate this with a cron job that deletes the records...
Nico : As we discussed yesterday, of course using ipfilter / iptables as a complement will help, as it would changing the default port, etc... but those are *additional* security measures, that are not to replace our own Tryton server design in terms of security, in a similar way that we don't rely completely on ipfilter / iptables for the right functioning of sshd.
The patch that I have attached solves the critical issue on this specific context. Of course it does not eliminate all possible DoS scenarios, but it does a pretty good job on fixing this issue.
Just apply the patch, and run the same PoC exploit scripts. Compare the system load, the responsiveness of the system before and after the patch. You will see the difference, that backups my position.
Finally, we need to deliver the solution for this issue to the Tryton and GNU Health installations. Some of these installations are mission critical (ie, deal with people health and lives) that need to be 24x7.
I would love to see it implemented in the standard Tryton server. That will be the best solution. We will always have time to improve security features for upcoming releases. We need Tryton to be a solid application from all points of view, and we will make it.
Hi, just for the record, before applying the patch, using the tryton client was very slow, and with the TransactionRollbackError being displayed too often.
After applying the path, Tryton noticeably improved speed, and i couldn't reproduce the error anymore.
Luis, we can not accept your patch because it reduces the security level of Tryton. With your patch there is no need for an attacker to wait more than a few millisecond (and certainly not the 3 seconds) for the login answer because he can get guess after few millisecond that the login was wrong. He will start a new query directly without waiting the 3 seconds.
So this means that the sleep of 3 seconds is useless and so there will be no more any protection against brute force attack against the login mechanism.
This was already explained in msg24650 and msg24615.
If you care about high availability, you must not rely on trytond only to be protected. You must add standard solutions that protect web services against DoS and monitor the healthiness of your services.
I will provide on a feature request, a patch that adds a task to clean the login attempt regularly.
Very sorry to hear this reaction and the lack of solid arguments, and lack of good alternatives.
Not applying the patch (or an acceptable alternative) certainly makes Tryton vulnerable, which forces us to create a security advisory for GNU Health, along with the currently proposed patch. Not the ideal situation, but I have the moral duty to protect the GNU Health community.
I hope we can come to an agreement on this before thursday. of course, I'm open to suggestions.
I don't see how you can say we did not provide solid arguments when we provided many times them which were never countered.
Now, if you want us to name one good alternatives,I will say fail2ban [1]. It will be great if someone could provide a good set of rules for Tryton. We will be happy to publish them.
I will warn you a last time to not apply your patch on GNU Health because it will weaken the protection Tryton has against brute force attack (as explained many times).
Finally, I would like to say that trytond has by default a limitation on concurrent connections (inherited from PostgreSQL) and this limitation can easily be reached and provoke a DoS. It is not necessary linked to the login method but to any RPC calls. So I recommend to anyone to run trytond in a private network or to use external protection. It is not the goal of Tryton to write such security tools especially when good one exists and because this can not be correctly managed at the application level but only at the OS level.
Maybe we should add a paragraph in the documentation to recommend the usage of such protection.
>Very sorry to hear this reaction and the lack of solid arguments, and
>lack of good alternatives.
I'll try to sum up the arguments because everybody accuses everybody
of not having solid arguments and that's getting on my nerves.
To sum up your argument: a table is filled with data coming from
unsuccessful attempts to authenticate a user in the system. This will
fill up the disk and thus make the system unusable.
To sum up our argument: DoS can not be mitigated at the application
level but should be managed on the IP level. Moreover the path
supplied do not fix anything on the DoS level because a smart attacker
would send an authentication request and drop the connection WITHOUT
waiting for the 3 seconds and do that a thousand of time.
So we're talking about a network kind of DoS while you're talking
about another kind of DoS attack.
About this second kind of DoS attack (the filling the disk one), my
opinion is that it's not a real problem.
Either the attacker tries to go unnoticed and fills the database with
records with a script generating gigabytes of data in a few hours,
this will not work because the 'add' method of the LoginAttempt object
removes every stalled record present since longer than 'delay()'
(which is the session timeout).
Either the attacker wants to fill the disk in less than 'delay()' (by
default 10 minutes), so he will have to generate several gigabytes per
minute. He will have to hit hard the server. And this is exactly the
same as the networking DoS that we are talking about. You can only
prevent that by monitoring your system and using (for example
fail2ban). Moreover keep in mind that the number of concurrent request
on tryton in limited by the number of concurrent request to the
database, so even distributing the attack will hit this wall and won't
be able to multiply its attack vector.
>Not applying the patch (or an acceptable alternative) certainly makes
>Tryton vulnerable, which forces us to create a security advisory for
>GNU Health, along with the currently proposed patch. Not the ideal
>situation, but I have the moral duty to protect the GNU Health
>community.
Of course, we do understand your duty to protect GNU Health from what
you perceive as an DoS attack vector. But I don't think Tryton is
vulnerable to the attack you're describing (as I explain in the
paragraphs above) and moreover the proposed patch soften the brute
force attack mitigation in place.