Server vulnerability in get_login

added 1 deleted label

Patch submitted

added 1 deleted label and removed 1 deleted label

Hello Luis,

As we discussed yesterday, I don't think there is anything Tryton can do to prevent a (D)DoS. This should be handled by the sysadmin using iptables / ip-whatever.

made the issue visible to everyone

added 1 deleted label and removed 1 deleted label

closed

I don't think neither that there is a real exploit especially because there are no factor of amplification for the data stored (we store only the login used).
Also I don't think it is a good reason to drop the good brute force security in place. Especially the fact that the succeeded login will not be waiting if there was previous failing attempt for the same login.

But indeed, it is true that the LoginAttempt table is not cleared for invalid login and there is not real benefit to keep them. So I think it will be a good new feature to add a cron task that clean the LoginAttempt record that are at least older than the session.timeout.

So if anybody disagree, I propose to change this issue into feature request.

I see no danger about publishing the script because anyone running a trytond publicly accessible should anyway monitor its traffic and its loggings to detect such bad behaviour and take measure long time before it could hurt by filling the disk.

Also anybody can put in its cron such query:

DELETE FROM res_user_login_attempt WHERE (NOW() - create_date) > '1 days'::INTERVAL

made the issue confidential

added 1 deleted label and removed 1 deleted label

reopened

Dear all
This is a real security issue, and it's critical. Anyone, from anywhere, can take down the a tryton server in seconds / minutes by running the script in parallel.

Being able to write anonymously to a tryton table is a not a idea and is asking for trouble.

I am attaching some very basic metrics that I took briefly running a couple of instances of the program, just so we have a better idea of the impact / scale of the issue.

Please DO NOT release the PoC exploit script to the public, at least until we apply the attached patch to the server. That would be irresponsible. I sent this in confidence to be discussed among the security team only.

There are many Tryton and GNU Health implementations out there and I take this very seriously. I know you do too, so thank you for your understanding!

All the best,
Luis

For me, performing a DDoS on an web application is not a security issue because any data are corrupted nor leaked.

Tryton as web application can not do anything to prevent DDoS, this has to be managed at the network level and/or at the server web level.

Also your patch doesn't fix any DDoS at all. An attacker will start new connection instead of waiting the 3 seconds.

About your metrics, I see no useful metrics. What is the rate of disk space filled? In how must time will a standard disk size be filled? I have the feeling that the log file is growing faster that the database because it is plain text with much more data.
I'm pretty sure that running the SQL query I gave when alert about disk space is raised is very enough.

Running the script i get

Traceback (most recent call last):
File "/trytond/protocols/jsonrpc.py", line 162, in _marshaled_dispatch
response['result'] = dispatch_method(method, params)
File "/trytond/protocols/jsonrpc.py", line 191, in _dispatch
res = dispatch(*args)
File "/trytond/protocols/dispatcher.py", line 186, in dispatch
Cache.resets(database_name)
File "/trytond/cache.py", line 112, in resets
where=table.name == name))
File "/trytond/backend/postgresql/database.py", line 294, in execute
return self.cursor.execute(sql, params)
TransactionRollbackError: no se pudo serializar el acceso debido a un update concurrente

Can we do something to prevent this error ?

Dear all

This discussion is taking way longer that what it took me to detect the issue, create the PoC and generate the patch.... so hopefully we can finally move on to other things soon.

Cedric, I'm not talking about web applications in general. I am focused on very specific problem, in a very specific context, that I have already explained.

The current design on the get_login method has problems, and it's allowing to rapidly exhaust / consume the system resources, because you are allowing anonymous write to the database system table, with no timeout on wrong login.

As I said previously, the sample metrics I collected are just that, a sample. Not scientific, but enough to give you an idea of the magnitude .

For instance, with just a few parallel processes, we can easily generate 50 records / second, which accounts to 4.320.000 (over 4 million needless, hard DB operations as writes and deletes) records per day, and, in doing so, generates a very high, sustained system load. So it's not just about "filling the disc", but also about the resources consumed in doing so.

Please don't tell me that you want to mitigate this with a cron job that deletes the records...

Nico : As we discussed yesterday, of course using ipfilter / iptables as a complement will help, as it would changing the default port, etc... but those are *additional* security measures, that are not to replace our own Tryton server design in terms of security, in a similar way that we don't rely completely on ipfilter / iptables for the right functioning of sshd.

The patch that I have attached solves the critical issue on this specific context. Of course it does not eliminate all possible DoS scenarios, but it does a pretty good job on fixing this issue.

Just apply the patch, and run the same PoC exploit scripts. Compare the system load, the responsiveness of the system before and after the patch. You will see the difference, that backups my position.

Finally, we need to deliver the solution for this issue to the Tryton and GNU Health installations. Some of these installations are mission critical (ie, deal with people health and lives) that need to be 24x7.
I would love to see it implemented in the standard Tryton server. That will be the best solution. We will always have time to improve security features for upcoming releases. We need Tryton to be a solid application from all points of view, and we will make it.

Thank you again.

Hi, just for the record, before applying the patch, using the tryton client was very slow, and with the TransactionRollbackError being displayed too often.

After applying the path, Tryton noticeably improved speed, and i couldn't reproduce the error anymore.

Regards !

Luis, we can not accept your patch because it reduces the security level of Tryton. With your patch there is no need for an attacker to wait more than a few millisecond (and certainly not the 3 seconds) for the login answer because he can get guess after few millisecond that the login was wrong. He will start a new query directly without waiting the 3 seconds.
So this means that the sleep of 3 seconds is useless and so there will be no more any protection against brute force attack against the login mechanism.

This was already explained in msg24650 and msg24615.

If you care about high availability, you must not rely on trytond only to be protected. You must add standard solutions that protect web services against DoS and monitor the healthiness of your services.

I will provide on a feature request, a patch that adds a task to clean the login attempt regularly.

Very sorry to hear this reaction and the lack of solid arguments, and lack of good alternatives.

Not applying the patch (or an acceptable alternative) certainly makes Tryton vulnerable, which forces us to create a security advisory for GNU Health, along with the currently proposed patch. Not the ideal situation, but I have the moral duty to protect the GNU Health community.

I hope we can come to an agreement on this before thursday. of course, I'm open to suggestions.

I don't see how you can say we did not provide solid arguments when we provided many times them which were never countered.
Now, if you want us to name one good alternatives,I will say fail2ban [1]. It will be great if someone could provide a good set of rules for Tryton. We will be happy to publish them.
I will warn you a last time to not apply your patch on GNU Health because it will weaken the protection Tryton has against brute force attack (as explained many times).
Finally, I would like to say that trytond has by default a limitation on concurrent connections (inherited from PostgreSQL) and this limitation can easily be reached and provoke a DoS. It is not necessary linked to the login method but to any RPC calls. So I recommend to anyone to run trytond in a private network or to use external protection. It is not the goal of Tryton to write such security tools especially when good one exists and because this can not be correctly managed at the application level but only at the OS level.

Maybe we should add a paragraph in the documentation to recommend the usage of such protection.

[1] https://en.wikipedia.org/wiki/Fail2ban

* Luis Falcon [2016-03-09 00:24 +0100]:

>Very sorry to hear this reaction and the lack of solid arguments, and
>lack of good alternatives.

I'll try to sum up the arguments because everybody accuses everybody
of not having solid arguments and that's getting on my nerves.

To sum up your argument: a table is filled with data coming from
unsuccessful attempts to authenticate a user in the system. This will
fill up the disk and thus make the system unusable.

To sum up our argument: DoS can not be mitigated at the application
level but should be managed on the IP level. Moreover the path
supplied do not fix anything on the DoS level because a smart attacker
would send an authentication request and drop the connection WITHOUT
waiting for the 3 seconds and do that a thousand of time.

So we're talking about a network kind of DoS while you're talking
about another kind of DoS attack.

About this second kind of DoS attack (the filling the disk one), my
opinion is that it's not a real problem.

Either the attacker tries to go unnoticed and fills the database with
records with a script generating gigabytes of data in a few hours,
this will not work because the 'add' method of the LoginAttempt object
removes every stalled record present since longer than 'delay()'
(which is the session timeout).

Either the attacker wants to fill the disk in less than 'delay()' (by
default 10 minutes), so he will have to generate several gigabytes per
minute. He will have to hit hard the server. And this is exactly the
same as the networking DoS that we are talking about. You can only
prevent that by monitoring your system and using (for example
fail2ban). Moreover keep in mind that the number of concurrent request
on tryton in limited by the number of concurrent request to the
database, so even distributing the attack will hit this wall and won't
be able to multiply its attack vector.

>Not applying the patch (or an acceptable alternative) certainly makes
>Tryton vulnerable, which forces us to create a security advisory for
>GNU Health, along with the currently proposed patch. Not the ideal
>situation, but I have the moral duty to protect the GNU Health
>community.

Of course, we do understand your duty to protect GNU Health from what
you perceive as an DoS attack vector. But I don't think Tryton is
vulnerable to the attack you're describing (as I explain in the
paragraphs above) and moreover the proposed patch soften the brute
force attack mitigation in place.

Dear Cedric, dear all

Just to summarize and keep the focus:

* We all agree that external applications / firewall rules are
excellent *additional* security measures, and that they should be in
place, but that is not the subject.

* This vulnerability is a design flaw that allows to write
anonymously millions of records in a very short time, quickly
consuming the host resources (CPU, DB ...) . I've provided a solution.
If you come up with a better one, we will be the happiest, but it has
to be a solid one, that can not rely on exposing the server to writing
anonymous requests to a DB table, or having cron job to perdiodically
"clean up the mess".

And that is about this vulnerability. Believe me, I am the first one
that does not want to create a separate solution from the standard
Tryton. It's not good to any of our communities, and I am doing
my best to avoid it, so please let's work together. That said, a
solution must delivered ASAP. I also think we need to discuss this with
other core members, and get their feedback and suggestions. Up to now
it's been just Nico, you, Sebas and I who have been discussing it.

Out of topic / unrelated issues to this vulnerability :

* You have brought the topic of password guessing by brute force
attacks : This is about implementing good policies, and being
consistent from bottom up. Let me explain: The first place where that
should be imposed is at server level. Currently, I don't see that you
have taken any measure to avoid brute force attacks in one of the most
important resources, the server super user password (the owner of all
databases). No failed login timeout, not good password
enforcement, nothing.

In GNU Health, it's been years that have imposed good password
enforcement using cracklib with the serverpass utility[1]. I have
recommended the use for the standard tryton installation, but, again,
has not been adopted. For GNU Health 3.2, we will include this type
of measures also at user level, along with password expiry rules,
password minimum length, and so on. I hope this time it can be part
of the standard.

* Integrating Trytond with PAM[2] . As I mentioned to Nico, for upcoming
versions, we could investigate intregrating some aspects of Trytond
with PAM, as sshd, PostgreSQL or Apache do. There even seems to be a
python_pam module.

I'll be happy to schedule a conference within this week to find a
solution to this immediate issue, as well as to create a team dedicated
to security both in Tryton and in GNU Health, that can put time into
this very important area.

All the best,

1.- https://en.wikibooks.org/wiki/GNU_Health/Security
2.- http://tldp.org/HOWTO/User-Authentication-HOWTO/x115.html

On Wed, 09 Mar 2016 09:28:07 +0100
Cédric Krier @tryton.org> wrote:

> Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
> I don't see how you can say we did not provide solid arguments when
> we provided many times them which were never countered. Now, if you
> want us to name one good alternatives,I will say fail2ban [1]. It
> will be great if someone could provide a good set of rules for
> Tryton. We will be happy to publish them. I will warn you a last time
> to not apply your patch on GNU Health because it will weaken the
> protection Tryton has against brute force attack (as explained many
> times). Finally, I would like to say that trytond has by default a
> limitation on concurrent connections (inherited from PostgreSQL) and
> this limitation can easily be reached and provoke a DoS. It is not
> necessary linked to the login method but to any RPC calls. So I
> recommend to anyone to run trytond in a private network or to use
> external protection. It is not the goal of Tryton to write such
> security tools especially when good one exists and because this can
> not be correctly managed at the application level but only at the OS
> level.
>
> Maybe we should add a paragraph in the documentation to recommend the
> usage of such protection.
>
> [1] https://en.wikipedia.org/wiki/Fail2ban
>
> _______________________________________________
> Tryton issue tracker @tryton.org>
> <https://bugs.tryton.org/#5375>
> _______________________________________________
>

> To sum up your argument: a table is filled with data coming from
> unsuccessful attempts to authenticate a user in the system. This will
> fill up the disk and thus make the system unusable.
YES. That is the issue. BTW, it's not just millions of records "filling
up the disc", the also but the huge system load on it.

If you keep mixing password guessing using brute force with
resource exhaustion/DoS, we will get nowhere.

Putting the security of Tryton just in third party solutions is like
leaving your house open because there is police in Belgium. Is the same
of telling postgresql, SAP or any other solution to avoid using their
own security protection methods.... makes no sense.

In other words: "Cuidate, que yo te cuidaré"

Finally.... Is it that hard to admit that there is a design issue that
must be fixed ? It happens in all systems, all the time. We ALL make
mistakes. Admitting the vulnerability, and getting away from the state
of denial that you seem to be immersed is the first step to get out of
this crisis. I've done all I could to produce a positve outcome, but we
all have a limit, and enough is enough.

We should be happy and grateful that a vulnerability has been detected
and that can be fixed before someone else makes a huge hole up your
"backdoor" (in computer security terms ).

Have a good one !

Hi All,

After carefully reading all the issue, here is my opinion.

I don't think it's the scope of trytond to fight against DDoS attack, so for me no issue at all. I want to propose Reverse nginx proxy [1] as another option to fight against DDoS attacks.

About the proposed patch it does not solve the DDoS attack, as it give an easier way to brute force some user password (explained on msg24650), which will ease the attacker the possibility to obtain a valid user/password. Once the user has the valid user/password, it can consume the host resources quicklier, for example by creating invoices and this will colapse the system easier.

[1] https://www.nginx.com/blog/mitigating-ddos-attacks-with-nginx-and-nginx-plus/

* Luis Falcon [2016-03-09 13:33 +0100]:
>
>Luis Falcon <falcon@gnu.org> added the comment:
>
>> To sum up your argument: a table is filled with data coming from
>> unsuccessful attempts to authenticate a user in the system. This will
>> fill up the disk and thus make the system unusable.
>YES. That is the issue. BTW, it's not just millions of records "filling
>up the disc", the also but the huge system load on it.
>
>If you keep mixing password guessing using brute force with
>resource exhaustion/DoS, we will get nowhere.

I don't think we mix this up. My summary made very clear the two
different kind of DoS and IMHO that there is nothing to fear.

Where we mix the two concepts is when we're commenting your patch.
Because it softens the brute force mitigation procedure put in place
AND do not bring resource exhaustion/DoS protection.

>Putting the security of Tryton just in third party solutions is like
>leaving your house open because there is police in Belgium. Is the same
>of telling postgresql, SAP or any other solution to avoid using their
>own security protection methods.... makes no sense.

Tryton is responsible of the security of the system it creates. So
indeed we should protect our user passwords, prevent intra-tryton
privilege escalation, etc. And according to me we do more than enough
to prevent the attack you're describing.

But there are part of the system that Tryton can not control, the
network and the thousands of IP packets reaching it are not under the
scope of what Tryton can control. Another system must be used to do
so.

>Finally.... Is it that hard to admit that there is a design issue that
>must be fixed ? It happens in all systems, all the time. We ALL make
>mistakes.

It's not that hard to admit it and in fact we've change numerous time
the pitfalls and design error that we made in Tryton (that latest
being the backend rewrite I made but there has been the workflow
rewrite and probably others).

>Admitting the vulnerability, and getting away from the state of
>denial that you seem to be immersed is the first step to get out of
>this crisis.

This kind of comment is very rude. I am very sad that you say such
thing about my mental state.

And btw the vulnerability is admitted, it's just that the system in
place mitigate it as it should (as explained in msg24677).

>We should be happy and grateful that a vulnerability has been
>detected and that can be fixed before someone else makes a huge hole
>up your "backdoor" (in computer security terms ).

Well this issue has resulted in #5377 (closed), so in the end it resulted
in something useful for Tryton.

Since there were some summaries posted, but none was really exhaustive, I am proposing
this one to get more structure into the discussion. Probably it would be an advantage
to discuss some points in the public once this issue will be disclosed.

Topic 1: Subject of this issue: Server vulnerability in get_login

It was claimed, that the server is vulnerable by running multiple login
attempts. As a consequence of multiple login attempts it was assumed that
1) the login table could fill up and the database could run out of
available space,
2) the logs could fill up and the (Tryton) server could run out of available
space,
3) the server could undergo heavy load as far as being unresponsive under
(D)DOS.

ad 1)
The table res_user_login_attempt with respect to field sizes mainly
consists of the field "login character varying". The maximal length of
character varying in e.g. postgresql is assumed to be ca. 1GB [1]. So indeed
here could evtl. be an attack vector to make the database unresponsive given the
size of the login is not limited by other measures. The impact of this issue
should be clarified in a separate discussion.
While it could be that the database goes unresponsive, there will be no leakage
of data or information from the side of the server.

ad2)
The logs could fill up much more quickly than the database table in every day
scenarios. While this is true, the server itself comes with no logging enabled
and logging has to be enabled by the user/admin.
The attack surface of a vanilla trytond instance should be non-existent.

ad3)
The server could be subject to (D)DOS by exhausting the connection pool with
running multiple connections. IIRC this issue was already discussed in the early
times of the project (especially when comparing to the login attempt delay,
see below) and is a long known 'secret'. It was shown that the connection
limit of the server is mainly defined by the connection limit of the database.
Depending on the system ressources and the configuration of the database
backend the system could indeed be subject to DOS attacks. While this is true
it was also noted, that the nature of those attacks can/should better be
handled on OS level than on application level.

Further discussion of this topic is needed
- if and what something could evtl. be done on the server side (e.g. [2][3])
- how a trytond installation could be hardened with the use of 3rd-party tools
like iptables, fail2ban etc. (best practices)

Personal opinion:
There are for sure different opinions how and if the (D)DOS attack surface
should be classified as vulnerability. With reference to this issue I am
inclined to rather classify it as non-security, because
- there is no data leak from the application
- (D)DOS is a general problem of each web application
- finally the issue is well known since a long time

Topic 2: Login delay
While there is practically no login delay for new sessions, there is a rapidly
increasing delay for login attempts in the same session.
It was shown, that the attack surface via new connections is much bigger
than the danger to break in via repeated login trials. It was suggested to
reduce the 'punishment' caused by the waiting time until a new login attempt can
be made. Given that the attack surface of this repeated login is relatively low
compared to the surface exposed by new connections, considerable lower
(and/or) configurable waiting delays were proposed. Thus the current delay
increase could be too conservative.

I propose to discuss this topic separately. As it is not relevant to the
original topic, it doesn't add to the security relevance of this very issue
(Server vulnerability in get_login).

Topic 3: Password security
There were added several concerns relating to password security.
Especially the security of the admin password was put in question.

Currently trytond comes with no password configured at all. It is at the
discretion of the user/admin to impose/configure those according to the
guidelines of the enterprise.

There seems currently no immediate danger caused by trytond. Again I propose to
discuss the need of additional security measurements like expiry, length, etc.
separately if needed, at least this topic doesn't add to the security relevance
of this very issue.

Hopefully I didn't miss any valuable points, please add others if adequate.

[1] http://www.postgresql.org/docs/current/static/datatype-character.html
[2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4450
[3] https://nodejs.org/en/blog/vulnerability/http-server-pipeline-flood-dos/

Hello Sergi
On Wed, 09 Mar 2016 13:45:59 +0100
Sergi Almacellas Abellana @tryton.org> wrote:

> Sergi Almacellas Abellana <sergi@koolpi.com> added the comment:
>
> Hi All,
>
> After carefully reading all the issue, here is my opinion.
>
> I don't think it's the scope of trytond to fight against DDoS attack,
> so for me no issue at all. I want to propose Reverse nginx proxy [1]
> as another option to fight against DDoS attacks.
>
I agree with you, but this is another topic. As I said, I am the
first that suggests *complementary* tools to harden the security, but
that does not mean to take care of our own server internal security
methods.

> About the proposed patch it does not solve the DDoS attack
It solves *this* vulnerability, because it eliminates the problem on
allowing anonymous, unprotected writing the invalid login
information at DB level, not being able to deleting those records,
doing a search on that table at each login... that takes a lot of
resources (CPU, DB, IO, ...).

> as it give an easier way to brute force some user password>which will
> ease the attacker the possibility to obtain a valid user/password.

We should discuss authentication methods in another thread. It's
not related to this vulnerability. I already made some comments /
suggestions on hardening the server authentication on
https://bugs.tryton.org/msg24678

Thanks

On Wed, 09 Mar 2016 14:18:27 +0100
Nicolas Évrard @tryton.org> wrote:

> >Admitting the vulnerability, and getting away from the state of
> >denial that you seem to be immersed is the first step to get out of
> >this crisis.
>
> This kind of comment is very rude. I am very sad that you say such
> thing about my mental state.

It was never my intention to offend you. But I am sorry if the
comment was misinterpreted, please accept my apologies.

> And btw the vulnerability is admitted

Thank you !

>
> >We should be happy and grateful that a vulnerability has been
> >detected and that can be fixed before someone else makes a huge hole
> >up your "backdoor" (in computer security terms ).
>
> Well this issue has resulted in #5377 (closed), so in the end it resulted
> in something useful for Tryton.
>

Thanks ! I looked at it. We still have the vulnerability since tryton
still allowis anonymous writes into a DB table. That
takes a lot of IO, CPU, online redo logs, and all the overhead
that implies storing info in a RDBMS object. A very high price
that we should avoid.

BTW, if we want to use a DB object to store volatile / ephemeral
information, with heavy IO, we could use unlogged postgres tables.

Bests,

> _______________________________________________
> Tryton issue tracker @tryton.org>
> <https://bugs.tryton.org/#5375>
> _______________________________________________
>

Hi Mathias !

On Wed, 09 Mar 2016 14:36:06 +0100
Mathias Behrle @tryton.org> wrote:

> Mathias Behrle <mbehrle@m9s.biz> added the comment:
>
> Since there were some summaries posted, but none was really
> exhaustive, I am proposing this one to get more structure into the
> discussion. Probably it would be an advantage to discuss some points
> in the public once this issue will be disclosed.

Thanks !

>
> Topic 1: Subject of this issue: Server vulnerability in get_login
>
> It was claimed, that the server is vulnerable by running multiple
> login attempts. As a consequence of multiple login attempts it was
> assumed that 1) the login table could fill up and the database could
> run out of available space,
> 2) the logs could fill up and the (Tryton) server could run out of
> available space,
> 3) the server could undergo heavy load as far as being unresponsive
> under (D)DOS.
>
> ad 1)
> The table res_user_login_attempt with respect to field sizes mainly
> consists of the field "login character varying". The maximal length of
> character varying in e.g. postgresql is assumed to be ca. 1GB [1]. So
> indeed here could evtl. be an attack vector to make the database
> unresponsive given the size of the login is not limited by other
> measures. The impact of this issue should be clarified in a separate
> discussion. While it could be that the database goes unresponsive,
> there will be no leakage of data or information from the side of the
> server.
>

Agree

> ad2)
> The logs could fill up much more quickly than the database table in
> every day scenarios. While this is true, the server itself comes with
> no logging enabled and logging has to be enabled by the user/admin.
> The attack surface of a vanilla trytond instance should be
> non-existent.
>
In 24x7 scenarios, standby databases / log shipping, and when
point-in-time recovery is needed, the archive log would/should be
active.

Currently, if transaction log archiving is active, all the write /
delete operations involved in this vulnerability would generate a very
large amount of space, disk IO, network resources, etc ..

To minimize the impact, I propose not using at all a DB table for
storing this type of ephemeral information.

If at the end we will opt to use a db object to store volatile info
such as in LoginAttempt, then I propose using an UNLOGGED table, that
would not use WAL / transaction logs, minimizing the impact.
Let me know your thoughts.

> ad3)
> The server could be subject to (D)DOS by exhausting the connection
> pool with running multiple connections. IIRC this issue was already
> discussed in the early times of the project (especially when
> comparing to the login attempt delay, see below) and is a long known
> 'secret'. It was shown that the connection limit of the server is
> mainly defined by the connection limit of the database. Depending on
> the system ressources and the configuration of the database backend
> the system could indeed be subject to DOS attacks. While this is true
> it was also noted, that the nature of those attacks can/should better
> be handled on OS level than on application level.
>

Agree. And that is a separate issue than the vulnerability I presented.

In the context of this vulnerability, when I talk about exhausting the
resources, I am not referring to the connection pool. I am aiming to the
to IO / DB / CPU resources. In fact, the high system load and
concurrency errors Sebas was referring[1] were reproduced using the
PoC exploit.

> Further discussion of this topic is needed
> - if and what something could evtl. be done on the server side (e.g.
> [2][3])
> - how a trytond installation could be hardened with the use of
> 3rd-party tools like iptables, fail2ban etc. (best practices)
>
> Personal opinion:
> There are for sure different opinions how and if the (D)DOS attack
> surface should be classified as vulnerability. With reference to this
> issue I am inclined to rather classify it as non-security, because
> - there is no data leak from the application

True

> - (D)DOS is a general problem of each web application

I don't consider this vulnerability related to the web application
server DoS, in the general connection / socket exhaustion. In this
vulnerability the highest level of impact is generated at DB /
IO subsystem, *not* at Tryton application server.

In fact, just about 20 processes to generate a very high load that
would make the system unsuable (even without log archive mode). So is
not really the amount of connections the real problem for this
vulnerability, although it's always a contributing factor.

> - finally the issue is well known since a long time
>

True for the classical DoS on web application servers.

>
> Topic 2: Login delay
> While there is practically no login delay for new sessions, there is
> a rapidly increasing delay for login attempts in the same session.
> It was shown, that the attack surface via new connections is much
> bigger than the danger to break in via repeated login trials. It was
> suggested to reduce the 'punishment' caused by the waiting time until
> a new login attempt can be made. Given that the attack surface of
> this repeated login is relatively low compared to the surface exposed
> by new connections, considerable lower (and/or) configurable waiting
> delays were proposed. Thus the current delay increase could be too
> conservative.

Agree

>
> I propose to discuss this topic separately. As it is not relevant to
> the original topic, it doesn't add to the security relevance of this
> very issue (Server vulnerability in get_login).

Agree

>
>
> Topic 3: Password security
> There were added several concerns relating to password security.
> Especially the security of the admin password was put in question.
>
> Currently trytond comes with no password configured at all. It is at
> the discretion of the user/admin to impose/configure those according
> to the guidelines of the enterprise.
>
> There seems currently no immediate danger caused by trytond. Again I
> propose to discuss the need of additional security measurements like
> expiry, length, etc. separately if needed, at least this topic
> doesn't add to the security relevance of this very issue.
>

Agree. It's a very important issue, but not related to this
vulnerability. Happy to discuss and collaborate in another thread.

> Hopefully I didn't miss any valuable points, please add others if
> adequate.
>

Thanks a lot for shedding light and constructive feedback !

I propose keep up collecting and discussing the suggestions / proposals
directly related to the vulnerability.

As with any constructive discussion, the outcome will be a more robust
Tryton server and community.

All the best,
Luis

1.- https://bugs.tryton.org/msg24652

* Luis Falcon: " [#5375 (closed)] Server vulnerability in get_login" (Wed, 09 Mar
2016 18:12:26 +0100):

> Luis Falcon <falcon@gnu.org> added the comment:
>
> Hi Mathias !
>
> On Wed, 09 Mar 2016 14:36:06 +0100
> Mathias Behrle @tryton.org> wrote:
>
> > Mathias Behrle <mbehrle@m9s.biz> added the comment:
> >
> > Since there were some summaries posted, but none was really
> > exhaustive, I am proposing this one to get more structure into the
> > discussion. Probably it would be an advantage to discuss some points
> > in the public once this issue will be disclosed.
>
> Thanks !
>
> >
> > Topic 1: Subject of this issue: Server vulnerability in get_login
> >
> > It was claimed, that the server is vulnerable by running multiple
> > login attempts. As a consequence of multiple login attempts it was
> > assumed that 1) the login table could fill up and the database could
> > run out of available space,
> > 2) the logs could fill up and the (Tryton) server could run out of
> > available space,
> > 3) the server could undergo heavy load as far as being unresponsive
> > under (D)DOS.
> >
> > ad 1)
> > The table res_user_login_attempt with respect to field sizes mainly
> > consists of the field "login character varying". The maximal length of
> > character varying in e.g. postgresql is assumed to be ca. 1GB [1]. So
> > indeed here could evtl. be an attack vector to make the database
> > unresponsive given the size of the login is not limited by other
> > measures. The impact of this issue should be clarified in a separate
> > discussion. While it could be that the database goes unresponsive,
> > there will be no leakage of data or information from the side of the
> > server.
> >
>
> Agree
>
> > ad2)
> > The logs could fill up much more quickly than the database table in
> > every day scenarios. While this is true, the server itself comes with
> > no logging enabled and logging has to be enabled by the user/admin.
> > The attack surface of a vanilla trytond instance should be
> > non-existent.
> >
> In 24x7 scenarios, standby databases / log shipping, and when
> point-in-time recovery is needed, the archive log would/should be
> active.

I was originally referring in this paragraph to the logging of trytond.

> Currently, if transaction log archiving is active, all the write /
> delete operations involved in this vulnerability would generate a very
> large amount of space, disk IO, network resources, etc ..

There is also the impact on memory, because logins go into the cache. This
impact should be mitigated by the planned size constraint for logins.

> To minimize the impact, I propose not using at all a DB table for
> storing this type of ephemeral information.

I would agree on that. Not sure if it could be sufficient to only use the
cache, since AFAIS LoginAttempt is primarily used by User to get the count of
login attempts. As I am also not a friend of the increasing delay it seems
for me sufficient to know if a login was already tried (i.e. is in the cache)
and then set a fixed delay (in case the login attempt was unsuccessful in the
first place), thus avoiding completely the use of a database object.

> If at the end we will opt to use a db object to store volatile info
> such as in LoginAttempt, then I propose using an UNLOGGED table, that
> would not use WAL / transaction logs, minimizing the impact.
> Let me know your thoughts.
>
> > ad3)
> > The server could be subject to (D)DOS by exhausting the connection
> > pool with running multiple connections. IIRC this issue was already
> > discussed in the early times of the project (especially when
> > comparing to the login attempt delay, see below) and is a long known
> > 'secret'. It was shown that the connection limit of the server is
> > mainly defined by the connection limit of the database. Depending on
> > the system ressources and the configuration of the database backend
> > the system could indeed be subject to DOS attacks. While this is true
> > it was also noted, that the nature of those attacks can/should better
> > be handled on OS level than on application level.
> >
>
> Agree. And that is a separate issue than the vulnerability I presented.
>
> In the context of this vulnerability, when I talk about exhausting the
> resources, I am not referring to the connection pool. I am aiming to the
> to IO / DB / CPU resources. In fact, the high system load and
> concurrency errors Sebas was referring[1] were reproduced using the
> PoC exploit.
>
> > Further discussion of this topic is needed
> > - if and what something could evtl. be done on the server side (e.g.
> > [2][3])
> > - how a trytond installation could be hardened with the use of
> > 3rd-party tools like iptables, fail2ban etc. (best practices)
> >
> > Personal opinion:
> > There are for sure different opinions how and if the (D)DOS attack
> > surface should be classified as vulnerability. With reference to this
> > issue I am inclined to rather classify it as non-security, because
> > - there is no data leak from the application
>
> True
>
> > - (D)DOS is a general problem of each web application
>
> I don't consider this vulnerability related to the web application
> server DoS, in the general connection / socket exhaustion. In this
> vulnerability the highest level of impact is generated at DB /
> IO subsystem, *not* at Tryton application server.
>
> In fact, just about 20 processes to generate a very high load that
> would make the system unsuable (even without log archive mode). So is
> not really the amount of connections the real problem for this
> vulnerability, although it's always a contributing factor.

Well, so it is still the point to configure the database system to only accept
the number of connections it can handle. How should the system handle the load
of heavily used connections if it can not handle the same number of login requests?

> > - finally the issue is well known since a long time
> >
>
> True for the classical DoS on web application servers.
>
> >
> > Topic 2: Login delay
> > While there is practically no login delay for new sessions, there is
> > a rapidly increasing delay for login attempts in the same session.
> > It was shown, that the attack surface via new connections is much
> > bigger than the danger to break in via repeated login trials. It was
> > suggested to reduce the 'punishment' caused by the waiting time until
> > a new login attempt can be made. Given that the attack surface of
> > this repeated login is relatively low compared to the surface exposed
> > by new connections, considerable lower (and/or) configurable waiting
> > delays were proposed. Thus the current delay increase could be too
> > conservative.
>
> Agree
>
> >
> > I propose to discuss this topic separately. As it is not relevant to
> > the original topic, it doesn't add to the security relevance of this
> > very issue (Server vulnerability in get_login).
>
> Agree
>
> >
> >
> > Topic 3: Password security
> > There were added several concerns relating to password security.
> > Especially the security of the admin password was put in question.
> >
> > Currently trytond comes with no password configured at all. It is at
> > the discretion of the user/admin to impose/configure those according
> > to the guidelines of the enterprise.
> >
> > There seems currently no immediate danger caused by trytond. Again I
> > propose to discuss the need of additional security measurements like
> > expiry, length, etc. separately if needed, at least this topic
> > doesn't add to the security relevance of this very issue.
> >
>
> Agree. It's a very important issue, but not related to this
> vulnerability. Happy to discuss and collaborate in another thread.
>
> > Hopefully I didn't miss any valuable points, please add others if
> > adequate.
> >
>
> Thanks a lot for shedding light and constructive feedback !
>
> I propose keep up collecting and discussing the suggestions / proposals
> directly related to the vulnerability.
>
> As with any constructive discussion, the outcome will be a more robust
> Tryton server and community.
>
> All the best,
> Luis
>
> 1.- https://bugs.tryton.org/msg24652

On Thu, 10 Mar 2016 00:52:27 +0100
Mathias Behrle @tryton.org> wrote:

> > > ad2)
> > > The logs could fill up much more quickly than the database table
> > > in every day scenarios. While this is true, the server itself
> > > comes with no logging enabled and logging has to be enabled by
> > > the user/admin. The attack surface of a vanilla trytond instance
> > > should be non-existent.
> > >
> > In 24x7 scenarios, standby databases / log shipping, and when
> > point-in-time recovery is needed, the archive log would/should be
> > active.
>
> I was originally referring in this paragraph to the logging of
> trytond.
>

Trytond logging generation in this context will be affected, and they
could get quite large. But it can be controlled (log rotation, filter
type of events, ...), in a similar way that other OS system services
logs are managed. So I don't much of a problem here.

I see a larger problem with the high rate of DB transaction logs
generation, since they are write and delete operations.

> > Currently, if transaction log archiving is active, all the write /
> > delete operations involved in this vulnerability would generate a
> > very large amount of space, disk IO, network resources, etc ..
>
> There is also the impact on memory, because logins go into the cache.
> This impact should be mitigated by the planned size constraint for
> logins.

We found out that putting the server under stress with the exploit, led
to many concurrency errors when resetting the cache[1]

After applying the patch, we could not reproduce the error.

> > To minimize the impact, I propose not using at all a DB table for
> > storing this type of ephemeral information.
>
> I would agree on that. Not sure if it could be sufficient to only use
> the cache, since AFAIS LoginAttempt is primarily used by User to get
> the count of login attempts. As I am also not a friend of the
> increasing delay it seems for me sufficient to know if a login was
> already tried (i.e. is in the cache) and then set a fixed delay (in
> case the login attempt was unsuccessful in the first place), thus
> avoiding completely the use of a database object.

Actually, that's what the patch does:

* Removes write operations to the DB table (LoginAttempt methods).
This fixes the main vulnerability.

* Sets a default timeout for *any* invalid login attempt, minimizing
brute force attacks.

The default timeout (3 seconds) can be modified in the future Tryton
version with a configuration parameter (failed_login_timeout). If you
prefer a more conservative default value (4 or 5 seconds), we can raise
it.

> > > - (D)DOS is a general problem of each web application
> >
> > I don't consider this vulnerability related to the web application
> > server DoS, in the general connection / socket exhaustion. In this
> > vulnerability the highest level of impact is generated at DB /
> > IO subsystem, *not* at Tryton application server.
> >
> > In fact, just about 20 processes to generate a very high load that
> > would make the system unsuable (even without log archive mode). So
> > is not really the amount of connections the real problem for this
> > vulnerability, although it's always a contributing factor.
>
> Well, so it is still the point to configure the database system to
> only accept the number of connections it can handle. How should the
> system handle the load of heavily used connections if it can not
> handle the same number of login requests?

Yes. In fact, that is part what happens in this case. The PoC exploit
consumes a lot of resources (IO, process, memory...) leading
to contention and finally, reaching a saturation point, where accepting
extra available db connections will actually make things worse [2].

1.- https://bugs.tryton.org/msg24652
2.- https://wiki.postgresql.org/wiki/Number_Of_Database_Connections

Hi there !

After all these days, just noticed that the patch was not applied to the Trytond 3.8.4 . Unfortunate :/

Have a great week.

I think we can close this issue. Even if any core developpers really considered as a security issue, the discussion leaded to some improvements like #5381 (closed), #5377 (closed) and #5435 (closed) which all improve the availability of trytond under some DoS attacks.
We will keep the login attempt computation as-is and the reasons can be found at msg24702.
Of course for complex database setup, the user will have to take care of how he plans to manage the LoginAttempt table (as many others). But as many other optimisations like indexes the policy in Tryton is to leave as much as possible freedom of choice to the user.

made the issue visible to everyone

added 1 deleted label and removed 1 deleted label

closed

mentioned in issue #5381 (closed)

mentioned in issue #7111 (closed)

mentioned in issue #9386 (closed)

Download	Creator	Timestamp	Type
tryton_login_exploit.py	@meanmicio	2016-03-08 14:37:33 UTC	text/plain
tryton_user_login.patch	@meanmicio	2016-03-08 14:38:31.293000 UTC	text/plain
tryton_login_exploit_metrics.pdf	@meanmicio	2016-03-08 16:47:18.843000 UTC	application/pdf

Server vulnerability in get_login

Files

Child items 0

Activity

Admin message

Server vulnerability in get_login

Files

Activity