In my case the model account.account.party is using a table query using the usual Min(line.id) as id to set the id of the new data.
But the join is over millions of lines (and few accounts) which results in more than a million lines.
On this table query we're then browsing some ids. Unfortunately postgres can not optimize correctly the table query because of the use of the aggregate function. Thus it will result in this big query being run for every read of a batch of ids.
The proposed review fixes it by using a specific domain_id on the table query model and adding support for it in ModelSQL.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I do not like much that you need to inspect the query internal and modify it in domain_id.
I'm wondering if it will not be cleaner to use a context key filled with a "domain" that ModelSQL.table_query could use to simplify its query. This could be also use in ModelSQL.search.
Now using a full domain maybe to complicated except if we have a tool similar to domain_inversion (maybe it can be reused) so it could be queried to get if possible a SQL clause for a specific field. In the case of account.account.party, it could be to request if there are a strict clause for the column id but it could also be queried to get a clause for company column etc.
>I do not like much that you need to inspect the query internal and modify it
>in `domain_id`.
I don't like it that much but more than the context solution.
For two reasons:
- context is a code smell, it's just a global state and should be the last
resort
- the query definition is a known fact by someone writing the domain_id
method, we have to use some internal properties because python-sql lacks
a way to properly access those object.
So it made me think (again) about python-sql and its inability to give a way to
access easily the query components. And it's important if
That's why I don't like my code. If we could write something like
line = table_query._tables['line']
table_query.where &= Operator(line.id, value)
Then I wouldn't mind that much about those access to the intenal properties.
Thinking out loud:
What if we add a __getitem__ to python-sql queries which would resolve a
dotted path?
Something like :
table_query['join.left'] → return the left part of the join
table_query['with.left.right'] → return the right part of the left part
of the with part of the query
table_query['0'] → first expr (of the select, insert, update)
table_query['group_by.0'] → first expr of the group_by
table_query['0.0'] → first expr of the first expr (for exemple if
it's a subquery)
If the path is not resolvable then it raises KeyError.
If we have __getitem__, do we want __setitem__ and __delitem__?
Maybe table_query['join', 'left'] is better? (I prefer the dotted notation
but it's almost a matter of taste)
>I'm wondering if it will not be cleaner to use a context key filled with a
>"domain" that `ModelSQL.table_query` could use to simplify its query. This
>could be also use in `ModelSQL.search`.
If I am not mistaken then if the context modifies the internal query then the
table scan will occur also (but on a much smaller table indeed) because the
domain will still be there (I don't think we would modify the domain in
table_query).
The trick with returning Literal(True) is that it prevents this (usualy small)
table scan. But since we're trying to preserve performances those cycles are a
gain anyway.
But another thing I don't like with my patch is the special case that it is
doing for read() when considering the domain_id. It's not such a big deal but
still.
>Now using a full domain maybe to complicated except if we have a tool similar
>to `domain_inversion` (maybe it can be reused) so it could be queried to get
>if possible a SQL clause for a specific field.
Indeed the domain_inversion could simplify the domain on the 'id' column but it
will still need to be parsed into its leaves and processed. While the domain
solution handle a leaf already.
>In the case of `account.account.party`, it could be to request if there are a
>strict clause for the column `id` but it could also be queried to get a clause
>for `company` column etc.
Wouldn't the trytond part of the patch be useful anyway as we don't handle
domain_id at all in read() (and probably elsewhere though I checked the usages
of \.id but I had in mind the idea of a table query thus I skipped write(),
create() and delete()).
On 2021-03-19 18:49, Nicolas Évrard wrote:
>
> Nicolas Évrard <nicoe@b2ck.com> added the comment:
>
> * Cédric Krier [2021-03-19 17:02 +0100]:
>
> >I do not like much that you need to inspect the query internal and modify it
> >in `domain_id`.
>
> I don't like it that much but more than the context solution.
>
> For two reasons:
> - context is a code smell, it's just a global state and should be the last
> resort
Another option is to add an optional argument to table_query.
> What if we add a __getitem__ to python-sql queries which would resolve a
> dotted path?
I do not think it will help anything because it presuppose a specific
structure of the query.
python-sql was designed to make composable query not to introspect them.
> >I'm wondering if it will not be cleaner to use a context key filled with a
> >"domain" that `ModelSQL.table_query` could use to simplify its query. This
> >could be also use in `ModelSQL.search`.
>
> If I am not mistaken then if the context modifies the internal query then the
> table scan will occur also (but on a much smaller table indeed) because the
> domain will still be there (I don't think we would modify the domain in
> table_query).
But for me it will be wrong to suppose that the table_query will be or
not optimized for the domain.
> >Now using a full domain maybe to complicated except if we have a tool similar
> >to `domain_inversion` (maybe it can be reused) so it could be queried to get
> >if possible a SQL clause for a specific field.
>
> Indeed the domain_inversion could simplify the domain on the 'id' column but it
> will still need to be parsed into its leaves and processed. While the domain
> solution handle a leaf already.
I do not understand.
> >In the case of `account.account.party`, it could be to request if there are a
> >strict clause for the column `id` but it could also be queried to get a clause
> >for `company` column etc.
>
> Wouldn't the trytond part of the patch be useful anyway as we don't handle
> domain_id at all in read() (and probably elsewhere though I checked the usages
> of \.id but I had in mind the idea of a table query thus I skipped write(),
> create() and delete()).
* Cédric Krier [2021-03-19 18:59 +0100]:
>
>Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
>On 2021-03-19 18:49, Nicolas Évrard wrote:
>>
>> Nicolas Évrard <nicoe@b2ck.com> added the comment:
>>
>> * Cédric Krier [2021-03-19 17:02 +0100]:
>>
>> >I do not like much that you need to inspect the query internal and modify it
>> >in `domain_id`.
>>
>> I don't like it that much but more than the context solution.
>>
>> For two reasons:
>> - context is a code smell, it's just a global state and should be the last
>> resort
>
>Another option is to add an optional argument to table_query.
That might work although the ids are used in a loop which will make it a bit
more complicated.
>> What if we add a __getitem__ to python-sql queries which would resolve a
>> dotted path?
>
>I do not think it will help anything because it presuppose a specific
>structure of the query.
Of course but this structure is known to the developer using it.
>python-sql was designed to make composable query not to introspect them.
I disagree. I always had in mind the fact that the result of the table_query()
and thus the python-sql object returned could be used and modified in an object
inheriting from the class.
Also composable queries should be introspectable. Because if we have query q
and we want to extend its where clause we will need to have access to one of
its component.
But if it's just a matter of writing SQL, it lacks in readability when compared
to pure SQL. So it has to aim for more.
>> >I'm wondering if it will not be cleaner to use a context key filled with a
>> >"domain" that `ModelSQL.table_query` could use to simplify its query. This
>> >could be also use in `ModelSQL.search`.
>>
>> If I am not mistaken then if the context modifies the internal query then the
>> table scan will occur also (but on a much smaller table indeed) because the
>> domain will still be there (I don't think we would modify the domain in
>> table_query).
>
>But for me it will be wrong to suppose that the table_query will be or
>not optimized for the domain.
Nothing prevent to return that either. I just thought that it made a nice
optimization (but I doubt the query planner will ever be able to optimize this
type of query).
>> >Now using a full domain maybe to complicated except if we have a tool similar
>> >to `domain_inversion` (maybe it can be reused) so it could be queried to get
>> >if possible a SQL clause for a specific field.
>>
>> Indeed the domain_inversion could simplify the domain on the 'id' column but it
>> will still need to be parsed into its leaves and processed. While the domain
>> solution handle a leaf already.
>
>I do not understand.
Passing the full domain and using domain_inversion to get only the part
relevant to the id column will sometime result in a domain that will need to be
parsed and it will make the code more difficult.
On the other hand the domain_id deals only with leaves which is more easy.
>> >In the case of `account.account.party`, it could be to request if there are a
>> >strict clause for the column `id` but it could also be queried to get a clause
>> >for `company` column etc.
>>
>> Wouldn't the trytond part of the patch be useful anyway as we don't handle
>> domain_id at all in read() (and probably elsewhere though I checked the usages
>> of \.id but I had in mind the idea of a table query thus I skipped write(),
>> create() and delete()).
>
>I do not understand neither.
The patch has two part:
- in account the addition of domain_id
- in trytond the support of the use of domain_id.
I wonder if the seoncd part would not be useful anyway.
On 2021-03-19 19:45, Nicolas Évrard wrote:
> >On 2021-03-19 18:49, Nicolas Évrard wrote:
> >>
> >> Nicolas Évrard <nicoe@b2ck.com> added the comment:
> >>
> >> * Cédric Krier [2021-03-19 17:02 +0100]:
> >>
> >> >I do not like much that you need to inspect the query internal and modify it
> >> >in `domain_id`.
> >>
> >> I don't like it that much but more than the context solution.
> >>
> >> For two reasons:
> >> - context is a code smell, it's just a global state and should be the last
> >> resort
> >
> >Another option is to add an optional argument to table_query.
>
> That might work although the ids are used in a loop which will make it a bit
> more complicated.
It is just a matter of calling __table__ in the loop with the proper
argument.
But I think we should try to allow any generic domain as parameter not
only ids because search may benefit from it.
> >> What if we add a __getitem__ to python-sql queries which would resolve a
> >> dotted path?
> >
> >I do not think it will help anything because it presuppose a specific
> >structure of the query.
>
> Of course but this structure is known to the developer using it.
Except with our modular design where any module can wrap or extend the
query.
> >python-sql was designed to make composable query not to introspect them.
>
> I disagree. I always had in mind the fact that the result of the table_query()
> and thus the python-sql object returned could be used and modified in an object
> inheriting from the class.
In practice it is not manageable. That's why we use hook function to
compose the query.
> Also composable queries should be introspectable. Because if we have query q
> and we want to extend its where clause we will need to have access to one of
> its component.
That's almost not practical.
> But if it's just a matter of writing SQL, it lacks in readability when compared
> to pure SQL. So it has to aim for more.
It improves much more the plain text SQL. Mainly by making the
construction composable. Introspection would be exactly as difficult as
parsing a plain test SQL query or the Python AST. It is practical for
low level stuff only.
> >> >I'm wondering if it will not be cleaner to use a context key filled with a
> >> >"domain" that `ModelSQL.table_query` could use to simplify its query. This
> >> >could be also use in `ModelSQL.search`.
> >>
> >> If I am not mistaken then if the context modifies the internal query then the
> >> table scan will occur also (but on a much smaller table indeed) because the
> >> domain will still be there (I don't think we would modify the domain in
> >> table_query).
> >
> >But for me it will be wrong to suppose that the table_query will be or
> >not optimized for the domain.
>
> Nothing prevent to return that either. I just thought that it made a nice
> optimization (but I doubt the query planner will ever be able to optimize this
> type of query).
I do not know what type of query you are talking about.
> >> >Now using a full domain maybe to complicated except if we have a tool similar
> >> >to `domain_inversion` (maybe it can be reused) so it could be queried to get
> >> >if possible a SQL clause for a specific field.
> >>
> >> Indeed the domain_inversion could simplify the domain on the 'id' column but it
> >> will still need to be parsed into its leaves and processed. While the domain
> >> solution handle a leaf already.
> >
> >I do not understand.
>
> Passing the full domain and using domain_inversion to get only the part
> relevant to the id column will sometime result in a domain that will need to be
> parsed and it will make the code more difficult.
We do not need to support all cases in this inversion. Also we already
have the inversion code written.
> On the other hand the domain_id deals only with leaves which is more easy.
But it prevents to have optimization on search.
> >> >In the case of `account.account.party`, it could be to request if there are a
> >> >strict clause for the column `id` but it could also be queried to get a clause
> >> >for `company` column etc.
> >>
> >> Wouldn't the trytond part of the patch be useful anyway as we don't handle
> >> domain_id at all in read() (and probably elsewhere though I checked the usages
> >> of \.id but I had in mind the idea of a table query thus I skipped write(),
> >> create() and delete()).
> >
> >I do not understand neither.
>
> The patch has two part:
> - in account the addition of domain_id
> - in trytond the support of the use of domain_id.
>
> I wonder if the seoncd part would not be useful anyway.
I do not understand what is the point having the trytond part only.
Oups, It seems I added similar remarks on the review before reading the full comments on the issue.
If the problem is just about returning a proper expression for the query, why not adding a new attribute for ModelSQL which returns by default table.id but the developer can customize to return another expression for table_queries. Something like:
def where_table_id(cls, table): return table.id
and for account_party we should return
``
def where_table_id(cls, table):
return line.id
Another option is to allow returning a tuple on the table_query, the first value will be the query and the other the expression to filter on the id. But not sure if returning diferent options is a good API.
I rethought about that and indeed I think we do not need any new mechanism but better index and query.
The current query: https://explain.dalibo.com/plan/pFa
We need to avoid to have a "Seq Scan" on account_move_line. For that we need to have the part of the query before doing the join to come from an index. So we could CREATE INDEX ON account_move_line (party, account, id) (it is important that the id is last column but the order of others does not seem to have impact (I guess PG can reorder the GROUP BY)). We could add a WHERE party IS NOT NULL to reduce the size of the index.
The result query: https://explain.dalibo.com/plan/YNm
Now we also need to avoid the "Sort" for the "GroupAggregate". For that we need to group before the join in a subquery and as we have an index including the id, there will be not sort to get the MIN.
The result query: https://explain.dalibo.com/plan/GCJ
Finally we could CREATE INDEX ON account_account (party_required).
The result query: https://explain.dalibo.com/plan/MxG
When I test such query as subquery with a clause on id, I got: https://explain.dalibo.com/plan/FBV. So we see that the filter on id is done back to the "GroupAggregate" so before the join. I do not think it is possible to benefit from an index there, except if we transform the subquery into a materialized view with an index.
I tested that with a not big database but I can already measure that it is 85% faster.
>I tested that with a not big database but I can already measure that it is 85%
>faster.
I have the same measurements with my bigger database when all the columns are
not included in the query. When all the columns are included it's even better
probably because the group by is smaller but it's impact is not relevant in the
better query.
On 2021-03-22 09:15, Nicolas Évrard wrote:
> * Cédric Krier [2021-03-19 23:11 +0100]:
>
> >I tested that with a not big database but I can already measure that it is 85%
> >faster.
>
> I have the same measurements with my bigger database when all the columns are
> not included in the query. When all the columns are included it's even better
> probably because the group by is smaller but it's impact is not relevant in the
> better query.
>> >> What if we add a __getitem__ to python-sql queries which would resolve a
>> >> dotted path?
>> >
>> >I do not think it will help anything because it presuppose a specific
>> >structure of the query.
>>
>> Of course but this structure is known to the developer using it.
>
>Except with our modular design where any module can wrap or extend the query.
If someone modifies the query then its responsibility is to check that he does
not break everything.
>> >python-sql was designed to make composable query not to introspect them.
>>
>> I disagree. I always had in mind the fact that the result of the
>> table_query() and thus the python-sql object returned could be used and
>> modified in an object inheriting from the class.
>
>In practice it is not manageable. That's why we use hook function to compose
>the query.
In practice sometimes some hooks are absent.
>> Also composable queries should be introspectable. Because if we have query q
>> and we want to extend its where clause we will need to have access to one of
>> its component.
>
>That's almost not practical.
I doubt so.
>> But if it's just a matter of writing SQL, it lacks in readability when compared
>> to pure SQL. So it has to aim for more.
>
>It improves much more the plain text SQL.
It's better but not much better.
>Mainly by making the construction composable.
There are two points where python-sql is better than plain SQL currently:
- Handling dialects
- Looping over items to create queries
Readability is not among those. Take for exemple the table_query for the aged
balance, except as I said above for the *columns part, I find that plain SQL is
more readable.
When I have to write a complicated query I always write it in SQL and then I
translate it to python.
>Introspection would be exactly as difficult as parsing a plain test SQL query
>or the Python AST.
I don't think so. While implementing it yesterday I noticed that in fact the
attribute accessors are already there and there is no need for the __getitem__.
So you can add that to the two points above.