Tryton - Issues

 

Issue5757

Title Make harder to create an index
Priority feature Status chatting
Superseder Nosy List albertca, ced, jcavallo, nicoe, pokoli, yangoon
Type performance Components
Assigned To Keywords
Reviews

Created on 2016-07-27.19:58:53 by ced, last changed by yangoon.

Files
File name Uploaded Type Edit Remove
unnamed albertca, 2016-07-29.17:03:20 text/plain
unnamed albertca, 2016-07-29.17:07:18 text/plain
Messages
msg27459 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-07-29.20:10:20
On 2016-07-29 19:26, Sergi Almacellas Abellana wrote:
> Why not adding a Python object like Contraint[1] to represent indexes, so anybody can add it's own indexes types if needed?

As I said index creation is not standardized by any SQL standard. So they
are very specific to the database. Also it is only the "method" (B-tree,
hash, GiST, and GIN for PostgreSQL) that could be missing.
But as Albert said, creating tailored index is the job of DB
administrator depending on the usage and the setup. There is no
restriction from trytond to create manually such indexes as far as the
name is not in collision with the trytond schema name.
Also it is still possible to execute custom SQL query in ModelStorage.__register__
msg27458 (view) Author: [hidden] (pokoli) (Tryton committer) Date: 2016-07-29.19:26:13
Why not adding a Python object like Contraint[1] to represent indexes, so anybody can add it's own indexes types if needed?

I think it's the simpliest and more flexible solution. 

[1] http://hg.tryton.org/trytond/file/241eb36cdb45/trytond/model/modelsql.py#l28
msg27457 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-07-29.17:35:05
On 2016-07-29 17:03, Albert Cervera i Areny wrote:
> 2016-07-28 22:51 GMT+02:00 Cédric Krier <issue_tracker@tryton.org>:
> > Cédric Krier <cedric.krier@b2ck.com> added the comment:
> >
> > I propose instead of overriding __register__ and use TableHandler to add
> > the indexes, we could have an attribute on the ModelSQL to list the indexes
> > '_sql_indexes' just like we have '_sql_constraint'.
> > The syntax could be:
> >
> > _sql_indexes = [
> >     ['column1', 'column2'],
> >     ['column1'],
> >     …
> >     ]
> >
> 
> This misses the type of index which is important, and I guess it will be
> even more important in the future. The storage part of PostgreSQL is
> becoming more and more extensible and I think we can expect more index
> types to be added in the future.

Indeed there is no standard for index. So I think we should just keep it
simple for now. The TableHandler could be improved if needed to guess
the best index depending on the column types. It should be there because
the index method depend on the backend.

> Also, we should see how you'd manage more complex indexes. For example one
> can create an index on an expression, not only a field.

I think this is out of the scope of the general design of model.
But I guess the TableHandler could be improved to allow using
expression.

> My feeling is that indexes should be something that is not part of the
> module and is more a DBA job.

Yes but Tryton must come with a good default setup for standard usage.

> If we feel we should help our users, we could create an automated tool that
> assists on that though I guess there is already something available.

This is out of the scope of Tryton. I'm pretty sure it already exist for
major database backend.
msg27456 (view) Author: [hidden] (albertca) (Tryton committer) (Tryton translator) Date: 2016-07-29.17:07:19
2016-07-29 9:46 GMT+02:00 Jean CAVALLO <issue_tracker@tryton.org>:

>
> Jean CAVALLO <jean.cavallo@coopengo.com> added the comment:
>
> Agreed. We could also add some warnings if there are possible duplicate
> indexes, for instance in your example, the index only on column1 may
> not be necessary since the composite index can be used for queries on
> column1.
>
> @ Albert :
> One question though, I consider that almost all "reverse" fields of O2M
> fields should be indexed since every read on the "main" record will
> become slow after a time (e.g. account.move->lines). What do you think ?
>

I agree with Cédric here: it depends. My point is that "thinking" may not
bring us anywhere and only analysis will do. And as that depends on the
amount of data on each table, it can only happen outside of the module.

>
> ----------
> nosy: +jcavallo
>
> _______________________________________________
> Tryton issue tracker <issue_tracker@tryton.org>
> <https://bugs.tryton.org/issue5757>
> _______________________________________________
>
msg27455 (view) Author: [hidden] (albertca) (Tryton committer) (Tryton translator) Date: 2016-07-29.17:03:21
2016-07-28 22:51 GMT+02:00 Cédric Krier <issue_tracker@tryton.org>:

>
> Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
> I propose instead of overriding __register__ and use TableHandler to add
> the indexes, we could have an attribute on the ModelSQL to list the indexes
> '_sql_indexes' just like we have '_sql_constraint'.
> The syntax could be:
>
> _sql_indexes = [
>     ['column1', 'column2'],
>     ['column1'],
>     …
>     ]
>

This misses the type of index which is important, and I guess it will be
even more important in the future. The storage part of PostgreSQL is
becoming more and more extensible and I think we can expect more index
types to be added in the future.

Also, we should see how you'd manage more complex indexes. For example one
can create an index on an expression, not only a field.

My feeling is that indexes should be something that is not part of the
module and is more a DBA job.

If we feel we should help our users, we could create an automated tool that
assists on that though I guess there is already something available.

>
> _______________________________________________
> Tryton issue tracker <issue_tracker@tryton.org>
> <https://bugs.tryton.org/issue5757>
> _______________________________________________
>
msg27451 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-07-29.12:35:06
On 2016-07-29 11:41, Jean CAVALLO wrote:
> They may be useful in some cases, I just wanted to point out that in the
> particular case you gave they do not.
> 
> See http://dba.stackexchange.com/questions/27481/is-a-composite-index-also-good-for-queries-on-the-first-field
> and http://dba.stackexchange.com/questions/6115/working-of-indexes-in-postgresql
> 
> The scheduler will use the composite index for the first column and it
> will be as efficient as having an extra index only on this column.
> This allows to create less indexes and save some space :)

Not necessary, such composite index could be bigger and so more
expensive to load. So as my previous link shows, there are legitimate
use case to have "duplicate" indexes so a warning will be annoying.

> Regarding the reverse, I agree it should not be a rule, but IMHO in
> 90-95% of the case it should be set. Even on small tables, where the
> cost of adding the index is minimal.

No, index on small table is just a waste of resources.
msg27450 (view) Author: [hidden] (jcavallo) Date: 2016-07-29.11:41:06
They may be useful in some cases, I just wanted to point out that in the
particular case you gave they do not.

See http://dba.stackexchange.com/questions/27481/is-a-composite-index-also-good-for-queries-on-the-first-field
and http://dba.stackexchange.com/questions/6115/working-of-indexes-in-postgresql

The scheduler will use the composite index for the first column and it
will be as efficient as having an extra index only on this column.
This allows to create less indexes and save some space :)

Regarding the reverse, I agree it should not be a rule, but IMHO in
90-95% of the case it should be set. Even on small tables, where the
cost of adding the index is minimal.

For active / company, I think they are not often used since those are
not very discriminative columns. Usually there is only one company,
and almost all records are active...
msg27439 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-07-29.09:59:20
Indeed "duplicate" indexes can be useful, see https://www.postgresql.org/docs/current/static/indexes-bitmap-scans.html

For the reverse, I do not think we can make a rule because it depends on the size of the table. For small table, indexes are always useless.

Another thing we should think about is the two fields: active and company. They are added automatically to filter clauses but as far as I see indexes on those columns are almost never used.
msg27438 (view) Author: [hidden] (jcavallo) Date: 2016-07-29.09:46:43
Agreed. We could also add some warnings if there are possible duplicate
indexes, for instance in your example, the index only on column1 may
not be necessary since the composite index can be used for queries on
column1.

@ Albert :
One question though, I consider that almost all "reverse" fields of O2M
fields should be indexed since every read on the "main" record will
become slow after a time (e.g. account.move->lines). What do you think ?
msg27437 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-07-28.22:51:44
I propose instead of overriding __register__ and use TableHandler to add the indexes, we could have an attribute on the ModelSQL to list the indexes '_sql_indexes' just like we have '_sql_constraint'.
The syntax could be:

_sql_indexes = [
    ['column1', 'column2'],
    ['column1'],
    …
    ]
msg27429 (view) Author: [hidden] (albertca) (Tryton committer) (Tryton translator) Date: 2016-07-28.01:16:59
+1 in general.

Yet I would only keep very, very few and very obvious indexes (date on account moves is the only one that comes to mind right now). Not even following the statistics given by some installs. Because, just like it happens with most modules, on each install we should tend to add indexes but not remove them. The same we try that fields should be added by modules but not removed.

For me, indexes should be very install-specific and other tools can be used to determine which are necessary.
msg27428 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-07-27.19:58:51
Indexes on table have a cost on update. On postgresql, all indexes must be updated when a row is updated even if the update does not change the indexed columns (see [1]). I think we have made the creation of index (with select=True) to easy and so we probably have too much indexes.
My proposal is to remove the attribute 'select' on the fields and to force developer to write code in __setup__ using the TableHandler.
This change will force us to review all the current index (and probably remove many) but also it will make us think if we do not need a more complex indexes (instead of simple column).

Later we could improve the TableHandler.index_action to allow to specify which kind of indexes (btree, trigram etc.) depending on the backend.

For the decision about keeping an index, it will be good to have statistics of production databases. Such statistics can be retrieved from PostgreSQL using this query:

    SELECT * FROM pg_stat_all_indexes;


[1] https://eng.uber.com/mysql-migration/
History
Date User Action Args
2017-04-19 16:31:38yangoonsetnosy: + yangoon
2016-07-29 20:10:24cedsetmessages: + msg27459
2016-07-29 19:26:14pokolisetmessages: + msg27458
2016-07-29 17:35:05cedsetmessages: + msg27457
2016-07-29 17:10:37nicoesetnosy: + nicoe
2016-07-29 17:07:19albertcasetfiles: + unnamed
messages: + msg27456
2016-07-29 17:03:22albertcasetfiles: + unnamed
messages: + msg27455
2016-07-29 12:35:06cedsetmessages: + msg27451
2016-07-29 11:41:06jcavallosetmessages: + msg27450
2016-07-29 09:59:20cedsetmessages: + msg27439

Showing 10 items. Show all history (warning: this could be VERY long)