Make better index

+1 in general.

Yet I would only keep very, very few and very obvious indexes (date on account moves is the only one that comes to mind right now). Not even following the statistics given by some installs. Because, just like it happens with most modules, on each install we should tend to add indexes but not remove them. The same we try that fields should be added by modules but not removed.

For me, indexes should be very install-specific and other tools can be used to determine which are necessary.

added 1 deleted label and removed 1 deleted label

I propose instead of overriding __register__ and use TableHandler to add the indexes, we could have an attribute on the ModelSQL to list the indexes '_sql_indexes' just like we have '_sql_constraint'.
The syntax could be:

_sql_indexes = [
['column1', 'column2'],
['column1'],
…
]

Agreed. We could also add some warnings if there are possible duplicate
indexes, for instance in your example, the index only on column1 may
not be necessary since the composite index can be used for queries on
column1.

@ Albert :
One question though, I consider that almost all "reverse" fields of O2M
fields should be indexed since every read on the "main" record will
become slow after a time (e.g. account.move->lines). What do you think ?

Indeed "duplicate" indexes can be useful, see https://www.postgresql.org/docs/current/static/indexes-bitmap-scans.html

For the reverse, I do not think we can make a rule because it depends on the size of the table. For small table, indexes are always useless.

Another thing we should think about is the two fields: active and company. They are added automatically to filter clauses but as far as I see indexes on those columns are almost never used.

They may be useful in some cases, I just wanted to point out that in the
particular case you gave they do not.

See http://dba.stackexchange.com/questions/27481/is-a-composite-index-also-good-for-queries-on-the-first-field
and http://dba.stackexchange.com/questions/6115/working-of-indexes-in-postgresql

The scheduler will use the composite index for the first column and it
will be as efficient as having an extra index only on this column.
This allows to create less indexes and save some space :)

Regarding the reverse, I agree it should not be a rule, but IMHO in
90-95% of the case it should be set. Even on small tables, where the
cost of adding the index is minimal.

For active / company, I think they are not often used since those are
not very discriminative columns. Usually there is only one company,
and almost all records are active...

On 2016-07-29 11:41, Jean CAVALLO wrote:
> They may be useful in some cases, I just wanted to point out that in the
> particular case you gave they do not.
>
> See http://dba.stackexchange.com/questions/27481/is-a-composite-index-also-good-for-queries-on-the-first-field
> and http://dba.stackexchange.com/questions/6115/working-of-indexes-in-postgresql
>
> The scheduler will use the composite index for the first column and it
> will be as efficient as having an extra index only on this column.
> This allows to create less indexes and save some space :)

Not necessary, such composite index could be bigger and so more
expensive to load. So as my previous link shows, there are legitimate
use case to have "duplicate" indexes so a warning will be annoying.

> Regarding the reverse, I agree it should not be a rule, but IMHO in
> 90-95% of the case it should be set. Even on small tables, where the
> cost of adding the index is minimal.

No, index on small table is just a waste of resources.

2016-07-28 22:51 GMT+02:00 Cédric Krier @tryton.org>:

>
> Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
> I propose instead of overriding __register__ and use TableHandler to add
> the indexes, we could have an attribute on the ModelSQL to list the indexes
> '_sql_indexes' just like we have '_sql_constraint'.
> The syntax could be:
>
> _sql_indexes = [
> ['column1', 'column2'],
> ['column1'],
> …
> ]
>

This misses the type of index which is important, and I guess it will be
even more important in the future. The storage part of PostgreSQL is
becoming more and more extensible and I think we can expect more index
types to be added in the future.

Also, we should see how you'd manage more complex indexes. For example one
can create an index on an expression, not only a field.

My feeling is that indexes should be something that is not part of the
module and is more a DBA job.

If we feel we should help our users, we could create an automated tool that
assists on that though I guess there is already something available.

>
> _______________________________________________
> Tryton issue tracker @tryton.org>
> <https://bugs.tryton.org/#5757>
> _______________________________________________
>

2016-07-29 9:46 GMT+02:00 Jean CAVALLO @tryton.org>:

>
> Jean CAVALLO <jean.cavallo@coopengo.com> added the comment:
>
> Agreed. We could also add some warnings if there are possible duplicate
> indexes, for instance in your example, the index only on column1 may
> not be necessary since the composite index can be used for queries on
> column1.
>
> @ Albert :
> One question though, I consider that almost all "reverse" fields of O2M
> fields should be indexed since every read on the "main" record will
> become slow after a time (e.g. account.move->lines). What do you think ?
>

I agree with Cédric here: it depends. My point is that "thinking" may not
bring us anywhere and only analysis will do. And as that depends on the
amount of data on each table, it can only happen outside of the module.

>
> ----------
> nosy: +jcavallo
>
> _______________________________________________
> Tryton issue tracker @tryton.org>
> <https://bugs.tryton.org/#5757>
> _______________________________________________
>

On 2016-07-29 17:03, Albert Cervera i Areny wrote:
> 2016-07-28 22:51 GMT+02:00 Cédric Krier @tryton.org>:
> > Cédric Krier <cedric.krier@b2ck.com> added the comment:
> >
> > I propose instead of overriding __register__ and use TableHandler to add
> > the indexes, we could have an attribute on the ModelSQL to list the indexes
> > '_sql_indexes' just like we have '_sql_constraint'.
> > The syntax could be:
> >
> > _sql_indexes = [
> > ['column1', 'column2'],
> > ['column1'],
> > …
> > ]
> >
>
> This misses the type of index which is important, and I guess it will be
> even more important in the future. The storage part of PostgreSQL is
> becoming more and more extensible and I think we can expect more index
> types to be added in the future.

Indeed there is no standard for index. So I think we should just keep it
simple for now. The TableHandler could be improved if needed to guess
the best index depending on the column types. It should be there because
the index method depend on the backend.

> Also, we should see how you'd manage more complex indexes. For example one
> can create an index on an expression, not only a field.

I think this is out of the scope of the general design of model.
But I guess the TableHandler could be improved to allow using
expression.

> My feeling is that indexes should be something that is not part of the
> module and is more a DBA job.

Yes but Tryton must come with a good default setup for standard usage.

> If we feel we should help our users, we could create an automated tool that
> assists on that though I guess there is already something available.

This is out of the scope of Tryton. I'm pretty sure it already exist for
major database backend.

Why not adding a Python object like Contraint[1] to represent indexes, so anybody can add it's own indexes types if needed?

I think it's the simpliest and more flexible solution.

[1] http://hg.tryton.org/trytond/file/241eb36cdb45/trytond/model/modelsql.py#l28

On 2016-07-29 19:26, Sergi Almacellas Abellana wrote:
> Why not adding a Python object like Contraint[1] to represent indexes, so anybody can add it's own indexes types if needed?

As I said index creation is not standardized by any SQL standard. So they
are very specific to the database. Also it is only the "method" (B-tree,
hash, GiST, and GIN for PostgreSQL) that could be missing.
But as Albert said, creating tailored index is the job of DB
administrator depending on the usage and the setup. There is no
restriction from trytond to create manually such indexes as far as the
name is not in collision with the trytond schema name.
Also it is still possible to execute custom SQL query in ModelStorage.__register__

Here is a post about finding missing indexes and unused indexes: https://www.geekytidbits.com/performance-tuning-postgres/#index-tuning

Maybe we could have a script that will allow us to collect statistics about Tryton databases which will allow us to better tune the default indexes.

Checking several customer databases shows that the "type" fields in sale.line, purchase.line, and account.invoice.line are never used.

Some more indexes we found unused (or almost unused):

account_move_state_index
purchase_request_uom_index
sale_sale_origin_index
party_contact_mechanism_active_index

I have made the proposal in the review which is kind of a mix of all the ideas in this issue.

Definition of the index in a specific attribute in __setup__
Allowing to define index on expression (works in postgres, will fail silently in SQLite)
Allow to specify some backend specific customization of the index

I submit the code review before reviewing all the modules. It would be nice to have some numbers about the index that are used in the real world also.

assigned to @nicoe

review361251002 updated at https://codereview.tryton.org/361251002/#ps341541002

I don't seem to find a way to easily remove an index created by one of the core modules. Am I right?

I think we need a mechanism to allow removing the index or that the trytond-admin --all does not rebuild it.

I don't seem to find a way to easily remove an index created by one of the core modules. Am I right?

I agree with this.

I think index creation should be delayed until the end of the upgrade process, so that a sub-module can disable an index before it is actually created in the database.

On 2021-02-26 23:02, Albert Cervera i Areny wrote:
> I don't seem to find a way to easily remove an index created by one of the core modules. Am I right?
>
> I think we need a mechanism to allow removing the index or that the trytond-admin --all does not rebuild it.

I do not think so. Just like we can not remove required, there is no
point to remove an index. Even less if we create more carefully default
indexes.

I think we need a mechanism to allow removing the index or that the trytond-admin --all does not rebuild it.

I do not think so. Just like we can not remove required, there is no point to remove an index. Even less if we create more carefully default indexes.

The use case for this would be, as I mentionned here[1], when a custom module adds a composite index that can be used in place of a standard index (for instance, a move / account index on account.move.line can replace a "pure" move index).

Though the standard index may still be useful (because it will be somehow lighter, so faster to load), we also may have good reasons to see it gone (for improved update / create performance, or disk on the DB side).

I am pointing this out because that is something some of our customers' DBAs asked us about, and I think it is a valid use case for being able to remove a core index.

[1] https://bugs.tryton.org/#5757#msg27438

On 2021-02-26 23:23, Jean CAVALLO wrote:
> I am pointing this out because that is something some of our customers' DBAs asked us about, and I think it is a valid use case for being able to remove a core index.

This is too specific to require special management that breaks the
modularity of Tryton.
For this case I would just edit the existing index.

On 2021-02-26 18:59, Nicolas Évrard wrote:
> I have made the proposal in the review which is kind of a mix of all the ideas in this issue.
>
> - Definition of the index in a specific attribute in \_\_setup\_\_

I do not think it is correct to store a backend specific object on the
Model.
Indeed by looking at syntax of both backends they share almost the same
syntax/parameters.
I think that some of the PG parameter are too specific to require our
support like:

- CONCURRENTLY: not really useful for trytond-admin as it is not
supposed to be concurrent
- ONLY: not supported by all PG version and only useful for
partitioned table which could exist only if the DBA create the
schema and so he could create the index also
- INCLUDE: could be general argument that is ignored by sqlite
- WITH: it seems to me that it is very specific and only DBA could
choose them.

So I think we could have an object in modelsql.py like the Constraint
that store the index definition to be passed to the table handler.
The only missing point is to index method to use. We could probably use
the PG name and just make SQL support only btree and hash.

> - Allowing to define index on expression (works in postgres, will fail silently in SQLite)

Normally SQLite support some expression: https://sqlite.org/expridx.html
But I think index creation is just a best effort.

> - Allow to specify some backend specific customization of the index

For me as long as a DBA can modify (drop/create using the same name) it
for fine tuning, I think we should not need to support too much
specific.
I would say that only the method is a specific that is good to support
but it is mainly related to the kind of operation for which the index it
created.

* Cédric Krier [2021-02-27 00:24 +0100]:
>
>Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
>On 2021-02-26 18:59, Nicolas Évrard wrote:
>> I have made the proposal in the review which is kind of a mix of all the ideas in this issue.
>>
>> - Definition of the index in a specific attribute in \_\_setup\_\_
>
>I do not think it is correct to store a backend specific object on
>the Model.

I don't see a problem with that.
It's not like we could change from one backend to another easily.

>Indeed by looking at syntax of both backends they share almost the
>same syntax/parameters.

SQLite is like the minimum of index creation.
I also looked at MariaDB and Oracle's version of CREATE INDEX (that's
why 'WHERE' is in the kwargs instead of the arguments). The similarity
of the syntax is kind of an artefact of SQLite simplicity.

>I think that some of the PG parameter are too specific to require our
>support like:
>
> - CONCURRENTLY: not really useful for trytond-admin as it is not
> supposed to be concurrent
> - ONLY: not supported by all PG version and only useful for
> partitioned table which could exist only if the DBA create the
> schema and so he could create the index also
> - INCLUDE: could be general argument that is ignored by sqlite
> - WITH: it seems to me that it is very specific and only DBA could
> choose them.
>
>So I think we could have an object in modelsql.py like the Constraint
>that store the index definition to be passed to the table handler.

First I think that the TableHandler class is already way too big. It's
why I chose to make another class.

Then the thing is that if we ever support another database there might
by another parameter that become useful (eg FULLTEXT for MariaDB) so I
opted for the strict minimum in the arguments and the rest in the
kwargs rather than a kind of rosetta stone that tries to cover
some use cases. Mainly because having the kwargs prefixed with the
database name allows to know which feature is enabled for which
backend (instead of not supporting it somehow silently).

>The only missing point is to index method to use. We could probably
>use the PG name and just make SQL support only btree and hash.

Methods are there for PG. SQLite do not support those AFAIK.

>> - Allowing to define index on expression (works in postgres, will
>> fail silently in SQLite)
>
>Normally SQLite support some expression: https://sqlite.org/expridx.html

Indeed, you're right.

>But I think index creation is just a best effort.

>> - Allow to specify some backend specific customization of the index
>
>For me as long as a DBA can modify (drop/create using the same name) it
>for fine tuning, I think we should not need to support too much
>specific.

To be honest I thought that only USING / WHERE might be used sometimes
but since it was not difficult to add the others I thought that I
would do so so that we can decide.

>I would say that only the method is a specific that is good to
>support but it is mainly related to the kind of operation for which
>the index it created.

I don't understand. Are you saying that you don't think the people
should specify the method but they should rather define the operation?

On 2021-02-27 11:12, Nicolas Évrard wrote:
> >> I have made the proposal in the review which is kind of a mix of all the ideas in this issue.
> >>
> >> - Definition of the index in a specific attribute in \_\_setup\_\_
> >
> >I do not think it is correct to store a backend specific object on
> >the Model.
>
> I don't see a problem with that.
> It's not like we could change from one backend to another easily.

It just makes the object unpractical.
The Model definition can not depend on the backend. It must be the same
no matter the backend because it is an abstraction. Otherwise a model is
linked to a specific backend.

> >Indeed by looking at syntax of both backends they share almost the
> >same syntax/parameters.
>
> SQLite is like the minimum of index creation.
> I also looked at MariaDB and Oracle's version of CREATE INDEX (that's
> why 'WHERE' is in the kwargs instead of the arguments). The similarity
> of the syntax is kind of an artefact of SQLite simplicity.

There is no point to have arguments that depend on the backend. They can
never be used when defining a Model because it does not depend on the
backend.

> >I think that some of the PG parameter are too specific to require our
> >support like:
> >
> > - CONCURRENTLY: not really useful for trytond-admin as it is not
> > supposed to be concurrent
> > - ONLY: not supported by all PG version and only useful for
> > partitioned table which could exist only if the DBA create the
> > schema and so he could create the index also
> > - INCLUDE: could be general argument that is ignored by sqlite
> > - WITH: it seems to me that it is very specific and only DBA could
> > choose them.
> >
> >So I think we could have an object in modelsql.py like the Constraint
> >that store the index definition to be passed to the table handler.
>
> First I think that the TableHandler class is already way too big. It's
> why I chose to make another class.

I do not think it simplify anything to move two methods from one class
to another in the same file.

> Then the thing is that if we ever support another database there might
> by another parameter that become useful (eg FULLTEXT for MariaDB) so I
> opted for the strict minimum in the arguments and the rest in the
> kwargs rather than a kind of rosetta stone that tries to cover
> some use cases. Mainly because having the kwargs prefixed with the
> database name allows to know which feature is enabled for which
> backend (instead of not supporting it somehow silently).

The goal is to have an abstraction on index definition. This is not an
abstraction if you have to care about which parameter are used by which
backend etc.

> >The only missing point is to index method to use. We could probably
> >use the PG name and just make SQL support only btree and hash.
>
> Methods are there for PG. SQLite do not support those AFAIK.

SQLite supports only one type of index. I think it is a kind of btree so
it works for equality and sort.
But I really think it is not good to have to define the method to use
but we must define which properties we need for the index and the
backend should choose the type that works the best.

> >> - Allow to specify some backend specific customization of the index
> >
> >For me as long as a DBA can modify (drop/create using the same name) it
> >for fine tuning, I think we should not need to support too much
> >specific.
>
> To be honest I thought that only USING / WHERE might be used sometimes
> but since it was not difficult to add the others I thought that I
> would do so so that we can decide.

We must work the other way, define a minimal requirement and then
expand.

> >I would say that only the method is a specific that is good to
> >support but it is mainly related to the kind of operation for which
> >the index it created.
>
> I don't understand. Are you saying that you don't think the people
> should specify the method but they should rather define the operation?

Indeed, I can see this kind of operations:

- equality
- range (<, >, etc)
- begin pattern
- 2d or vector
- multi (inverted)

For postgresql it would result in:

['equality'] -> HASH
['equality', 'range'] -> BTREE
['begin pattern'] -> BTREE
['vector'] -> GIST
['multi'] -> GIN

Now if we have ['range', 'vector'], this may be difficult to choose. We
have three options:

- create no index
- create multi index for each group of cases
- create one index as best effort

I think we should do the last one, just find the type of index that
provide the most of operations (+ internal preferences for equality ex:
we prefer BTREE over HASH).

Missatge de Cédric Krier <bugs@tryton.org> del dia dv., 26 de febr. 2021 a
les 23:11:

>
> Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
> On 2021-02-26 23:02, Albert Cervera i Areny wrote:
> > I don't seem to find a way to easily remove an index created by one of
> the core modules. Am I right?
> >
> > I think we need a mechanism to allow removing the index or that the
> trytond-admin --all does not rebuild it.
>
> I do not think so. Just like we can not remove required, there is no
> point to remove an index. Even less if we create more carefully default
> indexes.
>

"Required" is very different from "Index" because the former is part of the
business logic and constraints. Tryton may even have undesirable behaviour
if a required constraint is removed.

Index influences performance "only" which can differ from one database to
another, simply because of its size or the amount of records on one table
or another.

>
> ______________________________________
> Tryton issue tracker <bugs@tryton.org>
> <https://bugs.tryton.org/#5757>
> ______________________________________
>

--

<https://www.nan-tic.com>

Albert Cervera

Tel. (+34) 935 531 803

<https://www.linkedin.com/company/nan-tic/>

Pot consultar la informació sobre el tractament que realitzem de les dades
de caràcter personal de conformitat amb l’article 11 LOPD aquí
<https://www.nan-tic.com/lopd>.

* Cédric Krier [2021-02-27 11:45 +0100]:
>
>Cédric Krier <cedric.krier@b2ck.com> added the comment:
>
>On 2021-02-27 11:12, Nicolas Évrard wrote:
>> >> I have made the proposal in the review which is kind of a mix of
>> >> all the ideas in this issue.
>> >>
>> >> - Definition of the index in a specific attribute in \_\_setup\_\_
>> >
>> >I do not think it is correct to store a backend specific object on
>> >the Model.
>>
>> I don't see a problem with that.
>> It's not like we could change from one backend to another easily.
>
>It just makes the object unpractical.
>The Model definition can not depend on the backend.

From my point of view the Model definition does not depend on the
backend even with this addition.

Because it is only when the definition is realized (ie executed), that
the index it contains turn into their implementations.

>It must be the same no matter the backend because it is an
>abstraction. Otherwise a model is linked to a specific backend.

The model is linked to the specific backend only when the code is
executed. It's a transient state.

>> >Indeed by looking at syntax of both backends they share almost the
>> >same syntax/parameters.
>>
>> SQLite is like the minimum of index creation.
>> I also looked at MariaDB and Oracle's version of CREATE INDEX (that's
>> why 'WHERE' is in the kwargs instead of the arguments). The similarity
>> of the syntax is kind of an artefact of SQLite simplicity.
>
>There is no point to have arguments that depend on the backend. They can
>never be used when defining a Model because it does not depend on the
>backend.

In a perfect world there would be no index information in the models
and this work would be left to DBAs.

Indexes are not modeling information just performance enhancers. So
since we are compromising with the purity of the model we might as
well make this explicit.

>> >So I think we could have an object in modelsql.py like the Constraint
>> >that store the index definition to be passed to the table handler.
>>
>> First I think that the TableHandler class is already way too big. It's
>> why I chose to make another class.
>
>I do not think it simplify anything to move two methods from one class
>to another in the same file.

At least it's a start. Having those backend information structured
instead of having everything packed in the same class seems like a
good idea.

>> Then the thing is that if we ever support another database there might
>> by another parameter that become useful (eg FULLTEXT for MariaDB) so I
>> opted for the strict minimum in the arguments and the rest in the
>> kwargs rather than a kind of rosetta stone that tries to cover
>> some use cases. Mainly because having the kwargs prefixed with the
>> database name allows to know which feature is enabled for which
>> backend (instead of not supporting it somehow silently).
>
>The goal is to have an abstraction on index definition. This is not an
>abstraction if you have to care about which parameter are used by which
>backend etc.
>
>> >The only missing point is to index method to use. We could probably
>> >use the PG name and just make SQL support only btree and hash.
>>
>> Methods are there for PG. SQLite do not support those AFAIK.
>
>SQLite supports only one type of index. I think it is a kind of btree so
>it works for equality and sort.
>But I really think it is not good to have to define the method to use
>but we must define which properties we need for the index and the
>backend should choose the type that works the best.

I don't think that creating another abstraction will help. It will
just add another indirection and the mapping between our "langage" and
the different indexes implementation will be awkward.

Moreover it will add another point where beginners will have to learn
how we do stuffs and make the learning curve steeper.

>> >I would say that only the method is a specific that is good to
>> >support but it is mainly related to the kind of operation for which
>> >the index it created.
>>
>> I don't understand. Are you saying that you don't think the people
>> should specify the method but they should rather define the operation?
>
>Indeed, I can see this kind of operations:
>
> - equality
> - range (<, >, etc)
> - begin pattern
> - 2d or vector
> - multi (inverted)
>
>For postgresql it would result in:
>
> ['equality'] -> HASH
> ['equality', 'range'] -> BTREE
> ['begin pattern'] -> BTREE
> ['vector'] -> GIST
> ['multi'] -> GIN

I think this is trying to be more clever than what the feature
requires.

>Now if we have ['range', 'vector'], this may be difficult to choose.
>We have three options:
>
> - create no index
> - create multi index for each group of cases
> - create one index as best effort
>
>I think we should do the last one, just find the type of index that
>provide the most of operations (+ internal preferences for equality
>ex: we prefer BTREE over HASH).

This is precisely the kind of difficulty I want to avoid so that we
don't have to make a guess. If we let the people decide according to
their knowledge of their Model than this question is not there at all.

On 2021-02-27 13:31, Nicolas Évrard wrote:
> >On 2021-02-27 11:12, Nicolas Évrard wrote:
> >> >> I have made the proposal in the review which is kind of a mix of
> >> >> all the ideas in this issue.
> >> >>
> >> >> - Definition of the index in a specific attribute in \_\_setup\_\_
> >> >
> >> >I do not think it is correct to store a backend specific object on
> >> >the Model.
> >>
> >> I don't see a problem with that.
> >> It's not like we could change from one backend to another easily.
> >
> >It just makes the object unpractical.
> >The Model definition can not depend on the backend.
>
> From my point of view the Model definition does not depend on the
> backend even with this addition.

It is as soon as you store on it an object that are different depending
on the backend.

> Because it is only when the definition is realized (ie executed), that
> the index it contains turn into their implementations.

No because the class which store the index object is different per
backend. It is not the execution that is different.

> >It must be the same no matter the backend because it is an
> >abstraction. Otherwise a model is linked to a specific backend.
>
> The model is linked to the specific backend only when the code is
> executed. It's a transient state.

It is not when the code is executed, it is directly at the __setup__.
So yes it is a transient state on the model and this is precisely why it
is a bad design. Model must be static.

> >> >Indeed by looking at syntax of both backends they share almost the
> >> >same syntax/parameters.
> >>
> >> SQLite is like the minimum of index creation.
> >> I also looked at MariaDB and Oracle's version of CREATE INDEX (that's
> >> why 'WHERE' is in the kwargs instead of the arguments). The similarity
> >> of the syntax is kind of an artefact of SQLite simplicity.
> >
> >There is no point to have arguments that depend on the backend. They can
> >never be used when defining a Model because it does not depend on the
> >backend.
>
> In a perfect world there would be no index information in the models
> and this work would be left to DBAs.

I do not agree with this statement. The DBA is the last resort but
developer create queries that must be optimized by indexes. So it is the
developer who knows what index must be created.
The DBA is there for the unattended queries.

> So since we are compromising with the purity of the model we might as
> well make this explicit.

I do not want to compromise on purity. Model must never depend directly
on the backend.

> >> >So I think we could have an object in modelsql.py like the Constraint
> >> >that store the index definition to be passed to the table handler.
> >>
> >> First I think that the TableHandler class is already way too big. It's
> >> why I chose to make another class.
> >
> >I do not think it simplify anything to move two methods from one class
> >to another in the same file.
>
> At least it's a start. Having those backend information structured
> instead of having everything packed in the same class seems like a
> good idea.

I see no improvement there, just backward compatibility breakage.
Also with this separation we loose the optimisation that prevent to run
index creation query if the index already exist. Of course there is the
"IF NOT EXISTS" but it is an extra query. (Now we could argue that
retrieving the indexes can be complex).
But as index is indeed a manipulation of the table (index can not be
cross-table), it is logical that the table handler handles them.

> >But I really think it is not good to have to define the method to use
> >but we must define which properties we need for the index and the
> >backend should choose the type that works the best.
>
> I don't think that creating another abstraction will help. It will
> just add another indirection and the mapping between our "langage" and
> the different indexes implementation will be awkward.

So with such reasoning let's just throw away the ORM, it is just an
indirection to SQL.
This is precisely the added value of the framework to provide a good
abstract to such thing.

> Moreover it will add another point where beginners will have to learn
> how we do stuffs and make the learning curve steeper.

I do not see why a beginner will know better the specific type of
indexes of PostgreSQL and in which case which one should be used.
The framework must add value!

> >> >I would say that only the method is a specific that is good to
> >> >support but it is mainly related to the kind of operation for which
> >> >the index it created.
> >>
> >> I don't understand. Are you saying that you don't think the people
> >> should specify the method but they should rather define the operation?
> >
> >Indeed, I can see this kind of operations:
> >
> > - equality
> > - range (<, >, etc)
> > - begin pattern
> > - 2d or vector
> > - multi (inverted)
> >
> >For postgresql it would result in:
> >
> > ['equality'] -> HASH
> > ['equality', 'range'] -> BTREE
> > ['begin pattern'] -> BTREE
> > ['vector'] -> GIST
> > ['multi'] -> GIN
>
> I think this is trying to be more clever than what the feature
> requires.

This is the added value of the framework.
Probably the naming and definition must be refined. We could also have
good default value depending of the type column used.

> >Now if we have ['range', 'vector'], this may be difficult to choose.
> >We have three options:
> >
> > - create no index
> > - create multi index for each group of cases
> > - create one index as best effort
> >
> >I think we should do the last one, just find the type of index that
> >provide the most of operations (+ internal preferences for equality
> >ex: we prefer BTREE over HASH).
>
> This is precisely the kind of difficulty I want to avoid so that we
> don't have to make a guess. If we let the people decide according to
> their knowledge of their Model than this question is not there at all.

Again this is the added value otherwise we just let developer create the
indexes themselves.

* Cédric Krier [2021-02-27 14:32 +0100]:

>> >The Model definition can not depend on the backend.
>>
>> From my point of view the Model definition does not depend on the
>> backend even with this addition.
>
>It is as soon as you store on it an object that are different
>depending on the backend.

What I mean by definition is what you read.
What you mean is what is executed.

>> Because it is only when the definition is realized (ie executed), that
>> the index it contains turn into their implementations.
>
>No because the class which store the index object is different per
>backend. It is not the execution that is different.

The class is different as soon as the python interpreter has executed
the __setup__. From my POV, it is the execution that is different.

>> >> >Indeed by looking at syntax of both backends they share almost the
>> >> >same syntax/parameters.
>> >>
>> >> SQLite is like the minimum of index creation.
>> >> I also looked at MariaDB and Oracle's version of CREATE INDEX (that's
>> >> why 'WHERE' is in the kwargs instead of the arguments). The similarity
>> >> of the syntax is kind of an artefact of SQLite simplicity.
>> >
>> >There is no point to have arguments that depend on the backend. They can
>> >never be used when defining a Model because it does not depend on the
>> >backend.
>>
>> In a perfect world there would be no index information in the models
>> and this work would be left to DBAs.
>
>I do not agree with this statement. The DBA is the last resort but
>developer create queries that must be optimized by indexes. So it is
>the developer who knows what index must be created.

It would be a priori knowledge that have to be confronted to reality.

So if we really want purity we shouldn't rely on those and let the
reality speak because in the end we don't know.

>> So since we are compromising with the purity of the model we might as
>> well make this explicit.
>
>I do not want to compromise on purity. Model must never depend directly
>on the backend.

We compromise on purity on a lot of things: using lists instead of
sets in a lot of places, the TableHandler class which could be split,
the 2 phase commit protocol, in the GTK client we do not adhere to
GTK's way of doing stuffs in some places, etc.

And it's not an issue, if we have good reasons to do so. And btw
purity could be restored by clearing the list after it has done its
job (although I don't see the point in doing it so I consider it a bad
idea).

>> >> First I think that the TableHandler class is already way too big. It's
>> >> why I chose to make another class.
>> >
>> >I do not think it simplify anything to move two methods from one class
>> >to another in the same file.
>>
>> At least it's a start. Having those backend information structured
>> instead of having everything packed in the same class seems like a
>> good idea.
>
>I see no improvement there, just backward compatibility breakage.

Yeah sometimes people update their code when there is a release. It's
not a big deal.

Moreover it will break anyway because we can not keep the same
signature for the function handling the index.

>Also with this separation we loose the optimisation that prevent to run
>index creation query if the index already exist. Of course there is the
>"IF NOT EXISTS" but it is an extra query. (Now we could argue that
>retrieving the indexes can be complex).

The database is probably way better than us at managing this also.

>But as index is indeed a manipulation of the table (index can not be
>cross-table), it is logical that the table handler handles them.

I don't say that it does something else than manipulating the table.
What I say is that there's something like 20 public methods it could
be way more simple.

>> >But I really think it is not good to have to define the method to use
>> >but we must define which properties we need for the index and the
>> >backend should choose the type that works the best.
>>
>> I don't think that creating another abstraction will help. It will
>> just add another indirection and the mapping between our "langage" and
>> the different indexes implementation will be awkward.
>
>So with such reasoning let's just throw away the ORM, it is just an
>indirection to SQL.

You're comparing apples and pears. The ORM is a central piece of the
framework but indeed it's not a perfect mapping and that's why there
is python-sql.

>This is precisely the added value of the framework to provide a good
>abstract to such thing.

What is the added value of such abstraction? What use case does it
solve?

>> Moreover it will add another point where beginners will have to learn
>> how we do stuffs and make the learning curve steeper.
>
>I do not see why a beginner will know better the specific type of
>indexes of PostgreSQL and in which case which one should be used.

Because he might know already SQL but not Tryton.

>> >> >I would say that only the method is a specific that is good to
>> >> >support but it is mainly related to the kind of operation for which
>> >> >the index it created.
>> >>
>> >> I don't understand. Are you saying that you don't think the people
>> >> should specify the method but they should rather define the operation?
>> >
>> >Indeed, I can see this kind of operations:
>> >
>> > - equality
>> > - range (<, >, etc)
>> > - begin pattern
>> > - 2d or vector
>> > - multi (inverted)
>> >
>> >For postgresql it would result in:
>> >
>> > ['equality'] -> HASH
>> > ['equality', 'range'] -> BTREE
>> > ['begin pattern'] -> BTREE
>> > ['vector'] -> GIST
>> > ['multi'] -> GIN
>>
>> I think this is trying to be more clever than what the feature
>> requires.
>
>This is the added value of the framework.

As said, I don't think it's the role of Tryton to add value wrt the
indexes. Our strength is somewhere else.

>Probably the naming and definition must be refined. We could also have
>good default value depending of the type column used.

And it's even more code for something that could be simple. With more
code comes more bugs. Moreover if there is a default how will it work
when it will be an index on multiple columns each with different
defaults?

I don't think we could / should do this.

>> >Now if we have ['range', 'vector'], this may be difficult to choose.
>> >We have three options:
>> >
>> > - create no index
>> > - create multi index for each group of cases
>> > - create one index as best effort
>> >
>> >I think we should do the last one, just find the type of index that
>> >provide the most of operations (+ internal preferences for equality
>> >ex: we prefer BTREE over HASH).
>>
>> This is precisely the kind of difficulty I want to avoid so that we
>> don't have to make a guess. If we let the people decide according to
>> their knowledge of their Model than this question is not there at all.
>
>Again this is the added value otherwise we just let developer create the
>indexes themselves.

Indeed it's somehow the idea: allow to create the indexes themselves
while at the same time being kind of database independent.

For me it is a no go as long as the index definition is not designed like the Constraint in ModelSQL.

review361251002 updated at https://codereview.tryton.org/361251002/#ps371671002

review361251002 updated at https://codereview.tryton.org/361251002/#ps373491003

review361251002 updated at https://codereview.tryton.org/361251002/#ps352681002

review361251002 updated at https://codereview.tryton.org/361251002/#ps365911002

review361251002 updated at https://codereview.tryton.org/361251002/#ps387331002

review361251002 updated at https://codereview.tryton.org/361251002/#ps378241002

review361251002 updated at https://codereview.tryton.org/361251002/#ps386021002

review361251002 updated at https://codereview.tryton.org/361251002/#ps386211002

review361251002 updated at https://codereview.tryton.org/361251002/#ps419151003

review361251002 updated at https://codereview.tryton.org/361251002/#ps413311003

review361251002 updated at https://codereview.tryton.org/361251002/#ps418231008

I'm working on removing the select=True in all modules and here are the rules I try to follow when define an Index:

for SQL queries that are build in the module (using as most expression as possible)
for name, code, number and reference for Similarity
for MPTT left and right for Range

also I always try to set first the column/expression which have the most different values.

I do not define indexes for:

active nor company fields (because they are not enough discriminatory)
on boolean nor selection field (except if it is a partial)
for xxx2Many because they are defined automatically

assigned to @ced and unassigned @nicoe

added 1 deleted label and removed 1 deleted label

review361251002 updated at https://codereview.tryton.org/361251002/#ps424011003

In the last version, I ensured that the index defined on a ModelSQL is always created on its table by using the TableHandler to create the indexes (like we do for constraint). Also now any former index with the pattern idx_* or *_index are dropped if they are no more defined. This will ease the maintenance and ensure to always have the latest setup of indexes. But it will still be possible to customize the database by creating custom indexes manually and give them a different name.

Now it is ready for testing.

added 1 deleted label and removed 1 deleted label

review361251002 updated at https://codereview.tryton.org/361251002/#ps418291003

Download	Creator	Timestamp	Type
unnamed	@albertca	2016-07-29 15:03:20.798000 UTC	text/plain
unnamed	@albertca	2016-07-29 15:07:18.506000 UTC	text/plain

Make better index

Files

Child items 0

Activity