Tryton - Issues



Title HTML richtext encoding detection
Priority bug Status resolved
Superseder Nosy List adrien.benduc, ced, jcavallo, reviewbot, roundup-bot
Type behavior Components tryton
Assigned To ced Keywords review
Reviews 39141002
View: 39141002

Created on 2015-12-15.10:40:06 by adrien.benduc, last changed by roundup-bot.

New changeset 36c65f90c38f by Cédric Krier in branch 'default':
Try some standard encoding and fallback to ascii for HTML deserialisation
review39141002 updated at
review39141002 updated at
msg38295 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2018-02-10.19:41:59
I introduced chardet for msg22484 but it seems now it was not a good idea as it is not 100% reliable.
The GTK documentation says that we receive a guint8 but nothing about the encoding. Normally in GTK, strings are in UTF-8 but msg22384 shows that we can at least receive UTF-16 (from clipboard). So I propose to try to decode with system encoding and all UTFs and finally as ascii with replace on error.
This way, we have a deterministic process and we could extend it if we find system with other encoding.
So here is review39141002
msg34580 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2017-07-13.09:32:02
msg24413 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2016-02-28.20:36:22
It should be reported to chardet.
msg23359 (view) Author: [hidden] (adrien.benduc) Date: 2015-12-15.11:25:46
Right, I'll known next time
msg23358 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2015-12-15.11:23:03
Please don't set me in copy if I don't ask.
msg23357 (view) Author: [hidden] (adrien.benduc) Date: 2015-12-15.10:40:05
The chardet module used into the deserialize function (from to detect the text encoding returns ISO-XXX[...] when adding an accented character.
The only way to make chardet detects that the text should be UTF-8 encoded is to have at least 4 accented character  into the html text('é' for instance) declared as example into the
For now, I force to use 'UTF-8' encoding to solve my problem.
Date User Action Args
2018-02-19 19:10:39roundup-botsetstatus: testing -> resolved
nosy: + roundup-bot
messages: + msg38477
2018-02-11 01:27:41reviewbotsetmessages: + msg38298
2018-02-10 19:55:13reviewbotsetnosy: + reviewbot
messages: + msg38296
2018-02-10 19:41:59cedsetstatus: deferred -> testing
reviews: 39141002
messages: + msg38295
keyword: + review
assignedto: ced
2017-07-13 09:32:02cedsetmessages: + msg34580
2016-03-19 15:41:00cedsetcomponent: + tryton
2016-02-28 20:36:23cedsetstatus: chatting -> deferred
nosy: + ced
messages: + msg24413
2015-12-15 11:25:46adrien.benducsetmessages: + msg23359
2015-12-15 11:23:12cedsetnosy: - ced
2015-12-15 11:23:03cedsetstatus: unread -> chatting
nosy: ced, jcavallo, adrien.benduc
messages: + msg23358

Showing 10 items. Show all history (warning: this could be VERY long)