Title HTML richtext encoding detection
Created on 2015-12-15.10:40:06

New changeset 36c65f90c38f by Cédric Krier in branch 'default':
Try some standard encoding and fallback to ascii for HTML deserialisation
Date: 2018-02-10.19:41:59
I introduced chardet for msg22484 but it seems now it was not a good idea as it is not 100% reliable.
The GTK documentation says that we receive a guint8 but nothing about the encoding. Normally in GTK, strings are in UTF-8 but msg22384 shows that we can at least receive UTF-16 (from clipboard). So I propose to try to decode with system encoding and all UTFs and finally as ascii with replace on error.
This way, we have a deterministic process and we could extend it if we find system with other encoding.
So here is review39141002
Date: 2017-07-13.09:32:02
Date: 2016-02-28.20:36:22
It should be reported to chardet.
Date: 2015-12-15.11:25:46
Right, I'll known next time
Date: 2015-12-15.11:23:03
Please don't set me in copy if I don't ask.
Date: 2015-12-15.10:40:05
The chardet module used into the deserialize function (from to detect the text encoding returns ISO-XXX[...] when adding an accented character.
The only way to make chardet detects that the text should be UTF-8 encoded is to have at least 4 accented character  into the html text('é' for instance) declared as example into the
For now, I force to use 'UTF-8' encoding to solve my problem.
