Tryton - Issues

 

Issue8279

Title UnicodeError when importing statment with accents
Priority bug Status resolved
Superseder Nosy List ced, pokoli, reviewbot, roundup-bot
Type crash Components account_statement_aeb43
Assigned To pokoli Keywords review
Reviews 253341002
View: 253341002

Created on 2019-04-15.10:00:14 by pokoli, last changed by roundup-bot.

Messages
New changeset f5b5196daa71 by Sergi Almacellas Abellana in branch 'default':
Use iso-8859-1 as default encoding
https://hg.tryton.org/tryton-env/rev/f5b5196daa71
New changeset 252fa5d54f56 by Sergi Almacellas Abellana in branch 'default':
Use iso-8859-1 as default encoding
https://hg.tryton.org/modules/account_statement_aeb43/rev/252fa5d54f56
msg49039 (view) Author: [hidden] (pokoli) (Tryton committer) (Tryton translator) Date: 2019-04-15.13:42:44
https://downloads.tryton.org/standars/aeb43.pdf, page 6, section 1.2 Soporte en formato ASCII. When it says:

 "Código   ASCII   (en   mayúsculas)   (carácter   165=Ñ)   (tabla   recomendada   T1000850) (Personal computer: multilingual)"
msg49038 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2019-04-15.13:36:43
Could you point to the documentation for the record?
msg49037 (view) Author: [hidden] (pokoli) (Tryton committer) (Tryton translator) Date: 2019-04-15.13:00:41
The files follow the standard, the problem is that we understood wrongly the standard as the documentation is not so clear. 

It says "ASCII (with 165 as Ñ)" but ASCII is only limited to 127 characters.

It should say ISO-8859-1 which is the codification that has Ñ as 165 numeric format.
msg49036 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2019-04-15.12:48:13
There is a standard that says the encoding is ASCII. Why do you receive non-ASCII encoding?
chardet is not 100% reliable.

"it works correctly" is not a proper way to solve it. We must first understand what is going on. Why is those files not following the standard?
msg49034 (view) Author: [hidden] (pokoli) (Tryton committer) (Tryton translator) Date: 2019-04-15.12:33:46
Do not understand the first two questions of msg49032.

I've used chardet to detect the encoding of the files of the files from two diferent spanish banks and it detected iso-8859-1. I also tested the same file which crashes on ascii with iso-8859-1 and it works correctly.
msg49032 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2019-04-15.12:06:57
So how is creating non-conformant file? Why are they doing that? How do you know their encoding?
msg49031 (view) Author: [hidden] (pokoli) (Tryton committer) (Tryton translator) Date: 2019-04-15.11:43:00
The official documentation says ASCII but with Ñ as character 165, which is not pure ascii but iso-8859-1 (ascii extended).

As ascii is a subset of iso-8859-1, it works well when no accents are used but whenever you use a file with accents (or Ñ which is also used in spain) it crashes with the posted traceback.
msg49030 (view) Author: [hidden] (ced) (Tryton committer) (Tryton translator) Date: 2019-04-15.10:24:09
I do not understand the rational. What is the official standard encoding for aeb43? Why was it ASCII by default?
review253341002 updated at https://codereview.tryton.org/253341002/#ps261331002
msg49028 (view) Author: [hidden] (pokoli) (Tryton committer) (Tryton translator) Date: 2019-04-15.10:00:13
When importing an aeb43 statement file with accent I get the following traceback:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 1486: ordinal not in range(128)
  File "trytond/protocols/dispatcher.py", line 176, in _dispatch
    result = rpc.result(meth(*c_args, **c_kwargs))
  File "trytond/wizard/wizard.py", line 287, in execute
    return wizard._execute(state_name)
  File "trytond/wizard/wizard.py", line 313, in _execute
    do_result = do(action)
  File "trytond/modules/account_statement/statement.py", line 1052, in do_import_
    statements = list(getattr(self, 'parse_%s' % self.start.file_format)())
  File "trytond/modules/account_statement_aeb43/statement.py", line 37, in parse_aeb43
    file_ = file_.decode(encoding)

That's because the file is using the iso-8859-1 format.
History
Date User Action Args
2019-04-16 13:26:44roundup-botsetmessages: + msg49073
2019-04-16 13:26:39roundup-botsetstatus: testing -> resolved
nosy: + roundup-bot
messages: + msg49072
2019-04-15 13:42:45pokolisetmessages: + msg49039
2019-04-15 13:36:44cedsetmessages: + msg49038
2019-04-15 13:00:41pokolisetmessages: + msg49037
2019-04-15 12:48:13cedsetmessages: + msg49036
2019-04-15 12:33:46pokolisetmessages: + msg49034
2019-04-15 12:06:57cedsetmessages: + msg49032
2019-04-15 11:43:00pokolisetmessages: + msg49031
2019-04-15 10:24:09cedsetnosy: + ced
messages: + msg49030

Showing 10 items. Show all history (warning: this could be VERY long)