For now, tryton uses a format quite close to HTML but still not valid.
In order to develop the richtext widget on sao, we must converge to a common standard for both client. I think the best is a minimal subset of the HTML to support the current features matching what browsers generate using the execCommand.
- For alignment: we use <div align="...">...</div>
So we will have one div per line (browser doesn't necessary create one for the first line).
- For bold, underline and italic: we use <b> <u> and <i> tags
- For the size, we use the <font size="1-7">...</font>
- For the font family, we use <font face="...">...</font> and restrict to common families ('normal', 'sans', 'serif', 'monospace').
- For the foreground color: we use <font color="#123">...</font>
Unfortunately there is no HTML standard for the background color except via css.
I think we can just drop this feature for now until someone propose a patch.
I tested WYSWYG reviews in GTK client. Here I describe 4 scenarios with errors:
= Scenario 1 =
1- In form view, click in Text box (with richtext widget).
2.A - In corner window, increase/decrease window size
2.B - Save with None data in richtext field.
2.C - Copy text from LibreOffice and paste in richtext field
Traceback (most recent call last):
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 218, in deserialize
text, tags = parse_markup(normalize_markup(text, method='xml'))
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
ParseError: not well-formed (invalid token): line 1, column 19
= Scenario 2 =
1- In form view, click in Text box (with richtext widget).
2- Write a line (only some characters without key or new line) and tab key.
Traceback (most recent call last):
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 218, in deserialize
text, tags = parse_markup(normalize_markup(text, method='xml'))
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
ParseError: not well-formed (invalid token): line 1, column 24
= Scenario 3 =
1- In form view, click in Text box (with richtext widget).
2- Copy text from tryton.org and paste in richtext field
Traceback (most recent call last):
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 218, in deserialize
text, tags = parse_markup(normalize_markup(text, method='xml'))
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 45, in _strip_newline
return u''.join(text.splitlines())
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
= Scenario 4 =
Color tool.
1- Select some characters, and mark a colour.
2- Select more characters, and mark new colour.
GTK client crash (kill) with error:
/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py:320: GtkWarning: Invalid text buffer iterator: either the iterator is uninitialized, or the characters/pixbufs/widgets in the buffer have been modified since the iterator was created.
You must use marks, character numbers, or line numbers to preserve a position across buffer modifications.
You can apply tags and insert marks without invalidating your iterators,
but any mutation that affects 'indexable' buffer contents (contents that can be referred to by character offset)
will invalidate all outstanding iterators
tags.update(iter_.get_tags())
/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py:321: GtkWarning: Invalid text buffer iterator: either the iterator is uninitialized, or the characters/pixbufs/widgets in the buffer have been modified since the iterator was created.
You must use marks, character numbers, or line numbers to preserve a position across buffer modifications.
You can apply tags and insert marks without invalidating your iterators,
but any mutation that affects 'indexable' buffer contents (contents that can be referred to by character offset)
will invalidate all outstanding iterators
iter_.forward_char()
Violació de segment (bolcat de la imatge del nucli)
1. div or paragraph?
At now, each line generate a new div
Example:
It is a demo
and new line
It is a new paragraph
and new line
HTML generated is:
<div>It is a demo</div><div>and new line</div><div><br></div><div>It is a new paragraph</div><div>and new line</div>
Why not?
<p>It is a demo</br/>and new line</p><p>It is a new paragraph<br/>and new line</p>
2- Strong or bold?
Example:
It is a <b>demo</b><
Why not?
It is a <strong>demo</strong>
3. List icon
It's missing list option to create lists:
<ul>
<li>Demo</li>
</ul>
4. Extra
Some WYSWYG editors, could configure toolbar options (bold, align, color...) to active/permision with users. For example, dissable color toolbar in some users (by groups or users) because color is not formated by WYSWYG (color defned by CSS).
On 2015-09-15 10:00, Raimon Esteve wrote:
> = Scenario 1 =
>
> 1- In form view, click in Text box (with richtext widget).
> 2.A - In corner window, increase/decrease window size
> 2.B - Save with None data in richtext field.
> 2.C - Copy text from LibreOffice and paste in richtext field
>
> Traceback (most recent call last):
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 218, in deserialize
> text, tags = parse_markup(normalize_markup(text, method='xml'))
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
> root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
> parser.feed(text)
> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
> self._raiseerror(v)
> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
> raise err
> ParseError: not well-formed (invalid token): line 1, column 19
On 2015-09-15 10:10, Raimon Esteve wrote:
> Hie,
>
> 1. div or paragraph?
> At now, each line generate a new div
>
> Example:
>
> It is a demo
> and new line
>
> It is a new paragraph
> and new line
>
> HTML generated is:
>
> <div>It is a demo</div><div>and new line</div><div><br></div><div>It is a new paragraph</div><div>and new line</div>
>
> Why not?
>
> <p>It is a demo</br/>and new line</p><p>It is a new paragraph<br/>and new line</p>
Because I'm not developping an HTML editor. So it uses the minimal
common part of all clients.
More over you are expecting that we imagine the semantic out of plain text.
> 2- Strong or bold?
>
> Example:
>
> It is a <b>demo</b><
>
> Why not?
>
> It is a <strong>demo</strong>
Because it is basic html which is the one generated by browser using
execCommand.
> 3. List icon
>
> It's missing list option to create lists:
>
> <ul>
> <li>Demo</li>
> </ul>
Patch is welcome.
> 4. Extra
>
> Some WYSWYG editors, could configure toolbar options (bold, align, color...) to active/permision with users. For example, dissable color toolbar in some users (by groups or users) because color is not formated by WYSWYG (color defned by CSS).
On 2015-09-15 10:10, Raimon Esteve wrote:
> Hie,
>
> 1. div or paragraph?
> At now, each line generate a new div
>
> Example:
>
> It is a demo
> and new line
>
> It is a new paragraph
> and new line
>
> HTML generated is:
>
> <div>It is a demo</div><div>and new line</div><div><br></div><div>It is a new paragraph</div><div>and new line</div>
>
> Why not?
>
> <p>It is a demo</br/>and new line</p><p>It is a new paragraph<br/>and new line</p>
Because I'm not developping an HTML editor. So it uses the minimal
common part of all clients.
More over you are expecting that we imagine the semantic out of plain text.
> 2- Strong or bold?
>
> Example:
>
> It is a <b>demo</b><
>
> Why not?
>
> It is a <strong>demo</strong>
Because it is basic html which is the one generated by browser using
execCommand.
> 3. List icon
>
> It's missing list option to create lists:
>
> <ul>
> <li>Demo</li>
> </ul>
Patch is welcome.
> 4. Extra
>
> Some WYSWYG editors, could configure toolbar options (bold, align, color...) to active/permision with users. For example, dissable color toolbar in some users (by groups or users) because color is not formated by WYSWYG (color defned by CSS).
On 2015-09-15 10:00, Raimon Esteve wrote:
> = Scenario 3 =
>
> 1- In form view, click in Text box (with richtext widget).
> 2- Copy text from tryton.org and paste in richtext field
>
> Traceback (most recent call last):
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 218, in deserialize
> text, tags = parse_markup(normalize_markup(text, method='xml'))
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
> root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 45, in _strip_newline
> return u''.join(text.splitlines())
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
On 2015-09-15 10:00, Raimon Esteve wrote:
> = Scenario 2 =
>
> 1- In form view, click in Text box (with richtext widget).
> 2- Write a line (only some characters without key or new line) and tab key.
>
> Traceback (most recent call last):
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 218, in deserialize
> text, tags = parse_markup(normalize_markup(text, method='xml'))
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
> root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
> parser.feed(text)
> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
> self._raiseerror(v)
> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
> raise err
> ParseError: not well-formed (invalid token): line 1, column 24
>> Scenario 1
> Please provide the HTML copied from LibreOffice.
It's a basic ODT with a title and paragraph, but I attach it to test. After that, related scenario 3, users could copy complex ODT formating style (styles, center, list, images?...) and paste into richtext field.
>> Scenario 3
> It shold be fixed with last version.
When copy text from website, in clipboard there are html characters and cant decode. You could try when copy text from home page www.tryton.org
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 219, in deserialize
text = text.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
'utf8' codec can't decode byte 0xff in position 0: invalid start byte
>> Scenario 2
> It should be fixed in last patchset.
On 2015-09-15 10:00, Raimon Esteve wrote:
> = Scenario 4 =
>
> Color tool.
>
> 1- Select some characters, and mark a colour.
> 2- Select more characters, and mark new colour.
>
> GTK client crash (kill) with error:
>
> /home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py:320: GtkWarning: Invalid text buffer iterator: either the iterator is uninitialized, or the characters/pixbufs/widgets in the buffer have been modified since the iterator was created.
> You must use marks, character numbers, or line numbers to preserve a position across buffer modifications.
> You can apply tags and insert marks without invalidating your iterators,
> but any mutation that affects 'indexable' buffer contents (contents that can be referred to by character offset)
> will invalidate all outstanding iterators
> tags.update(iter_.get_tags())
> /home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py:321: GtkWarning: Invalid text buffer iterator: either the iterator is uninitialized, or the characters/pixbufs/widgets in the buffer have been modified since the iterator was created.
> You must use marks, character numbers, or line numbers to preserve a position across buffer modifications.
> You can apply tags and insert marks without invalidating your iterators,
> but any mutation that affects 'indexable' buffer contents (contents that can be referred to by character offset)
> will invalidate all outstanding iterators
> iter_.forward_char()
> Violació de segment (bolcat de la imatge del nucli)
On 2015-09-15 13:11, Raimon Esteve wrote:
>
> Raimon Esteve <resteve@zikzakmedia.com> added the comment:
>
> >> Scenario 1
> > Please provide the HTML copied from LibreOffice.
>
> It's a basic ODT with a title and paragraph, but I attach it to test. After that, related scenario 3, users could copy complex ODT formating style (styles, center, list, images?...) and paste into richtext field.
I don't care about the odt, it is only the text received by deserialize.
> >> Scenario 3
> > It shold be fixed with last version.
>
> When copy text from website, in clipboard there are html characters and cant decode. You could try when copy text from home page www.tryton.org
>
> File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 219, in deserialize
> text = text.decode('utf-8')
> File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
>
> 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
I don't know what to do. Apparently we receive UTF-16 encoded data but
nothing tells me how I can know that. Normally in GTK everything should
be str encoded in UTF-8.
ERROR:tryton.common.common:Traceback (most recent call last):
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 222, in deserialize
text, tags = parse_markup(normalize_markup(text, method='xml'))
File "/home/resteve/virtualenv/try37richeditor/tryton/tryton/common/htmltextbuffer.py", line 85, in normalize_markup
root = ET.fromstring(_markup(_replace_br(_strip_newline(markup_text))))
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
ParseError: not well-formed (invalid token): line 1, column 10
not well-formed (invalid token): line 1, column 10
I don't want to fix such input data because it is not a valid HTML, you can not have div outside body. Also it is not a full HTML document that can be interpreted but just a subset of HTML.
> I don't want to fix such input data because it is not a valid HTML, you can not have div outside body. Also it is not a full HTML document that can be interpreted but just a subset of HTML.
When user copy text from a website, don't know if is html (subset or complet) or not.
Sometimes, user copy a fragment from a FAQ, from bank info, etc. Not all document HTML (I tested a complet html doc and same error when paste in richtext field.
An idea is clean html tags and copy in textbox with text raw.
On 2015-09-15 21:26, Raimon Esteve wrote:
> > I don't want to fix such input data because it is not a valid HTML, you can not have div outside body. Also it is not a full HTML document that can be interpreted but just a subset of HTML.
>
> When user copy text from a website, don't know if is html (subset or complet) or not.
> Sometimes, user copy a fragment from a FAQ, from bank info, etc. Not all document HTML (I tested a complet html doc and same error when paste in richtext field.
>
> An idea is clean html tags and copy in textbox with text raw.
This is already what the widget does but we can not clean something that
is not HTML.
For the copy/paset from LibreOffice, indeed I checked and the content is a real text/html starting with DOCTYPE. But ElementTree can not parse such raw HTML because it is not a valid XML, so I think I should use HTMLParser to normalize the HTML into XML (parsable by ET) (this will replace the _replace_br).