Firefox is getting more intelligent (than us)

Trihexagonal

Son of Beastie

Reaction score: 2,357
Messages: 2,977

I know if you use the right-click option to "View Page Source" in Firefox-ESR it will highlight in red XHTML errors I've made the W3C validator points out that I can't readily see by glancing over the page of markup.
 

memreflect

Well-Known Member

Reaction score: 221
Messages: 257

Well, that's certainly interesting. Neither the server nor the HTML declares a character encoding for the document, so nobody can blame Firefox incorrectly guessing it's ISO-8859-2 when you "Repair Text Encoding". In my opinion, the right option would have been to retain the menu and add the new menu item for old pages like this.
 

grahamperrin

Son of Beastie

Reaction score: 835
Messages: 2,693


Treated as UTF-8:

1632015621727.png

There's the menu option – Repair Text Encoding – however I don't expect it to be a panacea in cases such as this:

1632015990852.png


1632016386647.png

used to replace an incoming character whose value is unknown or unrepresentable in Unicode

… the right option would have been to retain the menu …

Which encoding would you have chosen for <http://gost.isi.edu/publications/kerberos-neuman-tso.html>?
 

memreflect

Well-Known Member

Reaction score: 221
Messages: 257

ISO-8859-1, ISO-8859-15, or Windows-1252. The text is in English, and many HTML pages written in English were published in one of those three encodings prior to the ubiquity of UTF-8 from what I've experienced. On my system, Firefox's "Repair Text Encoding" happened to choose ISO-8859-2 instead, rendering © as Š in the Copyright line. That's why I feel the menu should have been kept—in case Firefox guesses incorrectly. On the other hand, if it works for 95% of pages, and newer pages/servers declare the character encoding, then I could see why the menu might have been removed, so perhaps those pages with no character encoding should simply be considered incompatible with the modern web. After all, there ain't no such thing as plain text.
 

grahamperrin

Son of Beastie

Reaction score: 835
Messages: 2,693

Mozilla bug 1731482 - Repair Text Encoding: page(s) not properly repaired (compared to e.g. Firefox ESR)

Incidentally:

ISO-8859-1, ISO-8859-15, or Windows-1252.

– off-topic from Firefox, none of those have the required effect, for the given page, in Falkon.

Code:
% pkg info -x falkon ; uname -aKU ; freebsd-version -kru
falkon-3.1.0_1
FreeBSD mowa219-gjp4-8570p-freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #109 main-n249408-ff33e5c83fa: Thu Sep 16 01:11:04  2021     root@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG  amd64 1400033 1400033
14.0-CURRENT
14.0-CURRENT
14.0-CURRENT
%
 

Trihexagonal

Son of Beastie

Reaction score: 2,357
Messages: 2,977

Well, that's certainly interesting. Neither the server nor the HTML declares a character encoding for the document, so nobody can blame Firefox incorrectly guessing it's ISO-8859-2 when you "Repair Text Encoding". In my opinion, the right option would have been to retain the menu and add the new menu item for old pages like this.
I deleted the closing bracket for the title of the index.html page on my site, then opened it as a file in Firefox-ESR and clicked the "View Page Source" option.

It highlights the beginning of the error in red and the metatag underneath the error is highlighted in red. Encoding is charset=utf-8 and it's valid XHTML 1.0 Transitional:


view_src.png
 
OP
H

hruodr

Aspiring Daemon

Reaction score: 283
Messages: 896

I sent an e-mail to the developer of the extension for Thunderbird.
Thanks.

I never use add-ons, I do not trust them. I wonder how an elementary functionality
disappeared, but the lot of bloat functionality remains and increases.

It is terrible that there are few alternative browsers.
 

memreflect

Well-Known Member

Reaction score: 221
Messages: 257

Incidentally:
ISO-8859-1, ISO-8859-15, or Windows-1252.
– off-topic from Firefox, none of those have the required effect, for the given page, in Falkon.
I just tried Falkon and Otter Browser, and changing the character encoding does not affect the rendering for me on any web pages I've tried. While they both appear to refresh the page view, the page info still shows UTF-8 or unknown while the encoding menu indicates the character encoding I selected is active.

Selecting "Western" in www/firefox-esr renders the pages correctly, and the pages also display correctly in a terminal emulator with W3M when I change the encoding (= key to view page info where the character encoding of the page can be changed).

I took a look at the pages on a Chromebook because I don't feel like waiting for Chromium to build, and the pages rendered incorrectly there as well. That was easily fixed by installing the Set Character Encoding extension and selecting one of the encodings I mentioned (ISO-8859-1 is noticeably missing, but it was succeeded by ISO-8859-15 and Windows-1252 anyway).
 

memreflect

Well-Known Member

Reaction score: 221
Messages: 257

I deleted the closing bracket for the title of the index.html page on my site, then opened it as a file in Firefox-ESR and clicked the "View Page Source" option.

It highlights the beginning of the error in red and the metatag underneath the error is highlighted in red. Encoding is charset=utf-8 and it's valid XHTML 1.0 Transitional:
In SGML definitions of HTML (anything before XHTML 1.0 and "ISO HTML"), that would be an error as well, so I'm not sure what your point is here. Are you suggesting that invalid markup is the cause of the character encoding trouble being discussed in this thread?

Off-topic:
You could shorten things to <title>Your title here</><meta ...> and it would still be valid HTML, but most HTML parsers would have trouble with that and such usage is discouraged by the W3C and the W3C HTML validator anyway. The shortest valid HTML 4.01 Strict document (if you ignore the lack of a doctype) is <title//<p>. For more information about these SGML features that few browsers (if any) have implemented, SGML - Markup minimization (Wikipedia) and Understanding HTML and SGML (W3C) are two useful resources. I am glad XML, and consequently XHTML, simplified things significantly with crazy features like those!
 

astyle

Daemon

Reaction score: 674
Messages: 1,507

If Firefox were in fact intelligent, it wouldn't be so bloated to the point that just one tab takes up 900MB. I'm really grateful that FreeBSD forums are not addled with ads like other sites often are.
 

grahamperrin

Son of Beastie

Reaction score: 835
Messages: 2,693

This thread is a point of reference in the bug report. It might help to keep things on topic; text encoding edge cases that are not properly repaired by the repair feature.
 

astyle

Daemon

Reaction score: 674
Messages: 1,507

It is not possible to repair anything, it is only heuristics. At best they should bring back the menu.
Sometimes, they just hide the menu under some cute-looking button. 😩 Happens during nearly every update, and I have to play hide-and-seek all over again.
 

Trihexagonal

Son of Beastie

Reaction score: 2,357
Messages: 2,977

In SGML definitions of HTML (anything before XHTML 1.0 and "ISO HTML"), that would be an error as well, so I'm not sure what your point is here.
Let me try to explain it so you can understand, memreflect.

The topic of the Thread is "Firefox is getting more intelligent (than us)".

I followed that up with the second post to this tread with:

I know if you use the right-click option to "View Page Source" in Firefox-ESR it will highlight in red XHTML errors I've made the W3C validator points out that I can't readily see by glancing over the page of markup.

My point was to show how it was getting more intelligent and could identify markup errors in 'View Page Source" option to view the raw XHTML markup I might not readily see.

You followed that up with:

Well, that's certainly interesting. Neither the server nor the HTML declares a character encoding for the document, so nobody can blame Firefox incorrectly guessing it's ISO-8859-2 when you "Repair Text Encoding". In my opinion, the right option would have been to retain the menu and add the new menu item for old pages like this.
Which was an erroneous statement on your part. It does make a declaration of character encoding of "utf-8' in the xml version declaration preceding the DocType and in the metatag shown in my "View Page Source" screenshot.

Code:
<?xml version='1.1' encoding='utf-8'?>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8" />

You are the one who said it couldn't be blamed for incorrectly guessing ISO-8859-2 as the character encoding. From what exactly did you draw the conclusion from that it incorrectly "guessed" the character encoding?

Are you suggesting that invalid markup is the cause of the character encoding trouble being discussed in this thread?
I purposely deleted the closing bracket of the Title of my index.html page, loaded it in Firefox-ESR as a file and took a screen shot to document the claim I made of the ability of Firefox-ESR to "highlight in red XHTML errors".

Is that clear to you now? All my markup is valid XHTML 1.0 Transitional, and FYI, my CSS is valid CSS level 3 + SVG.

You could shorten things to <title>Your title here</><meta ...> and it would still be valid HTML, but most HTML parsers would have trouble with that and such usage is discouraged by the W3C and the W3C HTML validator anyway. The shortest valid HTML 4.01 Strict document (if you ignore the lack of a doctype) is <title//<p>.
It would not be valid XHTML (and if it's not valid XHTML it's not considered to be XHTML at all), the validation abilities of which was my addition to the thread topic of how Firefox is getting more intelligent (than us).


For more information about these SGML features that few browsers (if any) have implemented, SGML - Markup minimization (Wikipedia) and Understanding HTML and SGML (W3C) are two useful resources. I am glad XML, and consequently XHTML, simplified things significantly with crazy features like those!
For a more information in the differences in XHTML Versus HTML.
 
Top