dvl@
Developer
I'm trying to parse packagesite.yaml for reasons and I'm looking for coding help please.
I keep running into encoding issues. I've tried:
My proof-of-concept script is:
Using https://pkg.freebsd.org/FreeBSD:12:amd64/latest/packagesite.txz as the source file (see how I got it) as sample input:
To encounter one of these encoding issues:
That's the line for zh-auto-cn-l10n:
I keep running into encoding issues. I've tried:
- latin-1
- ascii
- utf-8
- ISO-8859-1
My proof-of-concept script is:
Python:
#!/usr/local/bin/python
import yaml
import io
import sys
line = sys.stdin.readline()
while line:
docs = yaml.load_all(line, Loader=yaml.FullLoader)
for doc in docs:
print(doc['name'], doc['version'])
line = sys.stdin.readline()
Using https://pkg.freebsd.org/FreeBSD:12:amd64/latest/packagesite.txz as the source file (see how I got it) as sample input:
Code:
$ head -1 packagesite.yaml | ~/bin/yaml-test-packages.stdin.all.line.by.line
py37-pyasn1-modules 0.2.7
To encounter one of these encoding issues:
Code:
$ head -14074 packagesite.yaml | tail -1 | ~/bin/yaml-test-packages.stdin.all.line.by.line
Traceback (most recent call last):
File "/usr/home/dan/bin/yaml-test-packages.stdin.all.line.by.line", line 10, in <module>
for doc in docs:
File "/usr/local/lib/python3.7/site-packages/yaml/__init__.py", line 127, in load_all
loader = Loader(stream)
File "/usr/local/lib/python3.7/site-packages/yaml/loader.py", line 24, in __init__
Reader.__init__(self, stream)
File "/usr/local/lib/python3.7/site-packages/yaml/reader.py", line 74, in __init__
self.check_printable(stream)
File "/usr/local/lib/python3.7/site-packages/yaml/reader.py", line 144, in check_printable
'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #xdcbc: special characters are not allowed
in "<unicode string>", position 1421
That's the line for zh-auto-cn-l10n:
Code:
$ grep -hn zh-auto-cn-l10n packagesite.yaml
14074:{"name":"zh-auto-cn-l10n","origin":"chinese/auto-cn-l10n","version":"1.1_3","comment":"The automatic localization for Simplified Chinese zh_CN.eucCN locale","maintainer":"ports@FreeBSD.org","www":"UNKNOWN","abi":"FreeBSD:12:amd64","arch":"freebsd:12:x86:64","prefix":"/usr/local","sum":"7d87b8636a0a77528b79cad0172eab1a10da472320b9873e0f3ba8942dc1b155","flatsize":19656,"path":"All/zh-auto-cn-l10n-1.1_3.txz","repopath":"All/zh-auto-cn-l10n-1.1_3.txz","licenselogic":"single","pkgsize":7496,"desc":"Simplified Chinese (GB2312 encoding) zh_CN.eucCN automatic localization\nInstall this port and you will have a Simplified Chinese FreeBSD system","deps":{"relaxconf":{"origin":"sysutils/relaxconf","version":"1.1.1_3"},"wqy-fonts":{"origin":"x11-fonts/wqy","version":"20100803_10,1"},"zh-scim-pinyin":{"origin":"chinese/scim-pinyin","version":"0.5.92_4"},"zh-scim-tables":{"origin":"chinese/scim-tables","version":"0.5.10_1"}},"categories":["chinese"],"options":{"FCITX":"off","FIREFLYTTF":"off","MINICHINPUT":"off","RELAXCONF":"on","SCIM":"on","WQY":"on"},"annotations":{"FreeBSD_version":"1201000"},"messages":[{"message":"English Instructions:\n Please tell your users to merge their old dotfiles with the new ones, in\n /usr/local/share/skel/zh_CN.eucCN/dot.*\n\n For future adduser\n # adduser -k /usr/local/share/skel/zh_CN.eucCN\n\n**************************************************************************\n\n????????˵??:\n ??????????û??Ƚ????ǵ??¾?????,????\n /usr/local/share/skel/zh_CN.eucCN/dot.*\n\n ????Ժ???Ҫ?????û?,???????????µķ?ʽ:\n # adduser -k /usr/local/share/skel/zh_CN.eucCN","type":"install"},{"message":"===> NOTICE:\n\nThe zh-auto-cn-l10n port currently does not have a maintainer. As a result, it is\nmore likely to have unresolved issues, not be up-to-date, or even be removed in\nthe future. To volunteer to maintain this port, please create an issue at:\n\nhttps://bugs.freebsd.org/bugzilla\n\nMore information about port maintainership is available at:\n\nhttps://www.freebsd.org/doc/en/articles/contributing/ports-contributing.html#maintain-port"}]}