UTF8 Trick (cats!)

From: "Robert A. Kelly III" 
------------------------------------------------------
The real purpose of Unicode is so we can exchange plain text cats on the
internet! 😸

On 06/26/2013 09:33 AM, James Nylen wrote:
> 
> UTF-8 encoded sample plain-text file
> ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
> 
> Markus Kuhn [ˈmaʳkʊs kuːn]  — 2002-07-25
> 
> 
> The ASCII compatible UTF-8 encoding used in this plain-text file
> is defined in Unicode, ISO 10646-1, and RFC 2279.
> 
> 
> Using Unicode/UTF-8, you can write in emails and source code things such as
> 
> Mathematics and sciences:
> 
>   ∮ E⋅da = Q,  n → ∞, ∑ f(i) = ∏ g(i)
> 
>   ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β)
> 
>   ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ
> 
>   ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (⟦A⟧ ⇔ ⟪B⟫)
> 
>   2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm
> 
> Linguistics and dictionaries:
> 
>   ði ıntəˈnæʃənəl fəˈnɛtık əsoʊsiˈeıʃn
>   Y [ˈʏpsilɔn], Yen [jɛn], Yoga [ˈjoːgɑ]
> 
> APL:
> 
>   ((V⍳V)=⍳⍴V)/V←,V    ⌷←⍳→⍴∆∇⊃‾⍎⍕⌈
> 
> Nicer typography in plain text files:
> 
>   ╔══════════════════════════════════════════╗
>   ║                                          ║
>   ║   • ‘single’ and “double” quotes         ║
>   ║                                          ║
>   ║   • Curly apostrophes: “We’ve been here” ║
>   ║                                          ║
>   ║   • Latin-1 apostrophe and accents: '´`  ║
>   ║                                          ║
>   ║   • ‚deutsche‘ „Anführungszeichen“       ║
>   ║                                          ║
>   ║   • †, ‡, ‰, •, 3–4, —, −5/+5, ™, …      ║
>   ║                                          ║
>   ║   • ASCII safety test: 1lI|, 0OD, 8B     ║
>   ║                      ╭─────────╮         ║
>   ║   • the euro symbol: │ 14.95 € │         ║
>   ║                      ╰─────────╯         ║
>   ╚══════════════════════════════════════════╝
> 
> Combining characters:
> 
>   STARGΛ̊TE SG-1, a = v̇ = r̈, a⃑ ⊥ b⃑
> 
> Greek (in Polytonic):
> 
>   The Greek anthem:
> 
>   Σὲ γνωρίζω ἀπὸ τὴν κόψη
>   τοῦ σπαθιοῦ τὴν τρομερή,
>   σὲ γνωρίζω ἀπὸ τὴν ὄψη
>   ποὺ μὲ βία μετράει τὴ γῆ.
> 
>   ᾿Απ᾿ τὰ κόκκαλα βγαλμένη
>   τῶν ῾Ελλήνων τὰ ἱερά
>   καὶ σὰν πρῶτα ἀνδρειωμένη
>   χαῖρε, ὦ χαῖρε, ᾿Ελευθεριά!
> 
>   From a speech of Demosthenes in the 4th century BC:
> 
>   Οὐχὶ ταὐτὰ παρίσταταί μοι γιγνώσκειν, ὦ ἄνδρες ᾿Αθηναῖοι,
>   ὅταν τ᾿ εἰς τὰ πράγματα ἀποβλέψω καὶ ὅταν πρὸς τοὺς
>   λόγους οὓς ἀκούω· τοὺς μὲν γὰρ λόγους περὶ τοῦ
>   τιμωρήσασθαι Φίλιππον ὁρῶ γιγνομένους, τὰ δὲ πράγματ᾿
>   εἰς τοῦτο προήκοντα,  ὥσθ᾿ ὅπως μὴ πεισόμεθ᾿ αὐτοὶ
>   πρότερον κακῶς σκέψασθαι δέον. οὐδέν οὖν ἄλλο μοι δοκοῦσιν
>   οἱ τὰ τοιαῦτα λέγοντες ἢ τὴν ὑπόθεσιν, περὶ ἧς βουλεύεσθαι,
>   οὐχὶ τὴν οὖσαν παριστάντες ὑμῖν ἁμαρτάνειν. ἐγὼ δέ, ὅτι μέν
>   ποτ᾿ ἐξῆν τῇ πόλει καὶ τὰ αὑτῆς ἔχειν ἀσφαλῶς καὶ Φίλιππον
>   τιμωρήσασθαι, καὶ μάλ᾿ ἀκριβῶς οἶδα· ἐπ᾿ ἐμοῦ γάρ, οὐ πάλαι
>   γέγονεν ταῦτ᾿ ἀμφότερα· νῦν μέντοι πέπεισμαι τοῦθ᾿ ἱκανὸν
>   προλαβεῖν ἡμῖν εἶναι τὴν πρώτην, ὅπως τοὺς συμμάχους
>   σώσομεν. ἐὰν γὰρ τοῦτο βεβαίως ὑπάρξῃ, τότε καὶ περὶ τοῦ
>   τίνα τιμωρήσεταί τις καὶ ὃν τρόπον ἐξέσται σκοπεῖν· πρὶν δὲ
>   τὴν ἀρχὴν ὀρθῶς ὑποθέσθαι, μάταιον ἡγοῦμαι περὶ τῆς
>   τελευτῆς ὁντινοῦν ποιεῖσθαι λόγον.
> 
>   Δημοσθένους, Γ´ ᾿Ολυνθιακὸς
> 
> Georgian:
> 
>   From a Unicode conference invitation:
> 
>   გთხოვთ ახლავე გაიაროთ რეგისტრაცია Unicode-ის მეათე საერთაშორისო
>   კონფერენციაზე დასასწრებად, რომელიც გაიმართება 10-12 მარტს,
>   ქ. მაინცში, გერმანიაში. კონფერენცია შეჰკრებს ერთად მსოფლიოს
>   ექსპერტებს ისეთ დარგებში როგორიცაა ინტერნეტი და Unicode-ი,
>   ინტერნაციონალიზაცია და ლოკალიზაცია, Unicode-ის გამოყენება
>   ოპერაციულ სისტემებსა, და გამოყენებით პროგრამებში, შრიფტებში,
>   ტექსტების დამუშავებასა და მრავალენოვან კომპიუტერულ სისტემებში.
> 
> Russian:
> 
>   From a Unicode conference invitation:
> 
>   Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
>   Unicode, которая состоится 10-12 марта 1997 года в Майнце в Германии.
>   Конференция соберет широкий круг экспертов по  вопросам глобального
>   Интернета и Unicode, локализации и интернационализации, воплощению и
>   применению Unicode в различных операционных системах и программных
>   приложениях, шрифтах, верстке и многоязычных компьютерных системах.
> 
> Thai (UCS Level 2):
> 
>   Excerpt from a poetry on The Romance of The Three Kingdoms (a Chinese
>   classic 'San Gua'):
> 
>   [----------------------------|------------------------]
>     ๏ แผ่นดินฮั่นเสื่อมโทรมแสนสังเวช  พระปกเกศกองบู๊กู้ขึ้นใหม่
>   สิบสองกษัตริย์ก่อนหน้าแลถัดไป       สององค์ไซร้โง่เขลาเบาปัญญา
>     ทรงนับถือขันทีเป็นที่พึ่ง           บ้านเมืองจึงวิปริตเป็นนักหนา
>   โฮจิ๋นเรียกทัพทั่วหัวเมืองมา         หมายจะฆ่ามดชั่วตัวสำคัญ
>     เหมือนขับไสไล่เสือจากเคหา      รับหมาป่าเข้ามาเลยอาสัญ
>   ฝ่ายอ้องอุ้นยุแยกให้แตกกัน          ใช้สาวนั้นเป็นชนวนชื่นชวนใจ
>     พลันลิฉุยกุยกีกลับก่อเหตุ          ช่างอาเพศจริงหนาฟ้าร้องไห้
>   ต้องรบราฆ่าฟันจนบรรลัย           ฤๅหาใครค้ำชูกู้บรรลังก์ ฯ
> 
>   (The above is a two-column text. If combining characters are handled
>   correctly, the lines of the second column should be aligned with the
>   | character above.)
> 
> Ethiopian:
> 
>   Proverbs in the Amharic language:
> 
>   ሰማይ አይታረስ ንጉሥ አይከሰስ።
>   ብላ ካለኝ እንደአባቴ በቆመጠኝ።
>   ጌጥ ያለቤቱ ቁምጥና ነው።
>   ደሀ በሕልሙ ቅቤ ባይጠጣ ንጣት በገደለው።
>   የአፍ ወለምታ በቅቤ አይታሽም።
>   አይጥ በበላ ዳዋ ተመታ።
>   ሲተረጉሙ ይደረግሙ።
>   ቀስ በቀስ፥ ዕንቁላል በእግሩ ይሄዳል።
>   ድር ቢያብር አንበሳ ያስር።
>   ሰው እንደቤቱ እንጅ እንደ ጉረቤቱ አይተዳደርም።
>   እግዜር የከፈተውን ጉሮሮ ሳይዘጋው አይድርም።
>   የጎረቤት ሌባ፥ ቢያዩት ይስቅ ባያዩት ያጠልቅ።
>   ሥራ ከመፍታት ልጄን ላፋታት።
>   ዓባይ ማደሪያ የለው፥ ግንድ ይዞ ይዞራል።
>   የእስላም አገሩ መካ የአሞራ አገሩ ዋርካ።
>   ተንጋሎ ቢተፉ ተመልሶ ባፉ።
>   ወዳጅህ ማር ቢሆን ጨርስህ አትላሰው።
>   እግርህን በፍራሽህ ልክ ዘርጋ።
> 
> Runes:
> 
>   ᚻᛖ ᚳᚹᚫᚦ ᚦᚫᛏ ᚻᛖ ᛒᚢᛞᛖ ᚩᚾ ᚦᚫᛗ ᛚᚪᚾᛞᛖ ᚾᚩᚱᚦᚹᛖᚪᚱᛞᚢᛗ ᚹᛁᚦ ᚦᚪ ᚹᛖᛥᚫ
> 
>   (Old English, which transcribed into Latin reads 'He cwaeth that he
>   bude thaem lande northweardum with tha Westsae.' and means 'He said
>   that he lived in the northern land near the Western Sea.')
> 
> Braille:
> 
>   ⡌⠁⠧⠑ ⠼⠁⠒  ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌
> 
>   ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠙⠑⠁⠙⠒ ⠞⠕ ⠃⠑⠛⠔ ⠺⠊⠹⠲ ⡹⠻⠑ ⠊⠎ ⠝⠕ ⠙⠳⠃⠞
>   ⠱⠁⠞⠑⠧⠻ ⠁⠃⠳⠞ ⠹⠁⠞⠲ ⡹⠑ ⠗⠑⠛⠊⠌⠻ ⠕⠋ ⠙⠊⠎ ⠃⠥⠗⠊⠁⠇ ⠺⠁⠎
>   ⠎⠊⠛⠝⠫ ⠃⠹ ⠹⠑ ⠊⠇⠻⠛⠹⠍⠁⠝⠂ ⠹⠑ ⠊⠇⠻⠅⠂ ⠹⠑ ⠥⠝⠙⠻⠞⠁⠅⠻⠂
>   ⠁⠝⠙ ⠹⠑ ⠡⠊⠑⠋ ⠍⠳⠗⠝⠻⠲ ⡎⠊⠗⠕⠕⠛⠑ ⠎⠊⠛⠝⠫ ⠊⠞⠲ ⡁⠝⠙
>   ⡎⠊⠗⠕⠕⠛⠑⠰⠎ ⠝⠁⠍⠑ ⠺⠁⠎ ⠛⠕⠕⠙ ⠥⠏⠕⠝ ⠰⡡⠁⠝⠛⠑⠂ ⠋⠕⠗ ⠁⠝⠹⠹⠔⠛ ⠙⠑
>   ⠡⠕⠎⠑ ⠞⠕ ⠏⠥⠞ ⠙⠊⠎ ⠙⠁⠝⠙ ⠞⠕⠲
> 
>   ⡕⠇⠙ ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲
> 
>   ⡍⠔⠙⠖ ⡊ ⠙⠕⠝⠰⠞ ⠍⠑⠁⠝ ⠞⠕ ⠎⠁⠹ ⠹⠁⠞ ⡊ ⠅⠝⠪⠂ ⠕⠋ ⠍⠹
>   ⠪⠝ ⠅⠝⠪⠇⠫⠛⠑⠂ ⠱⠁⠞ ⠹⠻⠑ ⠊⠎ ⠏⠜⠞⠊⠊⠥⠇⠜⠇⠹ ⠙⠑⠁⠙ ⠁⠃⠳⠞
>   ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ ⡊ ⠍⠊⠣⠞ ⠙⠁⠧⠑ ⠃⠑⠲ ⠔⠊⠇⠔⠫⠂ ⠍⠹⠎⠑⠇⠋⠂ ⠞⠕
>   ⠗⠑⠛⠜⠙ ⠁ ⠊⠕⠋⠋⠔⠤⠝⠁⠊⠇ ⠁⠎ ⠹⠑ ⠙⠑⠁⠙⠑⠌ ⠏⠊⠑⠊⠑ ⠕⠋ ⠊⠗⠕⠝⠍⠕⠝⠛⠻⠹
>   ⠔ ⠹⠑ ⠞⠗⠁⠙⠑⠲ ⡃⠥⠞ ⠹⠑ ⠺⠊⠎⠙⠕⠍ ⠕⠋ ⠳⠗ ⠁⠝⠊⠑⠌⠕⠗⠎
>   ⠊⠎ ⠔ ⠹⠑ ⠎⠊⠍⠊⠇⠑⠆ ⠁⠝⠙ ⠍⠹ ⠥⠝⠙⠁⠇⠇⠪⠫ ⠙⠁⠝⠙⠎
>   ⠩⠁⠇⠇ ⠝⠕⠞ ⠙⠊⠌⠥⠗⠃ ⠊⠞⠂ ⠕⠗ ⠹⠑ ⡊⠳⠝⠞⠗⠹⠰⠎ ⠙⠕⠝⠑ ⠋⠕⠗⠲ ⡹⠳
>   ⠺⠊⠇⠇ ⠹⠻⠑⠋⠕⠗⠑ ⠏⠻⠍⠊⠞ ⠍⠑ ⠞⠕ ⠗⠑⠏⠑⠁⠞⠂ ⠑⠍⠏⠙⠁⠞⠊⠊⠁⠇⠇⠹⠂ ⠹⠁⠞
>   ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲
> 
>   (The first couple of paragraphs of "A Christmas Carol" by Dickens)
> 
> Compact font selection example text:
> 
>   ABCDEFGHIJKLMNOPQRSTUVWXYZ /0123456789
>   abcdefghijklmnopqrstuvwxyz £©µÀÆÖÞßéöÿ
>   –—‘“”„†•…‰™œŠŸž€ ΑΒΓΔΩαβγδω АБВГДабвгд
>   ∀∂∈ℝ∧∪≡∞ ↑↗↨↻⇣ ┐┼╔╘░►☺♀ fi�⑀₂ἠḂӥẄɐː⍎אԱა
> 
> Greetings in various languages:
> 
>   Hello world, Καλημέρα κόσμε, コンニチハ
> 
> Box drawing alignment tests:                                          █
>                                                                       ▉
>   ╔══╦══╗  ┌──┬──┐  ╭──┬──╮  ╭──┬──╮  ┏━━┳━━┓  ┎┒┏┑   ╷  ╻ ┏┯┓ ┌┰┐    ▊
> ╱╲╱╲╳╳╳
>   ║┌─╨─┐║  │╔═╧═╗│  │╒═╪═╕│  │╓─╁─╖│  ┃┌─╂─┐┃  ┗╃╄┙  ╶┼╴╺╋╸┠┼┨ ┝╋┥    ▋
> ╲╱╲╱╳╳╳
>   ║│╲ ╱│║  │║   ║│  ││ │ ││  │║ ┃ ║│  ┃│ ╿ │┃  ┍╅╆┓   ╵  ╹ ┗┷┛ └┸┘    ▌
> ╱╲╱╲╳╳╳
>   ╠╡ ╳ ╞╣  ├╢   ╟┤  ├┼─┼─┼┤  ├╫─╂─╫┤  ┣┿╾┼╼┿┫  ┕┛┖┚     ┌┄┄┐ ╎ ┏┅┅┓ ┋ ▍
> ╲╱╲╱╳╳╳
>   ║│╱ ╲│║  │║   ║│  ││ │ ││  │║ ┃ ║│  ┃│ ╽ │┃  ░░▒▒▓▓██ ┊  ┆ ╎ ╏  ┇ ┋ ▎
>   ║└─╥─┘║  │╚═╤═╝│  │╘═╪═╛│  │╙─╀─╜│  ┃└─╂─┘┃  ░░▒▒▓▓██ ┊  ┆ ╎ ╏  ┇ ┋ ▏
>   ╚══╩══╝  └──┴──┘  ╰──┴──╯  ╰──┴──╯  ┗━━┻━━┛  ▗▄▖▛▀▜   └╌╌┘ ╎ ┗╍╍┛ ┋
>  ▁▂▃▄▅▆▇█
>                                                ▝▀▘▙▄▟
> 
> 
> 
> On Wed, Jun 26, 2013 at 8:15 AM, Mike Harrison  > wrote:
> 
> 
>     I've been doing this for a few months now, watching it run, and I've
>     been amazed by the accuracy of this trick. The people I normally
>     would want to talk to on this address, don't send mail with UTF8
>     characters. I run this before pulling up pine via shell to read my mail.
> 
>     A simple shell in my user directory:
> 
>     grep --color='auto' -P -n "[\x80-\xFF]" Maildir/cur/*
>     grep  -P "[\x80-\xFF]" Maildir/cur/* >bad
>     perl ./nukebad.pl 
> 
>     Nukebad is simple:
> 
>     #!/usr/bin/perl
>     open(IN,"bad") ; while(){
>      $filename = substr($

=============================================================== From: Mike Harrison ------------------------------------------------------ That that worked, while literally in a green screen, with black background, SSH'd into a server and using PINE for email is impressive. But then, I've been cutting and pasting French and Arabic into a system lately... Building a system for Morocco.