n.b. Any UTF file (other than UTF-8, which doesn't need it) should start with a BOM U+FEFF encoded the same as any other code point in the same file. RFC 2781 recommends that, if the BOM is missing, big-endian should be assumed. But...
A lot of software assumes little-endian, because for a while Windows programmers had not seen anything else. Sigh.
Character | Unicode | UTF-8 encoding | UTF-16LE encoding | Notes |
---|---|---|---|---|
U+FEFF | 0xEF, 0xBB, 0xBF | 0xFF, 0xFE | BOM (Byte Order Mark). Also zero-width no-break space | |
£ | U+00A3 | 0xC2, 0xA3 | 0xA3, 0x00 | GBP |
€ | U+20AC | 0xE2, 0x82, 0xAC | 0xAC, 0x20 | Euro |
¥ | U+00A5 | 0xC2, 0xA5 | 0xA5, 0x00 | Yen |
₹ | U+20B9 | 0xE2 0x82 0xB9 | 0xB9, 0x20 | Rupee |
¬ | U+00AC | 0xC2, 0xAC | 0xAC, 0x00 | Logical NOT in languages such as PL/I |
« | U+00AB | 0xC2, 0xAB | 0xAB, 0x00 | French quotation |
» | U+00BB | 0xC2, 0xBB | 0xBB, 0x00 | French quotation |
μ | U+03BC | 0xCE, 0xBC | 0xBC, 0x03 | Greek mu |
µ | U+00B5 | 0xC2, 0xB5 | 0xB5, 0x00 | Micro (similar to mu) |
Ω | U+03A9 | 0xCE, 0xA9 | 0xA9, 0x03 | Greek capital Omega |
Ω | U+2126 | 0xE2, 0x84, 0xA6 | 0x26, 0x21 | Ohm symbol "Resistance is futile!" (similar Greek cap. Omega) |
π | U+03C0 | 0xCF, 0x80 | 0xC0, 0x03 | Greek Small Pi (maths constant π=3.14159265358979323846...) |
𓂸 | U+130B8 | 0xF0, 0x93, 0x82, 0xB8 | 0xCD, 0x80, 0xB8, 0xDC | Penis |
👨 | U+1f468 | 0xF0, 0x9F, 0x91, 0xA8 | 0x3D, 0xD8, 0x68, 0xDC | Man |
🚀 | U+1f680 | 0xF0, 0x9F, 0x9A, 0x80 | 0x3D 0xD8 0x80 0xDE | Rocket |
U+200d | 0xE2 0x80 0x8D | 0x3D 0xD8 0x80 0xDE | Zero Width Joiner (combining) | |
👨🚀 | U+1f468 U+200d U+1f680 |
0xF0 0x9F 0x91 0xA8 0xE2 0x80 0x8D 0xF0 0x9F 0x9A 0x80 | 0x3D 0xD8 0x68 0xDC, 0xD2 0x00, 0x3D 0xD8 0x80 0xDE | Male astronaut (Man+combine+Rocket) |
🍷 | U+1F377 | 0xF0,0x9F,0x8D,0xB7 | 0x3c,0xd8,0x77,0xdf | Wine Glass |
🧸 | U+1F9F8 | 0xF0,0x9F,0xA7,0xB8 | 0x3E,0xD8,0xF8,0xDD | Teddy Bear |