RedGrittyBrick

Some Common Character-Sets

Here is a comparison of some character sets that are, or have been, commonly used in English speaking countries such as the USA and UK.

They can be grouped into three categories: 7-bit character sets like ASCII with numeric-values / code-points running from 0 to 127; 8-bit character sets like “Windows Latin–1” and “ISO 8859–1 Latin–1” which have numeric-values / code-points from 0 to 256; Multi-byte character sets such as the Unicode character set.

Note that Unicode has numeric values / code-points running from 0 up into the millions. 32-bits is insufficient to represent all Unicode characters. Typically Unicode is encoded in UTF–8 which uses a variable number of bytes to represent each character. The tables below only show the first 256 characters.

Key

Column Description
ASC ASCII
PC IBM PC ROM
437 Code Page 437
850 Code Page 850
Win Windows Latin–1
Lat ISO 8859–1 Latin 1
Uni Unicode / ISO 10646

Characters

Dec Hex ASC PC 437 850 Win Lat1 Uni
0 0000 NUL  NUL NUL NUL NUL NUL
1 0001 SOH SOH SOH SOH SOH SOH
2 0002 STX STX STX STX STX STX
3 0003 ETX ETX ETX ETX ETX ETX
4 0004 EOT EOT EOT EOT EOT EOT
5 0005 ENQ ENQ ENQ ENQ ENQ ENQ
6 0006 ACK ACK ACK ACK ACK ACK
7 0007 BEL BEL BEL BEL BEL BEL
8 0008 BS BS BS BS BS BS
9 0009 HT HT HT HT HT HT
10 000A LF LF LF LF LF LF
11 000B VT VT VT VT VT VT
12 000C FF FF FF FF FF FF
13 000D CR CR CR CR CR CR
14 000E SO SO SO SO SO SO
15 000F SI SI SI SI SI SI
16 0010 DL DL DL DL DL DL
17 0011 DC1 DC1 DC1 DC1 DC1 DC1
18 0012 DC2 DC2 DC2 DC2 DC2 DC2
19 0013 DC3 DC3 DC3 DC3 DC3 DC3
20 0014 DC4 DC4 DC4 DC4 DC4 DC4
21 0015 NAK § NAK NAK NAK NAK NAK
22 0016 SYN SYN SYN SYN SYN SYN
23 0017 ETB ETB ETB ETB ETB ETB
24 0018 CAN CAN CAN CAN CAN CAN
25 0019 EM EM EM EM EM EM
26 001A SUB SUB SUB SUB SUB SUB
27 001B ESC ESC ESC ESC ESC ESC
28 001C FS FS FS FS FS FS
29 001D GS GS GS GS GS GS
30 001E RS RS RS RS RS RS
31 001F US US US US US US

Dec Hex ASC PC 437 850 Win Lat1 Uni
32 0020              
33 0021 ! ! ! ! ! ! !
34 0022 " " " " " " "
35 0023 # # # # # # #
36 0024 $ $ $ $ $ $ $
37 0025 % % % % % % %
38 0026 & & & & & & &
39 0027 ' ' ' ' ' ' '
40 0028 ( ( ( ( ( ( (
41 0029 ) ) ) ) ) ) )
42 002A | | | *
43 002B + + + + + + +
44 002C , , , , , , ,
45 002D - - - - - - -
46 002E . . . . . . .
47 002F / / / / / / /
48 0030 0 0 0 0 0 0 0
49 0031 1 1 1 1 1 1 1
50 0032 2 2 2 2 2 2 2
51 0033 3 3 3 3 3 3 3
52 0034 4 4 4 4 4 4 4
53 0035 5 5 5 5 5 5 5
54 0036 6 6 6 6 6 6 6
55 0037 7 7 7 7 7 7 7
56 0038 8 8 8 8 8 8 8
57 0039 9 9 9 9 9 9 9
58 003A : : : : : : :
59 003B ; ; ; ; ; ; ;
60 003C < < < < < < <
61 003D = = = = = = =
62 003E > > > > > > >
63 003F ? ? ? ? ? ? ?

Dec Hex ASC PC 437 850 Win Lat1 Uni
64 0040 @ @ @ @ @ @ @
65 0041 A A A A A A A
66 0042 B B B B B B B
67 0043 C C C C C C C
68 0044 D D D D D D D
69 0045 E E E E E E E
70 0046 F F F F F F F
71 0047 G G G G G G G
72 0048 H H H H H H H
73 0049 I I I I I I I
74 004A J J J J J J J
75 004B K K K K K K K
76 004C L L L L L L L
77 004D M M M M M M M
78 004E N N N N N N N
79 004F O O O O O O O
80 0050 P P P P P P P
81 0051 Q Q Q Q Q Q Q
82 0052 R R R R R R R
83 0053 S S S S S S S
84 0054 T T T T T T T
85 0055 U U U U U U U
86 0056 V V V V V V V
87 0057 W W W W W W W
88 0058 X X X X X X X
89 0059 Y Y Y Y Y Y Y
90 005A Z Z Z Z Z Z Z
91 005B [ [ [ [ [ [ [
92 005C \ \ \ \ \ \ \
93 005D ] ] ] ] ] ] ]
94 005E ^ ^ ^ ^ ^ ^ ^
95 005F _ _ _ _ _ _ _

Dec Hex ASC PC 437 850 Win Lat1 Uni
96 0060 ` ` ` ` ` ` `
97 0061 a a a a a a a
98 0062 b b b b b b b
99 0063 c c c c c c c
100 0064 d d d d d d d
101 0065 e e e e e e e
102 0066 f f f f f f f
103 0067 g g g g g g g
104 0068 h h h h h h h
105 0069 i i i i i i i
106 006A j j j j j j j
107 006B k k k k k k k
108 006C l l l l l l l
109 006D m m m m m m m
110 006E n n n n n n n
111 006F o o o o o o o
112 0070 p p p p p p p
113 0071 q q q q q q q
114 0072 r r r r r r r
115 0073 s s s s s s s
116 0074 t t t t t t t
117 0075 u u u u u u u
118 0076 v v v v v v v
119 0077 w w w w w w w
120 0078 x x x x x x x
121 0079 y y y y y y y
122 007A z z z z z z z
123 007B { { { { { { {
124 007C | | | | | | |
125 007D } } } } } } }
126 007E ~ ~ ~ ~ ~ ~ ~
127 007F DEL DEL DEL DEL DEL

Dec Hex ASC PC 437 850 Win Lat1 Uni
128 0080 Ç Ç Ç XXX
129 0081 ü ü ü XXX
130 0082 é é é BPH
131 0083 â â â ƒ NBH
132 0084 ä ä ä IND
133 0085 à à à NEL
134 0086 å å å SSA
135 0087 ç ç ç ESA
136 0088 ê ê ê ˆ HTS
137 0089 ë ë ë HTJ
138 008A è è è Š VTS
139 008B ï ï ï PLD
140 008C î î î Œ PLU
141 008D ì ì ì RI
142 008E Ä Ä Ä Ž SS2
143 008F Å Å Å SS3
144 0090 É É É DCS
145 0091 æ æ æ PU1
146 0092 Æ Æ Æ PU2
147 0093 ô ô ô STS
148 0094 ö ö ö CCH
149 0095 ò ò ò MW
150 0096 û û û SPA
151 0097 ù ù ù EPA
152 0098 ÿ ÿ ÿ ˜ SOS
153 0099 Ö Ö Ö XXX
154 009A Ü Ü Ü š SCI
155 009B ¢ ¢ ø CSI
156 009C £ £ £ œ ST
157 009D ¥ ¥ Ø OSC
158 009E × ž PM
159 009F ƒ ƒ ƒ Ÿ APC

Dec Hex ASC PC 437 850 Win Lat1 Uni
160 00A0 á á á     NBSP
161 00A1 í í í ¡ ¡ ¡
162 00A2 ó ó ó ¢ ¢ ¢
163 00A3 ú ú ú £ £ £
164 00A4 ñ ñ ñ ¤ ¤ ¤
165 00A5 Ñ Ñ Ñ ¥ ¥ ¥
166 00A6 ª ª ª ¦ ¦ ¦
167 00A7 º º º § § §
168 00A8 ¿ ¿ ¿ ¨ ¨ ¨
169 00A9 ® © © ©
170 00AA ¬ ¬ ¬ ª ª ª
171 00AB ½ ½ ½ « « «
172 00AC ¼ ¼ ¼ ¬ ¬ ¬
173 00AD ¡ ¡ ¡ ­ ­ SHY
174 00AE « « « ® ® ®
175 00AF » » » ¯ ¯ ¯
176 00B0 ° ° °
177 00B1 ± ± ±
178 00B2 ² ² ²
179 00B3 ³ ³ ³
180 00B4 ´ ´ ´
181 00B5 Á µ µ µ
182 00B6 Â
183 00B7 À · · ·
184 00B8 © ¸ ¸ ¸
185 00B9 ¹ ¹ ¹
186 00BA º º º
187 00BB » » »
188 00BC ¼ ¼ ¼
189 00BD ¢ ½ ½ ½
190 00BE ¥ ¾ ¾ ¾
191 00BF ¿ ¿ ¿

Dec Hex ASC PC 437 850 Win Lat1 Uni
192 00C0 À À À
193 00C1 Á Á Á
194 00C2 Â Â Â
195 00C3 Ã Ã Ã
196 00C4 Ä Ä Ä
197 00C5 Å Å Å
198 00C6 ã Æ Æ Æ
199 00C7 Ã Ç Ç Ç
200 00C8 È È È
201 00C9 É É É
202 00CA Ê Ê Ê
203 00CB Ë Ë Ë
204 00CC Ì Ì Ì
205 00CD Í Í Í
206 00CE Î Î Î
207 00CF ¤ Ï Ï Ï
208 00D0 ð Ð Ð Ð
209 00D1 Ð Ñ Ñ Ñ
210 00D2 Ê Ò Ò Ò
211 00D3 Ë Ó Ó Ó
212 00D4 È Ô Ô Ô
213 00D5 ı Õ Õ Õ
214 00D6 Í Ö Ö Ö
215 00D7 Î × × ×
216 00D8 Ï Ø Ø Ø
217 00D9 Ù Ù Ù
218 00DA Ú Ú Ú
219 00DB Û Û Û
220 00DC Ü Ü Ü
221 00DD ¦ Ý Ý Ý
222 00DE Ì Þ Þ Þ
223 00DF ß ß ß

Dec Hex ASC PC 437 850 Win Lat1 Uni
224 00E0 α α Ó à à à
225 00E1 β β ß á á á
226 00E2 Γ Γ Ô â â â
227 00E3 π π Ò ã ã ã
228 00E4 Σ Σ õ ä ä ä
229 00E5 σ σ Õ å å å
230 00E6 μ μ µ æ æ æ
231 00E7 τ τ þ ç ç ç
232 00E8 Φ Φ Þ è è è
233 00E9 Θ Θ Ú é é é
234 00EA Ω Ω Û ê ê ê
235 00EB δ δ Ù ë ë ë
236 00EC ý ì ì ì
237 00ED φ φ Ý í í í
238 00EE ε ε ¯ î î î
239 00EF ´ ï ï ï
240 00F0 ­ ð ð ð
241 00F1 ± ± ± ñ ñ ñ
242 00F2 ò ò ò
243 00F3 ¾ ó ó ó
244 00F4 ô ô ô
245 00F5 § õ õ õ
246 00F6 ÷ ÷ ÷ ö ö ö
247 00F7 ¸ ÷ ÷ ÷
248 00F8 ° ° ° ø ø ø
249 00F9 ¨ ù ù ù
250 00FA · · · ú ú ú
251 00FB ¹ û û û
252 00FC ² ² ³ ü ü ü
253 00FD ² ý ý ý
254 00FE þ þ þ
255 00FF - - NBSP. ÿ ÿ ÿ

Notes

Glossary of Control Characters

There are a large number of control characters which originally were used to organise transmission of data and to provide control over the movement of parts in mechanical printers. Few of these are still used for their original purpose.

NUL
Padding character.
SOH
Start of Header.
STX
Start of Text. See ETX.
ETX
End of Text. See STX. Also known as Ctrl+C and used to interrupt a program.
EOT
End of Transmission. Also known as Ctrl+D and used to signal end of file or end of data.
ENQ
Enquiry. See ACK, NAK.
ACK
Acknowledgement. See Enq, NAK.
BEL
Audible signal. Also known as Ctrl+G
BS
Backspace. Moves print position one place left. originally used on printers for overprinting one character on top of another. E.g. “O BS /” makes Ø. BS is also known as Ctrl+H
HT
Horizontal Tab. Advances print position rightwards, typically to the next column-position that is a multiple of 8 character-widths.
LF
Line Feed. Advances print position down one line. See CR.
VT
Vertical Tab.
FF
Form Feed. Advance print position to top of next sheet/page.
CR
Carriage Return. Move print position back to left margin.
SO
Shift Out. Switches to an alternate character set, often one used for graphics characters rather than for alphabetic characters. See SI.
SI
Shift In. Returns to normal character set. See SO.
DL
Data Link Escape. Following octets are data, not characters.
DC1
Device Control 1. Also known as X-On. Resume operation. See DC3/X-Off.
DC2
Device Control 2.
DC3
Device Control 3. Also known as X-off. Suspend operation. used for communications flow control, tells other end to temporarily stop sending data until this end has caught up. See DC1/X-on.
DC4
Device Control 4.
NAK
Negative Acknowledgement. Not ready or error in prior block. See ENQ, ACK.
SYN
Synchronous Idle. Used in synchronous transmission.
ETB
End of Transmission Block.
CAN
Cancel. Previously sent data is erroneous and should be discarded.
EM
End of Medium. For example, end of paper tape.
SUB
Substitute. Used to mark end of useful data in last block where a fixed-sized block is used. Sometimes used as a filler for unused space in a block. Also Ctrl+Z used to signal a program to suspend execution.
ESC
Escape. Often used as the start of an “Escape Sequence”. See CSI.
FS
Field Separator. One of the hierarchical data separators. See GS, RS, US.
GS
Group Separator. One of the hierarchical data separators. See FS, RS, US.
RS
Record separator. One of the hierarchical data separators. See FS, GS, US.
US
Unit Separator. One of the hierarchical data separators. See FS, GS, RS.
XXX
Unused?
BPH
Break Permitted Here. Indicates that a line-break is allowed here.
NBH
No Break Here. Follows a character that is not to be separated by a line-break from the following character.
IND
Deprecated, moves active position down one line. Compare LF.
NEL
Next Line. Compare CR,LF.
SSA
Start of Selected Area. See ESA.
ESA
End of Selected Area. See SSA.
CTS
Character Tabulation Set. Sets a tab-position at current location. See TAB.
HTJ
Horizontal Tabulation with Justification. Moves preceding text (from prior tab position) to right-align with next tab position.
VTS
Line Tabulation Set. Sets a vertical tabulation stop at current position. See VT.
PLD
Partial Line Down/forward. Used for subscripts or to end PLU.
PLU
Partial Line Up/backward. Used for superscripts or to end PLD.
RI
Reverse Line Feed. See LF. Moves to prior line, same horizontal position.
SS2
Single Shift Two. Switch to character set G2.
SS3
Single Shift Three. Switch to character set G3.
DCS
Device Control String. Ended by ST. Used to send commands or a status report.
PU1
Private Use One.
PU2
Private Use Two.
STS
Set Transmit State. Indicates that data is available for transmission.
CCH
Cancel Character. Ignore previous character. Destructive backspace. See BS.
MW
Message Waiting.
SPA
Start of Guarded Area. Following characters not to be altered, transmitted or (optionally) erased. See EPA.
EPA
End of Guarded Area. See SPA.
SOS
Start of String.
SCI
Single Character Introducer. Purpose not fully defined.
CSI
Control Sequence Introducer. Used to indicate the start of a group of characters that are not to be printed or displayed but which are to be used to move the cursor, change colours or perform some other change to the display state. See Esc.
ST
String Terminator. See APC, DCS, OSC, PM, SOS.
OSC
Operating System Command.
PM
Privacy Message.
APC
Application Program Command.
NBSP
No-Break Space. Used to join two words that should not be split across the end of a line.
SHY
Soft Hyphen. Indicates where a word can be broken across a line with a hyphen used to indicate the break.