QR code encoding

The format information records two things: the error correction level and the mask pattern used for the symbol. Masking is used to break up patterns in the data area that might confuse a scanner, such as large blank areas or misleading features that look like the locator marks.

The mask patterns are defined on a grid that is repeated as necessary to cover the whole symbol. Modules corresponding to the dark areas of the mask are inverted. The format information is protected from errors with a BCH code[1], and two complete copies are included in each QR symbol.

The message dataset is placed from right to left in a zigzag pattern, as shown below. In larger symbols, this is complicated by the presence of the alignment patterns and the use of multiple interleaved error-correction blocks.

format information
Figure 1 - Meaning of format information

In the Figure 1, the format information is protected by a (15,5) BCH code, which can correct up to 3 bit errors. The total length of the code is 15 bits, of which 5 are data bits (2 EC level + 3 mask pattern) and 10 are extra bits for error correction. The format mask for these 15 bits is: [101010000010010]. Note that we map the masked values directly to its meaning here, in contrast to image 4 "Levels & Masks" where the mask pattern numbers are the result of putting the 3rd to 5th mask bit, [101], over the 3rd to 5th format info bit of the QR code.

message placement
Figure 2 - Message placement within a QR symbol

The message is encoded using a (255,249) Reed Solomon code (shortened to (24,18) code by using "padding") which can correct up to 3 byte errors.

larger symbol illustrating interleaved blocks
Figure 3 - Larger symbol illustrating interleaved blocks

The message has 26 data bytes and is encoded using two Reed-Solomon code blocks. Each block is a (255,233) Reed Solomon code (shortened to (35,13) code), which can correct up to 11 byte errors in a single burst, containing 13 data bytes and 22 "parity" bytes appended to the data bytes. The two 35-byte Reed-Solomon code blocks are interleaved so it can correct up to 22 byte errors in a single burst (resulting in a total of 70 code bytes). The symbol achieves level H error correction.

The general structure of a QR encoding is as a sequence of 4 bit indicators with payload length dependent on the indicator mode (e.g. byte encoding payload length is dependent on the first byte).[2]

Mode indicator Description Typical structure '[ type : sizes in bits ]'
0001 Numeric [0001 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 3 1⁄3 × charcount ]
0010 Alphanumeric [0010 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 5 1⁄2 × charcount ]
0100 Byte encoding [0100 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 8 × charcount ]
1000 Kanji encoding [1000 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 13 × charcount ]
0011 Structured append [0011 : 4] [ Symbol Position : 4 ] [ Total Symbols: 4 ] [ Parity : 8 ]
0111 ECI [0111 : 4] [ ECI Assignment number : variable ]
0101 FNC1 in first position [0101 : 4] [ Numeric/Alphanumeric/Byte/Kanji payload : variable ]
1001 FNC1 in second position [1001 : 4] [ Application Indicator : 8 ] [ Numeric/Alphanumeric/Byte/Kanji payload : variable ]
0000 End of message [0000 : 4]

Note:

  • Character Count Indicator depends on how many modules are in a QR code (Symbol Version).
  • ECI Assignment number Size:
    • 8 × 1 bits if ECI Assignment Bitstream starts with '0'
    • 8 × 2 bits if ECI Assignment Bitstream starts with '10'
    • 8 × 3 bits if ECI Assignment Bitstream starts with '110'

Four-bit indicators are used to select the encoding mode and convey other information.

Encoding modes

Indicator Meaning
0001 Numeric encoding (10 bits per 3 digits)
0010 Alphanumeric encoding (11 bits per 2 characters)
0100 Byte encoding (8 bits per character)
1000 Kanji encoding (13 bits per character)
0011 Structured append (used to split a message across multiple QR symbols)
0111 Extended Channel Interpretation[3] (select alternate character set or encoding)
0101 FNC1 in first position (see Code 128[4] for more information)
1001 FNC1 in second position
0000 End of message (Terminator)

Encoding modes can be mixed as needed within a QR symbol. (e.g., a url with a long string of alphanumeric characters )

[ Mode Indicator][ Mode bitstream ] --> [ Mode Indicator][ Mode bitstream ] --> etc... --> [ 0000 End of message (Terminator) ]

After every indicator that selects an encoding mode is a length field that tells how many characters are encoded in that mode. The number of bits in the length field depends on the encoding and the symbol version.

Number of bits in a length field
(Character Count Indicator)

Encoding Ver. 1–9 10–26 27–40
Numeric 10 12 14
Alphanumeric 9 11 13
Byte 8 16 16
Kanji 8 10 12

Alphanumeric encoding mode stores a message more compactly than the byte mode can, but cannot store lower-case letters and has only a limited selection of punctuation marks, which are sufficient for rudimentary web addresses[5]. Two characters are coded in an 11-bit value by this formula:

V = 45 × C1 + C2

This has the exception that the last character in an alphanumeric string with an odd length is read as a 6-bit value instead.

Alphanumeric character codes

Code Character Code Character Code Character Code Character Code Character
00 0 09 9 18 I 27 R 36 Space
01 1 10 A 19 J 28 S 37 $
02 2 11 B 20 K 29 T 38 %
03 3 12 C 21 L 30 U 39 *
04 4 13 D 22 M 31 V 40 +
05 5 14 E 23 N 32 W 41 -
06 6 15 F 24 O 33 X 42 .
07 7 16 G 25 P 34 Y 43 /
08 8 17 H 26 Q 35 Z 44 :

[1] https://en.wikipedia.org/wiki/BCH_code
[2] ISO/IEC 18004:2006(E) § 6.4 Data encoding; Table 3 – Number of bits in character count indicator for QR Code 2005
[3] https://en.wikipedia.org/wiki/Extended_Channel_Interpretation
[4] https://en.wikipedia.org/wiki/Code_128
[5] https://en.wikipedia.org/wiki/URL

Source: Wikipedia.com

More information