Base64 Algorithm Explained - Step-by-Step Encoding Process

Q: Why does Base64 use 6-bit groups?

Base64 uses 6-bit groups because 2 raised to the power of 6 equals 64, which allows each group to map to exactly one of 64 printable ASCII characters. Using 6 bits provides a good balance between alphabet size and encoding efficiency, producing only 33% overhead compared to the original binary data.

Q: Why does Base64 process 3 bytes at a time?

Base64 processes 3 bytes (24 bits) at a time because 24 is the least common multiple of 8 (bits per byte) and 6 (bits per Base64 character). This means 3 input bytes divide evenly into exactly 4 output characters with no leftover bits, producing the cleanest encoding ratio.

Q: What makes Base64 different from encryption?

Base64 is a deterministic, reversible encoding scheme that uses a public algorithm and no secret keys. Anyone can decode Base64 data instantly. Encryption uses secret keys and mathematical algorithms (AES, RSA) to make data unreadable without the correct key. Base64 provides format conversion, not confidentiality.

Step-by-step breakdown of how the Base64 encoding algorithm converts binary data to text

How Does the Base64 Encoding Algorithm Work?

The Base64 algorithm converts binary data to printable text in 4 steps: (1) read 3 input bytes (24 bits), (2) split the 24 bits into four 6-bit groups, (3) map each 6-bit value (0-63) to a character from the Base64 alphabet, (4) output 4 characters. The algorithm repeats this process for every 3-byte block in the input data.

The algorithm is defined in RFC 4648 Section 4. It uses an alphabet of 64 printable ASCII characters: A-Z (indices 0-25), a-z (indices 26-51), 0-9 (indices 52-61), + (index 62), and / (index 63). The equals sign (=) serves as the padding character when the input length is not a multiple of 3. For the complete character-to-index mapping, see the Base64 character table.

The algorithm is stateless: each 3-byte block is processed independently, with no information carried between blocks. This property makes the algorithm simple to implement and allows parallel processing of large inputs. Both encoding and decoding follow the same block-based structure, making the process fully reversible. Try encoding text with the Base64 text encoder to see the algorithm in action.

What Is the Step-by-Step Base64 Encoding Process?

The following example traces the encoding of the string "Man" through every step of the Base64 algorithm. The input consists of 3 ASCII characters, which is exactly one complete block.

Step 1: Determine the ASCII decimal value of each input character.

Step 2: Convert each decimal value to its 8-bit binary representation.

Step 3: Concatenate all binary values into a single 24-bit string.

Step 4: Split the 24-bit string into four 6-bit groups.

Step 5: Convert each 6-bit group to its decimal value (the Base64 index).

Step 6: Map each index to the corresponding Base64 character.

Input Char	ASCII Decimal	Binary (8-bit)	6-bit Groups	Index	Base64 Char
M	77	`01001101`	`010011`	19	T
a	97	`01100001`	`010110`	22	W
n	110	`01101110`	`000101`	5	F
			`101110`	46	u

Binary concatenation: 01001101 + 01100001 + 01101110 = 010011 010110 000101 101110

Result: "Man" encodes to "TWFu". The 3 input bytes (24 bits) produce exactly 4 output characters (4 x 6 = 24 bits). No padding is needed because the input length divides evenly by 3. For more encoding examples, see the Base64 encoding guide.

How Does Base64 Padding Work?

Base64 padding handles input whose byte count is not divisible by 3. The algorithm adds zero-bits to complete the final 6-bit group, then appends = characters to make the output length a multiple of 4. The number of padding characters tells the decoder how many trailing bytes to discard.

Padding Case 1: Two Remaining Bytes

When 2 bytes remain after processing all complete 3-byte blocks, the algorithm pads the binary with 2 zero-bits, produces 3 Base64 characters, and appends one =.

Example: encoding "Ma"

Step	Data	Value
Input characters	M, a	2 bytes
Decimal values	77, 97
Binary	`01001101 01100001`	16 bits
Padded binary	`01001101 01100001 00`	18 bits (2 zeros added)
6-bit groups	`010011 010110 000100`	3 groups
Indices	19, 22, 4
Base64 characters	T, W, E	3 characters
Result with padding	`TWE=`

Padding Case 2: One Remaining Byte

When 1 byte remains, the algorithm pads the binary with 4 zero-bits, produces 2 Base64 characters, and appends two = characters.

Example: encoding "M"

Step	Data	Value
Input character	M	1 byte
Decimal value	77
Binary	`01001101`	8 bits
Padded binary	`01001101 0000`	12 bits (4 zeros added)
6-bit groups	`010011 010000`	2 groups
Indices	19, 16
Base64 characters	T, Q	2 characters
Result with padding	`TQ==`

Padding Summary

Input Length mod 3	Remaining Bytes	Zero-bits Added	Base64 Chars	Padding
0	0 (complete block)	0	4	None
1	1 byte	4	2	`==`
2	2 bytes	2	3	`=`

Use the Base64 validator to check whether a string has correct padding and valid characters.

How Does Base64 Decoding Work?

Base64 decoding reverses the encoding process in 4 steps: (1) map each Base64 character to its 6-bit index value using the alphabet lookup table, (2) concatenate all 6-bit values into a continuous binary string, (3) split the binary string into 8-bit groups (bytes), (4) convert each byte to its original value. Padding characters (=) are ignored during decoding.

Example: decoding "TWFu" back to "Man"

Step	Data	Value
Input characters	T, W, F, u	4 Base64 characters
Look up indices	19, 22, 5, 46	4 indices
6-bit binary	`010011 010110 000101 101110`	24 bits
Split into 8-bit groups	`01001101 01100001 01101110`	3 bytes
Decimal values	77, 97, 110
ASCII characters	M, a, n
Result	`Man`

The decoding process is deterministic and lossless. The same Base64 input always produces the same binary output. Decode any Base64 string using the Base64 text decoder.

What Is the Mathematical Formula for Base64 Output Length?

The encoded output length follows a formula based on the input byte count. The formula accounts for the 3-to-4 byte ratio and the padding requirement that output length must be a multiple of 4.

Encoded length = ceil(input_bytes / 3) * 4

Input Bytes	Calculation	Output Characters	Overhead
1	ceil(1/3) x 4	4	300%
2	ceil(2/3) x 4	4	100%
3	ceil(3/3) x 4	4	33%
10	ceil(10/3) x 4	16	60%
100	ceil(100/3) x 4	136	36%
1,000	ceil(1000/3) x 4	1,336	33.6%
1,000,000	ceil(1000000/3) x 4	1,333,336	33.3%

As input size increases, the overhead ratio approaches the theoretical minimum of 33.3% (the 4/3 ratio). For small inputs (1-2 bytes), the overhead is disproportionately large due to the padding requirement. Calculate exact sizes for your data using the Base64 size calculator.

How Does the Base64 Algorithm Handle Different Data Types?

The Base64 algorithm operates on raw bytes and does not distinguish between data types. Text, images, audio, video, and any other file format are all treated as sequences of bytes. The algorithm reads bytes, splits bits, maps to characters, and outputs text regardless of what the bytes represent.

The difference between encoding text and encoding binary files lies in the pre-processing step. Text must first be converted to bytes using a character encoding such as UTF-8 or ASCII. Binary files (images, PDFs, executables) are already byte sequences and require no pre-processing. The Base64 algorithm receives the same type of input in both cases: an array of bytes.

Data Type	Pre-processing	Base64 Input
ASCII text	Direct byte mapping (1 byte per char)	Byte array
UTF-8 text	UTF-8 encoding (1-4 bytes per char)	Byte array
PNG image	None (already bytes)	Byte array
PDF document	None (already bytes)	Byte array
JSON string	UTF-8 encoding	Byte array

Encode images using the Base64 image encoder or encode other file types with the Base64 file encoder. Both tools apply the same underlying algorithm to different input sources.

What Is the Time Complexity of Base64 Encoding?

The Base64 encoding and decoding algorithms run in O(n) time, where n is the number of input bytes. Each byte requires a constant number of operations: bit extraction via shifting, array lookup for character mapping, and character output. No comparisons, sorting, or recursive operations are involved.

The space complexity is also O(n). The output size is always ceil(n/3) * 4 characters, which is linearly proportional to the input size. Streaming implementations process fixed-size chunks (typically 3 bytes at a time) and require only O(1) additional memory beyond the input and output buffers.

Property	Encoding	Decoding
Time complexity	O(n)	O(n)
Space complexity	O(n)	O(n)
Operations per byte	Constant (bit shift + lookup)	Constant (lookup + bit shift)
Parallelizable	Yes (independent blocks)	Yes (independent blocks)
Streaming capable	Yes (3-byte chunks)	Yes (4-character chunks)

The linear time complexity means doubling the input size doubles the processing time. A 1MB file encodes in approximately the same time as two sequential 500KB files. Modern hardware encodes Base64 at speeds exceeding 1GB per second using optimized SIMD instructions. For a broader understanding of Base64 and its properties, see What is Base64 Encoding.

Frequently Asked Questions

Why does Base64 use 6-bit groups?

2 raised to the power of 6 equals 64, which allows each 6-bit group to map to exactly one of 64 printable ASCII characters. Using 6 bits provides a good balance between alphabet size and encoding efficiency, producing only 33% overhead compared to the original binary data. Fewer bits per group (like Base32's 5 bits) would increase overhead. More bits (7 or 8) would require an alphabet larger than the available printable ASCII characters.

Why does Base64 process 3 bytes at a time?

3 bytes equal 24 bits, which is the least common multiple of 8 (bits per input byte) and 6 (bits per Base64 character). This means 24 bits divide evenly into exactly 4 groups of 6 bits, with no leftover bits and no wasted space within the encoding block. Processing any other number of bytes would leave remainder bits that complicate the algorithm.

Is the Base64 algorithm the same for all implementations?

Yes, the core algorithm (3-byte to 4-character conversion using 6-bit grouping) is identical across all implementations as defined in RFC 4648. The only variation between Base64 variants is the character alphabet. Standard Base64 uses A-Z, a-z, 0-9, +, / while the URL-safe variant replaces + with - and / with _. The bit-splitting and padding logic remain unchanged.

Can Base64 encoding be parallelized?

Yes, Base64 encoding can be parallelized because each 3-byte input block is processed independently. No state is carried between blocks, so multiple blocks can be encoded simultaneously on separate CPU cores. The output blocks are concatenated in order after parallel processing completes. The same applies to decoding, where each 4-character block is independent.

What makes Base64 different from encryption?

Base64 is a deterministic, reversible encoding with a public algorithm and no secret keys. Anyone can decode a Base64 string instantly using built-in functions like JavaScript's atob() or Python's base64.b64decode(). Encryption uses secret keys and mathematical algorithms (AES, RSA, ChaCha20) to make data unreadable without the correct key. Base64 provides format conversion for data transport, not data protection. Use the Base64 text decoder to verify that any Base64 string decodes instantly.