What is Base64 Encoding? Complete Guide to Base64 Format

A comprehensive guide to the Base64 encoding format, algorithm, and practical applications

What Is Base64 Encoding?

Base64 is a binary-to-text encoding scheme defined in RFC 4648 that represents binary data using 64 printable ASCII characters. The algorithm converts every 3 input bytes (24 bits) into 4 output characters, producing a 33.3% size overhead compared to the original binary data.

The name "Base64" refers to the 64-character alphabet used for encoding. Each character represents exactly 6 bits of data (2⁶ = 64). This encoding exists because many transport protocols and storage systems only support text data. Email (SMTP), JSON, XML, HTML, and URL query strings all require text-safe representations of binary content. Base64 bridges this gap by transforming arbitrary bytes into a string that passes through text-only channels without corruption.

The encoding process is stateless and deterministic: the same input always produces the same output. Base64 does not compress data, does not encrypt data, and does not add error-checking capabilities. Its sole purpose is format conversion from binary to text.

Use the Base64 text encoder to convert any string to Base64, or the Base64 image encoder to encode image files directly in your browser.

How Does the Base64 Algorithm Work?

The Base64 algorithm processes input data in groups of 3 bytes (24 bits), splitting each group into 4 segments of 6 bits. Each 6-bit segment maps to one of 64 characters in the Base64 alphabet. When the input length is not a multiple of 3, padding characters (=) fill the remaining positions.

Step-by-Step Encoding Process

The algorithm follows 5 steps for each 3-byte group:

Read 3 bytes from the input stream (24 bits total).
Concatenate the 3 bytes into a single 24-bit binary number.
Split the 24-bit number into 4 groups of 6 bits each.
Map each 6-bit value (0-63) to the corresponding character in the Base64 alphabet.
Append padding (=) if the input had fewer than 3 bytes remaining.

Encoding Example: "Man"

The word "Man" consists of 3 ASCII bytes: M (77), a (97), n (110). The table below traces each step of the encoding process.

Step	Data	Value
Input characters	M, a, n	3 ASCII characters
Decimal values	77, 97, 110	3 bytes
Binary (8-bit each)	`01001101 01100001 01101110`	24 bits
Split into 6-bit groups	`010011 010110 000101 101110`	4 groups
Decimal index values	19, 22, 5, 46	4 indices (0-63)
Base64 characters	T, W, F, u	4 output characters
Result	`TWFu`

The 3-byte input "Man" becomes the 4-character output "TWFu". This 3:4 ratio applies to every group, producing the characteristic 33.3% size increase. For a detailed walkthrough of the algorithm with additional examples, see the Base64 algorithm reference.

What Characters Does Base64 Use?

The standard Base64 alphabet defined in RFC 4648 Section 4 contains 64 characters: 26 uppercase letters (A-Z), 26 lowercase letters (a-z), 10 digits (0-9), plus sign (+), and forward slash (/). The equals sign (=) serves as the padding character. Each character maps to a specific 6-bit index value from 0 to 63.

Index Range	Characters	Count
0 - 25	`A B C D E F G H I J K L M N O P Q R S T U V W X Y Z`	26
26 - 51	`a b c d e f g h i j k l m n o p q r s t u v w x y z`	26
52 - 61	`0 1 2 3 4 5 6 7 8 9`	10
62	`+`	1
63	`/`	1
Padding	`=`	1

For the complete index-to-character mapping with binary values, see the Base64 character table.

URL-Safe Base64 Variant

RFC 4648 Section 5 defines a URL-safe variant that replaces 2 characters from the standard alphabet. The plus sign (+) becomes a hyphen (-), and the forward slash (/) becomes an underscore (_). These substitutions prevent conflicts with URL encoding, where + represents a space and / is a path separator. The URL-safe variant is used in JWT tokens, filename encoding, and URL query parameters.

Index	Standard (RFC 4648 §4)	URL-Safe (RFC 4648 §5)
62	`+`	`-`
63	`/`	`_`

Use the URL-safe Base64 encoder to generate URL-compatible Base64 strings.

How Does Base64 Padding Work?

Base64 padding uses the = character to ensure the encoded output length is always a multiple of 4 characters. Padding is required when the input byte count is not evenly divisible by 3. The number of padding characters depends on the remainder of the input length divided by 3.

Input Bytes mod 3	Remaining Bytes	Padding Added	Output Characters
0	0 (exact multiple of 3)	None	4n characters
1	1 byte (8 bits)	`==`	4n + 4 characters
2	2 bytes (16 bits)	`=`	4n + 4 characters

Padding Examples

Input	Byte Count	mod 3	Base64 Output	Padding
`Man`	3	0	`TWFu`	None
`Ma`	2	2	`TWE=`	1 pad
`M`	1	1	`TQ==`	2 pads
`Base64`	6	0	`QmFzZTY0`	None
`Hello`	5	2	`SGVsbG8=`	1 pad
`A`	1	1	`QQ==`	2 pads

When 1 input byte remains, the algorithm produces 2 Base64 characters plus ==. When 2 input bytes remain, it produces 3 Base64 characters plus =. Padding allows the decoder to determine the exact number of original bytes without external metadata. Use the Base64 validator to check whether a string has correct padding.

What Are the Common Use Cases for Base64?

Base64 encoding is used in 6 primary contexts where binary data must travel through text-based systems. Each use case exploits the same property: Base64 output consists entirely of printable ASCII characters that survive text processing without corruption.

Data URIs for Web Images

Data URIs embed Base64-encoded images directly in HTML and CSS, eliminating separate HTTP requests. The format is data:image/png;base64,[encoded data]. This technique reduces latency for small images (under 10KB) by avoiding network round-trips. Larger images should remain as external files because Base64 adds 33% overhead and prevents browser caching.

Convert images to data URIs using the Base64 image encoder or generate ready-to-use HTML and CSS embed code with the Base64 embed code generator.

Email Attachments (MIME)

MIME (Multipurpose Internet Mail Extensions), defined in RFC 2045, uses Base64 to encode email attachments. SMTP (Simple Mail Transfer Protocol) was designed for 7-bit ASCII text and cannot transport raw binary data. MIME Base64 wraps encoded output at 76 characters per line with CRLF line endings. This format allows binary files (images, PDFs, archives) to travel through email infrastructure without corruption.

JWT Tokens

JSON Web Tokens (JWT), defined in RFC 7519, use URL-safe Base64 (base64url) to encode the header and payload segments. A JWT consists of 3 base64url-encoded parts separated by periods: header.payload.signature. The URL-safe variant avoids conflicts with URL special characters. Convert between standard and URL-safe formats using the URL-safe Base64 tool.

API Payloads

REST APIs frequently transmit binary data (images, documents, certificates) as Base64-encoded strings within JSON payloads. JSON does not support raw binary data, so Base64 provides the text representation. The Content-Transfer-Encoding: base64 header signals that a payload contains Base64 data. API schemas (OpenAPI/Swagger) use the format: byte type for Base64-encoded fields.

CSS Embedding

CSS files embed small images (icons, backgrounds, patterns) as Base64 data URIs in background-image properties. This bundles image data with the stylesheet, reducing the total number of HTTP requests. The embed code generator produces ready-to-use CSS snippets with data URIs.

Database and File Storage

Databases that lack binary column types store encoded data as text strings. Configuration files (JSON, YAML, XML) embed binary content as Base64 values. Source code embeds small binary resources (certificates, keys, icons) as Base64 string literals to avoid external file dependencies.

What Is the Difference Between Base64, Base32, and Base16?

Base64, Base32, and Base16 are all binary-to-text encoding schemes defined in RFC 4648. They differ in alphabet size, bits per character, size overhead, and intended use cases. Base64 provides the most compact output, Base16 provides the most human-readable output, and Base32 balances readability with compactness.

Property	Base64	Base32	Base16 (Hex)
Alphabet size	64 characters	32 characters	16 characters
Bits per character	6 bits	5 bits	4 bits
Size overhead	33% (4:3 ratio)	60% (8:5 ratio)	100% (2:1 ratio)
RFC	RFC 4648 §4	RFC 4648 §6	RFC 4648 §8
Padding character	`=`	`=`	None
Case sensitive	Yes	No (A-Z, 2-7)	No (0-9, A-F)
Common use	Email, data URIs, APIs	TOTP codes, Crockford IDs	Hex dumps, checksums, colors

For converting between Base64 and hexadecimal (Base16), use the Base64 to HEX converter.

What Are Data URIs and How Do They Use Base64?

A data URI is an inline data scheme defined in RFC 2397 that embeds file content directly in HTML, CSS, or JavaScript using the format data:[mediatype][;base64],<data>. When the ;base64 token is present, the data portion contains Base64-encoded binary content.

Data URI Structure

data:[<MIME type>][;base64],<encoded data>

Examples:
data:image/png;base64,iVBORw0KGgo...
data:image/svg+xml;base64,PHN2ZyB4...
data:text/plain;base64,SGVsbG8gV29ybGQ=
data:application/pdf;base64,JVBERi0x...

Data URI Components

Component	Description	Example
`data:`	URI scheme identifier	`data:`
MIME type	Media type of the encoded data	`image/png`
`;base64`	Encoding declaration	`;base64`
`,`	Separator between metadata and data	`,`
Encoded data	Base64-encoded binary content	`iVBORw0KGgo...`

Data URIs eliminate HTTP requests for small resources but increase the HTML/CSS file size by the encoded data length plus 33% Base64 overhead. Images under 10KB typically benefit from data URI embedding. Images above 10KB should remain as external files for browser caching and CDN delivery. For a complete guide, see the data URI reference. Generate data URIs using the image encoder or the embed code generator.

What Are the Limitations of Base64 Encoding?

Base64 encoding has 4 primary limitations: size overhead, lack of security, absence of compression, and increased memory consumption. Understanding these constraints prevents misuse in production systems.

33% size increase: Every 3 bytes of input produce 4 bytes of output. A 1MB file becomes approximately 1.33MB after encoding. MIME encoding adds further overhead from line breaks every 76 characters. For a detailed comparison, see Base64 vs binary.
Not encryption: Base64 provides zero confidentiality. Any program can decode a Base64 string instantly. Sensitive data requires AES, RSA, or other cryptographic algorithms.
Not compression: Base64 always increases data size. It does not analyze or reduce redundancy in the input. Compress data (gzip, brotli, zstd) before encoding to minimize the final size.
Increased memory usage: Encoding and decoding require buffering the full input and output in memory. Processing a 100MB file requires approximately 233MB of memory (100MB input + 133MB encoded output). Streaming encoders reduce peak memory usage by processing fixed-size chunks.
No error detection: Base64 does not include checksums or parity bits. Corruption during transmission produces silently incorrect output rather than an error. Use the Base64 validator to verify string integrity.
Cache inefficiency: Base64-embedded images in HTML or CSS cannot be cached independently by browsers. A change to any part of the page invalidates the entire cached resource, including all embedded images.

Is Base64 Encoding Secure?

Base64 is not encryption and provides zero security. It is a reversible encoding scheme that anyone can decode without a key, password, or secret. Treating Base64 as a security measure is a common and dangerous mistake.

The Base64 algorithm is deterministic and public. Every programming language includes built-in Base64 decoding functions (atob() in JavaScript, base64.b64decode() in Python, Base64.getDecoder() in Java). An attacker who intercepts a Base64-encoded string can decode it in under 1 millisecond.

What Base64 Does Not Provide

Confidentiality: Encoded data is readable by anyone who has the string.
Integrity: No mechanism detects whether the encoded data has been tampered with.
Authentication: No verification that the data came from a trusted source.
Non-repudiation: No proof that a specific sender produced the encoded data.

To demonstrate how easily Base64 decodes, paste any encoded string into the Base64 text decoder. The original content appears instantly.

When Base64 Appears in Security Contexts

Base64 appears in JWT tokens, TLS certificates (PEM format), and SSH keys. In these cases, Base64 is the transport encoding, not the security layer. The actual security comes from cryptographic signatures (HMAC, RSA, ECDSA) applied to the data before Base64 encoding. Removing the Base64 layer exposes the signed or encrypted binary payload, not the original plaintext.

What RFC Standards Define Base64?

Four RFC documents define Base64 encoding and its applications. RFC 4648 is the primary specification. The other 3 RFCs define Base64 usage within specific protocols: email (MIME), JSON Web Signatures, and OpenPGP.

RFC	Title	Year	Scope
RFC 4648	The Base16, Base32, and Base64 Data Encodings	2006	Primary specification. Defines standard Base64 (Section 4), URL-safe Base64 (Section 5), Base32 (Section 6), and Base16 (Section 8).
RFC 2045	MIME Part One: Format of Internet Message Bodies	1996	Defines Base64 as a Content-Transfer-Encoding for email. Specifies 76-character line wrapping with CRLF line endings.
RFC 7515	JSON Web Signature (JWS)	2015	Uses base64url encoding (RFC 4648 Section 5) for JWT header and payload segments. Omits padding by default.
RFC 4880	OpenPGP Message Format	2007	Uses Radix-64 encoding (a Base64 variant) with a 24-bit CRC checksum appended after the encoded data.

RFC 4648 superseded earlier specifications including RFC 3548 (2003) and RFC 2045 (1996) as the authoritative reference for Base64 encoding. The URL-safe variant (Section 5) was first standardized in RFC 4648, addressing the incompatibility of + and / with URL encoding.

Frequently Asked Questions

Is Base64 encoding the same as encryption?

No. Base64 is a reversible encoding scheme, not encryption. Any person or program can decode a Base64 string back to the original binary data without a key. Base64 provides zero confidentiality. For data protection, use encryption algorithms such as AES-256 or RSA.

Why does Base64 increase file size by 33%?

Base64 represents every 3 input bytes (24 bits) as 4 output characters (6 bits each). The ratio 4/3 equals approximately 1.333, producing a 33.3% size increase. Additional overhead comes from padding characters and, in MIME encoding (RFC 2045), line breaks every 76 characters.

Can Base64 encode any type of file?

Yes. Base64 operates on raw bytes, so it encodes any binary data: images (PNG, JPEG, GIF, WebP), PDFs, audio files, video files, executables, ZIP archives, and plain text. The encoded output is always a printable ASCII string regardless of the input format. Encode files using the image encoder or text encoder.

What is the maximum size for Base64 encoding?

No theoretical maximum exists in the Base64 specification (RFC 4648). Practical limits depend on the application: browser JavaScript engines typically handle strings up to 512MB, data URIs in CSS have browser-specific limits (Chrome allows approximately 2MB), and email systems using MIME often cap attachments at 25MB before encoding.

Is Base64 encoding reversible?

Yes. Base64 encoding is fully reversible and lossless. Decoding a Base64 string always produces the exact original binary data, byte for byte. The process is deterministic: the same input always produces the same output, and decoding always recovers the same input. Test this using the Base64 text decoder.