Editorial guide ยท Updated 2026-03-09

Zero-width characters: what they are and why they break copied text

Zero-width characters are invisible, but they are not harmless. They slip through copy and paste from chat tools, PDFs, and rich editors, then quietly break exact-match search, URLs, hashtags, filenames, and form validation.

Key takeaways

  • The most common troublemakers are U+200B zero-width space, U+200C zero-width non-joiner, U+200D zero-width joiner, U+2060 word joiner, and U+FEFF byte-order mark.
  • These marks often arrive from chat apps, browser editors, PDF extraction, OCR, and multilingual rendering systems.
  • You should remove them when they are accidental, but verify shaping-sensitive scripts before flattening everything.

Why invisible characters cause real problems

The main trap is that zero-width characters do not announce themselves. A word can look perfect on screen and still fail an exact keyword match because the system is not comparing the visible glyphs; it is comparing the underlying code points.

That matters in practical workflows: hiring portals can miss a keyword, markdown links can fail, analytics filters can split what should be one token, and filenames can stop matching the value you thought you copied.

Where they usually come from

They are common in text copied from AI assistants, messaging apps, design-heavy editors, PDF viewers, and OCR layers. Some tools insert them to suggest line-breaking behavior or preserve visual shaping in complex scripts.

The problem is not the original rendering context. The problem starts when the text leaves that context and lands in a stricter system that expects plain spacing and predictable tokens.

  • Chat and email drafts copied into CMS, CRM, ATS, or ticketing tools.
  • PDF and OCR output where hidden layout markers survive the paste.
  • Usernames, URLs, product codes, and hashtags that must match exactly.

When to remove them and when to be careful

Remove them aggressively in plain-text workflows, structured fields, filenames, and anything that depends on exact matching. Those are the cases where invisible characters create bugs faster than they create value.

Be more careful in scripts that intentionally rely on joiners or non-joiners for shaping. In those cases the right move is not blind deletion; it is a quick human check after cleanup to confirm the rendered word still looks correct.