HTML to Markdown: Best Tools and Methods
While converting Markdown to HTML is the more common direction, the reverse—converting HTML to Markdown—is increasingly important. Whether you're migrating a blog, cleaning up copied web content, extracting documentation from a CMS, or simply prefer writing in Markdown, knowing how to strip HTML back to its Markdown essence is a valuable skill.
This guide explores the use cases, manual techniques, automated tools, and best practices for HTML to Markdown conversion.
Why Convert HTML to Markdown?
📖 Migrating Content
Moving from WordPress, Medium, or any HTML-based CMS to a static site generator (Hugo, Jekyll, Astro) requires converting HTML posts to Markdown. This is often the first step in any CMS migration project.
📋 Cleaning Copied Content
When you copy text from a webpage, it comes with HTML formatting that can mess up your documents. Converting to Markdown first strips the noise and gives you clean, editable text.
🔧 Documentation Workflow
Many teams are switching from HTML documentation to Markdown-based systems (GitBook, Notion, Confluence with Markdown). Converting existing HTML docs to Markdown enables this transition.
📱 Content Reuse
Markdown is the lingua franca for developer content. Converting HTML to Markdown lets you reuse web content in README files, wikis, chat messages, and documentation platforms.
Manual Conversion Techniques
For small amounts of HTML, manual conversion is often faster than finding and configuring a tool. Here's a systematic approach:
Headings
<h1>Title</h1> → # Title
<h2>Subtitle</h2> → ## Subtitle
<h3>Section</h3> → ### Section
Text Formatting
<strong>bold</strong> → **bold**
<em>italic</em> → *italic*
<del>strike</del> → ~~strike~~
<code>inline</code> → `inline`
Links and Images
<a href="url">text</a> → [text](url)
<img src="src" alt="alt"> → 
Lists
<ul><li>Item</li></ul> → - Item
<ol><li>First</li></ol> → 1. First
Tables
<table>
<tr><th>A</th><th>B</th></tr>
<tr><td>1</td><td>2</td></tr>
</table>
→
| A | B |
|---|---|
| 1 | 2 |
<strong>(.*?)</strong> with **$1** to batch-convert bold tags.
Automated Tools and Libraries
Online Converters
The fastest approach for one-off conversions. Paste your HTML and get Markdown back. Our HTML to Markdown converter handles complex HTML including tables, nested lists, and inline formatting, producing clean, standards-compliant Markdown.
Command-Line: Pandoc
Pandoc is the gold standard for document conversion and supports HTML to Markdown:
# Basic conversion
pandoc input.html -o output.md -t gfm
# With table support
pandoc input.html -o output.md -t gfm+pipe_tables
# Convert a directory of HTML files
for f in *.html; do
pandoc "$f" -o "${f%.html}.md" -t gfm --wrap=none
done
The -t gfm flag targets GitHub Flavored Markdown, which is the most widely supported variant. --wrap=none prevents line wrapping in the output.
Python: html2text or markdownify
# Using html2text
import html2text
h = html2text.HTML2Text()
h.ignore_links = False
h.body_width = 0 # No wrapping
markdown = h.handle('<h1>Hello</h1><p>World</p>')
# Using markdownify
from markdownify import markdownify as md
markdown = md('<h1>Hello</h1><p>World</p>', heading_style="ATX")
JavaScript: Turndown
const TurndownService = require('turndown');
const turndown = new TurndownService({
headingStyle: 'atx',
codeBlockStyle: 'fenced'
});
const markdown = turndown.turndown(`
<h1>Hello</h1>
<p>This is <strong>HTML</strong> content.</p>
`);
// Output: # Hello\n\nThis is **HTML** content.
Turndown is the most popular JavaScript HTML-to-Markdown library with over 4,000 GitHub stars. It's highly configurable and supports custom rules for handling non-standard HTML.
Browser Extensions
For converting web pages directly in your browser:
- MarkDownload (Firefox/Chrome) — One-click Markdown conversion of any webpage
- Copy as Markdown (Chrome) — Copy selected content as Markdown
- Markdown Viewer (Chrome) — View Markdown files in the browser, can extract Markdown from pages
Handling Complex HTML
Nested Tables
Markdown tables are flat—no nested tables, merged cells, or colspan/rowspan. When converting complex tables, consider simplifying the structure or using HTML tables within Markdown (most parsers support this).
Styled Content
HTML classes, inline styles, and CSS-driven layouts have no Markdown equivalent. During conversion, focus on the content structure (headings, paragraphs, lists) and accept that visual styling will be lost. Re-apply styles in your Markdown rendering layer.
JavaScript and Interactive Elements
Scripts, forms, iframes, and interactive widgets cannot be represented in Markdown. Document these as code blocks or notes in the converted output.
Images with Complex Captions
HTML figure elements with figcaptions, lightbox links, and responsive classes need special handling. Convert to Markdown image syntax with the caption as alt text, or use a custom syntax supported by your platform.
Best Practices
- Validate output: After conversion, render the Markdown back to HTML and compare with the original to catch lost content.
- Preserve structure first, formatting second: A heading hierarchy is more important than preserving every bold tag.
- Use consistent conventions: Choose ATX headings (
#) or Setext headings (===) and stick with one style. - Test in your target platform: Different Markdown renderers support different features. Test your output in the platform where it will be used.
- Keep a conversion log: For large migrations, document which HTML patterns needed manual attention.
Frequently Asked Questions
Convert HTML to Markdown Now
Our free converter handles complex HTML including tables, lists, and code blocks.
Convert HTML → Markdown Now