HTML to Markdown: Best Tools and Methods

By Risetop Team · March 5, 2025 · 9 min read

While converting Markdown to HTML is the more common direction, the reverse—converting HTML to Markdown—is increasingly important. Whether you're migrating a blog, cleaning up copied web content, extracting documentation from a CMS, or simply prefer writing in Markdown, knowing how to strip HTML back to its Markdown essence is a valuable skill.

This guide explores the use cases, manual techniques, automated tools, and best practices for HTML to Markdown conversion.

Why Convert HTML to Markdown?

📖 Migrating Content

Moving from WordPress, Medium, or any HTML-based CMS to a static site generator (Hugo, Jekyll, Astro) requires converting HTML posts to Markdown. This is often the first step in any CMS migration project.

📋 Cleaning Copied Content

When you copy text from a webpage, it comes with HTML formatting that can mess up your documents. Converting to Markdown first strips the noise and gives you clean, editable text.

🔧 Documentation Workflow

Many teams are switching from HTML documentation to Markdown-based systems (GitBook, Notion, Confluence with Markdown). Converting existing HTML docs to Markdown enables this transition.

📱 Content Reuse

Markdown is the lingua franca for developer content. Converting HTML to Markdown lets you reuse web content in README files, wikis, chat messages, and documentation platforms.

Manual Conversion Techniques

For small amounts of HTML, manual conversion is often faster than finding and configuring a tool. Here's a systematic approach:

Headings

<h1>Title</h1>        →  # Title
<h2>Subtitle</h2>      →  ## Subtitle
<h3>Section</h3>       →  ### Section

Text Formatting

<strong>bold</strong>    →  **bold**
<em>italic</em>        →  *italic*
<del>strike</del>      →  ~~strike~~
<code>inline</code>     →  `inline`

Links and Images

<a href="url">text</a>              →  [text](url)
<img src="src" alt="alt">          →  ![alt](src)

Lists

<ul><li>Item</li></ul>     →  - Item
<ol><li>First</li></ol>    →  1. First

Tables

<table>
  <tr><th>A</th><th>B</th></tr>
  <tr><td>1</td><td>2</td></tr>
</table>

→
| A | B |
|---|---|
| 1 | 2 |
💡 Tip: For manual conversion, use your editor's find-and-replace with regex. For example, replace <strong>(.*?)</strong> with **$1** to batch-convert bold tags.

Automated Tools and Libraries

Online Converters

The fastest approach for one-off conversions. Paste your HTML and get Markdown back. Our HTML to Markdown converter handles complex HTML including tables, nested lists, and inline formatting, producing clean, standards-compliant Markdown.

Command-Line: Pandoc

Pandoc is the gold standard for document conversion and supports HTML to Markdown:

# Basic conversion
pandoc input.html -o output.md -t gfm

# With table support
pandoc input.html -o output.md -t gfm+pipe_tables

# Convert a directory of HTML files
for f in *.html; do
  pandoc "$f" -o "${f%.html}.md" -t gfm --wrap=none
done

The -t gfm flag targets GitHub Flavored Markdown, which is the most widely supported variant. --wrap=none prevents line wrapping in the output.

Python: html2text or markdownify

# Using html2text
import html2text
h = html2text.HTML2Text()
h.ignore_links = False
h.body_width = 0  # No wrapping
markdown = h.handle('<h1>Hello</h1><p>World</p>')

# Using markdownify
from markdownify import markdownify as md
markdown = md('<h1>Hello</h1><p>World</p>', heading_style="ATX")

JavaScript: Turndown

const TurndownService = require('turndown');
const turndown = new TurndownService({
  headingStyle: 'atx',
  codeBlockStyle: 'fenced'
});

const markdown = turndown.turndown(`
  <h1>Hello</h1>
  <p>This is <strong>HTML</strong> content.</p>
`);
// Output: # Hello\n\nThis is **HTML** content.

Turndown is the most popular JavaScript HTML-to-Markdown library with over 4,000 GitHub stars. It's highly configurable and supports custom rules for handling non-standard HTML.

Browser Extensions

For converting web pages directly in your browser:

Handling Complex HTML

Nested Tables

Markdown tables are flat—no nested tables, merged cells, or colspan/rowspan. When converting complex tables, consider simplifying the structure or using HTML tables within Markdown (most parsers support this).

Styled Content

HTML classes, inline styles, and CSS-driven layouts have no Markdown equivalent. During conversion, focus on the content structure (headings, paragraphs, lists) and accept that visual styling will be lost. Re-apply styles in your Markdown rendering layer.

JavaScript and Interactive Elements

Scripts, forms, iframes, and interactive widgets cannot be represented in Markdown. Document these as code blocks or notes in the converted output.

Images with Complex Captions

HTML figure elements with figcaptions, lightbox links, and responsive classes need special handling. Convert to Markdown image syntax with the caption as alt text, or use a custom syntax supported by your platform.

Best Practices

Frequently Asked Questions

Why convert HTML to Markdown instead of keeping HTML?
Markdown is more readable, easier to edit, version-control friendly, and portable. It's ideal for documentation, wikis, READMEs, and any content that humans need to read and edit regularly.
Can all HTML be converted to Markdown?
Not perfectly. Complex HTML with JavaScript, forms, iframes, and deeply nested tables may lose information. Markdown covers common formatting (headings, lists, links, images, code, tables) but can't represent everything HTML can.
How do I convert HTML from a webpage to Markdown?
Use browser extensions like MarkDownload, command-line tools like pandoc, or online converters. For bulk conversion of saved HTML pages, pandoc with a batch script works well.

Convert HTML to Markdown Now

Our free converter handles complex HTML including tables, lists, and code blocks.

Convert HTML → Markdown Now