Day 5: Creating a Dynamic Blog with MDX and Migrating Legacy Content

n-tien

May 03, 2025

6 mins read

Technology 7 Days Website Revamp

Creating a Dynamic Blog with MDX and Migrating Legacy Content

📢 If you haven't read the previous parts, check out Day 1, Day 2, Day 3 and Day 4!

📚 You can also explore all posts from the 7 Days Website Revamp series!

Why We Chose MDX for Our Blog

We needed a blogging system that was:

Developer-friendly
SEO-optimized
Flexible enough to embed custom components
Easy to manage content without building a CMS

MDX (Markdown + JSX) was the perfect choice — allowing us to write content like Markdown, but embed React components when needed.

Setting Up MDX in Next.js

We followed the official Next.js MDX guide.

First, we installed the required packages:

npm install @next/mdx @mdx-js/loader

Then, we updated next.config.js to handle .mdx pages:

import withMDX from "@next/mdx";

const nextConfig = withMDX({
  extension: /\.mdx?$/,
})({
  // Your existing Next.js config
  pageExtensions: ["js", "jsx", "ts", "tsx", "mdx"],
});

export default nextConfig;

Now, our app can render .mdx files just like any other React page.

Organizing Blog Content

We’ve organized our blog content in the following directory:

/src/content/blog/

Each blog post is an .mdx file that includes:

Frontmatter metadata at the top (enclosed within ---)
The body content written in Markdown/JSX

Example frontmatter:

---
title: "Building a Dynamic Blog with MDX"
author: "Tiến Nguyễn Hữu Anh"
date: "May 3, 2025"
categories: ["Technology"]
featured: true
draft: false
image: "/images/blog/day-5-nextjs-mdx/day5-mdx.webp"
subtitle: "Using MDX to power our content system."
---

Parsing and Reading Blog Data

To manage our blog posts dynamically, we created a utility module that handles:

Reading MDX files from the /src/content/blog/ directory
Parsing frontmatter metadata (title, date, author, subtitle, etc.)
Sorting posts by publication date
Returning content and metadata for rendering

This allowed us to generate blog indexes, tag pages, and individual post pages easily at build time.

Utility functions include:

getBlogPosts() → Fetch all blog posts
getBlogPost(slug) → Fetch a single blog post by its slug
getBlogCategories() → List all categories and post counts
sortByPublicationDate() → Sort posts newest to oldest

Example Utility Functions

import fs from "fs";
import path from "path";

type Metadata = {
  title: string;
  subtitle: string;
  image: string;
  author: string;
  date: string;
  categories: string[];
  featured: boolean;
  draft: boolean;
};

export type MDXBlog = {
  metadata: Metadata;
  slug: string;
  content: string;
};

function readMDXFile(filePath: string): MDXBlog {
  const rawContent = fs.readFileSync(filePath, "utf-8");
  const { metadata, content } = parseFrontmatter(filePath, rawContent);
  const slug = path.basename(filePath, ".mdx");

  return { metadata, slug, content };
}

function getMDXFiles(dir: string): string[] {
  return fs.readdirSync(dir).filter((file) => file.endsWith(".mdx"));
}

function getMDXData(dir: string): MDXBlog[] {
  const mdxFiles = getMDXFiles(dir);
  return mdxFiles.map((file) => readMDXFile(path.join(dir, file)));
}

export function getBlogPosts(): MDXBlog[] {
  const blogDir = path.join(process.cwd(), "src", "content", "blog");
  return getMDXData(blogDir);
}

export function getBlogPost(slug: string): MDXBlog | null {
  const blogDir = path.join(process.cwd(), "src", "content", "blog");
  const filePath = path.join(blogDir, `${slug}.mdx`);

  if (!fs.existsSync(filePath)) {
    return null; // Return null if the file does not exist
  }

  return readMDXFile(filePath);
}

// ... and other utils functions

Example: Rendering Blog Posts

On the frontend, we used the parsed blog data like this:

import { getBlogPosts } from "@/lib/utils";

const blogs = getBlogPosts();

export default function BlogListPage() {
  return (
    <div>
      {blogs.map((post) => (
        <div key={post.slug}>
          <h2>{post.metadata.title}</h2>
          <p>{post.metadata.date}</p>
        </div>
      ))}
    </div>
  );
}

Migrating Legacy Blog Posts

We had a lot of older blog posts on our previous Wix website (see why we switched from Wix to Next.js in our Day 1 article).

To streamline the process, we created a small Python crawler to automate the migration:

Crawled the old site
Extracted blog titles, dates, categories, and content
Converted the HTML to Markdown using libraries like html2text
Saved each blog as a clean .mdx file in /src/content/blog/

This approach minimized manual errors and ensured our blog history was preserved accurately.

Example Python Crawler for Blog Migration

To automate the migration of our legacy blog posts, we built a lightweight crawler using Python, Selenium, and BeautifulSoup.

Below is a simplified structure of the crawler:

from selenium import webdriver
from bs4 import BeautifulSoup
from markdownify import markdownify
import requests
import os

def crawl_blog(url, output_dir):
    driver = webdriver.Chrome()
    driver.get(url)
    html = driver.page_source
    driver.quit()

    soup = BeautifulSoup(html, "html.parser")
    article = soup.find("article")
    content = markdownify(str(article)) if article else ""

    metadata = {
        "title": soup.title.string if soup.title else "Untitled",
        "author": "Unknown",
        "date": "2025-01-01",
        "categories": ["Uncategorized"]
    }

    mdx_content = f"""---
title: "{metadata['title']}"
author: "{metadata['author']}"
date: "{metadata['date']}"
categories: {metadata['categories']}
---

{content}
    """

    os.makedirs(output_dir, exist_ok=True)
    with open(os.path.join(output_dir, "post.mdx"), "w", encoding="utf-8") as f:
        f.write(mdx_content)

crawl_blog("https://example.com/blog-post", "output")

Challenges and Lessons Learned

Frontmatter Consistency: Every blog post needed clean and accurate frontmatter to ensure proper rendering, SEO, and linking.
Handling Images: Old blog images were downloaded and referenced locally to maintain performance and eliminate dependency on external URLs.
Markdown Cleanup: Some artifacts from old HTML formatting needed manual fixing after automatic conversion.
SEO Considerations: Preserving original publication dates and metadata ensured better continuity for Google indexing.

Final Thoughts

Using Next.js + MDX combined with simple utilities allowed us to create a fast, scalable, and flexible blogging platform without the complexity of a traditional CMS.

Migrating our old content automatically also saved time, reduced errors, and kept the project moving forward efficiently.

In the next post, we'll explore how we built a dynamic Careers Page using Markdown and AWS Lightsail databases!

You can now continue with Day 6: Building a Careers Page with Markdown and Lightsail Database!

📚 You can also explore all posts from the 7 Days Website Revamp series!

← Back to Day 4 Next: Day 6 →