a

article-extractor

Extract full article text and metadata from web pages

Home/Communication/article-extractor

What is it?

A Claude Code skill for extracting full article text and metadata from web pages. It strips away navigation, ads, sidebars, and other non-content elements to deliver clean, readable article text. Ideal for content research, archiving, and building knowledge bases from web sources.

How to use it?

When you provide a URL, the skill automatically fetches the web page, identifies the main article content, and extracts clean text along with metadata such as title, author, publish date, and description. It handles various website layouts and content management systems.

The extracted content can be used for research, summarization, or further processing within your Claude workflow.

Key Features

  • Clean text extraction from web articles, removing ads, navigation, and clutter
  • Metadata extraction including title, author, date, and description
  • Handles various website layouts and CMS platforms
  • Integrates with other Tapestry skills for content processing pipelines
  • Preserves article structure and formatting
View on GitHub

GitHub Stats

Stars
Forks
Last Update
License
MIT
Version
1.0.0

Categories

Features