t1.1: scrape one forum via ViewComments.cfm POST pagination
Spider fetches ViewComments.cfm?GdocForumID=N with vPerPage=500, generates all page requests from page-1 metadata, and parses each div.Cbox for comment_id, author, date, title, text, reg_title, reg_desc. Handles span-wrapped comment text. Fixes UTF-8/windows-1251 meta-tag encoding mismatch. 9083 items, 15 empty-text (0.17%). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
4
.gitignore
vendored
4
.gitignore
vendored
@@ -22,5 +22,9 @@ env/
|
||||
archive/
|
||||
|
||||
|
||||
# --- scrapy ---
|
||||
.scrapy/
|
||||
output/
|
||||
|
||||
# --- misc ---
|
||||
.DS_Store
|
||||
Reference in New Issue
Block a user