Spider fetches ViewComments.cfm?GdocForumID=N with vPerPage=500, generates all page requests from page-1 metadata, and parses each div.Cbox for comment_id, author, date, title, text, reg_title, reg_desc. Handles span-wrapped comment text. Fixes UTF-8/windows-1251 meta-tag encoding mismatch. 9083 items, 15 empty-text (0.17%). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
30 lines
276 B
Plaintext
30 lines
276 B
Plaintext
# --- python bytecode ---
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
|
|
# --- environment files ---
|
|
.env
|
|
.env.*
|
|
*.local
|
|
.venv/
|
|
venv/
|
|
env/
|
|
|
|
# --- emacs ---
|
|
*~
|
|
\#*\#
|
|
.\#*
|
|
*.elc
|
|
|
|
# --- project private data ---
|
|
/private/
|
|
archive/
|
|
|
|
|
|
# --- scrapy ---
|
|
.scrapy/
|
|
output/
|
|
|
|
# --- misc ---
|
|
.DS_Store |