Files
vath/README.md
2026-05-05 11:35:19 -04:00

3.3 KiB
Raw Blame History

Table of Contents

  1. Project Goals
  2. Architecture
    1. Scraper
    2. Storage
    3. Analysis
  3. Roadmap

Project Goals

  1. Document and analyze sentiment of public comments on Virginia law, to determine:
    1. the utility of this forum as a mechanism for public comment, and
    2. the impact of this forum on Virginia regulation.
  2. Make data and insights broadly available.
  3. Generalize to other public comment tools.

Architecture

  1. Scrape/Parse: Scrapy for downloading comments
  2. Storage: json
  3. Sentiment analysis: Claude haiku
  4. Display: TBD

Scraper

Scrapy provides a simple mechanism for browsing and

  1. Forums listing page: `Forums.cfm` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count
  2. Comment listing page: `comments.cfm?GDocForumID=X` or `comments.cfm?stageid=X` or `comments.cfm?petitionid=X` - lists comments with title, author, date
  3. Individual comment page: `viewcomments.cfm?commentid=X` - shows regulation title + brief description at the top, plus the comment

Storage

One JSONL file per forum/bill.

Analysis

Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback.

Tool Output Context Sarcasm Context window Cost/1k comments
Google NL API -1→+1, magnitude No/generic Poorly No ~$12
Amazon Comprehend Pos/Neg/Neutral/Mixed No/generic Poorly No ~$0.10
Claude Haiku Prompted → for/against/neutral Yes Yes, with prompt Yes ~$0.100.30
GPT-4o-mini Prompted → same Yes Yes Yes ~$0.050.15

Roadmap

  1. Scrape one forum
  2. Compare sentiment models
  3. Display
  4. Scrape all data
  5. Scale?