Files
vath/README.md
2026-05-05 11:38:57 -04:00

4.0 KiB
Raw Blame History

Table of Contents

  1. Project Goals
    1. Document and analyze sentiment
    2. Make data available
    3. Generalize
  2. Architecture
    1. Scraper
    2. Storage
    3. Analysis
  3. Roadmap

Project Goals

  1. Document and analyze sentiment of public comments on Virginia law, to determine:
    1. the utility of this forum as a mechanism for public comment, and
    2. the impact of this forum on Virginia regulation.
  2. Make data and insights broadly available.
  3. Generalize to other public comment tools.

Document and analyze sentiment

  • Scrape the data, parse, clean, and store. Clearly separate scraper from sentiment analyzer for maximum auditability.
  • Build tests for identifying abuse, such as spam and account fraud
  • Identify any patterns connecting measured sentiment against VA decisions

Make data available

  • Pick a good visualization tool

Generalize

  • Identify scalable ways to apply this toolset to similar problems

Architecture

  1. Scrape/Parse: Scrapy for downloading comments
  2. Storage: json
  3. Sentiment analysis: Claude haiku
  4. Display: TBD

Scraper

Scrapy provides a simple mechanism for browsing and

  1. Forums listing page: `Forums.cfm` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count
  2. Comment listing page: `comments.cfm?GDocForumID=X` or `comments.cfm?stageid=X` or `comments.cfm?petitionid=X` - lists comments with title, author, date
  3. Individual comment page: `viewcomments.cfm?commentid=X` - shows regulation title + brief description at the top, plus the comment

Storage

One JSONL file per forum/bill.

Analysis

Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback.

Tool Output Context Sarcasm Context window Cost/1k comments
Google NL API -1→+1, magnitude No/generic Poorly No ~$12
Amazon Comprehend Pos/Neg/Neutral/Mixed No/generic Poorly No ~$0.10
Claude Haiku Prompted → for/against/neutral Yes Yes, with prompt Yes ~$0.100.30
GPT-4o-mini Prompted → same Yes Yes Yes ~$0.050.15

Roadmap

  1. Scrape one forum
  2. Compare sentiment models
  3. Display
  4. Scrape all data
  5. Scale?