134 lines
3.3 KiB
Markdown
134 lines
3.3 KiB
Markdown
|
||
# Table of Contents
|
||
|
||
1. [Project Goals](#org863a759)
|
||
2. [Architecture](#orgcd91fd0)
|
||
1. [Scraper](#org3256ad3)
|
||
2. [Storage](#org7a9a92c)
|
||
3. [Analysis](#org6ed72dc)
|
||
3. [Roadmap](#org416f14d)
|
||
|
||
|
||
|
||
<a id="org863a759"></a>
|
||
|
||
# Project Goals
|
||
|
||
1. Document and analyze sentiment of public comments on Virginia law, to determine:
|
||
1. the utility of this forum as a mechanism for public comment, and
|
||
2. the impact of this forum on Virginia regulation.
|
||
2. Make data and insights broadly available.
|
||
3. Generalize to other public comment tools.
|
||
|
||
|
||
<a id="orgcd91fd0"></a>
|
||
|
||
# Architecture
|
||
|
||
1. Scrape/Parse: ****Scrapy**** for downloading comments
|
||
2. Storage: json
|
||
3. Sentiment analysis: Claude haiku
|
||
4. Display: TBD
|
||
|
||
|
||
<a id="org3256ad3"></a>
|
||
|
||
## Scraper
|
||
|
||
Scrapy provides a simple mechanism for browsing and
|
||
|
||
1. Forums listing page: \`Forums.cfm\` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count
|
||
2. Comment listing page: \`comments.cfm?GDocForumID=X\` or \`comments.cfm?stageid=X\` or \`comments.cfm?petitionid=X\` - lists comments with title, author, date
|
||
3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment
|
||
|
||
|
||
<a id="org7a9a92c"></a>
|
||
|
||
## Storage
|
||
|
||
One JSONL file per forum/bill.
|
||
|
||
|
||
<a id="org6ed72dc"></a>
|
||
|
||
## Analysis
|
||
|
||
Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback.
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="org-left" />
|
||
|
||
<col class="org-left" />
|
||
|
||
<col class="org-left" />
|
||
|
||
<col class="org-left" />
|
||
|
||
<col class="org-left" />
|
||
|
||
<col class="org-left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="org-left">Tool</th>
|
||
<th scope="col" class="org-left">Output</th>
|
||
<th scope="col" class="org-left">Context</th>
|
||
<th scope="col" class="org-left">Sarcasm</th>
|
||
<th scope="col" class="org-left">Context window</th>
|
||
<th scope="col" class="org-left">Cost/1k comments</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td class="org-left">Google NL API</td>
|
||
<td class="org-left">-1→+1, magnitude</td>
|
||
<td class="org-left">No/generic</td>
|
||
<td class="org-left">Poorly</td>
|
||
<td class="org-left">No</td>
|
||
<td class="org-left">~$1–2</td>
|
||
</tr>
|
||
|
||
<tr>
|
||
<td class="org-left">Amazon Comprehend</td>
|
||
<td class="org-left">Pos/Neg/Neutral/Mixed</td>
|
||
<td class="org-left">No/generic</td>
|
||
<td class="org-left">Poorly</td>
|
||
<td class="org-left">No</td>
|
||
<td class="org-left">~$0.10</td>
|
||
</tr>
|
||
|
||
<tr>
|
||
<td class="org-left">Claude Haiku</td>
|
||
<td class="org-left">Prompted → for/against/neutral</td>
|
||
<td class="org-left">Yes</td>
|
||
<td class="org-left">Yes, with prompt</td>
|
||
<td class="org-left">Yes</td>
|
||
<td class="org-left">~$0.10–0.30</td>
|
||
</tr>
|
||
|
||
<tr>
|
||
<td class="org-left">GPT-4o-mini</td>
|
||
<td class="org-left">Prompted → same</td>
|
||
<td class="org-left">Yes</td>
|
||
<td class="org-left">Yes</td>
|
||
<td class="org-left">Yes</td>
|
||
<td class="org-left">~$0.05–0.15</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
|
||
<a id="org416f14d"></a>
|
||
|
||
# Roadmap
|
||
|
||
1. Scrape one forum
|
||
2. Compare sentiment models
|
||
3. Display
|
||
4. Scrape all data
|
||
5. Scale?
|
||
|