initial commit
This commit is contained in:
133
README.md
Normal file
133
README.md
Normal file
@@ -0,0 +1,133 @@
|
||||
|
||||
# Table of Contents
|
||||
|
||||
1. [Project Goals](#org863a759)
|
||||
2. [Architecture](#orgcd91fd0)
|
||||
1. [Scraper](#org3256ad3)
|
||||
2. [Storage](#org7a9a92c)
|
||||
3. [Analysis](#org6ed72dc)
|
||||
3. [Roadmap](#org416f14d)
|
||||
|
||||
|
||||
|
||||
<a id="org863a759"></a>
|
||||
|
||||
# Project Goals
|
||||
|
||||
1. Document and analyze sentiment of public comments on Virginia law, to determine:
|
||||
1. the utility of this forum as a mechanism for public comment, and
|
||||
2. the impact of this forum on Virginia regulation.
|
||||
2. Make data and insights broadly available.
|
||||
3. Generalize to other public comment tools.
|
||||
|
||||
|
||||
<a id="orgcd91fd0"></a>
|
||||
|
||||
# Architecture
|
||||
|
||||
1. Scrape/Parse: ****Scrapy**** for downloading comments
|
||||
2. Storage: json
|
||||
3. Sentiment analysis: Claude haiku
|
||||
4. Display: TBD
|
||||
|
||||
|
||||
<a id="org3256ad3"></a>
|
||||
|
||||
## Scraper
|
||||
|
||||
Scrapy provides a simple mechanism for browsing and
|
||||
|
||||
1. Forums listing page: \`Forums.cfm\` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count
|
||||
2. Comment listing page: \`comments.cfm?GDocForumID=X\` or \`comments.cfm?stageid=X\` or \`comments.cfm?petitionid=X\` - lists comments with title, author, date
|
||||
3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment
|
||||
|
||||
|
||||
<a id="org7a9a92c"></a>
|
||||
|
||||
## Storage
|
||||
|
||||
One JSONL file per forum/bill.
|
||||
|
||||
|
||||
<a id="org6ed72dc"></a>
|
||||
|
||||
## Analysis
|
||||
|
||||
Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback.
|
||||
|
||||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||||
|
||||
|
||||
<colgroup>
|
||||
<col class="org-left" />
|
||||
|
||||
<col class="org-left" />
|
||||
|
||||
<col class="org-left" />
|
||||
|
||||
<col class="org-left" />
|
||||
|
||||
<col class="org-left" />
|
||||
|
||||
<col class="org-left" />
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr>
|
||||
<th scope="col" class="org-left">Tool</th>
|
||||
<th scope="col" class="org-left">Output</th>
|
||||
<th scope="col" class="org-left">Context</th>
|
||||
<th scope="col" class="org-left">Sarcasm</th>
|
||||
<th scope="col" class="org-left">Context window</th>
|
||||
<th scope="col" class="org-left">Cost/1k comments</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td class="org-left">Google NL API</td>
|
||||
<td class="org-left">-1→+1, magnitude</td>
|
||||
<td class="org-left">No/generic</td>
|
||||
<td class="org-left">Poorly</td>
|
||||
<td class="org-left">No</td>
|
||||
<td class="org-left">~$1–2</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td class="org-left">Amazon Comprehend</td>
|
||||
<td class="org-left">Pos/Neg/Neutral/Mixed</td>
|
||||
<td class="org-left">No/generic</td>
|
||||
<td class="org-left">Poorly</td>
|
||||
<td class="org-left">No</td>
|
||||
<td class="org-left">~$0.10</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td class="org-left">Claude Haiku</td>
|
||||
<td class="org-left">Prompted → for/against/neutral</td>
|
||||
<td class="org-left">Yes</td>
|
||||
<td class="org-left">Yes, with prompt</td>
|
||||
<td class="org-left">Yes</td>
|
||||
<td class="org-left">~$0.10–0.30</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td class="org-left">GPT-4o-mini</td>
|
||||
<td class="org-left">Prompted → same</td>
|
||||
<td class="org-left">Yes</td>
|
||||
<td class="org-left">Yes</td>
|
||||
<td class="org-left">Yes</td>
|
||||
<td class="org-left">~$0.05–0.15</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
<a id="org416f14d"></a>
|
||||
|
||||
# Roadmap
|
||||
|
||||
1. Scrape one forum
|
||||
2. Compare sentiment models
|
||||
3. Display
|
||||
4. Scrape all data
|
||||
5. Scale?
|
||||
|
||||
Reference in New Issue
Block a user