update readme

This commit is contained in:
2026-05-05 11:38:57 -04:00
parent cd3543bd0f
commit 02964312cb

View File

@@ -1,16 +1,19 @@
# Table of Contents # Table of Contents
1. [Project Goals](#org863a759) 1. [Project Goals](#org5acb669)
2. [Architecture](#orgcd91fd0) 1. [Document and analyze sentiment](#org9291576)
1. [Scraper](#org3256ad3) 2. [Make data available](#org8054421)
2. [Storage](#org7a9a92c) 3. [Generalize](#orgdda4b6f)
3. [Analysis](#org6ed72dc) 2. [Architecture](#org1d6bc40)
3. [Roadmap](#org416f14d) 1. [Scraper](#org4298028)
2. [Storage](#org1cd413c)
3. [Analysis](#orgaea450e)
3. [Roadmap](#org6b7660d)
<a id="org863a759"></a> <a id="org5acb669"></a>
# Project Goals # Project Goals
@@ -21,7 +24,30 @@
3. Generalize to other public comment tools. 3. Generalize to other public comment tools.
<a id="orgcd91fd0"></a> <a id="org9291576"></a>
## Document and analyze sentiment
- Scrape the data, parse, clean, and store. Clearly separate scraper from sentiment analyzer for maximum auditability.
- Build tests for identifying abuse, such as spam and account fraud
- Identify any patterns connecting measured sentiment against VA decisions
<a id="org8054421"></a>
## Make data available
- Pick a good visualization tool
<a id="orgdda4b6f"></a>
## Generalize
- Identify scalable ways to apply this toolset to similar problems
<a id="org1d6bc40"></a>
# Architecture # Architecture
@@ -31,7 +57,7 @@
4. Display: TBD 4. Display: TBD
<a id="org3256ad3"></a> <a id="org4298028"></a>
## Scraper ## Scraper
@@ -42,14 +68,14 @@ Scrapy provides a simple mechanism for browsing and
3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment 3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment
<a id="org7a9a92c"></a> <a id="org1cd413c"></a>
## Storage ## Storage
One JSONL file per forum/bill. One JSONL file per forum/bill.
<a id="org6ed72dc"></a> <a id="orgaea450e"></a>
## Analysis ## Analysis
@@ -121,7 +147,7 @@ Google and Amazon both return generic sentiment (tone of writing: positive/negat
</table> </table>
<a id="org416f14d"></a> <a id="org6b7660d"></a>
# Roadmap # Roadmap