adding image

This commit is contained in:
2026-05-07 18:00:51 -04:00
parent bdab3c5e21
commit eaaefb66f2
2 changed files with 20 additions and 16 deletions

View File

@@ -1,18 +1,18 @@
# Table of Contents # Table of Contents
1. [Project Goals](#orgf37a106) 1. [Project Goals](#org2da6874)
1. [Research questions](#orgec50d46) 1. [Research questions](#org1a2b8b3)
2. [Architecture](#org7a5389e) 2. [Architecture](#orgfabfcd9)
1. [Scraper](#org7771df2) 1. [Scraper](#org2c5c7a2)
2. [Analysis](#org16a9e36) 2. [Analysis](#org72990f4)
3. [Storage](#org7341391) 3. [Storage](#org58a5b72)
3. [Instructions](#org692b2f6) 3. [Instructions](#org24fe465)
1. [Roadmap](#org9f21934) 1. [Roadmap](#org5739d49)
<a id="orgf37a106"></a> <a id="org2da6874"></a>
## Project Goals ## Project Goals
@@ -23,7 +23,7 @@
3. Generalize to other public comment tools. 3. Generalize to other public comment tools.
<a id="orgec50d46"></a> <a id="org1a2b8b3"></a>
### Research questions ### Research questions
@@ -38,7 +38,7 @@
(I anticipate this will not be measurable from currently available data) (I anticipate this will not be measurable from currently available data)
<a id="org7a5389e"></a> <a id="orgfabfcd9"></a>
## Architecture ## Architecture
@@ -47,8 +47,10 @@
3. Display: streamlit 3. Display: streamlit
4. Storage: jsonl, csv, parquet 4. Storage: jsonl, csv, parquet
![img](//pipeline-v1.2.3.svg)
<a id="org7771df2"></a>
<a id="org2c5c7a2"></a>
### Scraper ### Scraper
@@ -59,7 +61,7 @@ Scrapy provides a simple mechanism for retrieving, parsing, and saving content f
3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment 3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment
<a id="org16a9e36"></a> <a id="org72990f4"></a>
### Analysis ### Analysis
@@ -101,7 +103,7 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time.
\`\`\` \`\`\`
<a id="org7341391"></a> <a id="org58a5b72"></a>
### Storage ### Storage
@@ -120,7 +122,7 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time.
- Once complete, the cleanup script saves \`review.csv\`, \`review.pqt\`, and \`review.sqlite\` in this folder. - Once complete, the cleanup script saves \`review.csv\`, \`review.pqt\`, and \`review.sqlite\` in this folder.
<a id="org692b2f6"></a> <a id="org24fe465"></a>
## Instructions ## Instructions
@@ -144,7 +146,7 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time.
\`python analysis/openai<sub>batch.py</sub> submit\` \`python analysis/openai<sub>batch.py</sub> submit\`
<a id="org9f21934"></a> <a id="org5739d49"></a>
# Roadmap # Roadmap

View File

@@ -26,6 +26,8 @@
3. Display: streamlit 3. Display: streamlit
4. Storage: jsonl, csv, parquet 4. Storage: jsonl, csv, parquet
[[file://./pipeline-v1.2.3.svg]]
*** Scraper *** Scraper
Scrapy provides a simple mechanism for retrieving, parsing, and saving content form the forums. Scrapy provides a simple mechanism for retrieving, parsing, and saving content form the forums.
1. Forums listing page: `Forums.cfm` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count 1. Forums listing page: `Forums.cfm` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count