From 7593754866c58390c92a943fbe702e385eebe28f Mon Sep 17 00:00:00 2001 From: ben Date: Thu, 7 May 2026 21:42:08 -0400 Subject: [PATCH] Update README.md fixed display --- README.md | 76 +++++++++++++++++++++++++------------------------------ 1 file changed, 35 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index 1baab11..773a8d2 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,3 @@ - # Table of Contents 1. [Project Goals](#org2da6874) @@ -56,9 +55,9 @@ Scrapy provides a simple mechanism for retrieving, parsing, and saving content form the forums. -1. Forums listing page: \`Forums.cfm\` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count -2. Comment listing page: \`comments.cfm?GDocForumID=X\` or \`comments.cfm?stageid=X\` or \`comments.cfm?petitionid=X\` - lists comments with title, author, date -3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment +1. Forums listing page: `Forums.cfm` lists all open forums with agency, reg title, action type, brief description, closing date, comment count +2. Comment listing page: `comments.cfm?GDocForumID=X` or `comments.cfm?stageid=X` or `comments.cfm?petitionid=X` lists comments with title, author, date +3. Individual comment page: `viewcomments.cfm?commentid=X` shows regulation title + brief description at the top, plus the comment @@ -74,14 +73,12 @@ Then, the batch processing scripts uses the \`report.json\` to create multiple j We selected gpt-5.4-mini for a good balance of quality, cost, and time. 1. Prompt - - \`\`\` + ``` You are an expert policy analyst classifying public comments submitted to the Virginia Town Hall regulatory comment system. You will be given the text of a proposed regulation and a single public comment. Return ONLY a JSON object — no other text. Definitions: - - stance: the commenter's position on whether the regulation should be adopted. "support" = wants it approved (as-is or with changes); "oppose" = wants it rejected or substantially weakened; @@ -93,57 +90,54 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time. "neutral" = matter-of-fact, procedural, or informational; "mixed" = contains both positive and negative emotional content; "unclear" = tone cannot be determined (e.g., a one-word comment). - - stanceconfidence: float 0.0-1.0, your confidence in the stance label. - - stancerationale: 1-3 sentences explaining the key evidence; quote specific phrases where possible. + - stance_confidence: float 0.0-1.0, your confidence in the stance label. + - stance_rationale: 1-3 sentences explaining the key evidence; quote specific phrases where possible. - tags: up to 5 short topic labels relevant to the comment's specific concerns (e.g. "parental rights", "student safety", "privacy", "religious freedom", "LGBTQ+ inclusion", "bullying prevention", "school sports", "bathroom access"). Empty array if none apply. - Return exactly these keys: stance, stanceconfidence, stancerationale, tone, tags. - \`\`\` + Return exactly these keys: stance, stance_confidence, stance_rationale, tone, tags. + ``` ### Storage -- Each scraped forum is saved to \`output/.jsonl\` -- Each report (forum + prompt) is saves to \`reports/.json\` -- Each job is saved to \`analysis/jobs//: - └─\`forum.jsonl\` is a copy of the scraped forum for convenience - └─\`prompt.txt\` is a copy of the prompt used - └─\`report.json\` is a copy of the report used - └─\`status.json\` contains metadata about the job +- Each scraped forum is saved to `output/.jsonl` +- Each report (forum + prompt) is saves to `reports/.json` +- Each job is saved to `analysis/jobs/`: + └─`forum.jsonl` is a copy of the scraped forum for convenience + └─`prompt.txt` is a copy of the prompt used + └─`report.json` is a copy of the report used + └─`status.json` contains metadata about the job For each batch in the job, four files are created: - └─\`jobN-input.jsonl\` contains the exact queries sent to the API, for troubleshooting - └─\`jobN-output-raw.jsonl\` contains the exact response from the API - └─\`jobN-output.jsonl\` contains the exact response from the API - └─\`jobN-output-errors.jsonl\` when errors are returned (this file may not exist) -- Once complete, the cleanup script saves \`review.csv\`, \`review.pqt\`, and \`review.sqlite\` in this folder. + └─`jobN-input.jsonl` contains the exact queries sent to the API, for troubleshooting + └─`jobN-output-raw.jsonl` contains the exact response from the API + └─`jobN-output.jsonl` contains the exact response from the API + └─`jobN-output-errors.jsonl` when errors are returned (this file may not exist) +- Once complete, the cleanup script saves `review.csv`, `review.pqt`, and `review.sqlite` in this folder. ## Instructions -1. Scrape the forum. - \`python -2. Run model report. - \`python analysis/tokenizer.py –prompt \` -3. To run a realtime subset: - \`python analysis/openairealtime.py –prompt –model –limit \` - \`python analysis/openairealtime.py output/f452.jsonl –prompt prompt-1.txt –model gpt-4o-mini –limit 10\` -4. To create and run the whole thing in batches, first create the batch jobs from the report: - \`python analysis/openaibatch.py create –model \` - \`python analysis/openaibatch.py create ./reports/f452-1.json –model gpt-5.4-mini\` -5. Then, run the jobs sequentially. Don't submit more than one at a time, if the model fills up the batch will fail and resubmission is not implemented. - \`python analysis/openaibatch.py submit\` - - \`python analysis/openaibatch.py status\` - - \`python analysis/openaibatch.py download\` - - \`python analysis/openaibatch.py submit\` +1. Scrape the forum. + `python` +2. Run model report. + `python analysis/tokenizer.py --prompt ` +3. To run a realtime subset: + `python analysis/openai_realtime.py --prompt --model --limit ` + `python analysis/openai_realtime.py output/f452.jsonl --prompt prompt-1.txt --model gpt-4o-mini --limit 10` +4. To create and run the whole thing in batches, first create the batch jobs from the report: + `python analysis/openai_batch.py create --model ` + `python analysis/openai_batch.py create ./reports/f452-1.json --model gpt-5.4-mini` +5. Then, run the jobs sequentially. Don't submit more than one at a time, if the model fills up the batch will fail and resubmission is not implemented. + `python analysis/openaibatch.py submit` + `python analysis/openaibatch.py status` + `python analysis/openaibatch.py download` + `python analysis/openaibatch.py submit`