Compare commits

...

2 Commits

Author SHA1 Message Date
c8017c908d updated 1.2 for gpt4o mini 2026-05-05 14:10:26 -04:00
dfc3faffc3 added reqts 2026-05-05 14:10:14 -04:00
2 changed files with 21 additions and 6 deletions

View File

@@ -27,17 +27,32 @@ Comments are hydrated in backend via js-cued button (AJAX?).
- Date parsing: _parse_date() normalizes whitespace, upper-cases, parses "%m/%d/%y %I:%M %p" → ISO 8601; falls back to raw string on failure.
** evidence
- commit: beb5cf4 (AC1-2), <commit> (AC3-6)
- commit: beb5cf4 (AC1-2), e7df0b2 (AC3-6)
- tests: 8 passing (`python -m pytest tests -q`) or (`python -m pytest tests/`)
- `scrapy crawl forum -a forum_id=452 -s LOG_LEVEL=WARNING 2>&1`
- retrieved 9083 comments
- datetime: 2026-05-05
* [ ] t1.2: initial analysis pipeline
Write a simple pipeline for both - prefer non-concurrent/async from scraping run. Should be run manually, separate from scraper. You may use scrapy, but are not required to.
* [ ] t1.2: initial 4o sentiment
Write a simple manual pipeline for gpt-4o that reads one scraped forum jsonl file and roduces a separate analyzed jsonl file. this step must not mutate scraper output. analysis should classify each comment for regulatory stance, generic tone/sentiment, confidence, and enough rationale/evidence to support later dashboard drilldown.
Should be run manually, separate from scraper. You may use scrapy, but are not required to.
- Sentiment is derived, not scraped - keep separate from raw comments.
- keep jsonl as interchange/audit format
** acceptance criteria
1. run manual sentiment analysis of selected file against haiku
2. run manual sentiment analysis of selected file against gpt-4o
1. input scraped jsonl doc by filename/path, e.g. "./output/forum452_comments_<datetime>.jsonl"
- handle mixed itemtypes, e.g., forum + comment items
2. output new analysis file, e.g., "analysis/forum452_<datetime>_<model>_<datetime>.jsonl"
- one analysis record per comment
- include run_id, forum_id, comment_id, analyzed_at, model, prompt_version
3. capture stance toward proposed reg/guidance:
- `stance`: support, oppose, neutral, unknown
- `confidence`: 0-1
- short rationale, if provided by model
4. capture generic sentiment/tone separately from stance: `tone`=positive, negative, neutral, mixed, unclear
5. capture issue/topic tags for later grouping, may be empty
6. use .env for api key management
7. document the exact prompt version used; prompt text may live in code or docs, but must have a version string/hash in output records
** notes

BIN
requirements.txt Normal file

Binary file not shown.