Compare commits

..

2 Commits

Author SHA1 Message Date
8f1d9e7723 added forum metadata for later use 2026-05-09 00:36:30 -04:00
181477bce7 streamlit > local docker 2026-05-09 00:25:27 -04:00
4 changed files with 21 additions and 18 deletions

View File

@@ -110,32 +110,30 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time.
## Instructions
1. Clone repo and install dependencies:
`python -m pip install -r requirements.txt`
2. Scrape the forum based on the ID in the URL.
`scrapy crawl forum -a forum_id=<forum_id> -s LOG_LEVEL=WARNING 2>&1`
3. Run model report.
1. Scrape the forum.
`python`
2. Run model report.
`python analysis/tokenizer.py <input> --prompt <prompt>`
4. To run a realtime subset:
3. To run a realtime subset:
`python analysis/openai_realtime.py <input> --prompt <prompt> --model <model> --limit <N comments>`
`python analysis/openai_realtime.py output/f452.jsonl --prompt prompt-1.txt --model gpt-4o-mini --limit 10`
5. To create and run the whole thing in batches, first create the batch jobs from the report:
4. To create and run the whole thing in batches, first create the batch jobs from the report:
`python analysis/openai_batch.py create <report> --model <model>`
`python analysis/openai_batch.py create ./reports/f452-1.json --model gpt-5.4-mini`
6. Then, run the jobs sequentially. Don't submit more than one at a time, if the model fills up the batch will fail and resubmission is not implemented.
`python analysis/openai_batch.py</sub> submit`
`python analysis/openai_batch.py</sub> status`
`python analysis/openai_batch.py</sub> download`
`python analysis/openai_batch.py</sub> submit`
5. Then, run the jobs sequentially. Don't submit more than one at a time, if the model fills up the batch will fail and resubmission is not implemented.
`python analysis/openai<sub>batch.py</sub> submit`
`python analysis/openai<sub>batch.py</sub> status`
`python analysis/openai<sub>batch.py</sub> download`
`python analysis/openai<sub>batch.py</sub> submit`
<a id="org5739d49"></a>
# Roadmap
1. /Done/ Scrape one forum, check sentiment, display
2. Test different models
3. Build batch runner
1. Scrape one forum
2. Compare sentiment models
3. Display
4. Scrape all data
5. Scale?

View File

@@ -354,8 +354,9 @@ data pulls entirely from the job; goal is to point viz/streamlit.py at any job/
- tests: from root dir, `streamlit run viz/streamlit.py <job-dir>`
- datetime: [2026-05-08 Fri 23:44]
* [ ] t1.6 host streamlit via dockerfile
* +[ ] t1.6 host streamlit via dockerfile+
planning to deploy manually, get cert, etc etc. probably dont care about https?
+using streamlit.app instead+
** acceptance criteria
1. write dockerfile with slim image

View File

@@ -5,6 +5,8 @@ class ForumItem(scrapy.Item):
forum_id = scrapy.Field()
reg_title = scrapy.Field()
reg_desc = scrapy.Field()
scraped_at = scrapy.Field()
forum_url = scrapy.Field()
class CommentItem(scrapy.Item):

View File

@@ -63,6 +63,8 @@ class ForumSpider(scrapy.Spider):
forum_id=self.forum_id,
reg_title=reg_title,
reg_desc=reg_desc,
scraped_at=datetime.utcnow().isoformat(),
forum_url=_view_url(self.forum_id),
)
for page in range(2, last_page + 1):
yield scrapy.FormRequest(