Compare commits
2 Commits
25a17cb691
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| 8f1d9e7723 | |||
| 181477bce7 |
32
README.md
32
README.md
@@ -110,32 +110,30 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time.
|
|||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
1. Clone repo and install dependencies:
|
1. Scrape the forum.
|
||||||
`python -m pip install -r requirements.txt`
|
`python`
|
||||||
2. Scrape the forum based on the ID in the URL.
|
2. Run model report.
|
||||||
`scrapy crawl forum -a forum_id=<forum_id> -s LOG_LEVEL=WARNING 2>&1`
|
|
||||||
3. Run model report.
|
|
||||||
`python analysis/tokenizer.py <input> --prompt <prompt>`
|
`python analysis/tokenizer.py <input> --prompt <prompt>`
|
||||||
4. To run a realtime subset:
|
3. To run a realtime subset:
|
||||||
`python analysis/openai_realtime.py <input> --prompt <prompt> --model <model> --limit <N comments>`
|
`python analysis/openai_realtime.py <input> --prompt <prompt> --model <model> --limit <N comments>`
|
||||||
`python analysis/openai_realtime.py output/f452.jsonl --prompt prompt-1.txt --model gpt-4o-mini --limit 10`
|
`python analysis/openai_realtime.py output/f452.jsonl --prompt prompt-1.txt --model gpt-4o-mini --limit 10`
|
||||||
5. To create and run the whole thing in batches, first create the batch jobs from the report:
|
4. To create and run the whole thing in batches, first create the batch jobs from the report:
|
||||||
`python analysis/openai_batch.py create <report> --model <model>`
|
`python analysis/openai_batch.py create <report> --model <model>`
|
||||||
`python analysis/openai_batch.py create ./reports/f452-1.json --model gpt-5.4-mini`
|
`python analysis/openai_batch.py create ./reports/f452-1.json --model gpt-5.4-mini`
|
||||||
6. Then, run the jobs sequentially. Don't submit more than one at a time, if the model fills up the batch will fail and resubmission is not implemented.
|
5. Then, run the jobs sequentially. Don't submit more than one at a time, if the model fills up the batch will fail and resubmission is not implemented.
|
||||||
`python analysis/openai_batch.py</sub> submit`
|
`python analysis/openai<sub>batch.py</sub> submit`
|
||||||
`python analysis/openai_batch.py</sub> status`
|
`python analysis/openai<sub>batch.py</sub> status`
|
||||||
`python analysis/openai_batch.py</sub> download`
|
`python analysis/openai<sub>batch.py</sub> download`
|
||||||
`python analysis/openai_batch.py</sub> submit`
|
`python analysis/openai<sub>batch.py</sub> submit`
|
||||||
|
|
||||||
|
|
||||||
<a id="org5739d49"></a>
|
<a id="org5739d49"></a>
|
||||||
|
|
||||||
# Roadmap
|
# Roadmap
|
||||||
|
|
||||||
1. /Done/ Scrape one forum, check sentiment, display
|
1. Scrape one forum
|
||||||
2. Test different models
|
2. Compare sentiment models
|
||||||
3. Build batch runner
|
3. Display
|
||||||
|
4. Scrape all data
|
||||||
|
5. Scale?
|
||||||
|
|
||||||
|
|||||||
@@ -354,8 +354,9 @@ data pulls entirely from the job; goal is to point viz/streamlit.py at any job/
|
|||||||
- tests: from root dir, `streamlit run viz/streamlit.py <job-dir>`
|
- tests: from root dir, `streamlit run viz/streamlit.py <job-dir>`
|
||||||
- datetime: [2026-05-08 Fri 23:44]
|
- datetime: [2026-05-08 Fri 23:44]
|
||||||
|
|
||||||
* [ ] t1.6 host streamlit via dockerfile
|
* +[ ] t1.6 host streamlit via dockerfile+
|
||||||
planning to deploy manually, get cert, etc etc. probably dont care about https?
|
planning to deploy manually, get cert, etc etc. probably dont care about https?
|
||||||
|
+using streamlit.app instead+
|
||||||
** acceptance criteria
|
** acceptance criteria
|
||||||
1. write dockerfile with slim image
|
1. write dockerfile with slim image
|
||||||
|
|
||||||
|
|||||||
@@ -5,6 +5,8 @@ class ForumItem(scrapy.Item):
|
|||||||
forum_id = scrapy.Field()
|
forum_id = scrapy.Field()
|
||||||
reg_title = scrapy.Field()
|
reg_title = scrapy.Field()
|
||||||
reg_desc = scrapy.Field()
|
reg_desc = scrapy.Field()
|
||||||
|
scraped_at = scrapy.Field()
|
||||||
|
forum_url = scrapy.Field()
|
||||||
|
|
||||||
|
|
||||||
class CommentItem(scrapy.Item):
|
class CommentItem(scrapy.Item):
|
||||||
|
|||||||
@@ -63,6 +63,8 @@ class ForumSpider(scrapy.Spider):
|
|||||||
forum_id=self.forum_id,
|
forum_id=self.forum_id,
|
||||||
reg_title=reg_title,
|
reg_title=reg_title,
|
||||||
reg_desc=reg_desc,
|
reg_desc=reg_desc,
|
||||||
|
scraped_at=datetime.utcnow().isoformat(),
|
||||||
|
forum_url=_view_url(self.forum_id),
|
||||||
)
|
)
|
||||||
for page in range(2, last_page + 1):
|
for page in range(2, last_page + 1):
|
||||||
yield scrapy.FormRequest(
|
yield scrapy.FormRequest(
|
||||||
|
|||||||
Reference in New Issue
Block a user