added forum metadata for later use

streamlit > local docker
updated readme
2026-05-09 00:36:30 -04:00 · 2026-05-09 00:25:27 -04:00 · 2026-05-09 00:02:24 -04:00 · 2026-05-09 00:00:59 -04:00 · 2026-05-08 23:57:46 -04:00 · 2026-05-08 23:33:55 -04:00
24 changed files with 50975 additions and 43 deletions
--- a/README.md
+++ b/README.md
@@ -1,17 +1,3 @@
 # Table of Contents
 1.  [Project Goals](#org2da6874)
    1.  [Research questions](#org1a2b8b3)
    2.  [Architecture](#orgfabfcd9)
        1.  [Scraper](#org2c5c7a2)
        2.  [Analysis](#org72990f4)
        3.  [Storage](#org58a5b72)
    3.  [Instructions](#org24fe465)
 1.  [Roadmap](#org5739d49)
 <a id="org2da6874"></a>
 ## Project Goals
@@ -21,8 +7,9 @@
 2.  Make data and insights broadly available.
 3.  Generalize to other public comment tools.
 Take a look at https://vatownhall.streamlit.app
 ![img](./docs/streamlit-snapshot.png)
 <a id="org1a2b8b3"></a>
 ### Research questions
@@ -66,9 +53,9 @@ Scrapy provides a simple mechanism for retrieving, parsing, and saving content f
 Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation.  We add the proposed change as context to the model.
-Before sending the comments for sentiment analysis, \`tokenizer.py\` receives the forum to be processed and prompt as inputs, then generates a \`report.json\` estimating tokens (tiktoken), cost, and time to run for multiple models.
+Before sending the comments for sentiment analysis, `tokenizer.py` receives the forum to be processed and prompt as inputs, then generates a `report.json` estimating tokens (tiktoken), cost, and time to run for multiple models.
-Then, the batch processing scripts uses the \`report.json\` to create multiple jobs, with subcommands to download and check their status. 
+Then, the batch processing scripts uses the `report.json` to create multiple jobs, with subcommands to download and check their status. 
 We selected gpt-5.4-mini for a good balance of quality, cost, and time.
@@ -107,15 +94,15 @@ We selected gpt-5.4-mini for a good balance of quality, cost, and time.
 -   Each scraped forum is saved to `output/<forum-id>.jsonl`
 -   Each report (forum + prompt) is saves to `reports/<forum-id-N>.json`
 -   Each job is saved to `analysis/jobs/<report-id>`:
-     └─`forum.jsonl` is a copy of the scraped forum for convenience
+     └─`forum.jsonl` is a copy of the scraped forum for convenience  
-     └─`prompt.txt` is a copy of the prompt used
+     └─`prompt.txt` is a copy of the prompt used  
-     └─`report.json` is a copy of the report used
+     └─`report.json` is a copy of the report used  
-     └─`status.json` contains metadata about the job
+     └─`status.json` contains metadata about the job  
-    For each batch in the job, four files are created:
+    For each batch in the job, four files are created:  
-     └─`jobN-input.jsonl` contains the exact queries sent to the API, for troubleshooting
+     └─`jobN-input.jsonl` contains the exact queries sent to the API, for troubleshooting  
-     └─`jobN-output-raw.jsonl` contains the exact response from the API
+     └─`jobN-output-raw.jsonl` contains the exact response from the API  
-     └─`jobN-output.jsonl` contains the exact response from the API
+     └─`jobN-output.jsonl` contains the exact response from the API  
-     └─`jobN-output-errors.jsonl` when errors are returned (this file may not exist)
+     └─`jobN-output-errors.jsonl` when errors are returned (this file may not exist)  
 -   Once complete, the cleanup script saves `review.csv`, `review.pqt`, and `review.sqlite` in this folder.
--- a/analysis/jobs/f452-1/review.xlsx
+++ b/analysis/jobs/f452-1/review.xlsx
--- a/analysis/prompt-1.txt
+++ b/analysis/prompt-1.txt
@@ -1,6 +1,4 @@
-You are an expert policy analyst classifying public comments submitted to the Virginia Town Hall
+You are an expert policy analyst classifying public comments submitted to the Virginia Town Hall regulatory comment system. You will be given the text of a proposed regulation and a single public comment. Return ONLY a JSON object — no other text.
 regulatory comment system. You will be given the text of a proposed regulation and a single
 public comment. Return ONLY a JSON object — no other text.
 Definitions:
 - stance: the commenter's position on whether the regulation should be adopted.
@@ -16,8 +14,6 @@ Definitions:
  "unclear"  = tone cannot be determined (e.g., a one-word comment).
 - stance_confidence: float 0.0-1.0, your confidence in the stance label.
 - stance_rationale: 1-3 sentences explaining the key evidence; quote specific phrases where possible.
- tags: up to 5 short topic labels relevant to the comment's specific concerns (e.g.
+- tags: up to 5 short topic labels relevant to the comment's specific concerns (e.g. "parental rights", "student safety", "privacy", "religious freedom", "LGBTQ inclusion", "bullying prevention", "school sports", "bathroom access"). Empty array if none apply.
  "parental rights", "student safety", "privacy", "religious freedom", "LGBTQ+ inclusion",
  "bullying prevention", "school sports", "bathroom access"). Empty array if none apply.
 Return exactly these keys: stance, stance_confidence, stance_rationale, tone, tags.
--- a/docs/streamlit-snapshot.png
+++ b/docs/streamlit-snapshot.png
--- a/docs/tasks.org
+++ b/docs/tasks.org
@@ -280,10 +280,10 @@ python analysis/create_csv.py output/f452.jsonl analysis/jobs/f452-1/ --parquet
 #+end_src
 ** evidence
- commit:
+- commit: 28d6d22
 - tests: passing (pytest tests/create_csv.py tests/encoding.py)
 - csv: analysis/jobs/f452-1/review.csv
- datetime: [2026-05-07 Thu]
+- datetime: [2026-05-07 Thu 17:23]
 * [X] t1.1.1: text encoding cleanup
 fix mojibake in scraped text before analysis/reporting, especially curly quotes showing as â€™.
@@ -309,24 +309,74 @@ fix mojibake in scraped text before analysis/reporting, especially curly quotes
 - Spider: DEFAULT_RESPONSE_ENCODING=utf-8 remains. If a future forum genuinely sends cp1252, change to 'cp1252' and apply ftfy post-decode in the item pipeline.
 ** evidence
- commit:
+- commit: 1ea696d
 - tests: passing (pytest tests/encoding.py)
 - before/after sample: N/A — f452.jsonl is clean; tests cover synthetic mojibake patterns
- datetime: [2026-05-07 Thu]
+- datetime: [2026-05-07 Thu 17:00]
-* === Backlog ===
+
-* [ ] X: first dash explorer
+* [X] t1.4: graph data prototype
-create a local dash app for exploring one forum analysis dataset.
+create ./viz/prototype_charts.py generating individual plotly charts for exploring graphs to embed into streamlit or dash later
 ** acceptance criteria
-1. load parquet/csv review dataset
+2. create graph for Stance/Share
-2. show stance counts, tone counts, tag counts, and confidence histogram
+   - stacked h-bar with % support/oppose/neutral/unknown + raw totals, eg  63% (5720) / 37% (3320) / 0.09% (8) / 0.37% (34)
-3. provide filters for stance, tone, confidence, tag, and text search
+   - later, consider centered diverging h-bar: oppose ← | neutral/unknown | → support
-4. show filtered comment table
+3. create graph for Stance/Time: 
   - cumulative support/oppose % over time
 4. create graph for Stance/Tone (heatmap count)
 5. create graph for Confidence/Stance (boxplot or histogram)
 ** notes
 - prototyped in plotly
 - initial streamlit  
 ** evidence
 - commit: 3fb424d
 - tests: see viz/proto and viz/chart_tests
 - datetime: [2026-05-08 Fri 08:38]
 * [X] t1.5: streamlit
 create organized webpage displaying useful information from completed job and analysis
 ** acceptance criteria
 1. display total stance breakdown
 2. display centered horiz-bar with absolute stances
 3. show daily comment stances and cumulative
 4. show comment table with filters for stance (filter tone?)
 5. clicking/selecting a comment shows full text and model rationale
 6. app runs locally with one command
 ** notes
 data pulls entirely from the job; goal is to point viz/streamlit.py at any job/ folder and have everything it needs
 ** evidence
 - commit: cc16acb
 - tests: from root dir, `streamlit run viz/streamlit.py <job-dir>`
 - datetime: [2026-05-08 Fri 23:44]
 * +[ ] t1.6 host streamlit via dockerfile+
 planning to deploy manually, get cert, etc etc. probably dont care about https?
 +using streamlit.app instead+
 ** acceptance criteria
 1. write dockerfile with slim image
 ** notes
 * === Backlog ===
 - add forum_url, forum_collected_date to scraper (to add to viz)
 * [ ] X: complete proposal information
 Ensure we capture as much useful information as possible about the actual proposal - contact information, etc. what the state actually says about what was posted. 
 ** acceptance criteria
 1. Item: `Forum` stores id, url, proposal title, description, open/close date, number of comments, agency, board, guidance document id
   - add details for guidanceDoc, publication date, comments, guidance docs - eg: https://www.townhall.virginia.gov/L/GDocForum.cfm?GDocForumID=452
 2. Item: `Comment` stores forum_id, comment_id, author, title, text, date, url
 * [ ] X: add helper data to create_csv
 1. in create_csv.py, create helper columns:
   - stance_signed = {"support":1, "oppose":-1, "neutral":0, "unknown":0}
   - stance_weighted = stance_signed * stance_confidence
   - is_support_oppose = stance in ["support", "oppose"]
   - date_day
   - date_hour
   - text_norm
   - text_hash
   - confidence_bucket = 'low' <.7 | 'med' .7-.89 | 'high' >=.9
--- a/requirements.txt
+++ b/requirements.txt
--- a/scraper/items.py
+++ b/scraper/items.py
@@ -5,6 +5,8 @@ class ForumItem(scrapy.Item):
    forum_id  = scrapy.Field()
    reg_title = scrapy.Field()
    reg_desc  = scrapy.Field()
    scraped_at = scrapy.Field()
    forum_url = scrapy.Field()
 class CommentItem(scrapy.Item):
--- a/scraper/spiders/forum.py
+++ b/scraper/spiders/forum.py
@@ -63,6 +63,8 @@ class ForumSpider(scrapy.Spider):
                forum_id=self.forum_id,
                reg_title=reg_title,
                reg_desc=reg_desc,
                scraped_at=datetime.utcnow().isoformat(),
                forum_url=_view_url(self.forum_id),
            )
            for page in range(2, last_page + 1):
                yield scrapy.FormRequest(
--- a/viz/chart_tests/confidence_by_stance.html
+++ b/viz/chart_tests/confidence_by_stance.html
--- a/viz/chart_tests/cumulative_stance_area.html
+++ b/viz/chart_tests/cumulative_stance_area.html
--- a/viz/chart_tests/cumulative_stance_share.html
+++ b/viz/chart_tests/cumulative_stance_share.html
--- a/viz/chart_tests/stance_diverging_bar.html
+++ b/viz/chart_tests/stance_diverging_bar.html
--- a/viz/chart_tests/stance_over_time.html
+++ b/viz/chart_tests/stance_over_time.html
--- a/viz/chart_tests/stance_share.html
+++ b/viz/chart_tests/stance_share.html
--- a/viz/chart_tests/stance_tone_counts.html
+++ b/viz/chart_tests/stance_tone_counts.html
--- a/viz/chart_tests/stance_tone_heatmap.html
+++ b/viz/chart_tests/stance_tone_heatmap.html
--- a/viz/chart_tests/stance_tone_rowpct.html
+++ b/viz/chart_tests/stance_tone_rowpct.html
--- a/viz/proto/confidence_by_stance.html
+++ b/viz/proto/confidence_by_stance.html
--- a/viz/proto/stance_over_time.html
+++ b/viz/proto/stance_over_time.html
--- a/viz/proto/stance_share.html
+++ b/viz/proto/stance_share.html
--- a/viz/proto/stance_tone_heatmap.html
+++ b/viz/proto/stance_tone_heatmap.html
--- a/viz/prototype_charts.py
+++ b/viz/prototype_charts.py
@@ -0,0 +1,134 @@
 '''
    prototype_charts.py
    generate test charts for later addition to streamlit
 '''
 from pathlib import Path
 import pandas as pd
 import plotly.express as px
 import numpy as np
 inp = Path(r"c:/users/moses/projects/vath/analysis/jobs/f452-1/review.csv")
 out = Path("viz/")
 out.mkdir(parents=True, exist_ok=True)
 stance_order = ["support", "oppose", "neutral", "unknown"]
 # tone_order = ["positive", "negative", "neutral", "mixed", "unknown", "unclear"]
 # default order was actually better - unclear/negative/neutral/mixed/positive vs unknown/oppose/neutral/support
 # same for pct w/in stance
 df = pd.read_csv(inp)
 df["date"] = pd.to_datetime(df["date"], errors="coerce")
 df["date_day"] = df["date"].dt.date
 df["stance"] = df["stance"].fillna("unknown")
 df["tone"] = df["tone"].fillna("unknown")
 # 1. stance share
 counts = df["stance"].value_counts().reindex(stance_order, fill_value=0).reset_index()
 counts.columns = ["stance", "count"]
 fig = px.bar(counts, x="count", y="stance", orientation="h", text="count")
 fig.write_html(out / "stance_share.html")
 # 2. stance over time
 daily = df.groupby(["date_day", "stance"]).size().reset_index(name="count")
 fig = px.bar(daily, x="date_day", y="count", color="stance", category_orders={"stance": stance_order})
 fig.write_html(out / "stance_over_time.html")
 # 3. stance x tone
 heat = df.groupby(["stance", "tone"]).size().reset_index(name="count")
 fig = px.density_heatmap(heat, x="tone", y="stance", z="count", category_orders={"stance": stance_order})
 fig.write_html(out / "stance_tone_heatmap.html")
 # 4. confidence by stance
 fig = px.box(df, x="stance", y="stance_confidence", category_orders={"stance": stance_order}, points="outliers")
 fig.write_html(out / "confidence_by_stance.html")
 # 5. cumulative stance and share over time
 daily = (
    df.groupby(["date_day", "stance"])
      .size()
      .unstack(fill_value=0)
      .reindex(columns=stance_order, fill_value=0)
      .sort_index()
 )
 cum = daily.cumsum()
 cum_long = cum.reset_index().melt(id_vars="date_day", var_name="stance", value_name="cumulative_count")
 fig = px.area(
    cum_long,
    x="date_day",
    y="cumulative_count",
    color="stance",
    category_orders={"stance": stance_order},
    title="cumulative comments by stance over time",
 )
 fig.write_html(out / "cumulative_stance_area.html")
 cum_pct = cum.div(cum.sum(axis=1), axis=0).reset_index().melt(
    id_vars="date_day", var_name="stance", value_name="cumulative_share"
 )
 fig = px.line(
    cum_pct,
    x="date_day",
    y="cumulative_share",
    color="stance",
    category_orders={"stance": stance_order},
    title="cumulative stance share over time",
 )
 fig.update_yaxes(tickformat=".0%")
 fig.write_html(out / "cumulative_stance_share.html")
 # 7. diverging h-bar
 stance_counts = df["stance"].value_counts().reindex(stance_order, fill_value=0)
 div = pd.DataFrame({
    "stance": ["oppose", "support", "neutral", "unknown"],
    "count": [
        -stance_counts.get("oppose", 0),
         stance_counts.get("support", 0),
         stance_counts.get("neutral", 0),
         stance_counts.get("unknown", 0),
    ],
 })
 fig = px.bar(
    div,
    x="count",
    y="stance",
    orientation="h",
    text=div["count"].abs(),
    title="support vs oppose",
 )
 fig.update_xaxes(title="comments", zeroline=True)
 fig.update_traces(textposition="outside")
 fig.write_html(out / "stance_diverging_bar.html")
 # 8. Stance x Tone labels
 heat = pd.crosstab(df["stance"], df["tone"]).reindex(
    index=stance_order,
    columns=[c for c in tone_order if c in df["tone"].unique()],
    fill_value=0,
 )
 fig = px.imshow(
    heat,
    text_auto=True,
    aspect="auto",
    title="stance x tone, count",
 )
 fig.write_html(out / "stance_tone_counts.html")
 rowpct = heat.div(heat.sum(axis=1).replace(0, np.nan), axis=0)
 fig = px.imshow(
    rowpct,
    text_auto=".0%",
    aspect="auto",
    title="stance x tone, percent within stance",
 )
 fig.write_html(out / "stance_tone_rowpct.html")
--- a/viz/prototype_streamlit.py
+++ b/viz/prototype_streamlit.py
@@ -0,0 +1,28 @@
 # streamlit run analysis/viz/prototype_streamlit.py
 from datetime import datetime
 import pandas as pd
 import plotly.graph_objects as go
 import plotly.express as px
 import streamlit as st
 df = pd.read_csv(r"analysis/jobs/f452-1/review.csv")
 st.set_page_config(layout="wide")
 stance = st.multiselect("Filter stance", sorted(df["stance"].dropna().unique()), default=sorted(df["stance"].dropna().unique()))
 q = st.text_input("Search comment text")
 dff = df[df["stance"].isin(stance)]
 if q:
    dff = dff[dff["text"].fillna("").str.contains(q, case=False, regex=False)]
 st.dataframe(dff[["comment_id", "title", "stance", "stance_confidence", "tone"]], width="stretch")
 st.write("Showing " + str(len(dff))+ " comments")
 cid = st.selectbox("comment", dff["comment_id"].astype(str))
 row = dff[dff["comment_id"].astype(str) == cid].iloc[0]
 st.subheader(row["title"])
 st.write(row["text"])
 st.write(row["author"] + ", " + row["date"][:10])
 st.write("**model:** " + str(row["model"]))
 st.markdown("**stance:** " + str(row["stance"]) + "  \n**confidence:** " + str(row["stance_confidence"]) + "  \n**tone:** " + str(row["tone"]))
 st.write("**analysis:** "+ row["stance_rationale"])
--- a/viz/streamlit.py
+++ b/viz/streamlit.py
@@ -0,0 +1,189 @@
 # streamlit run viz/streamlit.py -- --jobs-dir analysis/jobs/f452-1
 import argparse
 from pathlib import Path
 from datetime import datetime as dt
 import pandas as pd
 import plotly.graph_objects as go
 import plotly.express as px
 import streamlit as st
 parser = argparse.ArgumentParser()
 parser.add_argument("--jobs-dir", default="analysis/jobs/f452-1", type=Path,
                    help="Job directory containing review.csv, forum.jsonl, and prompt.txt")
 args, _ = parser.parse_known_args()  # parse_known_args: ignore Streamlit's own argv entries
 workdir = args.jobs_dir
 df = pd.read_csv(workdir/"review.csv")
 df['date_dt'] = pd.to_datetime(df.date)
 df["date_day"] = df["date_dt"].dt.date
 forum = pd.read_json(workdir/"forum.jsonl", lines=True).iloc[0].to_dict()
 prompt = (workdir/"prompt.txt").read_text(encoding="utf-8")
 stance_colors = {'oppose':'#ffa15a', 'neutral':'#e377c2','support':'#19d3f3','unknown':'#000000'}
 stance_order = ["oppose", "mixed", "unknown", "neutral", "support"]
 st.set_page_config(layout="wide")
 st.title("Virginia Townhall Explorer",anchor=None)
 st.caption("Explore data collected from Virginia's public comment system. Source code at https://github.com/eulaly/vath")
 st.subheader("Proposal",anchor=None,divider="gray")
 st.markdown(f"**{forum.get('reg_title')}**")
 st.text(forum.get('reg_desc'))
 st.caption(f'Comments posted from {dt.strftime(min(df.date_dt),"%D")}—{dt.strftime(max(df.date_dt),"%D")} at https://www.townhall.virginia.gov/L/Comments.cfm?GDocForumID={forum.get("forum_id")}')
 st.subheader("Comment Summary",anchor=False,divider="gray")
 summary_left, summary_right = st.columns([1,2])
 with summary_left:
 # Summary Table
    summary_stats = (
    df.groupby("stance").size()
      .reindex(stance_order, fill_value=0)
      .reset_index(name="count")
      .assign(percent=lambda d: (d["count"] / d["count"].sum()).map("{:.1%}".format))
 )
    st.dataframe(summary_stats, hide_index=True, width="stretch")
 with summary_right:
 # Stance div-h
    counts = df["stance"].value_counts()
    stance_divh = go.Figure()
    stance_divh.add_bar(y=["stance"], x=[-counts.get("oppose",0)], name="oppose", orientation="h", marker_color=stance_colors.get('oppose'), text=[counts.get("oppose",0)], textposition="inside")
    stance_divh.add_bar(y=["stance"], x=[counts.get("neutral",0)], name="neutral", orientation="h", marker_color=stance_colors.get('neutral'), text=[counts.get("neutral",0)], textposition="inside")
    stance_divh.add_bar(y=["stance"], x=[counts.get("unknown",0)], name="unknown", orientation="h", marker_color=stance_colors.get('unknown'), text=[counts.get("unknown",0)], textposition="inside")
    stance_divh.add_bar(y=["stance"], x=[counts.get("support",0)], name="support", orientation="h", marker_color=stance_colors.get('support'), text=[counts.get("support",0)], textposition="inside")
    stance_divh.update_yaxes(title_text="",showticklabels=False)
    stance_divh.update_layout(barmode="relative", title="", height=180, margin=dict(l=0,r=0,t=0,b=0),xaxis_title="", yaxis_title="",legend=dict(orientation="v",y=0.12))
    st.plotly_chart(stance_divh,width='stretch')
 # Daily Comments Breakdown, 3 Tabs
 daily_wide = (
    df.groupby(["date_day", "stance"])
      .size()
      .unstack(fill_value=0)
      .reindex(columns=stance_order, fill_value=0)
      .sort_index()
 )
 daily_long = (
    daily_wide.reset_index()
      .melt(id_vars="date_day", var_name="stance", value_name="count")
 )
 cum_wide = daily_wide.cumsum()
 cum_long = (
    cum_wide.reset_index()
      .melt(id_vars="date_day", var_name="stance", value_name="cumulative_count")
 )
 cum_total = cum_wide.sum(axis=1)
 cum_share = cum_wide.div(cum_total.where(cum_total > 0), axis=0)
 cum_share_long = (
    cum_share.reset_index()
      .melt(id_vars="date_day", var_name="stance", value_name="cumulative_share")
 )
 tab_daily, tab_area, tab_share = st.tabs([
    "Daily",
    "Cumulative",
    "Cumulative Share",
 ])
 with tab_daily:
    fig = px.bar(
        daily_long,
        x="date_day",
        y="count",
        color="stance",
        category_orders={"stance": stance_order},
        color_discrete_map=stance_colors,
    )
    fig.update_layout(barmode="stack", height=420, legend_orientation="v")
    st.plotly_chart(fig, width="stretch")
 with tab_area:
    fig = px.area(
        cum_long,
        x="date_day",
        y="cumulative_count",
        color="stance",
        category_orders={"stance": stance_order},
        color_discrete_map=stance_colors,
    )
    fig.update_layout(height=420, legend_orientation="v")
    st.plotly_chart(fig, width="stretch")
 with tab_share:
    fig = px.line(
        cum_share_long,
        x="date_day",
        y="cumulative_share",
        color="stance",
        category_orders={"stance": stance_order},
        color_discrete_map=stance_colors,
    )
    fig.update_yaxes(tickformat=".0%", range=[0, 1])
    fig.update_layout(height=420, legend_orientation="v")
    st.plotly_chart(fig, width="stretch")
 st.subheader("Comment Explorer",anchor=False,divider="gray") 
 # comment explorer
 cex_left, cex_right = st.columns([1,1])
 with cex_left:
    filter_stance = st.multiselect("Filter stance", sorted(df["stance"].dropna().unique()), default=sorted(df["stance"].dropna().unique()))
    filter_tone = st.multiselect("Filter tone", sorted(df["tone"].dropna().unique()), default=sorted(df["tone"].dropna().unique()))
    dff = df[df["stance"].isin(filter_stance) & df["tone"].isin(filter_tone)]
 with cex_right:
    q = st.text_input("Search comment title and text")
    if q:
        dff = dff[dff["text"].fillna("").str.contains(q, case=False, regex=False)]
    st.text(""); st.text("")
    st.text("Showing " + str(len(dff))+ " comments",text_alignment="right", width="stretch")
 st.dataframe(dff[["comment_id", "title", "text", "stance", "stance_confidence", "tone"]], width="stretch")
 cid = st.selectbox("Select comment to view:", dff["comment_id"].astype(str))
 row = dff[dff["comment_id"].astype(str) == cid].iloc[0]
 st.markdown(f'**{row["title"]}**')
 st.text(row["text"])
 st.write(row["author"] + ", " + row["date_dt"].strftime("%D"))
 st.divider()
 st.subheader('Analysis')
 cexs_left, cexs_right = st.columns([1,1])
 with cexs_left:
    st.write(f"**stance:** {row['stance']}")
    st.write(f"**stance_confidence:** {row['stance_confidence']:.2f}")
    st.write(f"**tone:** {row['tone']}")
    st.write("**analysis:** "+ row["stance_rationale"])
 with cexs_right:
    x_order = ["unknown","oppose","mixed","neutral","support"]  # includes mixed even if absent; harmless zero column
    y_order = ["positive","neutral","mixed","negative","unclear"]
    tab = pd.crosstab(df["tone"], df["stance"]).reindex(index=y_order, columns=x_order, fill_value=0)
    pct = tab.div(tab.sum(axis=1).replace(0, pd.NA), axis=0).fillna(0)
    tone_stance = px.imshow(
        pct,
        x=x_order, y=y_order,
        text_auto=".0%",
        aspect="auto",
        color_continuous_scale="Greens",
    )
    tone_stance.update_traces(text=tab.astype(str) + " / " + (pct*100).round(0).astype(int).astype(str) + "%")
    tone_stance.add_scatter(x=[row["stance"]],y=[row["tone"]],mode="markers",marker=dict(size=15,color="yellow",symbol="cross",line=dict(width=1, color="red")),showlegend=False)
    tone_stance.update_layout(height=420, xaxis_title="stance", yaxis_title="tone")
    st.plotly_chart(tone_stance, width='stretch')
    st.caption("Tone by stance, % within tone", text_alignment="right",width="stretch")
 st.divider()
 st.write("**model:** " + str(row["model"]))
 with st.expander("Prompt", expanded=False):
    st.code(prompt, language="text")
 tone_conf = px.box(df,x="stance",y="stance_confidence",color="stance",category_orders={"stance":stance_order},color_discrete_map=stance_colors,points="outliers",title="Comment Stance Classification Confidence")
 tone_conf.update_yaxes(range=[0,1.02])
 tone_conf.update_layout(height=430, legend_orientation="v")
 st.plotly_chart(tone_conf,width="stretch")
Author	SHA1	Message	Date
eulaly	8f1d9e7723	added forum metadata for later use	2026-05-09 00:36:30 -04:00
eulaly	181477bce7	streamlit > local docker	2026-05-09 00:25:27 -04:00
eulaly	771f11fd3c	updated readme	2026-05-09 00:02:24 -04:00
eulaly	f42183eeda	added streamlit link	2026-05-09 00:00:59 -04:00
eulaly	92706bafb5	updated tasks and deps	2026-05-08 23:57:46 -04:00
eulaly	723b353db8	lol	2026-05-08 23:33:55 -04:00
eulaly	67cd96a523	updated readme.md	2026-05-08 23:32:44 -04:00
eulaly	cc16acbb12	added argparse for job dir, added tone filter	2026-05-08 23:28:13 -04:00
eulaly	afd5b8c60e	full local streamlit support	2026-05-08 21:57:04 -04:00
eulaly	3fb424da3c	added streamlit v1	2026-05-08 17:22:33 -04:00
eulaly	c3f2911563	updated reqts	2026-05-07 21:55:00 -04:00