full local streamlit support

This commit is contained in:
2026-05-08 21:57:04 -04:00
parent 3fb424da3c
commit afd5b8c60e
3 changed files with 119 additions and 94 deletions

BIN
docs/streamlit-snapshot.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -314,8 +314,59 @@ fix mojibake in scraped text before analysis/reporting, especially curly quotes
- before/after sample: N/A — f452.jsonl is clean; tests cover synthetic mojibake patterns - before/after sample: N/A — f452.jsonl is clean; tests cover synthetic mojibake patterns
- datetime: [2026-05-07 Thu 17:00] - datetime: [2026-05-07 Thu 17:00]
* [ ] t1.4: graph data prep * [X] t1.4: graph data prototype
create a script ./viz/prototype_charts.py generating individual plotly charts for exploring graphs to embed into streamlit or dash later create ./viz/prototype_charts.py generating individual plotly charts for exploring graphs to embed into streamlit or dash later
** acceptance criteria
2. create graph for Stance/Share
- stacked h-bar with % support/oppose/neutral/unknown + raw totals, eg 63% (5720) / 37% (3320) / 0.09% (8) / 0.37% (34)
- later, consider centered diverging h-bar: oppose ← | neutral/unknown | → support
3. create graph for Stance/Time:
- cumulative support/oppose % over time
4. create graph for Stance/Tone (heatmap count)
5. create graph for Confidence/Stance (boxplot or histogram)
** notes
- prototyped in plotly
- initial streamlit
** evidence
- commit: 3fb424d
- tests: see viz/proto and viz/chart_tests
- datetime: [2026-05-08 Fri 08:38]
* [ ] t1.5: streamlit
create organized webpage displaying useful information from completed job and analysis
** acceptance criteria
1. display total stance breakdown
2. display centered horiz-bar with absolute stances
3. show daily comment stances and cumulative
4. show comment table with filters for stance (filter tone?)
5. clicking/selecting a comment shows full text and model rationale
6. app runs locally with one command
** notes
data pulls entirely from the job; goal is to point viz/streamlit.py at any job/ folder and have everything it needs
** evidence
- commit:
- tests: from root dir, `streamlit run viz/streamlit.py`
7. add forum_url, forum_collected_date to scraper
* [ ] t1.6 host streamlit
figure out how to host this, locally or via streamlit servers
* === Backlog ===
* [ ] X: complete proposal information
Ensure we capture as much useful information as possible about the actual proposal - contact information, etc. what the state actually says about what was posted.
** acceptance criteria
1. Item: `Forum` stores id, url, proposal title, description, open/close date, number of comments, agency, board, guidance document id
- add details for guidanceDoc, publication date, comments, guidance docs - eg: https://www.townhall.virginia.gov/L/GDocForum.cfm?GDocForumID=452
2. Item: `Comment` stores forum_id, comment_id, author, title, text, date, url
* [ ] X: add helper data to create_csv
1. in create_csv.py, create helper columns: 1. in create_csv.py, create helper columns:
- stance_signed = {"support":1, "oppose":-1, "neutral":0, "unknown":0} - stance_signed = {"support":1, "oppose":-1, "neutral":0, "unknown":0}
- stance_weighted = stance_signed * stance_confidence - stance_weighted = stance_signed * stance_confidence
@@ -325,38 +376,3 @@ create a script ./viz/prototype_charts.py generating individual plotly charts fo
- text_norm - text_norm
- text_hash - text_hash
- confidence_bucket = 'low' <.7 | 'med' .7-.89 | 'high' >=.9 - confidence_bucket = 'low' <.7 | 'med' .7-.89 | 'high' >=.9
2. add forum_url, forum_collected_date to scraper
2. create graph for Stance/Share
- stacked h-bar with % support/oppose/neutral/unknown + raw totals, eg 63% (5720) / 37% (3320) / 0.09% (8) / 0.37% (34)
- later, consider centered diverging h-bar: oppose ← | neutral/unknown | → support
3. create graph for Stance/Time:
- cumulative support/oppose % over time
4. create graph for Stance/Tone (heatmap count)
5. create graph for Confidence/Stance (boxplot or histogram)
** acceptance criteria
1. load parquet/csv review dataset
2. show stance counts, tone counts, tag counts, and confidence histogram
3. provide filters for stance, tone, confidence, tag, and text search
4. show filtered comment table
5. clicking/selecting a comment shows full text and model rationale
6. app runs locally with one command
** notes
** evidence
- commit:
- tests:
- datetime:
* === Backlog ===
* [ ] X: complete proposal information
Ensure we capture as much useful information as possible about the actual proposal - contact information, etc. what the state actually says about what was posted.
** acceptance criteria
1. Item: `Forum` stores id, url, proposal title, description, open/close date, number of comments, agency, board, guidance document id
- add details for guidanceDoc, publication date, comments, guidance docs - eg: https://www.townhall.virginia.gov/L/GDocForum.cfm?GDocForumID=452
2. Item: `Comment` stores forum_id, comment_id, author, title, text, date, url

View File

@@ -1,6 +1,7 @@
# streamlit run analysis/viz/comment_streamlit2.py # streamlit run analysis/viz/streamlit.py
from datetime import datetime as dt import argparse
from pathlib import Path from pathlib import Path
from datetime import datetime as dt
import pandas as pd import pandas as pd
import plotly.graph_objects as go import plotly.graph_objects as go
import plotly.express as px import plotly.express as px
@@ -15,25 +16,21 @@ forum = pd.read_json(workdir/"forum.jsonl", lines=True).iloc[0].to_dict()
prompt = (workdir/"prompt.txt").read_text(encoding="utf-8") prompt = (workdir/"prompt.txt").read_text(encoding="utf-8")
stance_colors = {'oppose':'#ffa15a', 'neutral':'#e377c2','support':'#19d3f3','unknown':'#000000'} stance_colors = {'oppose':'#ffa15a', 'neutral':'#e377c2','support':'#19d3f3','unknown':'#000000'}
#stance_colors = {'oppose':'orange', 'neutral':'green','support':'blue','unknown':'gray','mixed':'violet'} stance_order = ["oppose", "mixed", "unknown", "neutral", "support"]
stance_order = ["oppose", "neutral", "unknown", "support"]
st.set_page_config(layout="wide") st.set_page_config(layout="wide")
st.title("Virginia Townhall Explorer") st.title("Virginia Townhall Explorer",anchor=None)
st.divider() st.caption("Explore data collected from Virginia's public comment system. Source code at https://github.com/eulaly/vath")
st.subheader(forum.get('reg_title'))
st.subheader("Proposal",anchor=None,divider="gray")
st.markdown(f"**{forum.get('reg_title')}**")
st.text(forum.get('reg_desc')) st.text(forum.get('reg_desc'))
st.caption(f"Link: https://www.townhall.virginia.gov/L/Comments.cfm?GDocForumID={forum.get('forum_id')}") st.caption(f'Comments posted from {dt.strftime(min(df.date_dt),"%D")}{dt.strftime(max(df.date_dt),"%D")} at https://www.townhall.virginia.gov/L/Comments.cfm?GDocForumID={forum.get("forum_id")}')
st.write(f'Comments posted from {dt.strftime(min(df.date_dt),"%D")}{dt.strftime(max(df.date_dt),"%D")}') st.subheader("Comment Summary",anchor=False,divider="gray")
st.write('Data collected on _')
st.subheader("Comment Summary")
# summary
summary_left, summary_right = st.columns([1,2]) summary_left, summary_right = st.columns([1,2])
with summary_left: with summary_left:
# summary table # Summary Table
#summary_stats = df.groupby("stance").size().reindex(stance_order,fill_value=0).reset_index(name="count")
summary_stats = ( summary_stats = (
df.groupby("stance").size() df.groupby("stance").size()
.reindex(stance_order, fill_value=0) .reindex(stance_order, fill_value=0)
@@ -43,7 +40,7 @@ with summary_left:
st.dataframe(summary_stats, hide_index=True, width="stretch") st.dataframe(summary_stats, hide_index=True, width="stretch")
with summary_right: with summary_right:
# stance div-h # Stance div-h
counts = df["stance"].value_counts() counts = df["stance"].value_counts()
stance_divh = go.Figure() stance_divh = go.Figure()
stance_divh.add_bar(y=["stance"], x=[-counts.get("oppose",0)], name="oppose", orientation="h", marker_color=stance_colors.get('oppose'), text=[counts.get("oppose",0)], textposition="inside") stance_divh.add_bar(y=["stance"], x=[-counts.get("oppose",0)], name="oppose", orientation="h", marker_color=stance_colors.get('oppose'), text=[counts.get("oppose",0)], textposition="inside")
@@ -52,14 +49,7 @@ with summary_right:
stance_divh.add_bar(y=["stance"], x=[counts.get("support",0)], name="support", orientation="h", marker_color=stance_colors.get('support'), text=[counts.get("support",0)], textposition="inside") stance_divh.add_bar(y=["stance"], x=[counts.get("support",0)], name="support", orientation="h", marker_color=stance_colors.get('support'), text=[counts.get("support",0)], textposition="inside")
stance_divh.update_yaxes(title_text="",showticklabels=False) stance_divh.update_yaxes(title_text="",showticklabels=False)
stance_divh.update_layout(barmode="relative", title="", height=180, margin=dict(l=0,r=0,t=0,b=0),xaxis_title="", yaxis_title="",legend=dict(orientation="v",y=0.12)) stance_divh.update_layout(barmode="relative", title="", height=180, margin=dict(l=0,r=0,t=0,b=0),xaxis_title="", yaxis_title="",legend=dict(orientation="v",y=0.12))
#legend_orientation="v")
st.plotly_chart(stance_divh,width='stretch') st.plotly_chart(stance_divh,width='stretch')
# stance_time
#stance_order = ["oppose", "neutral","unknown","support"]
#daily = df.groupby(["date_day", "stance"]).size().reset_index(name="count")
#stance_time = px.bar(daily, x="date_day", y="count", color="stance", category_orders={"stance": stance_order},color_discrete_map=stance_colors,title="")
#st.plotly_chart(stance_time, width='stretch')
# Daily Comments Breakdown, 3 Tabs # Daily Comments Breakdown, 3 Tabs
daily_wide = ( daily_wide = (
@@ -90,6 +80,7 @@ cum_share_long = (
.melt(id_vars="date_day", var_name="stance", value_name="cumulative_share") .melt(id_vars="date_day", var_name="stance", value_name="cumulative_share")
) )
tab_daily, tab_area, tab_share = st.tabs([ tab_daily, tab_area, tab_share = st.tabs([
"Daily", "Daily",
"Cumulative", "Cumulative",
@@ -132,46 +123,64 @@ with tab_share:
fig.update_yaxes(tickformat=".0%", range=[0, 1]) fig.update_yaxes(tickformat=".0%", range=[0, 1])
fig.update_layout(height=420, legend_orientation="v") fig.update_layout(height=420, legend_orientation="v")
st.plotly_chart(fig, width="stretch") st.plotly_chart(fig, width="stretch")
st.subheader("Comment Explorer") st.subheader("Comment Explorer",anchor=False,divider="gray")
# stance/tone heatmap
# TODO add raw values
# TODO OPT add button to swap between pct/tone <> pct/stance
x_order = ["unknown","oppose","mixed","neutral","support"] # includes mixed even if absent; harmless zero column
y_order = ["positive","neutral","mixed","negative","unclear"]
tab = pd.crosstab(df["tone"], df["stance"]).reindex(index=y_order, columns=x_order, fill_value=0)
pct = tab.div(tab.sum(axis=1).replace(0, pd.NA), axis=0).fillna(0)
fig = px.imshow(
pct,
x=x_order, y=y_order,
text_auto=".0%",
aspect="auto",
color_continuous_scale="Greens",
title="tone by stance, percent within tone",
)
fig.update_traces(text=tab.astype(str) + " / " + (pct*100).round(0).astype(int).astype(str) + "%")
fig.update_layout(height=420, xaxis_title="stance", yaxis_title="tone")
st.plotly_chart(fig, width='stretch')
# comment explorer # comment explorer
stance = st.multiselect("Filter stance", sorted(df["stance"].dropna().unique()), default=sorted(df["stance"].dropna().unique())) cex_left, cex_right = st.columns([1,1])
q = st.text_input("Search comment text") with cex_left:
dff = df[df["stance"].isin(stance)] stance = st.multiselect("Filter stance", sorted(df["stance"].dropna().unique()), default=sorted(df["stance"].dropna().unique()))
if q: q = st.text_input("Search comment title and text")
dff = dff[dff["text"].fillna("").str.contains(q, case=False, regex=False)] dff = df[df["stance"].isin(stance)]
if q:
dff = dff[dff["text"].fillna("").str.contains(q, case=False, regex=False)]
with cex_right:
filter_tone = st.multiselect("Filter tone", sorted(df["tone"].dropna().unique()), default=sorted(df["tone"].dropna().unique()))
st.text(""); st.text("")
st.text("Showing " + str(len(dff))+ " comments",text_alignment="right", width="stretch")
st.dataframe(dff[["comment_id", "title", "text", "stance", "stance_confidence", "tone"]], width="stretch") st.dataframe(dff[["comment_id", "title", "text", "stance", "stance_confidence", "tone"]], width="stretch")
st.write("Showing " + str(len(dff))+ " comments")
cid = st.selectbox("comment", dff["comment_id"].astype(str)) cid = st.selectbox("Select comment to view:", dff["comment_id"].astype(str))
row = dff[dff["comment_id"].astype(str) == cid].iloc[0] row = dff[dff["comment_id"].astype(str) == cid].iloc[0]
st.subheader(row["title"]) st.markdown(f'**{row["title"]}**')
st.write(row["text"]) st.text(row["text"])
st.write(row["author"] + ", " + row["date"][:10]) st.write(row["author"] + ", " + row["date_dt"].strftime("%D"))
st.markdown(f"**stance:** {row['stance']} \t|\t **confidence:** {row['stance_confidence']:.2f} \t|\t **tone:** {row['tone']}")
st.write("**analysis:** "+ row["stance_rationale"]) st.divider()
st.subheader('Analysis')
cexs_left, cexs_right = st.columns([1,1])
with cexs_left:
st.write(f"**stance:** {row['stance']}")
st.write(f"**stance_confidence:** {row['stance_confidence']:.2f}")
st.write(f"**tone:** {row['tone']}")
st.write("**analysis:** "+ row["stance_rationale"])
with cexs_right:
x_order = ["unknown","oppose","mixed","neutral","support"] # includes mixed even if absent; harmless zero column
y_order = ["positive","neutral","mixed","negative","unclear"]
tab = pd.crosstab(df["tone"], df["stance"]).reindex(index=y_order, columns=x_order, fill_value=0)
pct = tab.div(tab.sum(axis=1).replace(0, pd.NA), axis=0).fillna(0)
tone_stance = px.imshow(
pct,
x=x_order, y=y_order,
text_auto=".0%",
aspect="auto",
color_continuous_scale="Greens",
)
tone_stance.update_traces(text=tab.astype(str) + " / " + (pct*100).round(0).astype(int).astype(str) + "%")
tone_stance.add_scatter(x=[row["stance"]],y=[row["tone"]],mode="markers",marker=dict(size=15,color="yellow",symbol="cross",line=dict(width=1, color="red")),showlegend=False)
tone_stance.update_layout(height=420, xaxis_title="stance", yaxis_title="tone")
st.plotly_chart(tone_stance, width='stretch')
st.caption("Tone by stance, % within tone", text_alignment="right",width="stretch")
st.divider()
st.write("**model:** " + str(row["model"])) st.write("**model:** " + str(row["model"]))
with st.expander("Prompt", expanded=False): with st.expander("Prompt", expanded=False):
st.code(prompt, language="text") st.code(prompt, language="text")
tone_conf = px.box(df,x="stance",y="stance_confidence",color="stance",category_orders={"stance":stance_order},color_discrete_map=stance_colors,points="outliers",title="Comment Stance Classification Confidence")
tone_conf.update_yaxes(range=[0,1.02])
tone_conf.update_layout(height=430, legend_orientation="v")
st.plotly_chart(tone_conf,width="stretch")