Interactive Text Analysis Platform

A self-service NLP utility dashboard I built using Streamlit to ingest unstructured textual data and systematically extract linguistic intelligence. This tool handles everything from automated regex preprocessing and tokenization up to heavy deep learning inference layers using local Hugging Face pipelines.

🛠️ System Architecture & Workflow

Instead of a monolithic script, I broke the project down into three distinct, modular components to preserve clean code boundaries:

text_cleaner.py (The Preprocessing Layer): Uses Python's native re library alongside spacy (en_core_web_sm) to handle sentence tokenization, custom stop-word removal, and full lemmatization.
nlp_functions.py (The Inference Engine): Houses the pipeline setups for our 4 transformer models. I integrated memory caching here via @st.cache_resource to keep page re-renders fast.
app.py (The Interactive Layout): Renders our Streamlit dashboard workspace, controls data uploading logic, parses CSV data frames, and handles our charting layouts.

🖥️ Feature Walkthrough & Interface Previews

1. Data Ingestion & Preprocessing

The platform accepts direct copy-paste strings or bulk .csv uploads. Once loaded, the raw text passes directly through text_cleaner.py to strip out noise before compiling statistics.

2. Frequency Visualizations & N-Gram Extraction

To move past simple word clouds, I added a localized N-gram analytics layer. The app parses the clean text array into sequential tokens to map out frequent Unigrams, Bigrams, and Trigrams.

# A quick look at how the N-grams are compiled behind the scenes:
def extract_ngrams(text_tokens, n=2):
    return list(zip(*[text_tokens[i:] for i in range(n)]))

3. Deep Learning Insights (Hugging Face Pipelines)

This is where the heavy text classification takes place. The dashboard pipes your clean data into four specialized, pre-trained Hugging Face models simultaneously:

Sentiment Analysis (cardiffnlp/twitter-roberta-base-sentiment): Tracks positive/negative shifts.
Emotion Tracking (j-hartmann/emotion-english-distilroberta-base): Monitors fine-grained emotional responses.
Zero-Shot Tone Analysis (facebook/bart-large-mnli): Detects formal vs. casual structures.
Abstractive Summarization (google-t5/t5-small): Synthesizes long blocks into crisp summaries.

3. Deep Learning Insights (Hugging Face Pipelines)

This is where the heavy text classification takes place. The dashboard pipes your clean data into four specialized, pre-trained Hugging Face models simultaneously.

4. Custom Bulk Data Filtration

When a user uploads a CSV file, the app activates structural layout filters, letting you isolate specific rows or columns of text before pushing the clean batch into the analytics pipelines.

⚡ Setup & Local Execution

To replicate this environment locally, follow these configuration steps:

1. Standard Installation

git clone https://github.com
cd interactive-text-analysis
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Language Model Provisioning

Since spaCy requires an external pipeline download, run this step to pull down the English dictionary model:

python -m spacy download en_core_web_sm

3. Launch App

streamlit run app.py

The app will compile and automatically launch in your default web browser at http://localhost:8501.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
README.md		README.md
app.py		app.py
nlp_functions.py		nlp_functions.py
text_cleaner.py		text_cleaner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive Text Analysis Platform

🛠️ System Architecture & Workflow

🖥️ Feature Walkthrough & Interface Previews

1. Data Ingestion & Preprocessing

2. Frequency Visualizations & N-Gram Extraction

3. Deep Learning Insights (Hugging Face Pipelines)

3. Deep Learning Insights (Hugging Face Pipelines)

4. Custom Bulk Data Filtration

⚡ Setup & Local Execution

1. Standard Installation

2. Language Model Provisioning

3. Launch App

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Interactive Text Analysis Platform

🛠️ System Architecture & Workflow

🖥️ Feature Walkthrough & Interface Previews

1. Data Ingestion & Preprocessing

2. Frequency Visualizations & N-Gram Extraction

3. Deep Learning Insights (Hugging Face Pipelines)

3. Deep Learning Insights (Hugging Face Pipelines)

4. Custom Bulk Data Filtration

⚡ Setup & Local Execution

1. Standard Installation

2. Language Model Provisioning

3. Launch App

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages