Semantic Matching Engine

Full-Stack AI Engineer — Hybrid Search, Lead Intelligence, CRM Integration
Solo developer (Full-stack: backend, frontend, data pipeline)
From June 2025 to November 2025 (6 months)
Designing the intelligence layer that connects high-intent leads with optimal franchise opportunities — replacing gut-feeling brokerage with data-driven recommendations.
FastAPI, Next.js, Supabase (PostgreSQL + pgvector), Google Gemini, OpenAI Embeddings, Docker
Overview
Franchise brokerage runs on gut feeling and spreadsheets. A lead says "I want something in food, around Charlotte, under $200k" and a broker manually scrolls through hundreds of listings. I built the full intelligence layer — from automated lead profiling to hybrid semantic matching — that turns scattered CRM data into data-driven franchise recommendations.
THE PROBLEM
Lead data in the franchise industry is messy. Candidate preferences arrive as scattered broker notes: "Interested in food service, has $150k liquid, wants the Charlotte area." Franchise territories are defined in natural language: "Taylor's SC 29687 and up to 30 miles." Manual matching is slow, subjective, and misses good fits because no human can hold 800 franchise profiles in their head simultaneously.
DATA & PIPELINE
Building the data foundation required taming unstructured information at every level:
- Web Scraping: Built an authenticated scraping system to collect ~800 franchise listings — financial data, territory definitions, candidate requirements — with pagination handling and hybrid parsing (rule-based + LLM).
- Structured Extraction: Raw HTML franchise pages are transformed into clean JSON via Google Gemini, with retry logic, schema validation, and per-token cost tracking.
- Territory Normalization: Natural language territory descriptions are parsed by an LLM acting as a specialized NER, then automatically geocoded for spatial queries.
- CRM Sync: Bidirectional integration with GoHighLevel maps lead statuses to sales pipeline stages in real time.
METHODOLOGY
The matching engine combines the best of semantic understanding and hard business constraints:
- Semantic Search: Franchise descriptions are vectorized using OpenAI Embeddings and stored in Supabase pgvector. A lead's profile is embedded and compared via cosine similarity — capturing nuanced intent beyond keyword matching.
- Constraint Filtering: Semantic results are intersected with strict SQL filters for "hard" requirements: liquid capital, total investment range, and territory availability at state/county/city level.
- Explainable Recommendations: Each match includes an AI-generated "Why it's a fit" narrative comparing four pillars: Money, Motives, Interest, and Territory alignment.
| Franchise | Similarity | Budget Match | Territory | Fit Summary |
|---|---|---|---|---|
| CleanEats Co. | 0.89 | $120k / $150k max | Charlotte, NC | Strong food-service alignment, territory open |
| FreshBowl | 0.84 | $95k / $150k max | Charlotte metro | Health-conscious brand matches stated motives |
| QuickBite | 0.78 | $140k / $150k max | SC border (30mi) | Fast-casual format, budget near ceiling |
Each recommendation combines semantic similarity, hard financial constraints, and territory availability into an explainable result.
RESULTS & IMPACT
Automated lead profiling and hybrid search replaced hours of manual spreadsheet work per lead, producing ranked, explainable recommendations that brokers can act on immediately.
The platform serves as a scalable orchestration layer that transforms raw CRM data into actionable sales intelligence — enabling a more professional, data-driven approach to franchise brokerage across 800+ listings.
REFERENCES
- OpenAI. Text Embeddings — text-embedding-3-small. platform.openai.com/docs/guides/embeddings
- Supabase. pgvector — Vector Similarity Search. supabase.com/docs/guides/ai
- GoHighLevel. API Documentation — Contacts & Opportunities. highlevel.stoplight.io
TECH STACK
AI & Backend: Python 3.12, FastAPI, Google Gemini,
OpenAI Embeddings.
Frontend: Next.js, TypeScript.
Data Layer: Supabase (PostgreSQL + pgvector),
BeautifulSoup, Pandas.
Infrastructure: Docker, GoHighLevel API, Gitlab.
This is an archived project. Please reach out if you have any questions.