Florent Lin AI Engineer
Back

Semantic Matching Engine

Global Franchises — 2025

My Role

Full-Stack AI Engineer — Hybrid Search, Lead Intelligence, CRM Integration

Team

Solo developer (Full-stack: backend, frontend, data pipeline)

Timeline

From June 2025 to November 2025 (6 months)

Context

Designing the intelligence layer that connects high-intent leads with optimal franchise opportunities — replacing gut-feeling brokerage with data-driven recommendations.

Technologies

FastAPI, Next.js, Supabase (PostgreSQL + pgvector), Google Gemini, OpenAI Embeddings, Docker

Overview

Franchise brokerage runs on gut feeling and spreadsheets. A lead says "I want something in food, around Charlotte, under $200k" and a broker manually scrolls through hundreds of listings. I built the full intelligence layer — from automated lead profiling to hybrid semantic matching — that turns scattered CRM data into data-driven franchise recommendations.

THE PROBLEM

Lead data in the franchise industry is messy. Candidate preferences arrive as scattered broker notes: "Interested in food service, has $150k liquid, wants the Charlotte area." Franchise territories are defined in natural language: "Taylor's SC 29687 and up to 30 miles." Manual matching is slow, subjective, and misses good fits because no human can hold 800 franchise profiles in their head simultaneously.

DATA & PIPELINE

Building the data foundation required taming unstructured information at every level:

  • Web Scraping: Built an authenticated scraping system to collect ~800 franchise listings — financial data, territory definitions, candidate requirements — with pagination handling and hybrid parsing (rule-based + LLM).
  • Structured Extraction: Raw HTML franchise pages are transformed into clean JSON via Google Gemini, with retry logic, schema validation, and per-token cost tracking.
  • Territory Normalization: Natural language territory descriptions are parsed by an LLM acting as a specialized NER, then automatically geocoded for spatial queries.
  • CRM Sync: Bidirectional integration with GoHighLevel maps lead statuses to sales pipeline stages in real time.

METHODOLOGY

The matching engine combines the best of semantic understanding and hard business constraints:

  • Semantic Search: Franchise descriptions are vectorized using OpenAI Embeddings and stored in Supabase pgvector. A lead's profile is embedded and compared via cosine similarity — capturing nuanced intent beyond keyword matching.
  • Constraint Filtering: Semantic results are intersected with strict SQL filters for "hard" requirements: liquid capital, total investment range, and territory availability at state/county/city level.
  • Explainable Recommendations: Each match includes an AI-generated "Why it's a fit" narrative comparing four pillars: Money, Motives, Interest, and Territory alignment.
Sample Matching Output — Lead #247
FranchiseSimilarityBudget MatchTerritoryFit Summary
CleanEats Co.0.89$120k / $150k maxCharlotte, NCStrong food-service alignment, territory open
FreshBowl0.84$95k / $150k maxCharlotte metroHealth-conscious brand matches stated motives
QuickBite0.78$140k / $150k maxSC border (30mi)Fast-casual format, budget near ceiling

Each recommendation combines semantic similarity, hard financial constraints, and territory availability into an explainable result.

RESULTS & IMPACT

From Manual Vetting to Intelligent Matching

Automated lead profiling and hybrid search replaced hours of manual spreadsheet work per lead, producing ranked, explainable recommendations that brokers can act on immediately.

The platform serves as a scalable orchestration layer that transforms raw CRM data into actionable sales intelligence — enabling a more professional, data-driven approach to franchise brokerage across 800+ listings.

REFERENCES

  • OpenAI. Text Embeddings — text-embedding-3-small. platform.openai.com/docs/guides/embeddings
  • Supabase. pgvector — Vector Similarity Search. supabase.com/docs/guides/ai
  • GoHighLevel. API Documentation — Contacts & Opportunities. highlevel.stoplight.io

TECH STACK

AI & Backend: Python 3.12, FastAPI, Google Gemini, OpenAI Embeddings.
Frontend: Next.js, TypeScript.
Data Layer: Supabase (PostgreSQL + pgvector), BeautifulSoup, Pandas.
Infrastructure: Docker, GoHighLevel API, Gitlab.

This is an archived project. Please reach out if you have any questions.