meitzcjakubzuqy/linkedin-business-niches
LinkedIn niche business discovery
LinkedIn Business Niches Scraper
LinkedIn Business Niches Scraper helps you discover businesses in specific cities and industries by combining targeted Google queries with LinkedIn result parsing. It turns messy search results into clean, structured business lead data you can filter, enrich, and use for outreach or market research. Built for teams that need fast niche discovery without manual searching.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for linkedin-business-niches you've just found your team — Let’s Chat. 👆👆
Introduction
This project finds and structures business leads by location + category using Google-to-LinkedIn discovery flows.
It solves the time-waste of manually searching, opening results, and copying details into spreadsheets.
It’s designed for growth teams, agencies, recruiters, and analysts who need reliable niche lead discovery at scale.
City + Industry Business Discovery
- Generates search queries that combine city + niche keywords to surface relevant business profiles.
- Visits and evaluates results with browser automation to handle dynamic pages and redirects.
- Extracts consistent business identifiers (name, profile URL, website, category hints) for clean exports.
- Supports pagination and throttling controls to keep runs stable and repeatable.
- Produces structured datasets ready for enrichment, outreach, or analytics workflows.
Features
| Feature | Description |
|---|---|
| City + niche query builder | Automatically composes Google queries for specific industries/categories within a target city. |
| Headless browser crawling | Uses browser automation to navigate JavaScript-heavy pages and capture resolved URLs. |
| Result deduplication | Prevents repeated businesses across pages and keywords with normalized URL + name matching. |
| Configurable proxy support | Supports rotating proxies and session settings to reduce blocks and improve consistency. |
| Pagination controls | Limits pages per query and total results to keep runs within predictable bounds. |
| Structured exports | Outputs consistent JSON objects suitable for CSV/Sheets/CRM imports. |
| Error handling + retries | Retries transient failures and tracks per-request outcomes for reliability. |
| Performance tuning knobs | Concurrency, delays, and per-domain rate limits for balanced throughput. |
What Data This Scraper Extracts
| Field Name | Field Description |
|---|---|
| query | The generated search query (city + niche) used for discovery. |
| city | Target city used in the query. |
| category | Target industry/category keyword used in the query. |
| businessName | Business or organization name detected from result/title. |
| linkedinUrl | Resolved LinkedIn business/profile URL. |
| website | Business website if found on the profile or in result snippets. |
| industry | Industry label when available (from page text or metadata hints). |
| locationText | Location string found on the page (city/region when available). |
| followerCount | Follower count when available on the profile. |
| employeeCountRange | Employee size range when available. |
| description | Short description/about snippet if available. |
| sourceUrl | The page URL where the record was discovered (search result or profile). |
| resultRank | Rank/position in the search results page. |
| collectedAt | ISO timestamp when the record was collected. |
Example Output
[
{
"query": "marketing agency in Austin site:linkedin.com/company",
"city": "Austin",
"category": "marketing agency",
"businessName": "BrightPath Marketing",
"linkedinUrl": "https://www.linkedin.com/company/brightpath-marketing/",
"website": "https://brightpathmarketing.com",
"industry": "Marketing Services",
"locationText": "Austin, Texas, United States",
"followerCount": 8421,
"employeeCountRange": "11-50",
"description": "Performance-focused growth partner for B2B and local brands.",
"sourceUrl": "https://www.google.com/search?q=marketing+agency+in+Austin+site%3Alinkedin.com%2Fcompany",
"resultRank": 3,
"collectedAt": "2025-12-12T12:10:31.492Z"
}
]
Directory Structure Tree
LinkedIn Business Niches/
├── src/
│ ├── main.js
│ ├── config/
│ │ ├── defaults.js
│ │ └── schema.json
│ ├── crawlers/
│ │ ├── googleCrawler.js
│ │ └── linkedinCrawler.js
│ ├── extractors/
│ │ ├── parseGoogleResults.js
│ │ ├── parseLinkedinProfile.js
│ │ └── normalize.js
│ ├── services/
│ │ ├── queryBuilder.js
│ │ ├── dedupeStore.js
│ │ ├── rateLimiter.js
│ │ └── logger.js
│ ├── storage/
│ │ ├── datasetWriter.js
│ │ └── stateStore.js
│ └── utils/
│ ├── validators.js
│ ├── urls.js
│ └── time.js
├── input/
│ ├── INPUT.schema.json
│ └── INPUT.example.json
├── data/
│ ├── sample.output.json
│ └── sample.keywords.txt
├── tests/
│ ├── normalize.test.js
│ ├── queryBuilder.test.js
│ └── parsers.test.js
├── .gitignore
├── package.json
├── package-lock.json
└── README.md
Use Cases
- Growth agencies use it to build niche lead lists by city, so they can launch faster outbound campaigns with cleaner targeting.
- B2B sales teams use it to discover local businesses in specific categories, so they can fill pipelines without manual prospecting.
- Market researchers use it to map competitors in a region, so they can analyze category density and positioning.
- Recruiters use it to find companies hiring in certain niches, so they can identify targets for candidate outreach.
- Local SEO consultants use it to collect business profiles at scale, so they can prioritize outreach and partnership opportunities.
FAQs
How do I control which cities and niches are searched?
Provide a list of cities and category keywords in the input. The tool generates combinations (or uses your explicit pairs) to create consistent query coverage. For tighter targeting, limit the number of categories per city and cap pages per query.
What if LinkedIn pages don’t show all fields (followers, employee range, website)?
Not every profile exposes the same fields. The scraper extracts what’s available and leaves missing fields as null. For enrichment, you can later merge outputs with your CRM or a separate website/contacts enrichment workflow.
How do I reduce duplicates across keywords and pages?
Deduplication is applied using normalized LinkedIn URLs and cleaned business names. If you’re using very similar keywords, keep a shared dedupe store enabled so businesses found under multiple queries still appear once.
How do I prevent blocks and improve reliability?
Use rotating proxies, set realistic delays, and keep concurrency moderate. If you see partial runs, reduce parallelism and increase wait times between pages to stabilize navigation and extraction.
Performance Benchmarks and Results
Primary Metric: ~45–90 discovered business records per minute on typical city+niche queries with moderate concurrency and stable proxy routing.
Reliability Metric: 93–97% successful page processing rate when using rotation + conservative throttling; drops significantly under aggressive concurrency without proxies.
Efficiency Metric: ~250–450 MB peak memory usage during multi-page runs (browser sessions + routing), with throughput scaling linearly up to a practical concurrency ceiling.
Quality Metric: 85–95% of records include a valid LinkedIn URL and business name; 40–70% include secondary enrichment fields (website/industry/location) depending on profile visibility.
