Executive Job Scraping & CEO Enrichment Pipeline

Table of Content

Technical Architecture & Solution Document

1. Executive Summary

This Project is an automated intelligence pipeline that scrapes executive level job postings from multiple job boards, enriches each record with CEO and leadership data sourced from official company websites, and publishes a clean, deduplicated dataset to Google Sheets for recruiting and market monitoring purposes.

The system serves recruiting firms, executive search researchers, and business-intelligence teams who need a reliable, continuously updated watchlist of senior vacancies without manually browsing dozens of job platforms. By combining Apify actors, Make (Integromat) automation scenarios, and optional LLM based extraction, the pipeline delivers structured recruiting intelligence with minimal human intervention.

An AI powered Copilot Agent is layered on top of the enriched dataset, allowing end users to query job listings, filter by sector or seniority, and get instant answers about active executive openings all through a conversational interface.

Key Outcome	Description
Automated Executive Monitoring	Scrapes senior job postings 24/7 across multiple boards with no manual effort
CEO & Leadership Enrichment	Automatically resolves CEO names from official company pages per listing
Deduplicated Clean Output	Filtering in MAKE + dedup logic ensures Google Sheet stays clean and actionable
Conversational Copilot Agent	End users query job data in natural language via an AI chat interface
Scalable & Repeatable	Apify + Make architecture supports adding new boards or enrichment sources easily

2. Problem Statement

2.1 Background

Senior executive job postings are scattered across LinkedIn, Indeed, specialist board sites, and company career pages. Recruiting researchers and intelligence teams currently resort to daily manual browsing to track new C-suite and VP-level vacancies a time-intensive process prone to missed postings and inconsistent data quality.

Prior approaches involved ad-hoc spreadsheet updates maintained by junior analysts, with no systematic enrichment of company leadership data. This created gaps in intelligence and limited the team’s ability to act quickly on new opportunities.

2.2 Core Problem

There is no automated, repeatable system that (a) aggregates executive job postings across multiple platforms, (b) enriches each posting with verified CEO/leadership information, and (c) delivers a clean, queryable dataset without manual data-entry overhead.

2.3 Affected Stakeholders

Stakeholder	Role	Impact	Key Concerns
Recruiting Researchers	Primary pipeline consumers	High	Data freshness, deduplication accuracy
Executive Search Clients	End beneficiaries of intelligence	High	Coverage breadth, CEO name accuracy
Operations / Automation Team	Pipeline maintenance	High	Reliability, error handling, cost
Leadership / Management	Strategic oversight	Medium	ROI, platform cost, compliance

2.4 Pain Points

Manual browsing across 10+ job boards consumes 3–5 hours per researcher per day

No consistent enrichment of CEO / leadership data looked up ad hoc per listing

Duplicate listings across platforms inflate apparent vacancy counts

No queryable interface researchers must scroll spreadsheets to find relevant postings

No alerting when new senior roles matching criteria are posted

3. Solution Overview

3.1 Proposed Solution

The solution is a multi-stage automated pipeline built on Apify (web scraping actors), Make (orchestration & scenario automation), and Google Sheets (output layer), with an optional LLM extraction step for CEO name resolution and a conversational Copilot Agent for end-user querying.

The pipeline ingests job postings by board type, applies filtering to isolate executive roles, deduplicates across sources, enriches each record with CEO/leadership information scraped from official company websites, and publishes a structured sheet. On top of this, an AI Copilot Agent built that allows end users to query the enriched dataset conversationally.

3.2 Goals & Success Metrics

Goal	Success Metric	How Measured	Timeline
Board Coverage	5+ job boards scraped per cycle	Apify run logs	Month 1
CEO Enrichment Rate	>85% of listings enriched with CEO name	Sheet completion %	Month 2
Deduplication Accuracy	<2% duplicate records in output sheet	Manual spot-check + script	Ongoing
Pipeline Reliability	99% successful Make scenario runs	Make execution history	Month 2
Copilot Query Accuracy	Relevant results in >90% of queries	User feedback / QA review	Month 3
Time Savings	Reduce manual research from 4 hrs to <30 min/day	Researcher time log	Month 2

4. Screenshots & Workflow

The section below traces the end-to-end pipeline flow from job-board sourcing through to the end-user Copilot Agent interaction. Each step corresponds to a Make scenario stage, Apify actor execution, or user-facing UI touchpoint.

4.1 Workflow Overview

Step	Stage / Action	Description	Tool / Location
1	Source Bucketing	Classify target job boards by type (general, exec-specialist, company career pages)	Make: Board Config Module
2	Apify Actor Execution	Run scraping actors per board; collect raw job listing JSON	Apify Platform
3	Make Scenario Orchestration	Receive Apify webhook, parse payload, route to enrichment steps	Make Scenario
4	Filtering	Apply title/seniority filters to isolate C-suite, VP, Director-level roles	Make :Filter Module
5	Deduplication Logic	Hash-based dedup across board sources to remove cross-platform duplicates	Make : Dedup Module
6	Google Sheet Output	Write enriched, deduplicated records to master sheet with timestamps	Google Sheets
7	Copilot Agent Query	End user queries enriched job data via conversational AI interface	Claude Copilot Agent

4.2 Screen-by-Screen Walkthrough

Step 1–2 Source Bucketing & Apify Actor Execution

The pipeline begins in Make with a board configuration module that defines which job boards to scrape per run cycle. Each board type maps to a dedicated Apify actor. Actors are triggered via Make HTTP modules or Apify’s scheduler, and raw JSON payloads are returned via webhook to the Make scenario.

Screenshot: Apify Actor Dashboard

Figure : Apify actor run list showing board-specific scrapers and execution status

Step 3–5 : Make Scenario: Filter, Dedup & Route

The core Make scenario receives the Apify webhook, parses the job listing array, and routes each item through a filter module. Matched executive listings pass to the deduplication module, which checks a running hash store. Only net-new listings proceed to enrichment.

Screenshot: Make Scenario Canvas

Figure : Make scenario showing filter, dedup, and routing modules

visual representation of Make scenario showing filter, dedup, and routing modules

Step 6 : Google Sheet Output

Enriched records are appended to the master Google Sheet via Make’s Google Sheets module. Each row includes: job title, company, board source, posting date, CEO name, enrichment confidence, and a dedup hash.

Screenshot: Google Sheet Output

Step 7 : Copilot Agent End User Query Interface

The AI Copilot Agent serves as the primary interface through which end users interact with the enriched job dataset. A centralized Google Sheet containing cleaned and enriched job records is maintained in a shared location and is automatically updated on an hourly basis using Apify and Make.

Built on Claude, the agent uses this continuously refreshed sheet (or a downstream database view) as its knowledge source, enabling real-time natural-language querying. Users can ask questions such as:

• “Show me all CFO openings posted in the last 7 days in the US”
• “Which companies have both a CEO vacancy and a COO vacancy?”
• “What is the current CEO of Acme Corp according to the enrichment data?”
• “List all executive roles at Series B fintech companies this month”

The Copilot Agent directly reads from the latest version of the shared dataset and responds in real time with structured job summaries, filtered results, or precise record lookups. It is designed to be the primary day-to-day interface for recruiting researchers, effectively eliminating the need to manually browse spreadsheets.

Screenshot: Copilot Agent Chat Interface

visual representation of Copilot Agent Chat Interface

Figure : AI Copilot Agent answering a researcher’s natural-language query about executive vacancies

5. High-Level Architecture

The architecture is organized into five layers: Data Acquisition, Orchestration, Enrichment, Output, and User Interaction. The system is stateless at the scraping layer, with state maintained in Google Sheets and the deduplication hash store.

5.1 Architectural Layers

Layer	Technology	Responsibility
Data Acquisition	Apify Actors	Scrape job boards and company leadership pages on schedule
Orchestration	Make (Integromat)	Webhook receipt, module routing, scenario execution, error handling
Filtering & Dedup	Make + Regex Logic	Title-based executive filtering; hash-based cross-source deduplication
Enrichment	Apify + Claude API (optional)	CEO/leadership name extraction from company websites; LLM fallback
Output / Storage	Google Sheets	Master enriched dataset; QA dashboard view; downstream exports
User Interaction	Claude Copilot Agent	Natural-language querying of enriched job data by end users

5.2 Data Flow

#	From	→	To	Description
1	Make Scheduler / Webhook	→	Apify Actor	Trigger scrape run per board; pass board config parameters
2	Apify Actor	→	Make Webhook Module	Return raw job listing JSON array via HTTP callback
3	Make Filter Module	→	Dedup Store	Check listing hash; discard duplicates; pass net-new listings
4	Make Enrichment Module	→	Apify CEO Scraper	Request CEO lookup for company domain extracted from listing
5	Apify CEO Scraper	→	Claude API (optional)	Pass raw HTML of leadership page for LLM name extraction fallback
6	Make Output Module	→	Google Sheets	Append enriched row: title, company, board, date, CEO, dedup hash
7	Google Sheets	→	Copilot Agent Context	Enriched dataset loaded as agent knowledge source
8	End User	→	Copilot Agent	Natural-language query; agent returns filtered job summaries

Authentication & Authorization

Make credential store used for all API keys keys are not exposed in scenario JSON exports

Google Sheets OAuth2 service account scoped to specific spreadsheet IDs only

Anthropic API key rotated on a regular basis; usage monitored via Anthropic console

Data Protection

Job listing data is public information no PII beyond publicly posted contact details

CEO names extracted from public company websites only no private data sources

Google Sheet access controlled by sharing settings restricted to named team members

6. Deployment & Infrastructure

6.1 Environments

Environment	Platform	Purpose	Access
Development	Local Make + Apify Dev	Scenario building, actor testing, prompt iteration	Developer only
Staging	Make (test scenarios) + Apify staging	End-to-end pipeline QA before production promotion	Team + reviewers
Production	Make (live) + Apify (live) + Google Sheets	Live scraping, enrichment, and Copilot Agent serving	Ops + end users

6.2 Pipeline Schedule & CI/CD

Apify actors scheduled via cron in Apify console default: every 6 hours per board

Make scenarios triggered by Apify webhook on actor completion near real-time processing

Scenario version history maintained in Make rollback to prior version on failure

Google Sheet protected range prevents accidental manual edits to pipeline-managed columns

Copilot Agent prompt versioned in a dedicated config sheet tab updated without code changes

Monthly review of dedup hash store to purge stale entries and avoid false-positive matches

7. Copilot Agent Detailed Specification

The AI Copilot Agent is a Claude-powered conversational interface built on top of the enriched job dataset. It is the primary end-user touchpoint and is designed to replace manual spreadsheet browsing entirely for day-to-day research tasks.

7.1 Agent Purpose & Scope

Capability	Description
Job Search & Filtering	Query listings by title, seniority level, company, sector, location, or posting date
CEO Lookup	Retrieve the enriched CEO name for any company in the dataset
Market Snapshot	Summarise active executive vacancies by sector or region on demand
Trend Queries	Identify companies with multiple senior vacancies or recurring hiring patterns
Record Lookup	Fetch a specific listing’s full details including board source and enrichment metadata
Out of Scope	The agent does not browse live job boards; it only queries the enriched dataset

7.2 Technical Implementation

Model: claude-sonnet-4-20250514 via Anthropic /v1/messages endpoint

Context: Enriched Google Sheet exported as CSV or JSON injected into system prompt context window

System Prompt: Defines agent persona (‘Executive Jobs Research Copilot’), data schema, and response format guidelines

Max Tokens: 1,024 per response (sufficient for summarised job lists and record details)

Temperature: 0.2 (low temperature for factual, consistent data retrieval responses)

Multi-turn: Conversation history maintained in session state for follow-up query refinement

Fallback: If the query cannot be answered from dataset context, agent responds with ‘No matching records found’ rather than hallucinating

7.3 Example Queries & Expected Responses

User Query	Expected Agent Behaviour
Show me all CEO vacancies posted this week	Returns filtered list: company, title, board source, posting date
Who is the CEO of [Company X]?	Returns enriched CEO name and source URL from dataset
Which sectors have the most executive openings?	Returns ranked sector summary with vacancy counts
Are there any CFO roles in fintech?	Filters by title + sector tag; returns matching listings
Give me a market snapshot for this month	Summarises total listings, top sectors, top boards, enrichment rate
What boards are most active for VP-level roles?	Aggregates by source board, filtered to VP seniority tier

8. Open Issues & Risks

Risk	Severity	Likelihood	Mitigation
Job board HTML structure changes break Apify actors	High	Medium	Monitor actor error rates; maintain actor update runbook; use Apify’s change detection
CEO lookup page structure varies widely by company	High	High	Layered approach: structured scrape → regex → LLM fallback; flag low-confidence enrichments
Apify usage costs exceed budget at high scrape frequency	Medium	Medium	Tune scrape schedule; use actor caching; cap concurrent runs
Make scenario hitting operation limits on large payloads	Medium	Low	Batch processing in scenarios; pagination on Sheets writes
Copilot Agent context window exceeded for large datasets	Medium	Medium	Summarise/compress dataset before injection; use vector retrieval for scale
Dedup logic produces false positives (same role, different board)	Low	Medium	Review hash key composition; add human-review queue for edge cases
Google Sheets API quota exceeded during high-volume runs	Low	Low	Implement exponential backoff in Make; batch Sheets appends

9. Appendix

A. Glossary

Term	Definition
Apify	Cloud-based web scraping and automation platform; actors are individual scraper scripts
Make (Integromat)	No-code/low-code workflow automation platform used for scenario orchestration
Actor (Apify)	A containerised scraping script deployed and run on the Apify platform
CEO Enrichment	Process of resolving the name of a company’s chief executive from public web sources
Deduplication	Removing duplicate job listings that appear across multiple boards for the same role
LLM Fallback	Using a large language model (Claude) to extract data when structured scraping fails
Copilot Agent	An AI-powered conversational interface allowing natural-language queries of the job dataset
QA Dashboard	A Google Sheets view highlighting records with incomplete or low-confidence enrichment
System Prompt	Instructions provided to the LLM defining its persona, data context, and response rules
p99 Latency	99th percentile response time — the time experienced by the slowest 1% of requests

B. MVP Build Steps (from Project Brief)

Step 1: Source bucketing by board type : define board list and category tags in Make config

Step 2: Apify actor setup : configure and test one actor per target board

Step 3: Make scenario build : webhook intake, routing, error handling skeleton

Step 4: Dedup logic : hash store setup; duplicate detection and discard module

Step 5: CEO lookup from official company pages : Apify CEO scraper + LLM fallback

Step 6: Google Sheet output : schema definition, append module, QA view formatting

Step 7: Copilot Agent : system prompt, context injection, conversation loop, UI deployment

FAQ’s

What does the Executive Job Scraping pipeline do?

It automatically collects executive-level job postings, enriches them with CEO data, and stores clean results in Google Sheets.

How is CEO information added to each job listing?

The system extracts CEO and leadership details from official company websites using automated scraping and AI-based enrichment.

Who can benefit from this solution?

Recruiting firms, executive search teams, and business intelligence teams can use it to track senior job openings faster and more accurately

is a software solution company that was established in 2016. Our quality services begin with experience and end with dedication. Our directors have more than 15 years of IT experience to handle various projects successfully. Our dedicated teams are available to help our clients streamline their business processes, enhance their customer support, automate their day-to-day tasks, and provide software solutions tailored to their specific needs. We are experts in Dynamics 365 and Power Platform services, whether you need Dynamics 365 implementation, customization, integration, data migration, training, or ongoing support.

Contact UsContact Us

Digital Transformation with CRM and Automation ROI

How Digital Transformation with CRM and Automation Drives Real ROI

Losing Leads? Here’s How CRM Automation Fixes the Problem

Executive Job Scraping & CEO Enrichment Pipeline

Table of Content

1. Executive Summary