Methodology
How Job Radar collects, enriches, scores, and displays vacancy data. Full transparency on every step.
Data Sources
| Source | Type | Coverage | Frequency |
|---|---|---|---|
| 79 Telegram channels | Public + Private | RU analytics, BI, data, product | 2×/day |
| hh.ru API | Job board | 30+ search queries (BI, Analytics, Head of) | 1×/day |
| eFinancialCareers | Web scraping | Finance + analytics | 1×/day |
| LinkedIn (EU) | Guest API | Head of BI/Analytics, Director Data — Europe | 1×/day |
| EuroTechJobs | Web scraping | EU tech jobs (BI, Analytics, Data) | 1×/day |
| Relocate.me | Web scraping | Relocation jobs (BI, Data) | 1×/day |
Telegram channels include job boards (@analysts_hunter, @foranalysts, @evacuatejobs), recruiter channels, and BI/analytics community groups. Private channels are accessed via Telethon API.
Enrichment Pipeline
Each vacancy passes through a 14-step enrichment pipeline. Every step is idempotent — it only processes records that need updating.
Salary Types
Every vacancy has a salary classification based on how the data was obtained:
| Type | Sources | What it means |
|---|---|---|
| Stated | REGEX, LLM, TG, WEB, FIX | Salary explicitly mentioned in the vacancy text. Extracted by regex patterns (60+ rules) or LLM analysis. Most reliable. |
| Estimated | LITE, RESEARCH, EST | No salary in text. Estimated by comparing with similar vacancies: same level + dds_industry + region. Median of matched cohort. |
| No salary | — | Neither stated nor estimable (insufficient reference data). |
Salary Conversion
All salary filters use EUR equivalent for uniform comparison:
| Currency | Conversion | Note |
|---|---|---|
| RUB ₽ | ÷ 95 | Central Bank approximate rate |
| USD $ | × 0.92 | USD → EUR |
| GBP £ | × 1.15 | GBP → EUR |
| KZT ₸ | × 0.002 | Tenge → EUR |
Annual salaries (USD/EUR/GBP above threshold) are automatically converted to monthly.
Scoring Model
Each vacancy is scored 0–100 based on 6 weighted components:
| Rating | Score | Meaning |
|---|---|---|
| [+] | ≥ 70 | Strong match — review recommended |
| [~] | 50–69 | Partial match — worth a look |
| [−] | < 50 | Low relevance |
Role Families
15 role families for classification:
| Family | Examples |
|---|---|
| Analytics | Business Analyst, BI Analyst, Data Analyst, Financial Analyst |
| Data Engineering | Data Engineer, ETL Developer, DWH Architect |
| DS/ML | Data Scientist, ML Engineer, NLP Researcher |
| Engineering | Backend Developer, DevOps, Platform Engineer, SRE |
| Product | Product Manager, Product Owner, Product Analyst |
| QA | QA Engineer, SDET, Manual QA, AQA |
| Management | Project Manager, Team Lead, CTO, Head of |
| Marketing | Performance Marketing, Growth, UA Manager |
| Sales | Account Executive, BDM, Sales Director |
| Finance | FP&A, Controller, Risk Analyst |
| HR | Recruiter, HRBP, Talent Acquisition |
| Design | UX/UI Designer, Product Designer |
| Procurement | Procurement Manager, Supply Chain |
| Multiple | Cross-functional roles |
| Other | Roles outside standard categories |
Data Quality
Field Coverage (current)
| Field | Coverage | Method |
|---|---|---|
| Level | 100% | Regex + LLM |
| Salary | 100% | Regex → LLM → Estimation |
| Company | 100% | 5-phase extraction pipeline |
| Work Format | 97% | Regex + LLM classification |
| Location | 72% | 3-phase: metadata → regex → LLM |
| Language | 58% | Regex pattern matching |
| Skills | 56% | hh.ru metadata + LLM |
| Domain | 41% | 448→30 normalization map |
Lower coverage on location/language/skills/dds_industry is expected — many Telegram posts are too short to extract these fields reliably.
Garbage Filtering
Non-vacancy items (chat messages, discussions, ads) are automatically flagged and excluded. Currently ~5% of raw posts are classified as garbage.
Deduplication
Reposts are detected via content similarity and marked with is_repost flag. Source URL deduplication prevents the same vacancy from appearing twice.
Schedule
| Time | Pipeline | Steps |
|---|---|---|
| 07:00 | Job boards (hh.ru) | Collect from hh.ru API |
| 09:07 | Morning full pipeline | All 14 steps: collect → enrich → score → QA |
| 21:07 | Evening pipeline | All 14 steps: collect → enrich → score → QA |