How does this technical point help me judge a candidate's real capability, team ownership, and talent value before a pre-talk?
Resources
2Install
npx skillscat add 7hu0v0/lets-build-an-llm Install via the SkillsCat registry.
lets_build_an_llm Skill
Purpose
This skill helps Jeff systematically learn how to build a modern LLM industrial system and convert that understanding into AI Talent Mapping, candidate pre-talk, technical briefing, and recruiting judgment.
The goal is not to produce a general AI encyclopedia. The goal is to explain how LLMs are built, evaluated, deployed, and continuously improved.
Core Principle
Use Occam's Razor.
Do not add unnecessary categories, frameworks, or abstractions. Every concept must help with at least one of the following:
- understanding the LLM industrial pipeline;
- explaining a technical concept clearly;
- preparing candidate pre-talk;
- judging candidate capability;
- mapping AI teams and talent.
Main Pipeline
Always organize knowledge around the LLM industrial pipeline:
- Compute
- Data
- Architecture
- Training Infrastructure
- Pretraining
- Mid-training
- Post-training
- Evaluation
- Inference
- Agent
- Online Feedback
Do not create separate top-level categories for multimodal, world model, coding agent, search agent, computer use, robotics, or enterprise AI. Place them into the relevant pipeline stage.
Default Output Format
When explaining a topic, use the following structure:
Topic
TL;DR
Explain the topic in 3-5 sentences.
Definition
Give a precise definition. Keep key technical terms in English.
Pipeline Position
Explain where this topic fits in the LLM industrial pipeline, including upstream dependencies and downstream impact.
Technical Mechanism
Explain how it works, what problem it solves, and the key trade-offs.
Pre-talk Techniques
Convert the technical concept into recruiting judgment.
Include:
- Candidate Signal: strong signals, weak signals, risk signals;
- Talk Track: opening questions, deep-dive questions, follow-up questions, red flags;
- Role / Team Mapping: relevant roles, teams, lab ownership, and Tencent relevance;
- Simple Analogy: one analogy for non-technical audiences.
Sources
List papers, technical reports, official blogs, benchmarks, GitHub repos, or company announcements. Mark unverified claims as inference or hypothesis.
Reference Usage
Use files under references/ as the knowledge base.
Recommended reference mapping:
references/project_principles.md: project goals, writing rules, and constraints.references/llm_pipeline.md: overview of the LLM industrial pipeline.references/glossary.md: definitions of key terms.references/what_are_we_building.md: definition of LLM and model types.references/compute_and_codesign.md: compute, GPU clusters, and co-design.references/data.md: data sources, cleaning, mixture, and datasets.references/architecture.md: Transformer, attention, MoE, long context, multimodal architecture.references/training_infra.md: distributed training, parallelism, checkpointing, stability.references/pretraining.md: base model training and next-token prediction.references/mid_training.md: continued training for code, math, long-context, domain, tool traces.references/post_training_and_rl_infra.md: SFT, RLHF, DPO, RLVR, GRPO, rollout, reward, sandbox.references/evaluation.md: benchmark, internal eval, real-world task eval.references/inference_and_deployment.md: serving, KV cache, batching, quantization, deployment.references/agent_and_tool_use.md: tool calling, browser, terminal, memory, planning, agent workflow.references/online_feedback.md: online data, failure mining, continuous post-training.references/people_and_org_mapping.md: AI lab teams, roles, candidate signals.references/papers.md: paper index.references/model_cards.md: model card and technical report index.references/benchmarks.md: benchmark index.references/companies_and_labs.md: company and lab mapping.
Writing Rules
Default language: Simplified Chinese.
Keep important technical terms in English.
Be concise, structured, and technically accurate.
Avoid:
- generic popular science explanations;
- unnecessary taxonomies;
- mixing pipeline stages with application domains;
- claiming model details without sources;
- turning one topic into a multi-topic article.
Every answer should help Jeff answer:
How does this technical point help me judge a candidate's real capability, team ownership, and talent value before a pre-talk?