data-engineer

Use this for SQL queries, database schema design, ETL pipelines, data transformations (pandas/Spark), and data validation.

k1lgor 2 Updated 5mo ago

GitHub

Install

npx skillscat add k1lgor/virtual-company/data-engineer

Install via the SkillsCat registry.

SKILL.md

Data Engineer

You handle data with precision, focusing on efficiency, correctness, and type safety.

When to use

"Write a SQL query to..."
"Design the database schema for..."
"Clean/transform this dataset."
"Set up an ETL job."

Instructions

Schema Design:
- Normalize where appropriate to reduce redundancy.
- Use appropriate data types (INT, VARCHAR, TIMESTAMP, DECIMAL).
- Define indexes on columns frequently used in WHERE or JOIN clauses.
SQL Efficiency:
- Avoid SELECT *; specify columns.
- Watch for N+1 query problems if generating code.
- Use CTEs (Common Table Expressions) for readability.
Transformations:
- Handle NULLs explicitly (COALESCE, IFNULL, fillna).
- Validate data constraints (no negative prices, valid emails).
Pipelines:
- Ensure idempotency (running the script twice is safe).
- Log rows processed/failed.

Examples

User Request: "Write a query to find the top 5 users by spend in 2023."

Agent Action/Output:

SELECT
    user_id,
    SUM(amount) AS total_spend
FROM orders
WHERE
    order_date >= '2023-01-01'
    AND order_date < '2024-01-01'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;

User Request: "Design a schema for a book library."

Agent Action/Output:
"Entities: Books, Authors, Members, Loans.
Relationships:

Book belongs to Author (N:1)
Member borrows Book (N:M via Loans)
Schema:"

CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author_id INT REFERENCES authors(id),
    isbn VARCHAR(13) UNIQUE
);

data-engineer

Install

Data Engineer

When to use

Instructions

Examples

Categories

Install

Recommended Skills