costa-marcello

xlsx

"Creates, edits, and analyses spreadsheets with formulas, formatting, and data visualisation. Use when working with .xlsx, .xlsm, .csv, or .tsv files, creating spreadsheets from scratch, reading or analysing tabular data, modifying files while preserving formulas, building financial models, or recalculating formula values."

costa-marcello 2 1 Updated 3mo ago

Resources

2
GitHub

Install

npx skillscat add costa-marcello/skillkit/xlsx

Install via the SkillsCat registry.

SKILL.md

Output Rules

Zero Formula Errors

Deliver every Excel file with zero formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?).

Preserve Existing Templates

When updating a file with established patterns, match its format, style, and conventions exactly. Existing template conventions override these guidelines.

Financial Models

For financial model colour coding, number formatting, assumptions placement, and hardcode documentation standards, read references/financial-model-standards.md.


Workflow

Tool Selection

Task Library Reason
Data analysis, bulk operations, simple export pandas Fast column operations, statistics
Formulas, formatting, Excel-specific features openpyxl Preserves formulas and styles

Default to openpyxl. Use pandas when the task is purely data analysis with no formula or formatting needs.

Step-by-Step Process

Workflow Progress:
- [ ] Step 1: Choose library (pandas or openpyxl)
- [ ] Step 2: Create or load the workbook
- [ ] Step 3: Add data, formulas, and formatting
- [ ] Step 4: Save the file
- [ ] Step 5: Recalculate formulas (run scripts/recalc.py)
- [ ] Step 6: Check recalc output for errors -- fix and re-run until clean
- [ ] Step 7: Verify final output with the user

Formulas Over Hardcodes

Use Excel formulas for all calculations. Do not compute values in Python and write the result.

**Bad -- hardcoding a computed value:** ```python total = df['Sales'].sum() sheet['B10'] = total # writes 5000, not a formula ```

Good -- letting Excel compute:

sheet['B10'] = '=SUM(B2:B9)'
sheet['C5'] = '=(C4-C2)/C2'
sheet['D20'] = '=AVERAGE(D2:D19)'

Reading Data with pandas

import pandas as pd

df = pd.read_excel('file.xlsx')                          # first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # all sheets as dict
df = pd.read_excel('file.xlsx', dtype={'id': str}, usecols=['A', 'C'], parse_dates=['date'])

Creating a New Workbook

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
ws = wb.active
ws['A1'] = 'Header'
ws['A1'].font = Font(bold=True, color='FF0000')
ws['A1'].fill = PatternFill('solid', start_color='FFFF00')
ws['A1'].alignment = Alignment(horizontal='center')
ws.column_dimensions['A'].width = 20
ws['B2'] = '=SUM(A1:A10)'
wb.save('output.xlsx')

Editing an Existing Workbook

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
ws = wb.active  # or wb['SheetName']
ws['A1'] = 'Updated'
ws.insert_rows(2)
wb.save('modified.xlsx')

Key openpyxl Details

  • Cell indices are 1-based (row=1, column=1 = A1).
  • load_workbook('file.xlsx', data_only=True) reads calculated values. Do not save after loading with data_only=True -- formulas will be permanently lost.
  • For large files use read_only=True (reading) or write_only=True (writing).

Recalculating Formulas

openpyxl writes formulas as strings without computed values. Run the recalc script after every save that contains formulas:

python scripts/recalc.py output.xlsx 30

The script uses LibreOffice (assumed installed) to recalculate, then scans all cells for errors. It returns JSON:

{
  "status": "success",
  "total_errors": 0,
  "total_formulas": 42
}

If status is errors_found, check error_summary for types and locations:

Error Cause Fix
#REF! Invalid cell reference Verify referenced cells exist
#DIV/0! Division by zero Add =IF(B2=0, 0, A2/B2) guard
#VALUE! Wrong data type in formula Check input types
#NAME? Unrecognised formula name Check spelling
#N/A Value not found in lookup Verify lookup values exist

Fix the errors, save, and run scripts/recalc.py again. Repeat until status is success.

For the full verification checklist, read references/formula-verification.md.


Example 1: Sales Report from CSV

User asks: "Create a formatted sales report from this CSV."

import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, numbers

df = pd.read_csv('sales.csv')
wb = Workbook()
ws = wb.active
ws.title = 'Sales Report'

# Headers
headers = list(df.columns)
for col, header in enumerate(headers, 1):
    cell = ws.cell(row=1, column=col, value=header)
    cell.font = Font(bold=True, color='FFFFFF')
    cell.fill = PatternFill('solid', start_color='4472C4')
    cell.alignment = Alignment(horizontal='center')

# Data rows
for r_idx, row in enumerate(df.itertuples(index=False), 2):
    for c_idx, value in enumerate(row, 1):
        ws.cell(row=r_idx, column=c_idx, value=value)

# Summary formulas
last_row = len(df) + 1
ws.cell(row=last_row + 1, column=1, value='Total')
ws.cell(row=last_row + 1, column=2).value = f'=SUM(B2:B{last_row})'

wb.save('sales_report.xlsx')

Then run: python scripts/recalc.py sales_report.xlsx

Example 2: Editing Existing File, Preserving Formulas

User asks: "Add a new column to my budget spreadsheet."

from openpyxl import load_workbook

wb = load_workbook('budget.xlsx')
ws = wb.active
new_col = ws.max_column + 1
ws.cell(row=1, column=new_col, value='Variance')

for row in range(2, ws.max_row + 1):
    budget_cell = ws.cell(row=row, column=2).coordinate
    actual_cell = ws.cell(row=row, column=3).coordinate
    ws.cell(row=row, column=new_col).value = f'={actual_cell}-{budget_cell}'

wb.save('budget.xlsx')

Then run: python scripts/recalc.py budget.xlsx

Example 3: Data Analysis Only (No Formulas)

User asks: "Which products had the highest sales last quarter?"

import pandas as pd

df = pd.read_excel('inventory.xlsx')
q4 = df[df['Quarter'] == 'Q4']
top = q4.nlargest(10, 'Sales')[['Product', 'Sales', 'Region']]
print(top.to_string(index=False))

No recalc needed -- pure analysis with no file output.

Example 4: Multi-Sheet Financial Model

User asks: "Build a 3-statement model with assumptions tab."

Read references/financial-model-standards.md for colour coding and number formats, then:

from openpyxl import Workbook
from openpyxl.styles import Font

wb = Workbook()
assumptions = wb.active
assumptions.title = 'Assumptions'
assumptions['A1'] = 'Revenue Growth'
assumptions['B1'] = 0.05
assumptions['B1'].font = Font(color='0000FF')  # blue = hardcoded input

income = wb.create_sheet('Income Statement')
income['A1'] = 'Revenue'
income['B1'] = 1000000
income['B1'].font = Font(color='0000FF')
income['C1'] = "=B1*(1+Assumptions!B1)"
income['C1'].font = Font(color='000000')  # black = formula

wb.save('model.xlsx')

Then run: python scripts/recalc.py model.xlsx

Example 5: Bulk CSV-to-Excel Conversion

User asks: "Convert all CSVs in this folder to one Excel file with tabs."

import pandas as pd
from pathlib import Path

writer = pd.ExcelWriter('combined.xlsx', engine='openpyxl')
for csv_file in sorted(Path('.').glob('*.csv')):
    df = pd.read_csv(csv_file)
    df.to_excel(writer, sheet_name=csv_file.stem[:31], index=False)
writer.close()

Code Style

Write minimal Python. No unnecessary comments, verbose variable names, or redundant print statements. For Excel files, add cell comments for complex formulas and data sources.

References

File Purpose
references/formula-verification.md Verification checklist and error debugging
references/financial-model-standards.md Colour coding, number formats, assumptions, hardcode docs
scripts/recalc.py LibreOffice formula recalculation with error scanning