Python vs. PySpark Notebooks in Microsoft Fabric
A simple rule of thumb: for small to medium datasets (under ~100M rows), plain Python is typically faster and cheaper on CU costs. PySpark is the better choice once you're dealing with large datasets — think 100M+ rows or 10 GB+.
One thing worth keeping in mind: processing costs don't just apply to notebooks. Data Pipelines also consume CUs, and they can rack up costs quickly if you're not careful.
In a recent project, I needed to reduce a dataset down to 310 million rows. Given the scale, I ran the entire pipeline in PySpark — and it was the right call.

Love It: Power BI in the Browser with Microsoft Fabric
I grew up on desktop tools — Power BI Desktop, Tableau, RStudio. But with Microsoft Fabric, I do everything in the browser, and honestly? I love it.
Notebooks, semantic models, Power BI reports — all in one place. Load the data, transform it, set up a pipeline, move it through Bronze → Silver → Gold, build the semantic model, and publish the report. No context switching. No desktop installs.
Sure, the browser isn't 100% there yet — advanced modelling and Row-Level Security still need the desktop. But we're at 80%, and that 80% covers most of the work.
One environment. One flow. That's the win.

REST APIs to Learn Advanced Fabric Data Pipeline Concepts
Microsoft Fabric is one of the most exciting platforms right now for consuming REST APIs, transforming data, and delivering it all the way to Power BI.
But here's the thing — data ingestion in Fabric's Data Pipeline is not simpole. And the best way to get good at hard things? Practice with real data.
That's where tradingeconomics.com comes in. They offer developers a free API key (~500 calls/month) — just enough to get hands-on with the platform without spending a cent.
Once you're pulling live data, you can start tackling the advanced stuff: metadata-driven orchestration, dynamic parameters, For Each loops, and If Condition activities — the building blocks of scalable ETL/ELT pipelines in Fabric.
Free data. Real concepts. Let's build.

Data Pipeline "ForEach" = Python's for loop
Microsoft Fabric's ForEach activity works just like a Python for loop — iterate over a collection, run some logic for each item, repeat.
Processing files? Feed it ["file1.csv", "file2.csv"], wrap a Copy activity inside, and use @item() to reference each file dynamically. Same mental model as looping through a list in Python.

Combining Medallion Architecture with CI/CD in Microsoft Fabric
Everyone agrees Medallion architecture delivers competitive advantages — faster insights, better data trust, greater scalability. But knowing why it works is only half the battle. The part that's rarely talked about is how to actually implement it.
My take: pair it with CI/CD. Map the Bronze, Silver, and Gold layers across your development, test, and production pipeline, and suddenly Medallion stops being a theoretical framework and starts being something you can actually ship.

Reading PDFs: Azure AI Document Intelligence vs. LLMs
Traditional PDF extraction tools may feel outdated compared to today's powerful LLMs — but the distinction matters. Traditional OCR-based tools like Azure AI Document Intelligence extract rather than infer, which is their key advantage: they return field-level confidence scores that make outputs fully auditable.
LLMs are harder to audit. More critically, they carry hallucination risk — under uncertainty, an LLM may confabulate (fabricate to fill gaps!!!) a plausible-looking number rather than return a low confidence score or flag the ambiguity. In a high-stakes context like financial reporting, that silent failure is arguably worse than a flagged extraction failure.

With Direct Lake, why should we still refresh our semantic model?
Direct Lake eliminates traditional scheduled refreshes by reading directly from Delta tables in your Lakehouse/Warehouse — no pipeline-triggered refresh needed.
However, there's a catch — "framing":
Direct Lake snapshots the Delta table's state at a point in time rather than streaming live data. For production scenarios where data freshness matters, many teams still trigger a reframe via pipeline after the Lakehouse load finishes.

