Franco Arda, Ph.D.
Data Engineer | Microsoft Fabric Certified (DP-700)
Experience: Daimler-Benz · Siemens · DHL · Deutsche Bahn · Swisscom · Infineon Technologies · BMW · VW
Languages: Swiss German · German · English
Education: Ph.D. in Data Science · MBA
Tools:
Microsoft Fabric: Python · PySpark · SQL · T-SQL · KQL · Data Pipelines · Dataflow Gen2 · Medallion Architecture · Lakehouse · Warehouse · Real-Time Intelligence · Semantic Models · Power BI · DAX · Maps · Eventstreams · CI/CD · Data Governance
AI in Fabric: Data Science · Machine Learning · Prompt Engineering · AI Agents (e.g., Operations Agents) · Fabric Copilot · Real-Time AI Analytics (e.g., anomaly detection)
Portfolio:
Architecture overview
The architecture uses four operational phases to move from raw sensor data to automated action:
-
IoT sensors in refrigerated and frozen units continuously monitor temperature across all store locations.
-
Azure Event Hubs collects environmental data to predict how external conditions affect temperature readings.
-
Eventstreams ingests and processes incoming temperature data in real time.
-
Eventhouse stores and processes the data. Anomaly alerts flag units with frequent temperature spikes for proactive maintenance.
-
Natural Language Copilot lets store associates and analysts query temperature trends conversationally.
-
Real-Time Dashboard gives managers and regional supervisors a live view of refrigeration performance, trends, and food safety compliance.
-
Activator triggers real-time alerts when temperatures breach thresholds, prompting immediate inspections and protecting food safety.
Operational phases
Ingest and process
IoT sensors monitor temperature across refrigerated display cases, walk-in freezers, dairy and produce areas, medication storage, and cold prep zones.
Azure Event Hubs captures environmental inputs — external weather, humidity, store traffic, door-open frequency, and HVAC performance — to predict their impact on refrigeration.
Example: A 150-location grocery chain processes temperature data from thousands of sensors while correlating weather, traffic, and HVAC data to optimize refrigeration performance chain-wide.
Analyze and transform
Eventstreams handles real-time processing: temperature validation, zone aggregation, environmental correlation, automated routing, and cross-location comparison.
Eventhouse stores the data and powers anomaly detection — flagging units with temperature spikes for maintenance before issues escalate.
Query
Natural Language Copilot lets store associates run conversational queries on temperature trends and historical data — no technical skills required.
Visualize and activate
Real-Time Dashboard gives managers and supervisors a multi-location view of refrigeration health, trends, and compliance status.
Activator sends real-time alerts for threshold breaches — triggering inspections, maintenance notifications, and food safety responses automatically.
Technical benefits
-
Real-time monitoring — Continuous visibility across all refrigeration units and locations
-
Predictive maintenance — Early detection reduces downtime and repair costs
-
Food safety compliance — Automated alerts and reporting keep operations within regulatory requirements
-
Energy optimization — Intelligent monitoring reduces unnecessary energy consumption
-
Customer experience — Reliable refrigeration means better product quality and fewer stockouts



Switzerland's photovoltaic buildout is accelerating rapidly — with over 7 GW of installed capacity today and a federal target of 34 TWh of solar generation by 2035, grid operators, industrial asset owners, and energy utilities face a growing challenge: keeping large and often hard-to-reach PV installations performing at their peak.
Traditional inspection methods — rope teams, manual thermal checks, or periodic aerial surveys — are costly, infrequent, and slow to surface faults. In alpine environments where snow loads, soiling, and physical access compound the problem, a single undetected defect can quietly erode yield for months.
This solution brings together drone-based thermal and visual imaging, automated defect detection powered by computer vision, and real-time asset intelligence — all built on a unified Microsoft Fabric data platform. Drone-mounted sensors are capable of identifying four critical fault categories at scale:
-
Localized cell failure — individual cells with thermal anomalies (hotspots) that reduce string output and risk cascading damage
-
String-level failure — entire strings dropping off-line due to inverter, bypass diode, or wiring faults
-
Dirt and dust accumulation — soiling patterns detected via visual imaging and correlated with yield loss data
-
Physical panel degradation — micro-cracks, delamination, and glass damage identified before they cause irreversible yield loss
Imagery flows directly into a Fabric Data (real-time, batch, or both), where AI models classify panel-level faults and results surface in Real-Time Dashboards or Power BI dashboards tailored for asset managers and O&M teams. Maintenance tickets are generated automatically, closing the loop from detection to resolution.
The business case is concrete:
-
Inspection costs reduced by up to 60–70% compared to rope access or manual thermal surveys
-
Faults detected 4–8 weeks earlier, recovering an estimated 1–3% of annual energy yield
-
Inspection cycles shortened from yearly to quarterly or on-demand
Dataset
Mockup dataset simulating a drone inspection of a solar farm. GPS coordinates clustered around St. Gallen (47.43°N, 9.31°E). 3 drones (drone_01–03) covering panel zones A–D, spaced at realistic 60–90 second intervals.
Anomaly rate intentionally set at ~55% (11/20 rows).
Anomaly Types & Severity
-
hot_spot — Localized cell failure · severity 6.8–8.9 · temp delta +17–26°C
-
bypass_diode — String-level failure · most critical · severity 9.1–9.7 · temp delta +28–31°C
-
soiling — Dirt/dust accumulation · lowest priority · severity 2.4–3.1 · temp delta +5–6°C
-
delamination — Physical panel degradation · severity 5.3 · temp delta +12°C


Since Microsoft Fabric is an end-to-end analytics platform, we never have to leave Fabric to write code. For example, if we want to enrich our solar panel inspection data, we can do so using a Notebook (a Data Scientist's favorite) with PySpark — or plain Python for smaller datasets (under 100 million rows).

Picture a heart attack. A dirty solar cell or panel is like a clogged artery — and if enough arteries stop pumping, the whole system seizes up and fails. One common way this plays out comes down to frame design and how panels are installed. Most solar panel frames trap a small amount of water along the bottom edge.
When that water carries any dirt or debris, it leaves behind a soiling deposit as it evaporates, which then causes the affected cells to overheat. The panel shown below is likely suffering from exactly this: a buildup of dust along its lower edge creating partial shading — a clear example of which is visible in the image.

With Microsoft Fabric, we can create smart alerts for any observed variable — such as dirt/dust accumulation or localized cell failure — enabling technical teams to receive automated notifications via email or Microsoft Teams. For example: "Solar panel A14: anomaly detected — localized cell failure (confidence: 94%)." This eliminates the need for constant dashboard monitoring, unless real-time observation is specifically required.

Microsoft Fabric captures drone flight paths in real-time and stores them for post-mission analytics. For the pilot, live geo-tracking improves situational awareness across large or complex installations. For drone operators, recorded flight routes provide verifiable coverage evidence — supporting ESG audits and ensuring no panel zone is missed between inspection cycles.

This reference architecture shows how to build a comprehensive e-mobility charging network solution using Microsoft Fabric Real-Time Intelligence — processing live data from thousands of charging stations to enable smarter operations, predictive maintenance, and revenue optimization.
The platform handles real-time usage data, station state, and energy cost rates at scale. Thousands of stations stream continuously; energy pricing flows in via MQTT; station metadata syncs daily. Together they form a unified intelligence layer for managing large charging networks with confidence.
Architecture overview
The architecture is built around four operational phases: Ingest and process → Analyze, transform, and enrich → Train → Visualize and activate.
-
Thousands of charging stations stream real-time usage and state data.
-
Energy cost rates stream in via MQTT-Eventstream integration.
-
Station metadata and asset information is collected and refreshed daily.
-
Charging events are enriched on the fly with asset data, producing fully curated, consumption-ready datasets.
-
Usage data is aggregated and correlated with energy rates for a unified view of cost and performance.
-
ML models are built, trained, and scored in real time to predict usage and station availability.
-
A Real-Time Dashboard provides high-granularity visibility across the entire network — drillable down to individual sockets.
-
Power BI delivers rich business intelligence reports querying live data directly.
-
Automated alerts notify field technicians the moment a station malfunctions or behaves anomalously.
Operational phases
Ingest and process
Real-time charging station data flows into Eventstreams for ingestion and enrichment. In parallel, energy cost rates arrive through MQTT integration. Three data streams feed the platform continuously:
-
Live station telemetry — usage patterns, operational state, availability, and performance metrics.
-
Energy cost rates — real-time pricing for cost optimization and billing.
-
Station metadata — specifications, locations, maintenance history, and hardware configurations, updated daily.
To put the scale in perspective: a major operator with 15,000 stations processes over 500,000 events per day — session starts and stops, power readings, connector status, payment transactions, and diagnostics. Eventstreams handles this velocity while applying real-time enrichment with station specs, network topology, and maintenance schedules.
Analyze, transform, and enrich
Eventhouse continuously enriches live telemetry with asset data from OneLake, combining real-time signals with historical context to produce analysis-ready datasets. This enables:
-
Station specs and capabilities — connector types, power ratings, and operational context.
-
Location and network data — geographic and topology details for precise situational awareness.
-
Historical performance patterns — past usage and operational trends for identifying recurring issues.
-
Maintenance records — service history and schedules to anticipate and prevent failures.
-
User behavior analytics — charging habits and preferences to improve experience and utilization.
Aggregated usage data correlates with live energy rates to power:
-
Real-time cost calculations — immediate billing and pricing adjustments based on live consumption.
-
Usage pattern analysis — peak demand identification and demand forecasting.
-
Network load balancing — dynamic distribution to prevent congestion and maximize efficiency.
-
Performance monitoring — real-time KPI tracking to detect anomalies and maintain service levels.
Train
Microsoft Fabric's Data Science capabilities let you build, train, and score ML models on both historical and live data. Key models include:
-
Demand forecasting — predict charging demand by location, time of day, and season. Anticipate weekend surges near shopping centers or evening peaks in residential areas.
-
Availability forecasting — identify where to add capacity and which areas are underserved, using usage trends, geographic data, and network performance.
-
Predictive maintenance — flag equipment failures before they happen by analyzing maintenance records, live performance metrics, and environmental factors. Less downtime, longer equipment life.
-
Energy cost optimization — forecast energy costs and usage trends to implement dynamic pricing, balance load, and
Technical benefits
E-mobility network intelligence
-
Sub-second monitoring across thousands of stations
-
ML-driven forecasts for demand, availability, and maintenance
-
Unified platform integrating telemetry, energy rates, and asset data
-
Full drilldown from network overview to individual socket
Automated operations
-
Real-time alerts to field technicians for faults and anomalies
-
Trigger-based workflows for maintenance, capacity, and service events
-
Predictive models for proactive station management
-
Dynamic pricing, capacity, and schedule optimization
Analytics and business intelligence
-
Live cost calculations correlating usage with energy rates
-
High-granularity BI with direct query on real-time data
-
Natural language queries via KQL Copilot
-
Cross-system correlation linking live events with history and asset data
Efficiency and revenue
-
Predictive maintenance reduces downtime and costs
-
Demand forecasting maximizes utilization and revenue
-
Real-time availability monitoring improves customer experience
-
Energy cost analytics optimize consumption and margins
Powerful Alerts without Alert Fatigue
The true power of this service lies in its ability to handle stateful transitions, enabling you to act on significant changes without alert fatigue. Rather than firing on every fluctuation, it tracks how conditions evolve over time — and only reacts when something meaningful has actually changed.
Consider an EV charging network: each charging station is modeled as an object with the station's ID as the key, and Activator accumulates the state of each station — current number of available chargers, recent charging activity, and more — independently. A rule like "Alert if a station has no available chargers for 30 minutes" is then evaluated per station, using that station's own event history and state.





This per-object design is what makes the approach powerful. Each charging station or individual charging port (identified by a unique ID) becomes an active object with its own stream of availability and status readings, allowing the system to detect patterns and anomalies at the individual level rather than across a noisy aggregate.
In effect, the system gives every object its own memory. It continuously monitors conditions — charger availability, fault states, inactivity, trends — and acts on them in isolation for each instance. This is what allows Activator to reliably trigger the right action (an alert, a notification, or a workflow) for the right entity at the right time, and not a moment sooner.

Working with data streams presents fundamentally different challenges than batch processing. Streams are continuous, with no predetermined start or end point. While events typically arrive in chronological order, network latency, system failures, and distributed sources can cause out-of-order delivery, late-arriving data, and duplicates.
For this example, I'm using a straightforward streaming intelligence approach. A more advanced alternative — the Lambda architecture — combines batch and stream processing for large-scale, low-latency workloads, but it comes with significant setup and maintenance overhead. For most use cases, the simpler approach is sufficient.
It supports all the key operations you'd need:
-
aggregation
-
filtering
-
grouping
-
joins
-
field management
-
unions
-
splits
-
row expansion
Fabric's Eventstream also covers all five stream analytics windowing functions:
-
tumbling
-
hopping
-
sliding
-
session
-
snapshot
Processing the IoT Data
For this example, I'm using Fabric's built-in bicycle demo data.

Step 1 — Save the raw data
The raw data is saved to a dedicated table in a KQL database. This serves two purposes: it acts as a backup if you need to reprocess the data, and it enables ML workloads — since Eventhouses support Notebooks (e.g., PySpark for datasets exceeding 100M rows), you can run machine learning algorithms directly on the raw stream.

Step 2 — Transform the streaming data
The transformation aggregates available bike counts by pickup point and time window. The key components are: the sum of available bikes, a group-by on BikepointID, and a tumbling (fixed) window for the time dimension. The result is saved as a new table in the same Eventhouse — keeping it separate from the raw data.
The output shows available bikes per station in real time. Since continuous manual monitoring isn't practical, the next step is setting up automated alerts.


Anomaly Detection in Real-Time Intelligence
Note: This feature is still in preview as of May 2026.
Microsoft's anomaly detection is generally quite capable — Power BI uses a similar engine to identify trends, seasonality, and noise. In most cases, an out-of-the-box solution like this is preferable to building a custom one, particularly for real-time data where a continuously running algorithm can become costly. That said, the capacity unit (CU) costs for Fabric's anomaly detection aren't fully clear to me yet.

Real-Time Alerts with Fabric Activator
Rather than monitoring data manually, Fabric Activator lets you define alerts triggered by specific conditions at the pickup point level. Configuration involves four elements: the alert trigger, the condition, the action (email or Teams message), and the alert message text.

Operations agents (currently in preview as of May 2026) in Fabric Real-Time Intelligence help organizations turn real-time data into immediate, actionable decisions. Rather than relying on manual monitoring and intervention, agents continuously track key metrics, surface insights, and recommend targeted actions — enabling teams to respond faster and optimize operations at scale.
Each operations agent is a dedicated Fabric item, scoped to a specific business process. By configuring agents with clear goals, instructions, and data sources, you can deploy multiple agents as virtual experts across your organization. This modular approach ensures that every critical process is monitored and continuously improved, with recommended actions always aligned to your strategic objectives.
The following levels illustrate how agents can grow in sophistication — from simple threshold alerts to complex, cross-domain reasoning.
Level 1 — Single source, single trigger, predefined rule
"Warehouse stock for Product X in Stuttgart has dropped below 500 units. Recommended action: reorder from primary supplier."
The operation agent watches one metric, knows one threshold, suggests one action. No reasoning, no context, no relationships. A sophisticated alert system — but one that acts: with Power Automate connected, the reorder is placed automatically the moment the operator clicks Yes. No system login, no manual entry..
Level 2 — Multiple internal sources, pattern recognition, contextual recommendation
"Warehouse stock in Stuttgart is declining 40% faster than the seasonal norm. Current supplier lead time has increased from 12 to 19 days over the past month. At the current consumption rate, stockout risk is high in 18 days. Recommended action: accelerate the next scheduled order and notify logistics."
The agent is now connecting internal dots — inventory trends, supplier performance history, consumption patterns — and reasoning across them. Still entirely inside your own data, no external signals yet. And when the operator clicks Yes, Power Automate orchestrates the response: an approval request goes to the procurement manager via Teams, the purchase order is raised in SAP on approval, the logistics team is notified by email, and the lead time risk is flagged in the supplier dashboard. All without leaving the chat.
Level 3 — Internal data + external signals + geopolitical risk mapping + predictive action
"A typhoon warning has been issued for southern Taiwan. Two of your active suppliers are registered in that region and together cover 60% of your resin intake. You have 11 days of buffer. Your next-best alternative has a 3-week lead time. Recommended action: trigger a contingency order today."
The agent now crosses the boundary of your own data — correlating supplier locations against live external event feeds, calculating exposure as a share of total intake, and stress-testing your buffer against realistic lead times. Internal and external signals converge into a single risk picture, with a recommended action delivered before the disruption reaches you.
Example:
First, we need to create an Eventstream (data streaming) and an Eventhouse (storing the streamed data) with some simple filters:

Next, we define the job for the Operations Agents and the actions to take:

Below is a sample message from the Operations Agent (generally in Teams):

(1) describes the recommended action by the Operations Agent and (2) the recommended action to be approved by a human (which they can trigger an action in Power Automate).
We will build the Medallion Architecture: a layered data structure designed to progressively refine raw data into reliable, analysis-ready information. It consists of three distinct layers, each serving a specific purpose in the data lifecycle.
Bronze Layer: Raw Data
This is the entry point for all incoming data. Sources such as logs, files, and event streams land here in their original, unmodified form — including incomplete records or inconsistencies. Nothing is altered or discarded at this stage. Preserving the raw data in its original format ensures a reliable audit trail and gives you the ability to reprocess or revalidate data at any point in the future.
Silver Layer: Cleaned and Structured Data
In this layer, raw data is validated, cleaned, and restructured. Duplicates are removed, errors are corrected, and data types and formats are standardised to make the data suitable for analysis. For example, customer records might be normalised to ensure that names, addresses, and identifiers are consistently represented across all records. The Silver layer is where raw noise becomes trustworthy, structured information.
Gold Layer: Refined Data for Reporting and Analytics
The Gold layer contains data that is fully refined and ready for business use. It is optimised for fast queries and typically holds pre-aggregated datasets and business-specific enrichments. For instance, it might contain summarised sales figures broken down by region and product category, ready to feed dashboards, reports, and machine learning models. This is the layer that end users and analytical tools interact with most directly.

The Medallion Architecture brings structure and discipline to the challenge of managing large volumes of data. By processing data in clearly defined stages, it ensures that everyone working with the same dataset — whether building dashboards, training models, or making business decisions — is working from the same high-quality, well-understood information.
How It Works in Microsoft Fabric
In Fabric, each layer corresponds to a dedicated Lakehouse in its own Workspace. Data moves forward through the layers via Notebooks orchestrated by Pipelines. The typical pattern is a Notebook triggered by a Pipeline, with the source Lakehouse attached as the input and the destination Lakehouse written to explicitly via its abfss:// path. This approach produces physical Delta tables at each layer that are fully independent of one another.
The layers also map naturally to deployment stages:
-
Development — Bronze Lakehouse only
-
Test — Bronze and Silver Lakehouses
-
Production — Bronze, Silver, and Gold Lakehouses
Why Not Use Shortcuts?
Shortcuts might seem like a convenient way to make Bronze data available in Silver without copying it, but they undermine the architecture's core guarantee. A Shortcut is a live pointer, not a copy — if the Bronze data changes or is deleted, Silver reflects that change immediately. For the Medallion Architecture to work correctly, each layer must hold its own independent, stable copy of the data. Shortcuts break that contract and should be avoided for inter-layer data movement.
Moving from "classic" Power BI to Microsoft Fabric introduces a set of challenges that go beyond learning new tools. The underlying architecture shifts, the governance model changes, and many assumptions baked into existing PBIX files no longer hold. Here's a breakdown of the key challenges, specifically around semantic models.
1. New mode choices and when to use them
Fabric introduces several new connectivity options: Direct Lake, Lakehouse-backed models, Warehouse-backed models, and semantic link. Teams already familiar with Import and DirectQuery now need to evaluate when each mode is appropriate, and existing models designed around Import or DirectQuery may not map cleanly to Direct Lake best practices, which depend on well-structured Delta tables, proper partitioning, and V-Order optimization.
2. From PBIX-centric to data-estate-centric
Traditional Power BI workflows kept the model and report together in a single PBIX file. Fabric pushes in a different direction: data lives centrally in OneLake (via a Lakehouse or Warehouse), transformations move upstream into Dataflows Gen2, Data Pipelines, or Notebooks, and the semantic model becomes a thin layer on top.
In practice, this means untangling complex Power Query (M) logic that was embedded in existing models and re-implementing it further upstream. "Golden datasets" need to be redesigned as Lakehouse or Warehouse tables paired with a dedicated semantic model. There is no shortcut for this — it requires deliberate re-architecture, not migration.
3. Feature gaps between Desktop and the web editor
Fabric's web-based semantic model editor has matured considerably, but it is not yet fully at parity with Power BI Desktop. Some advanced modeling features, complex M authoring, and niche capabilities still require working in Desktop. Operations that felt straightforward in a PBIX can feel fragmented across tools — M versus PySpark versus pipelines — which can slow teams down or force awkward mixed workflows during the transition.
4. Governance and sharing
Fabric encourages a model where a single, centrally governed semantic model is shared across reports using Build permissions, often with data and reports living in separate workspaces. For organizations with many report-specific datasets, this is a structural shift.
Row-level security, certifications, and workspace permissions all need to be re-established under the new capacity structure. It also means being deliberate about avoiding chained semantic models — where one model sits on top of another — a pattern that crept into some Power BI deployments and becomes harder to manage in Fabric.
5. Capacity and performance
Fabric runs on Capacity Units (CUs) shared across all workloads: Lakehouse, Warehouse, semantic models, Notebooks, and more. This is a meaningful change from how Power BI Premium managed capacity.
The failure modes are also new. A poorly written Notebook or a heavy lake query can consume CUs and starve a semantic model mid-report. Keeping Direct Lake models performant requires tuning at the lake layer — Delta table layouts, partitioning strategies, and V-Order indexing — rather than simply optimizing DAX or data model structure.
6. Refresh and data synchronization
Direct Lake removes the need for scheduled refresh — the model reads directly from OneLake and data is live. But migration complicates the picture. Teams need to decide, model by model, whether to keep Import (with incremental refresh), move to Direct Lake, or adopt a Composite approach.
Mixed scenarios — where some sources are in Fabric and others are external — may still require classic refresh patterns and on-premises gateways. Debugging also changes: instead of reading dataset refresh logs, teams need to learn how to diagnose performance at the lake and query level.
7. Skill gaps
Fabric is a broader platform than Power BI, and it asks more of the people using it. BI teams fluent in Desktop, M, and SQL now need working knowledge of Spark, Notebooks, and Lakehouse concepts to get the most out of semantic models in Fabric. At the same time, data engineers and scientists can now produce tables that feed directly into semantic models, which means tighter coordination between teams — and more opportunities for things to go wrong if that coordination is absent.
8. Migration logistics
Existing semantic models cannot simply be lifted and shifted into Fabric's recommended patterns. There is no automated path from a complex PBIX with heavy Power Query logic to a Lakehouse-first architecture with a lean semantic model on top. After re-platforming, DAX measures, RLS rules, and report behavior all need to be validated carefully, which typically means running old and new environments in parallel for a period.
A practical example: rethinking bidirectional relationships
One concrete modeling decision that comes up frequently in migrations is the handling of bidirectional relationships.
In Power BI, bidirectional cross-filtering between a fact table and a dimension is sometimes used to solve specific calculation problems. A common case is a simple star schema where FactSales has a two-way relationship with DimProduct. The motivation is usually something like a share-of-category measure: the denominator needs to ignore the product filter while the numerator respects it. Without bidirectional filtering, this can be awkward to express in DAX alone — though CROSSFILTER() offers an alternative.

The problem with bidirectional relationships is that they create ambiguous filter paths. In a complex model, the query engine may not resolve filter direction predictably, leading to unexpected results and harder-to-debug DAX. For that reason, Fabric and Power BI guidance generally recommends limiting bidirectional relationships and using CROSSFILTER() in DAX where needed instead.
But there is a cleaner structural solution: promote the Category attribute out of DimProduct into its own DimProductCategory table, and add CategoryKey as a foreign key directly on FactSales. Every relationship in the model is now single-direction.

But there is a cleaner structural solution: promote the Category attribute out of DimProduct into its own DimProductCategory table, and add CategoryKey as a foreign key directly on FactSales. Every relationship in the model is now single-direction.
The benefits are tangible. Filter paths are unambiguous. Share-of-category measures become straightforward because the category filter hits the fact table directly. Query performance improves because the engine no longer needs to resolve cross-filter direction at runtime.
The trade-off is a slightly more normalized fact table — one extra foreign key per row — and a requirement that CategoryKey is populated correctly during ingestion, ideally by looking it up from DimProduct in the pipeline so there is never a mismatch between the two.
This pattern has a long history in dimensional modeling: when an attribute on a dimension is primarily doing the work of filtering facts, it belongs closer to the fact table. Fabric's architecture, with transformation logic living upstream in pipelines and Notebooks, makes it easier than ever to implement this kind of design correctly from the start.
After 10+ years in data analytics, I've never seen a more exciting platform for doing data science in production — whether you're a small company or a large enterprise. The reason is simple: everything lives in one place. No switching between tools. From data ingestion to data science, pipelines, semantic models, and ultimately Power BI, it's all connected.
Data Science in Microsoft Fabric
For data enrichment and business insights, Microsoft Fabric offers a complete data science experience — enabling end-to-end workflows built directly on governed enterprise data in OneLake. This means you can access curated datasets, shared data, and model predictions without ever moving data between systems.
Data engineers, data scientists and business analysts work on the same platform. Sharing and collaboration become seamless across roles: analysts can share Power BI reports and datasets with data science teams, and hand-offs during problem formulation are far smoother. Cross-tenant data sharing in OneLake even enables multi-organization collaboration, giving data science teams governed access to datasets from external partners or subsidiaries.
Sample dataset (mockup Walmart sales): What will my store sell next week?

Data Discovery and Preprocessing
The Lakehouse resource is the primary way to interact with data in OneLake. It attaches directly to a notebook, making it easy to read data into a Pandas dataframe for exploration without any extra setup.
OneLake shortcuts extend this further by providing no-copy access to data stored in external systems or other Fabric workspaces and tenants. You can attach a shortcut to a Lakehouse and read that data in notebooks — no duplication, no ETL required.
For data ingestion and orchestration, Fabric's natively integrated data pipelines make it straightforward to build workflows that access and transform data into formats ready for machine learning.
