With the recent General Availability of Microsoft Fabric, we hear assertions suggesting it is (or will become) a preferred option over leading Data & AI Platforms such as Databricks. Some are confused about whether it is better to adopt one platform exclusively or to attempt to use both in some sort of hybrid approach. Both provide a wide array of features for data management, including Data Integration, Engineering, Warehousing, Analytics, and AI/ML and there is considerable overlap.
Databricks, the established incumbent and pioneer of the unified Lakehouse, has been a trailblazer in this field combining data lake and warehouse features. Microsoft Fabric, a rebranding and attempted integration of Synapse and other existing Azure services, is Microsoft’s new Lakehouse platform offering. It is important to look beyond marketing hype and assess whether Fabric is truly ready for production Enterprise workloads.
In this blog, we offer our insights from working with clients using Databricks and Microsoft technologies. We assess these platforms on their current features, architecture, pricing, maturity, and compliance to help businesses make informed decisions.
Choosing the Right Foundation
Compute and Cost
Fabric abstracts away underlying compute dimensions through consolidated “Capacity Unit” SKUs that bundle all its supported processing requirements. In other words, there is a single “dial” to turn whether the workload involves Apache Spark, Synapse Real Time Analytics, SQL, BI, Synapse Data Warehousing, or Synapse Data Science. Outside of this, there is limited ability to configure specific compute options.
In contrast, Databricks compute is a function of the specific workload (e.g. job compute, Databricks SQL, Model Serving, etc.) allowing for more granular control over decisions that affect cost and performance. Users can tailor instance types, node sizes, auto-scaling policies, and other compute parameters to match workload demands.
Fabric’s capacity model introduces Smoothing and Throttling which further complicates the single dial and can adversely impact workload performance. This results in a high likelihood that you are either paying extra for unused capacity or missing SLAs unless you are perfect at predicting variable demand.
Overall, this distinction means Fabric users sacrifice granular control under the auspice of simplicity (leading to higher cost of ownership and inconsistent performance), while Databricks provides more flexibility and more configurable control of the options that affect performance and cost. For customers who need or prefer the simplicity of abstraction, Databricks serverless provides this.
The ultimate purpose of the lakehouse is to ingest and store analytics-ready data in open formats accessible to various engines. Fabric and Databricks are each backed by cloud object stores. Fabric builds its OneLake on Azure Data Lake Storage (ADLS) Gen2, which leverages the Delta Lake open format built and open-sourced by Databricks. Databricks is multi-cloud and runs on all major cloud-hosted object containers including Amazon S3, ADLS Gen2, and Google Cloud Storage (GCS). For optimal integration and interoperability, it is critical that Microsoft adopt Databricks’ Delta and not fall behind or use its own flavor.
One of the most glaring differences between these platforms is how they currently allow for connectivity to external (or federated) data sources including external databases, cloud storage accounts, and other components such as Power BI, Azure Data Explorer, etc., without having to move them into Lakehouse storage. While some might say these sources represent somewhat of an anti-pattern for a unified lakehouse (contributing to data silos etc.), they are often a necessary fact of life.
Databricks Unity Catalog provides support for such sources via Foreign Catalogs, Volumes, and External Locations, all of which are fully governed and enforce object-level security. In Fabric, the same is accomplished via OneLake Shortcuts, which as of this writing, have serious documented security risks that allow users to create backdoors that bypass or circumvent governance that is supposed to be enforced by Fabric governance and compliance and Microsoft Purview.
User Interface & Experience
Databricks and Fabric each provide central management consoles offering a unified view and access through the concept of workspaces. These provide an integrated set of capabilities for data engineers, analysts, and data scientists that can break down user silos while at the same time guarantee necessary isolation and security.
Fabric distinguishes native functionality such as notebooks, Spark jobs, and pipelines and product experiences like Power BI and Synapse. Product experiences sometimes feel like incomplete ports of existing Microsoft Services into Fabric rather than a complete integration, some of which result in functionality gaps compared to their standalone versions. For example, some analytic flows in Power BI may not function the same as they do when invoked within Fabric’s interface. This highlights Fabric’s challenge of retrofitting pre-existing Azure services under the Fabric umbrella and raises doubt on Fabric users’ ability to build end-to-end products without similar challenges.
Databricks is a purpose-built environment that combines data engineering, data science, and analytics into a single experience. It is capable of integrating the same product experiences offered by Fabric via standalone Azure services (e.g. Azure Data Factory, Azure Machine Learning, Power BI, Azure Data Explorer, etc.), as well as offering hundreds of other partner integrations (including Fivetran, dbt, Tableau and many others) through Databricks Partner Connect.
Databricks is a first-party service on Azure and supports all 3 major cloud providers (Azure, AWS, and Google Cloud). This multi-cloud flexibility protects against Hyperscaler lock-in, permits infrastructure selection based on performance and costs, and allows for the distribution of workloads to different availability zones for disaster recovery. Microsoft Fabric is naturally restricted to Azure, which may lock organizations into a single ecosystem and severely limit their long-term options and flexibility.
Governance and Compliance
One of the most fundamental disparities between the two platforms seems to be how each treats governance and compliance. Unity Catalog is at the heart of Databricks and this core functionality is what enables everything within the platform, including tracing data lineage, enabling discovery, building a semantic layer, accomplishing data classification, documentation, security, and fine-grained access control. Databricks is extending its leadership position by delivering the world’s first intelligent lakehouse and is already infusing AI throughout the platform based on this strength, including AI assistants, optimizations, and natural language queries.
Microsoft Fabric requires integration with an external Microsoft service, Purview, and as of this writing, there are major issues with this integration, including limited administrative oversight, data collection issues, processing bottlenecks, data transfer delays, and other operational challenges. For Fabric Copilot to be able to realize its promise of establishing the same high-quality insights and AI capabilities as the Databricks platform, this integration needs to be seamless and it is not. This should lead users to question Fabric’s ability to produce quality semantic knowledge for users and ultimately represents a huge risk and clear differentiator between the platforms.
Maturity of Microsoft Fabric
This section lists a few other anecdotal points offering insight into the current maturity of Microsoft Fabric.
Performance Tradeoffs and Queue Management
Microsoft Fabric cannot partition capacity into distinct compartments with user-defined resource constraints or workload priorities. As a result, incoming analytical workloads may encroach upon one another (which could impede existing production workloads). These performance interferences stand to hamper business user productivity, contributing to missed SLAs.
Multiple engine disparity
Fabric treats lakehouses and warehouses as different entities, meaning you have to choose one for any particular workload and then live with the limitations. For example, if your data resides in a warehouse then you cannot write to it with Spark.
GenAI and ML capabilities
Microsoft Fabric does not share the same GenAI and ML capabilities as Databricks and has to be rounded out and augmented by other Azure services such as Azure Machine Learning and Azure AI Services, resulting in additional silos, complexity, and integration points that make the development experience more difficult. Without these, Fabric lacks a natively integrated model registry, monitoring, model governance, real-time serving, and feature stores that enterprises require. These missing components for robust MLOps limit Fabric’s viability for scaling AI/ML and delivering models to production.
Fabric Lifecycle Management is in Preview. Fabric Deployment Pipelines have existing limitations and do not fully support all Fabric items, so it may not be possible to fully automate continuous integration and deployment. For example, Microsoft Fabric does not expose APIs for programmatic control over serverless resources, requiring manual administrative actions rather than automation or orchestration integration.
Microsoft Fabric currently lacks native capabilities for secure and scalable Data Sharing across organizational boundaries, similar to what Delta Sharing and Clean Rooms offer in Databricks.
The Microsoft Fabric data load wizard cannot reapply past configurations through its UI, forcing manual JSON pipeline edits later, and limiting usability versus scripting data loads directly.
Security and Secrets
Microsoft Fabric currently provides limited support for managed identities and integration with Azure Key Vault, causing enterprise concerns around default security and secrets management capabilities.
Our experience is that Microsoft Fabric is currently not ready for Production workloads. The platform struggles with integration issues and the need to stitch together additional Microsoft services to overcome feature limitations. These limitations represent a tangible risk to you in several ways, including security, cost, complexity, and quality. Ultimately this makes it more challenging for your team to deliver quality products.
Databricks is and will remain the leader of the Unified Lakehouse (and Data Intelligence Platform) for the foreseeable future. It continues to be a solid foundation for your Data & AI platform, especially when it comes to building enterprise-grade solutions requiring security, reliability, scalability, automation, and developer and user experience. Microsoft Fabric and Databricks can be used together to complement one another in the right circumstances, but Databricks is the leader and will continue to be.