Difference between Data Warehouse and Data Mart




Data Warehouse:

  • Holds multiple subject areas
  • Holds very detailed information
  • Works to integrate all data sources
  • Does not necessarily use a dimensional model but feeds dimensional models.

Data Mart

  • Often holds only one subject area- for example, Finance, or Sales
  • May hold more summarised data (although many hold full detail)
  • Concentrates on integrating information from a given subject area or set of source systems
  • Is built focused on a dimensional model using a star schema.

Sum of data marts is not equal to data warehouse…

How does Apache Cassandra stores table data internally

From http://planetcassandra.org/making-the-change-from-thrift-to-cql/


It is important to understand that the CQL data representation does not always match the underlying storage structure. This can be challenging for those accustomed to Thrift-based operations, as those were performed directly against the storage layer. But CQL introduces an abstraction on top of the storage rows, and only maps directly in the simplest of schemas.

SQL Server 2014: Tempdb hidden performance enhancement feature




SQL Server has had a concept of eager writes for many versions.  The idea is to prevent flooding the buffer pool with pages that are newly created from bulk activities, and need to be written to disk.  Eager writes help reduce the pressure on lazy writer and checkpoint as well as widening the I/O activity window, allowing for better performance and parallel usage of the hardware.

The change in SQL Server 2014 is to relax the need to flush these pages, as quickly, to the TEMPDB data files.  When doing a select into … #tmp … or create index WITH SORT IN TEMPDB the SQL Server now recognizes this may be a short lived operation.   The pages associated with such an operation may be created, loaded, queried and released in a very small window of time without going to disk.