I'm reading an article about Big Data and the author of that article also mentioned about Data Warehouse a little bit before diving deeper into Big Data. However, I cannot really understand the difference between those two since they seem to both deal with large amount of data.
Can someone help me understand the differences between Data Warehouse and Big Data?
Big Data is not always about the volume of data. In fact, there are three key things that defines Big Data, those are:
Big data allows us not only to work with huge amount of data but also provide the capability to work with data in a near real-time fashion and a mechanism to store multiple types of data without having to pre-defined what type of data to store as such relational database system.
Data warehouse was introduced before Big Data and the way Data Warehouse is architected is also different with Big Data. In its early days, Data Warehouse was introduced to solve the problems in that there're multiple data sources to deal with.
For example, company A sell their products in 3 territories. This company also has 3 offices to responsible for each territory and each office has built their own data source to store sales & customer data (say each data source also has different architecture). If for example the head-quarter needs to have some sort of reports and analytics based on all three data sources they would consider building a data warehouse to handle this kind of job.
If company A decided to build a data warehouse, then this data warehouse will store only needed well-defined format data extracted from each difference data source and then generate reports and analytics on ad-hoc basis. This is called Extract -> Transform -> Load or ETL architecture.
Big Data comes with different architecture. With three characteristics mentioned earlier, Big Data allows all 3 offices in company A to store all their data into a single database. If any party needs to get the reports or analytics they want then they can simple ask the database to handle this job. This type of architecture is called Extract -> Load -> Transform or ELT.