Data Warehouse as a Service
When we think about Data warehouses, it’s always about expensive dedicated hardware along with huge software licensing fees. You have to pay upfront for both the hardware and software along with the costs associated with setting up and installing them. This would require you to have DBA and networking teams in place to ensure smooth deployment and continuous maintenance.
Small enterprises cannot afford data warehouses, and loose the competitive edge vis a vis larger organizations.
For Larger Organizations, the challenges are different, while the average growth in enterprise data is at 50% year on year, data warehousing is not growing at the same pace. This results in a lot of data being left out of the Data Warehousing and Business Intelligence process.
Enter Amazon Redshift.
Amazon Redshift is Data Warehousing on Cloud by Amazon Web Services. It is a fully managed, petabyte scale data warehouse.
Amazon Redshift turns the Data Warehousing economics upside down. The best thing about Amazon Redshift is that you can provision it within minutes, doing away with the routine heavy lifting of setting up hardware and installing software to start using a data warehouse.
With Redshift, you do away with all the upfront investments required for hardware or software. It is a pay as you go service and is priced to analyze all your data. It is extremely fast and it is cheaper than most options available in the market today.
Key features of Amazon Redshift
Redshift reduces I/O Operations
Redshift provides columnar data storage. With Columnar data storage, all values for a particular column are stored contiguously on the disk in sequential blocks.
Columnar data storage helps reduce the I/O requests made to the disk compared to a traditional row based data storage. It also reduces the amount of data loaded from the disk improving the processing speed, as more memory is available for query executions.
As similar data is stored sequentially, Redshift compresses the data rather efficiently. Compression of data further reduces the amount of I/O required for queries.
Redshift is implemented using a Massively parallel processing architecture
Amazon Redshift has a Massively Parallel Processing Architecture. MPP enables Redshift to distribute and parallelize queries across multiple nodes. Apart from queries, the MPP architecture also enables parallel operations for data loads, backups and restores.
Redshift architecture is inherently parallel; there is no additional tuning or overheads for distribution of loads for the end users.
Redshift has security built in
Amazon provides various security features for Redshift just like all other AWS services. Access Control can be maintained at the account level using IAM roles. For data base level access control, you can define Redshift database groups and users and restrict access to specific database and tables.