Related posts for ‘#Blog’

How to avoid becoming the next Ticketmaster or Santander: Public vs private cloud ETL tools in data breaches

In a nutshell, a hack against customers of cloud storage firm, Snowflake appears like it may turn into one of the most monumental data-breaches in history. Discover how it happened in the context of private vs public cloud ETL tools, and learn how to choose the best solutions for your company.

What happened to TicketMaster and Santander?

In a nutshell, a hack against customers of cloud storage firm, Snowflake appears like it may turn into one of the most monumental data-breaches in history.

Hacker group ShinyHunters, claim to be selling 560 million records from Ticketmaster and 30 million from Santander.

Snowflake, a public-cloud-services company which allows companies to store massive datasets on its servers, revealed that hackers had been attempting to access its customers’ accounts using stolen login details from one of their employees. 

Since then, Snowflake first said a “limited number” of customer accounts had been accessed, however, cybercriminals have publicly claimed to be selling stolen data relating to Snowflake accounts. It has been reported by the likes of TechCrunch and Wired that hundreds of Snowflake customer passwords have been found online. 

There’s also a snowball effect at play here, because in recent days a BreachForums account, with the handle Sp1d3r has posted two more companies whose data it claims is related to the Snowflake incident:

1) A financial services company, LendingTree and subsidiary QuoteWizard (alleged 190 million customers’ details).

2) Automotive giant Advance Auto Parts (alleged 380 million customers’ details) 

Not a good look for Snowflake, and certainly not a good outcome for the victim customers.

Let’s get into how situations like this can be prevented by understanding the difference between public vendor vs private cloud ETL tools.

Public vendor vs private cloud ETL tools

Firstly, some quick context:

– Public clouds involve hosting of data infrastructure with internet facing (public) endpoints on top of a cloud provider such as AWS, Azure, Google Cloud, which the likes of Snowflake use to provide services to Ticketmaster and Santander. Essentially, cloud computing resources are fully managed by these third parties in multi-tenant environments.

– Private clouds on the other hand (also known as an internal cloud or corporate cloud), is a cloud computing environment in which all hardware and software resources are dedicated exclusively to a single company and any authorised partnering individuals.

– ETL stands for Extract, Transform, and Load. ETL tools are a set of software tools that are used to extract data from one or more sources, transform it into a consistent and clean format, and load it into a target system or database.

Public Vendor Cloud ETL Tools are offered by third-party vendors and are hosted on a public cloud platform – just like Snowflake does for its database infrastructure. These tools allow users to manage their data flow via one interface which links to both a variety of data sources and destinations. Public Vendor Cloud ETL Tools offer a wide range of integrations and examples include Stitch Data, FiveTran, Rivery, Keboola and many others.

Private Cloud ETL Tools, on the other hand, are hosted on a company’s private cloud infrastructure and offer more control over data security and compliance, as the data resides within the company’s private network. Often, private cloud networks are thought of as being less flexible than public vendor cloud ETL tools, but as we get into the next sections – you’ll see this is definitely not always the case…

Pros and cons of private and public cloud ETL Tools

Public Vendor Cloud ETL Tools: Pros

  • Ease of Use: They are typically easier to set up and designed with user-friendliness in mind.
  • Scalability: They can easily scale up or down based on the data volume and computational needs.
  • Cost-Effective: They operate on a pay-as-you-go model, which can be more cost-effective for organisations with fluctuating data processing needs.
  • Maintenance: The vendor takes care of all the maintenance, updates, and upgrades.

Public Vendor Cloud ETL Tools: Cons

  • Data Security: Since data is stored in a shared environment, there are increased data security concerns depending on cloud partners and SLAs. 
    • With more people having access to a shared environment, security can be more ‘flakey’ seeing as there are more opportunities for hackers and bad actors.
  • Limited Customization: They may not offer the same level of customization or control compared to private cloud ETL tools.
  • Performance: They may not be ideal for real-time or on-demand data access as they are typically geared towards batch processing.

Private Cloud ETL Tools: Pros

  • Data Security: They offer more control over data security and compliance, as the data resides within the company’s private network.
  • Scalability: Private cloud ETL tools can also easily scale up or down to handle fluctuating data volumes and processing demands – it just requires a partner/IT team that knows what they’re doing.
  • Customization: They offer a higher level of customization and control over the ETL process.

Private Cloud ETL Tools: Cons

  • Cost: They can be more expensive to set up and maintain, as they may require dedicated infrastructure and resources.
  • Maintenance: They require ongoing maintenance and management, which can be resource-intensive in cases of sub-optimal planning.
  • Performance, quality, flexibility and support SLAs are highly infrastructure and partner dependent.
    • All the advantages of public cloud infrastructure can be achieved in private cloud environments using reputable ETL tools – it’s just that not every company knows what they’re doing. 
    • Pro Tip: If you’re considering private cloud infrastructure, insist on a partner with a track-record of success and strong support SLAs.

When to consider private cloud ETL Tools

Central to this decision is of course company size, budget, compliance requirements and general ‘data risk’ (such as for larger companies in general, and especially for industries like healthcare), but these are the key considerations for when cloud ETL tools should be considered:

  • Data Security and Compliance: If your organisation has stringent data security requirements or needs to comply with specific regulations, private cloud ETL tools can provide more control over data security and compliance.
  • Customization: If your organisation requires a high level of customization in the ETL process, private cloud ETL tools can offer more flexibility.
  • Resource Availability: If your organisation has the necessary resources (like a dedicated IT team or reputable partner) to set up and maintain a private cloud ETL tool, it can be a viable option.
  • Data Volume: If the volume of data you’re processing is relatively small or you don’t need real-time updates, then private cloud ETL tools that improve your current ETL process might be a better option at a lower cost.
  • Destination of Your Data: If you have specific requirements about the intended destination of your data after processing, private cloud ETL tools often provide more control.

The future-proof conclusion on private cloud ETL tools

It’s true that most ETL tools harnessing private cloud infrastructure come with cost and maintenance burdens, so the approach and choice of partner should always be central to the decision. Most of the time, these burdens are a result of poor planning and sub-optimal architecture.

Because the team at Matatika understand the merits of both options, and are well-versed in harnessing ETL tools in various cloud contexts and industries – we offer both private and public cloud solutions. A key difference lies in the fact that we offer both with the same high levels of support, security, upgrades, performance, scalability, and operational stability. 

In fact, Matatika provides the complete set of ETL tools to rapidly load data from 500+ sources into your data warehouse. From there, we specialise in not only ensuring data from multiple sources is accurate and consolidated, but also transforming the data in innovative ways using reputable ETL tools. From smart visualisation, to leading BI insights delivered by advanced AI analysis – Matatika deliver solutions which are efficient, scalable built-for-purpose and most importantly, ready for the future of tech and AI.

If you’d like to get more detailed information, feel free to download the “Ultimate Guide to Getting More from Your Data”, or if you’d like to discuss any data opportunities your company has right now and get some helpful consultation, use the button below.

Ultimate Guide to ETL Tools for Modern Business Intelligence

This blog aims to clarify what ETL has meant traditionally, and what modern ETL tools can do for operational efficiency and tech-ROI right now. We’ll also go over the key considerations to help you opt for a top set of tools, and importantly – the right partner...


Your guide to loading quality test data

Help! What do we mean by test data? Test data should be realistic in volume, shape, and freshness. Often, when developing a new model or dashboard, it is these unexpected elements of the data that make a project unstable or cause stakeholders to lose trust in your analytics because something “doesn’t look right”. Keep their trust by developing with realistic, but safe to use and sanitised, test data.