CIOs: 5 Big Data Operational Changes To Make Now

Operational Changes for Big Data

Preparing Your Organization for GDPR Compliance

The threat of a $24 million fine is enough to make any organization sit up and listen to what changes they must make to adhere to new European Union laws on data protection. But, in preparing for General Data Protection Regulation (GDPR), are U.S. companies focused too much on the “data” in their big data clusters? David Dingwall, of Fox Technologies, believes so. He says putting these clusters through GDPR compliance is dependent on some fundamental technical setups. Getting the “plumbing” wrong can bypass all that expensive compliance process review work and cause your organization to fail audit reviews.

The beauty of building extra-large Linux clusters is that it’s easy. Hadoop, OpenStack, hypervisor and HPC installers enable you to build on commodity hardware and deal with node failure reasonably simply. However, a minimum fine of at least €20 million (US$24 million) for a GDPR violation does make you focus on how auditors are going to treat their review of your organization’s people-related data storage and manipulation.

Most of the GDPR review articles you may have read in the last 12 months reinforce that privacy and encryption of people data is hugely important. Multiple layers of encryption for data at rest and in transit through your infrastructure is appropriate.  However, when dealing with new big data infrastructures, crucial audit areas of concern include being clear how the software manipulates, aggregates, anonymizes or de-anonymizes (soon to be illegal in the U.K.) people data.

There are some key lessons from the financial services marketplace, which have been using Linux-based HPC and blade clusters for data modelling and forecasting for the last 15 years, especially the operational planning and setup that make ongoing cycles audits easier to complete.

Big Data Cluster Fundamentals: The Large Sausage Machine Without Real People

There is temptation to build a new data-processing cluster on a standalone network to constrict data movement, with supplemental admin access on a second corporate LAN interface. Once loaded, however, like an Oracle database in the past, a data work package for Hadoop and HPC clusters tends to execute all running data transforming tasks in a cluster with a single account (e.g., “hadoop”), not the submitting user ID.

Audit needs to prove not just how personal data is stored, but also how data is manipulated. Therefore, this includes understanding who on your staff can create, change or log in at these application-specific accounts, or worse, the operating system root account. 

Here are the five big data operational changes to make.

  1. Too Many Setup Options, Not Enough Certified (People) Installers
  2. Ensure Your Administrators are Real People
  3. Visibility into Your Organization’s SIEM, and Needing to Track Correlated Events
  4. Give Auditors the Right Tools to Do their Jobs – Your Admin Staff are too Busy Running the Business!
  5. Certify Your Organization, Not Just the Big Data Cluster

To read the full article with expanded explanations of each item above, visit

Are you prepared for GDPR compliance? Take the free assessment now to find out.