big data blog

Big Data: Securing Access and Taking Responsibility

Big Data has been a critical trend in IT over the past decade, and as more data is consistently created, the question for businesses is where it is stored, who has access to it and how they access it. Dan Raywood looks into these issues.

Over the past decade, the concept of Big Data has become more prevalent as more data is created and technologies are required to analyse and process it.

To determine the challenges, it is first important to understand where Big Data comes from. According to a 2001 white paper by Doug Laney, Big Data is created by: Volume, in that it consists of enormous quantities of data; Velocity, as it is created in real-time; and Variety, as there are structured, semi-structured and unstructured types.

A blog by SAS further pointed to the volume increase by organisations collecting data from various sources, including transactions, smart (IoT) devices, industrial equipment, videos, images, audio, social media and more. Also, the amount of data created by these sources has increased the velocity, as “data streams into businesses at an unprecedented speed and must be handled in a timely manner.”

SAS also pointed to two more elements: Variability, as data flows are unpredictable, and businesses need to manage daily, seasonal and event-triggered peak data loads; and Veracity, referring to data quality. “Because data comes from so many different sources, it’s difficult to link, match, cleanse and transform data across systems.”

Therefore, Big Data consists of a large number of instances, and this has created these ‘data lakes’ which are proving to be hard to manage.

Earlier this year, I attended a conference in London where one of the speakers, Tim Ayling, VP EMEA data security specialists at Imperva, talked about the challenges of Big Data, claiming we “don’t spend enough time looking at data because of sheer complexity” and asked the audience if they could identify where their data is, who has access to it, and how they access it. 

In a conversation with ISMS after the event, we asked him why he thinks managing and controlling access is a big challenge for businesses. He claims that the amount of data is among the reasons for the problem. There is “a dawning for people that data is now everywhere, and there’s so much of it and some of it we know about, some of it is shadow data which could be anywhere and it’s just been created without any process behind it.” This has caused the large variety of data businesses now have to deal with.

He claims that before Big Data became the norm, data was often in Oracle or SQL databases, but now it is structured and unstructured and in databases that companies don’t even know about. 

On the access side, the challenge is about multiple stakeholders having access, and that is where control is lost. Ayling agrees, saying humans want to make things as easy as possible for themselves, and despite the corporate policy, employees will still use consumer tools to transfer files for efficiency.

Rowenna Fielding, director of Miss IG Geek Ltd, says that access to ‘Big Data’ is no different from access controls for any other information asset – it should be on a need-to-know and least-privilege.

“The challenge with the fashion for ‘data lakes’ is that purposes are often speculative, i.e. ‘run it all through an algorithm and see if anything useful turns up’, which makes it difficult to verify and limit access by purpose,” she says.

“Ideally, a ‘Big Data’ organisation should have a team of ethics-aware and law-savvy data analysts who can work with stakeholders to turn enquiries into data queries (and be able to advise on limitations or biases within the set) rather than granting direct access, but the proliferation of cloud platforms are designed to make data mining easy and accessible, [and this] deter organisations from imposing – or even considering – putting restrictive controls in place.”

An option for managed access could be a role-based policy. Spectral says this can help control access over the many layers of Big Data pipelines. “The principle of least privileges is a good reference point for access control by limiting access to only the tools and data that are strictly necessary to perform a user’s tasks.”

Is the concept of least privilege, or even Zero Trust, and need-to-know access possible? Ayling says the obvious answer is ‘yes’ because it means everyone has the correct privileges, “but dreaming about it and doing it are very different things.” 

The average user is granted access to files and applications, but how often do they call the helpdesk or an administrator and tell them they no longer need access to it? Ayling says: “We’ve all been at places where you need a level of access for something for one particular task, and the easiest thing is to give you global access, and you’re in, and you literally got everything because that’s the easy thing to do. That’s not unusual at all.”

On controlling access to Big Data, Fielding says it is “a mistake to be too prescriptive about control methods and mechanisms because there’s no one-size-fits-all answer.”

She says the same principles of confidentiality, integrity and availability apply just as much to large-scale, informally-structured data as they do to spreadsheets or emails or laptops; however, the process of figuring out who needs what, when and where gets more complicated.

Access management is more than just knowing who has access and ensuring that the attackers do not get into your network. It’s about limiting access to data and ensuring that employees can only view information relevant to their work. This is a critical factor of ISO 27001 Annex A.9, and as Big Data has become more apparent in how the world works, limiting access to it must also be considered and should be a part of your risk strategy. 

Big Data is a millennium trend of overall IT, and we in cybersecurity have focused on analysing it, storing it and accessing it. That final point is the most critical for data and information security: access controls must be put on Big Data at the earliest opportunity, as the volume, velocity and variety are constantly increasing, and you don’t want to be too late to take action. 

Streamline your workflow with our new Jira integration! Learn more here.