A Bird’s Eye View of Data Governance

Samyukta Hariharan
3 min readFeb 15, 2024

--

Data governance is a term that is today in global focus. We all realise that the scope of data governance is vast and the grounds to cover are ever expanding. Via this article, I would like to define what data governance looks like in a broader sense and provide some actionable techniques to tap the world of safe and useful data.

Strengthening our roots with Data Quality

As it is a common knowledge today that data is the most important factor of a machine learning pipeline, and that without quality data, we cannot improve our ML systems, it is important to understand what quality data comprises of.

Broadly, the data quality framework can be categorized into the below subdomains:

  1. Granularity

Database level → Table Level → Record Level → Metadata Level

One may ask, why is metadata level quality important?

It could be the most important data you need to preserve. Metadata has multiple roles to play -

> It provides complete context of the relationships between datasets, including but not limited to the source, identifiers, authors and so on. This is important for lineage to trace back data quality issues, should it arise from the source.

> Meta-learning is a key field in the Artificial Intelligence domain, which utilizes metadata of machine learning algorithms, to optimize learning utilizing lesser data. Imagine the scope of research compromised with faulty metadata!

2. Quality Metrics

Consistency, Completeness, Accuracy, Validity → Integrity.

Wait, data needs integrity? I thought that was a human characteristic!

Just as humans need integrity to carry out tasks in the right manner all their life, data integrity ensures that over its complete life-cycle, all quality metrics of the data are maintained.

Looking into the broader aspects with Data Security

Data security plays a crucial role in protecting data over transmission and during storage.

When data passes over the network, either between us and an online application we are using, or us and an app, it is important that it undergoes the right encryption methods to safeguard leakage of confidential data, as well as kept hidden from attackers.

When data is stored within databases, again, confidentiality is key in terms of how we store the data. Personal information, passwords, and so on need to be encrypted, as well as safeguarded to avoid breaches. Network breaches not only compromise personal data, but also lead to availability issues, since the attacker would ensure that the data network is cut-off to the end users.

Now at the actual bird’s eye view with Data Privacy

The phrase Data Privacy is quite self explanatory, keeping private data safe within an organization. But how does this differ from mere encryption of data within databases?

There are two aspects to data privacy.

One that is implemented by organizations as a part of their compliance and regulatory requirements.

When customers share their private details with a website/organization, they do so with the trust on the “Privacy Policy” that an organization claims. This Privacy Policy covers the way private data is going to be stored and utilised by the organization.

The other that is implemented by countries to safeguard the data of their citizens

Some examples of regulations with international implications are as follows

  1. European Union’s GDPR (General Data Protection Regulation)
  2. California Consumer Privacy Act (CPRA)
  3. Australia’s Privacy Act 1988
  4. India’s Digital Personal Data Protection Act (DPDP)

These regulations provide guidelines’ to organizations to handle data of their citizens. This makes it very important for organizations to adhere to individual rules and regulations of all their users based on which country they come from. All these privacy policies follow common themes (with exceptions as well):

  1. Consent — Users should be allowed to provide and withdraw consent for their data to be either shared, or even used to provide personalized services
  2. Data minimization — Personal data collected should be only used for what it was intended, and minimal data should be collected for the intended purposes. This way any breach would have lesser impact on the end users.
  3. Rights — Users should have the rights to modify the data they submit, as well as delete it if they wish.
  4. Global consideration of laws — Even if an organization is based outside the country, so as long as the citizen utilizes its product, the organization is to follow these rules.

There are of course a lot of intricacies to these rules that are beyond the scope of this article.

Thanks for reading!

.

.

.

Bibliography:

https://blog.netwrix.com/2023/09/18/international-data-privacy-laws/#:~:text=In%20total%2C%2071%25%20of%20countries,ensure%20compliance%20with%20these%20laws.

https://www.wilmerhale.com/en/insights/blogs/wilmerhale-privacy-and-cybersecurity-law/20230818-india-passes-long-awaited-privacy-law

--

--

Samyukta Hariharan
Samyukta Hariharan

Written by Samyukta Hariharan

Research Engineer in AI/Data. Learning and writing about all things Data.

No responses yet