I recently wrote a post about how the semantic layer can really help with information security. When I posted it on LinkedIn, I tagged some folks who I knew would be interested and are data security experts. Bart Vandekerckhove was one of these people and he replied mentioning a term I hadn’t heard before: ABAC (Attribute-based Access Control), and also agreeing about the potential of semantic layers for infosec.
I ask Bart some questions below on ABAC, RBAC and what he thinks the role of semantic layers will be in infosec, and in relation to these frameworks.
Bart, can you tell us a bit about yourself? Background, what you do now, skillset etc?
Before co-founding Raito, I built out Collibra’s privacy offering as Sr. Product Manager Privacy. Before that, I was a Sr. Financial Risk Consultant at Deloitte. I’ve always been fascinated by ways you can have strong risk management while fostering innovation. It’s a very tight rope to walk, but I’m convinced it’s essential for an organisation’s success.
What problems have you been seeing with infosec, that has led to your research of ABAC?
We focus on access management for BI and AI, where you’re working with enormous volumes of data and lots of change. Particularly for larger organisations, we noticed that today’s IAM technology does not perform well in dynamic data stacks, resulting in very slow and expensive access management workflows. The symptoms are well known; data consumers have to wait a long time for access to data, and data engineers have to spend an inordinate amount of time on access management. A more modern approach to data access management that is built on collaboration and automation, and that is tightly integrated with DevOps is more scalable. ABAC is part of that.
Can you compare it to RBAC? Why is it better?
Each has its merits. ABAC are basically policies to dynamically grant/deny access or mask data based on the data- and user attributes. You define them centrally and they’re automatically enforced across your data. RBAC are in a way also a codification of policies, but they’re more flexible than ABAC. You codify roles that correspond with how the business uses data, whether it’s in daily processes or dedicated projects, and then grant those roles permissions to data. However, contrary to ABAC, you can easily add or remove users and groups to those roles. Under ABAC, a user’s permissions only change when their attributes (i.e. their department, seniority, region) change. This flexibility makes RBAC the more popular model, but it exposes the organisation to role explosions which you don’t have under ABAC. The most flexible way of managing access is Access Control Lists (ACL), where individual users are given access to individual objects. This way of managing access is highly flexible making it ideal for very dynamic organisations, but they become unwieldy very quickly and can eventually pose significant security risks.
Does the semantic layer with RBAC on entities implement ABAC by default?
Partly. By granting the roles permissions on the semantic layer, you’re making the scope of the role dynamic. The advantage is that by dynamically granting access based on the semantic meaning of the underlying data, your data warehouse is secure by design and your permission model is flexible to change as it is not tightly coupled to the physical data. In this way, RBAC on the semantic layer gets close to the strengths of ABAC. However, ABAC is not limited to using the attributes of the data (i.e. semantic meaning). It also uses the user attributes, and can even include device attributes and the time of day. Nevertheless, RBAC on the semantic layer already makes for a very scalable model. I would even think that this will eventually become that preferred way of managing access to analytical data.
How should a semantic layer implement ABAC in your opinion? Also is there any method you think could be even better?
I think the semantic layer will implement ABAC and RBAC almost identically, because it lets you dynamically define the permissions using the semantic meaning of data. Where ABAC and RBAC differ is in the ways those permissions are assigned to users and groups. In the case of ABAC this is done dynamically based on the users’ attributes, whereas for RBAC this is fixed. The verdict is still out but I think RBAC on the semantic model will become the de facto model. Why? It’s a matter of ownership and control. Data teams have strong control over the data attributes which are managed in the semantic model, but less so over the user attributes which are managed in HR systems and other business applications. This lack of control leads to poor metadata quality resulting in very unreliable ABAC.
Ah, I see now - so a customer service role could have customer and PII-required attributes, and therefore automatically be given access to the customer entity from the semantic layer. Likewise, an account manager role could have customer and revenue attributes, and be automatically given access to the customer entity without PII access and also the value field from the orders entity for the customers they are allowed to see.
So the difference would be that the attributes of the roles are mapped to and define automatic access to objects in the semantic layer, rather than the roles themselves - another potentially useful abstraction.
I can see how this could make things more flexible, but you would end up relying on someone to create and maintain roles and their attributes. It sounds like a collaboration between HR and Infosec, where HR defines a role and what it does, and Infosec then know which attributes to assign to the role. I don’t know how often HR and Infosec actually work this well and closely in organisations…
I do agree that you get most of the way there by just mapping the roles to the objects in the semantic layer - RBAC on semantic layer.
What model does Raito use?
Raito’s users can use RBAC, ABAC and hybrid crossover access controls. This means that data product owners can start with traditional RBAC where permissions and users/groups are fixed, and evolve those into RBAC on the semantic model where permissions are dynamically set. Optionally, they can also dynamically assign those permissions to users using their attributes. This way, Raito supports data teams at every maturity.
Does this mean Raito has its own semantic model which defines what the data means? Does Raito also support third party semantic layers like Cube?
No, we ingest the metadata from other solutions. No integration with Cube yet, but technically possible.
Thanks so much for collaborating on this post, Bart!
Thanks for the great conversation, David!