We’re excited to announce the overall availability of Row Filters and Column Masks in Unity Catalog on AWS, Azure and GCP! Managing fine-grained entry controls on rows and columns in tables is vital to make sure knowledge safety and meet compliance. With Unity Catalog, you should use customary SQL features to outline row filters and column masks, permitting fine-grained entry controls on rows and columns. Row Filters allow you to management which subsets of your tables’ rows are seen to hierarchies of teams and customers inside your group. Column Masks allow you to redact your desk values based mostly on the identical dimensions.
“Distributing knowledge governance via Databricks Unity Catalog reworked Akamai’s method to managing and governing knowledge. With Unity Catalog, we are actually managing and governing over six petabytes of knowledge with fine-grained entry controls on rows and columns.”
— Gilad Asulin, Large Information Crew Chief, Akamai
This weblog discusses how one can allow fine-grained entry controls utilizing Row Filters and Column Masks.
What are Coarse-Grained Entity-Degree Permissions?
Earlier than this announcement, Unity Catalog already supported entity-level permissions. For instance, you should use GRANT
and REVOKE
SQL instructions over securable objects resembling tables and features to regulate which customers and teams are allowed to examine, question, or modify them:
USE CATALOG major;
CREATE SCHEMA accounts;
CREATE TABLE accounts.purchase_history(
 amount_cents BIGINT,
 area STRING,
 payment_type STRING,
 purchase_date DATE DEFAULT CURRENT_DATE())
USING DELTA;
We will grant learn entry to the accounts_team
:
GRANT SELECT ON TABLE accounts.purchase_history TO accounts_team;
Now, the accounts_team
has entry to question (however not modify) the purchase_history
desk.
Prior Approaches for Sharing Subsets of Information with Completely different Teams
However what if we’ve got separate accounts groups for various areas? To date, we may create a each day job to repeat subsets of knowledge into totally different tables and set their permissions accordingly:
-- Create a desk for knowledge from the EMEA area and grant
-- learn entry to the corresponding accounts group.
CREATE TABLE accounts.purchase_history_emea(
 amount_cents INT,
 payment_type STRING,
 purchase_date DATE DEFAULT CURRENT_DATE())
USING DELTA;
GRANT SELECT ON TABLE accounts.purchase_history_emea TO accounts_team_emea;
-- Run this each day to replace the customized desk.
-- Use the day gone by to verify all the information is obtainable earlier than
-- copying it.
INSERT INTO accounts.purchase_history_emea
SELECT * EXCEPT (area) FROM accounts.purchase_history
WHERE area = 'EMEA' AND purchase_date = DATE_SUB(CURRENT_DATE(), 1);
Whereas this method successfully addresses question wants, it comes with drawbacks. By duplicating knowledge, we enhance storage and compute utilization. Additionally, the duplicated knowledge lags behind the unique, introducing staleness. Furthermore, this answer caters solely to queries resulting from restricted consumer permissions, limiting write entry to the first desk.
One other technique makes use of dynamic views. Till this level, you possibly can outline a view particularly meant for consumption by particular consumer(s) or group(s):
CREATE VIEW accounts.purchase_history_emea
AS SELECT amount_cents, payment_type, purchase_date
FROM accounts.purchase_history
WHERE area = 'EMEA';
GRANT SELECT ON VIEW accounts.purchase_history_emea
TO accounts_team_emea;
Now we have solved the information copying downside, however customers nonetheless have to recollect to question the accounts.purchase_history_emea
desk if they’re within the EMEA area or the accounts.purchase_history_apac
desk if they’re within the APAC area, and so forth.
Dynamic views from an administrator’s perspective additionally create complexity for a number of causes:
- Should create and preserve quite a few views for every area
- Shared SQL logic is cumbersome to reuse throughout totally different regional groups
- Causes muddle within the Catalog Explorer
- Restricted to queries
- Can not insert or replace knowledge inside views
Introducing Row Filters
With row filters, you possibly can apply predicates to a desk, making certain that solely rows assembly particular standards are returned in subsequent queries.
Every row filter is applied as a SQL user-defined perform (UDF). To start, write a SQL UDF with a boolean end result whose parameter sort(s) are the identical because the column(s) of your goal desk that you simply need to management entry by.
For consistency, let’s proceed utilizing the area
column of the earlier accounts.purchase_history
desk for this function.
CREATE FUNCTION accounts.purchase_history_row_filter(area STRING)
RETURN CASE
 WHEN IS_ACCOUNT_GROUP_MEMBER('emea') THEN area = 'EMEA'
 WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN TRUE
 ELSE FALSE
END;
We will check this logic by performing a number of queries over the goal desk and making use of the perform immediately. For somebody within the accounts_team_emea
group, such a question may appear like this:
SELECT amount_cents,
 area,
 accounts.purchase_history_row_filter(area) AS filtered
FROM accounts.purchase_history;
+--------------+--------+----------+
| amount_cents | area | filtered |
+--------------+--------+----------+
| 42 Â Â Â Â Â | EMEA Â | TRUE Â Â |
| 1042 Â Â Â Â | EMEA Â | TRUE Â Â |
| 2042 Â Â Â Â | APAC Â | FALSE Â Â |
+--------------+--------+----------+
Or for somebody within the admin
group who’s setting the entry management logic within the first place, we discover that each one rows from the desk are returned:
SELECT amount_cents, area, purchase_history_row_filter(area) AS filtered
FROM accounts.purchase_history;
+--------------+--------+----------+
| amount_cents | area | filtered |
+--------------+--------+----------+
| 42 Â Â Â Â Â | EMEA Â | TRUE Â Â |
| 1042 Â Â Â Â | EMEA Â | TRUE Â Â |
| 2042 Â Â Â Â | APAC Â | TRUE Â Â |
+--------------+--------+----------+
Now we’re prepared to use this logic to our goal desk as a coverage perform, and grant learn entry to the accounts_team_emea
group:
ALTER TABLE accounts.purchase_history
SET ROW FILTER accounts.purchase_history_row_filter ON (area);
GRANT SELECT ON TABLE accounts.purchase_history TO accounts_team_emea;
Or, we are able to assign this coverage on to the desk at creation time to verify there isn’t a interval the place the desk exists, however the coverage doesn’t but apply:
CREATE TABLE accounts.purchase_history_emea(
 amount_cents INT,
 payment_type STRING,
 purchase_date DATE DEFAULT CURRENT_DATE())
USING DELTA
WITH ROW FILTER purchase_history_row_filter ON (area);
GRANT SELECT ON TABLE accounts.purchase_history TO accounts_team_emea;
After that, querying from the desk ought to return the subsets of rows similar to the outcomes of our testing above. For instance, the accounts_team_emea
members will obtain the next end result:
SELECT amount_cents, area FROM accounts.purchase_history;
+--------------+--------+
| amount_cents | area |
+--------------+--------+
| 42 Â Â Â Â Â | EMEA Â |
| 1042 Â Â Â Â | EMEA Â |
+--------------+--------+
Now, we are able to share the identical accounts.purchase_history
desk with totally different teams with out copying the information or including many new names into our namespace.
You possibly can view this data on the Catalog Explorer. Trying on the purchase_history
desk, we see {that a} row filter applies:
Clicking on the row filter, we are able to see the coverage perform identify:
Following the “view” button reveals the perform contents:
Introducing Column Masks
We have now demonstrated create and apply fine-grained entry controls to tables utilizing row filters, selectively filtering out rows that the invoking consumer doesn’t have entry to learn at question time. However what if we need to management entry to columns as a substitute, eliding some column values and leaving others intact inside every row?
Right here we announce column masks!
Every column masks can also be applied as a SQL user-defined perform (UDF). Nevertheless, not like row filter features returning boolean outcomes, every column masks coverage perform accepts one argument and returns the identical sort as this enter argument.
Let’s go forward and masks out the acquisition quantity column of the accounts.purchase_history
desk when the worth is a couple of thousand:
CREATE FUNCTION accounts.purchase_history_mask(amount_cents INT)
RETURN CASE
 WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN amount_cents
 WHEN amount_cents < 1000 THEN amount_cents
 ELSE NULL
END;
Now, solely directors have permission to have a look at the acquisition quantities of $10 or larger.
Let’s go forward and check the coverage perform. Non-admin customers see this:
SELECT amount_cents,
 accounts.purchase_history_mask(amount_cents) AS masked,
 area
FROM accounts.purchase_history;
+--------------+--------+----------+
| amount_cents | masked | area  |
+--------------+--------+----------+
| 42 Â Â Â Â Â | 42 Â Â | EMEA Â Â |
| 1042 Â Â Â Â | NULL Â | EMEA Â Â |
| 2042 Â Â Â Â | NULL Â | APAC Â Â |
+--------------+--------+----------+
However directors have entry to view all the information:
SELECT amount_cents,
 accounts.purchase_history_mask(amount_cents) AS masked,
 area
FROM accounts.purchase_history;
+--------------+--------+----------+
| amount_cents | masked | area  |
+--------------+--------+----------+
| 42 Â Â Â Â Â | 42 Â Â | EMEA Â Â |
| 1042 Â Â Â Â | 1042 Â | EMEA Â Â |
| 2042 Â Â Â Â | 2042 Â | APAC Â Â |
+--------------+--------+----------+
Seems nice! Let’s apply the masks to our desk:
ALTER TABLE accounts.purchase_history
ALTER COLUMN amount_cents
SET MASK accounts.purchase_history_mask;
After that, querying from the desk ought to redact particular column values similar to the outcomes of our testing above. For instance, non-administrators will obtain the next end result:
SELECT amount_cents, area FROM accounts.purchase_history;
+--------------+--------+
| amount_cents | area |
+--------------+--------+
| 42 Â Â Â Â Â | EMEA Â |
| NULL Â Â Â Â | EMEA Â |
| NULL Â Â Â Â | APAC Â |
+--------------+--------+
It really works accurately.
We will additionally examine the values of different columns to make our masking choice. For instance, we are able to modify the perform to have a look at the area column as a substitute of the acquisition quantity:
ALTER TABLE accounts.purchase_history ALTER COLUMN amount_cents DROP MASK;
CREATE FUNCTION accounts.purchase_history_region_mask(
 amount_cents INT,
 area STRING)
RETURN CASE
 WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN amount_cents
 WHEN area = 'APAC' THEN amount_cents
 ELSE NULL
END;
Now we are able to apply the masks with the USING COLUMNS
clause to specify the extra column identify(s) to move into the coverage perform:
ALTER TABLE accounts.purchase_history
ALTER COLUMN amount_cents
SET MASK accounts.purchase_history_mask
USING COLUMNS (area);
Thereafter, querying from the desk ought to redact sure column values in another way for non-administrators:
SELECT amount_cents, area FROM accounts.purchase_history;
+--------------+--------+
| amount_cents | area |
+--------------+--------+
| NULL Â Â Â Â | EMEA Â |
| NULL Â Â Â Â | EMEA Â |
| 2042 Â Â Â Â | APAC Â |
+--------------+--------+
We will take a look at the masks by trying on the desk column within the Catalog Explorer:
Like earlier than, following the “view” button reveals the perform contents:
Storing Entry Management Lists in Mapping Tables
Row filter and column masks coverage features nearly all the time must discuss with the present consumer and evaluate it in opposition to an inventory of allowed customers or verify its group memberships in opposition to an specific listing of allowed teams. Itemizing these consumer and group allowlists within the coverage features themselves works nicely for lists of affordable sizes. For bigger lists or instances the place we would like further assurance that the identities of the customers or teams themselves are hidden from view for customers, we are able to make the most of mapping tables as a substitute.
These mapping tables act like customized gatekeepers, deciding which knowledge rows customers or teams can entry in your authentic desk. The fantastic thing about mapping tables lies of their seamless integration with reality tables, making your knowledge safety technique more practical.
This method is a game-changer for numerous customized necessities:
- Tailor-made Consumer Entry: You possibly can impose restrictions based mostly on particular person consumer profiles whereas accommodating particular guidelines for consumer teams. This ensures that every consumer sees solely what they need to.
- Dealing with Complicated Hierarchies: Whether or not it is intricate organizational buildings or numerous units of guidelines, mapping tables can navigate the complexities, making certain that knowledge entry adheres to your distinctive hierarchy.
- Seamless Exterior Mannequin Replication: Replicating complicated safety fashions from exterior supply methods turns into a breeze. Mapping tables provide help to mirror these intricate setups with out breaking a sweat.
For instance:
CREATE TABLE accounts.purchase_history_groups
AS VALUES ('emea'), ('apac') t(group);
CREATE OR REPLACE FUNCTION accounts.purchase_history_row_filter(area STRING)
RETURN EXISTS(SELECT 1 FROM accounts.purchase_history_groups phg
WHERE IS_ACCOUNT_GROUP_MEMBER(phg.group));
Now, we are able to lengthen the accounts.purchase_history_groups
desk to giant numbers of teams with out making the coverage perform itself complicated, and likewise limit entry to the rows of that desk to solely the administrator that created the accounts.purchase_history_row_filter
SQL UDF.
Utilizing Row and Column Degree Safety with Lakehouse Federation
With Lakehouse Federation, Unity Catalog solves vital knowledge administration challenges to simplify how organizations deal with disparate knowledge methods. This gives the flexibility to create a unified view of your whole knowledge property, structured and unstructured, enabling safe entry and exploration for all customers no matter knowledge supply. It permits environment friendly querying and knowledge mixture via a single engine, accelerating numerous knowledge evaluation and AI functions with out requiring knowledge ingestion. Moreover, it gives a constant permission mannequin for knowledge safety, making use of entry guidelines and making certain compliance throughout totally different platforms.
The fine-grained entry controls introduced right here work seamlessly with Lakehouse Federation tables to assist sharing entry to federated tables inside your organizations with customized row and column degree entry insurance policies for various teams, with none want to repeat knowledge or create many duplicate or comparable desk/view names in your catalogs.
For instance, you possibly can create a federated connection to an present MySQL database. Then, browse the Catalog Explorer to examine the overseas catalog:
Contained in the catalog, we discover a mysql_demo_nyc_pizza_rating
desk:
Let’s apply our row filter to that desk:
ALTER TABLE mysql_catalog.qf_mysql_demo_database.mysql_demo_nyc_pizza_rating
SET ROW FILTER major.accounts.purchase_history_row_filter ON (identify);
Trying on the desk overview afterwards, it displays the change:
Clicking on the row filter reveals the identify of the perform, similar to earlier than:
Now, queries over this federated MySQL desk will return totally different subsets of rows relying on every invoking consumer’s identification and group memberships. We have efficiently built-in fine-grained entry management with Lakehouse Federation, leading to simplified usability and unified governance for Delta Lake and MySQL tables in the identical group.
Getting began with Row and Column Degree Safety
With Row Filters and Column Masks, you now acquire the ability to streamline your knowledge administration, making extreme ETL pipelines and knowledge copies a factor of the previous. That is your gateway to a brand new world of unified knowledge safety, the place you possibly can confidently share knowledge with a number of customers and teams, all whereas sustaining management and making certain that delicate data stays protected.
To get began with Row Filters and Column Masks, take a look at our documentation on AWS and Azure and GCP.
Our staff will focus on this launch and different superior entry controls in Unity Catalog in our Information + AI Summit 2024 session, “Attribute-Based mostly Entry Controls in Unity Catalog—Constructing a Scalable Entry Administration Framework.” We hope to see you the week of June tenth. Register for Information + AI Summit at the moment!