Sharing access to synchronized Shared Metadata Model objects in Azure Synapse Analytics
Shared Metadata Model / Shared Metastore recap
As described in the previous article What is the Shared Metadata Model in Azure Synapse Analytics, and why should I use it?, the shared metadata model synchronizes Spark tables with SQL Serverless such that you can query your Spark table data using normal T-SQL through the SQL Serverless endpoint.
Remember, the underlying data that forms these Spark tables isn't stored in the SQL Serverless DB - rather the underlying data is stored in your ADLS account. So, just like when querying data using OPENROWSET
, you must make sure that the caller has the appropriate permissions to read the corresponding files in the lake. This can be done using RBAC (+ABAC) roles at the storage account/filesystem level, or by using ACLs at the directory/file level.
In this article we'll take a look at various scenarios you might encounter when configuring access control on the synchronized objects in the shared metadata model.
How to assign access to the Shared Metadata Model objects
Up until recently, there were two primary ways of assigning access to the synchronized objects:
- Assign the
sysadmin
server role to the targetLOGIN
. This can be directly assigned, or can be inherited by the Synapse Administrator or Synapse SQL Administrator RBAC roles. - Grant a
LOGIN
theCONNECT ANY DATABASE
andSELECT ALL USER SECURABLES
permissions.
(as described in the official docs How to set up access control on synchronized objects in serverless SQL pool).
On their own, these were both quite high-privileged permissions sets, as they applied to all replicated Spark databases. So this wasn't ideal if you wanted to reduce the scope of permissions. (Although in conjunction with storage-level permissions, it was possible to further lock-down what end-users could see, even if they were assigned to these roles.)
But recently there's been an update to Synapse to allow the creation of customized objects in the Spark replicated database. This update provides us with another option:
- Create a
USER
on specific replicated database, and assigndb_datareader
(or some other role that SQL Serverless supports - see Database-level roles). This must be an AAD security principal (User/Group/App), as currently this is the only authentication mechanism supported by the synchronized tables.
This is much better - you can now assign USERs access to individual databases and add them to a role (most likely db_datareader
). Unfortunately GRANT
statements aren't supported, so you can't assign more granular access to objects within the database. Maybe that will come in the future - who knows.
Here's the code snippet (creating a contained AAD user):
USE [MyReplicatedSparkDB];
CREATE USER [foo.bar@contoso.com] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [foo.bar@contoso.com];
So long as the underlying storage access has been assigned for the backing data of the synchronized tables (using the Storage Blob Data * RBAC permissions, or using ACLs), there shouldn't be anything more to do.
Common errors
Login failed for user '<token-identified principal>'.
This is common if you're logging in via SSMS or Azure Data Studio, or some other SQL IDE. This error is nothing to do with the Shared Metadata Model - rather the AAD user trying to access doesn't have access to the Serverless pool at all. If you're using Synapse RBAC, then the user needs to be granted either the Synapse Administrator or Synapse SQL Administrator roles. Otherwise they need to be given explicit SQL permissions on the server and/or DB.
The SELECT permission was denied on the object...
USER can log-in to the database but has no permissions to access any of the objects (or at least no "SELECT" permissions). They'll need to be assigned to a role like db_datareader
to be granted access to the objects. Or, if this is a non-replicated database, then more granular GRANT
s can be assigned.
You do not have permission to use the bulk load statement.
In order to use the OPENROWSET
command (directly or indirectly (e.g. via a VIEW
)), the USER
needs to be granted the ADMINISTER DATABASE BULK OPERATIONS
permission.
File 'https://...' cannot be opened because it does not exist or it is used by another process.
We've finally reached an error caused by storage level permissions! This implies that the AAD user (or the identity under which the call is being made) doesn't have the appropriate storage level permissions. If it's an AAD identity (which it will be for queries against synchronized objects in the replicated DB), these permissions can be assigned either through the Storage Blob Data * RBAC permissions, or by ACLs against the filesystem/directories/files.
External table '...' is not accessible because content of directory cannot be listed.
Another storage level permissions issue: this is when a user has access to a database and its objects (synchronized external table in this instance), but doesn't have appropriate permissions to the underlying files in the lake. Again, an RBAC role or ACLs are required.