Authentication in gRPC Frontend

# Authentication in gRPC Frontend ## Intro There are three different entities that will need to communicate with the gRPC Server: - EOS / dCache / disk system frontend in general - Automated scripts such as Rundeck - Operators ## Solution: Kerberos Authentication This applies to Operators who will run `cta-admin` commands. (Could it also apply to automated scripts, by using a service account? `tapeops`?) We already have an implementation of Kerberos Authentication provided in the PoC by dCache/Jacek. It expects to find the KRB5 keytab/token in the configuration file `Keytab` entry. ### TODO: ask @jleduc to help outline Kerberos restrictions #### Issues with using Kerberos - It is too permissive. A service account would then be able to do everything a human operator can do and we don’t want this. - It is better not to share the same password across all instances. - Kerberos is only an option inside CERN. What about more universal options? ## Solution: Implement SSS ourselves ### Questions #### If we already have an encrypted channel (provided by TLS), do we need to encrypt messages with the shared secret? Bottom line is, if the client holds the shared secret, then this suffices to authenticate them. Since the channel is encrypted, we could simply send the shared secret string as part of the message? ## Solution: Token Authentication This solution should be applicable in all cases. **Open question**: Who would we use as a token provider? How to generate tokens? #### Could we use CERN SSO as token provider? **Update 26.02.2025**: checked with CERN SSO team, was told they can only provide tokens for Web Applications. So this is **not applicable to us**. #### Could we use Keycloak as a token provider? (CERN-SSO uses Keycloak too, here is why: https://auth.docs.cern.ch/documents/why-keycloak/)  The following sections are written with an OpenID Connect JWT token in mind. ### Signing the token The token is given to the client already signed by the issuing authority. The client simply passes the token in the header when making a call. ### Validating the token All the client does is pass the signed token. Then the server will again compute the signature using the token contents and the secret only the server knows, and make sure it matches the one on the passed token. Information on how to compute the signature is passed in the token header, which contains the signing algorithm used. ### Client-side gRPC implementation ### Server-side gRPC implementation ### Questions #### If CTA Frontend is not the issuing authority of the tokens, how can it perform token validation? For Keycloak specifically, this is a relevant part of the documentation: https://www.keycloak.org/securing-apps/oidc-layers#_validating_access_tokens We can cache the public key of the issuing authority, and use that to validate the passed token. #### What happens when the token expires? How to refresh it? ## Solution: Mutual TLS (mTLS) Frequently used in communications between microservices (two services who will authenticate each other and then communicate) Requires a Certificate provider, Certification Authority and both client certificate and server certificate. This solution applies to the physics workflow events, since communication will be between a single client (EOS or dCache) and a single server (CTA Frontend). At server configuration/setup time, we need to pass to the server the CA certificate that will authenticate the client, and to the client we need to pass the CA cert that will authenticate the server. ### Implementation Implementation seems quite simple on our side, because gRPC is capable of handling the intricacies internally, and for us it would mostly be a matter of setting up the client and server somewhat like this: #### Server-side implementation ```c++ std::string server_cert = "server.crt"; std::string server_key = "server.key"; std::string ca_cert = "ca.crt"; // This is for the CA to validate the client grpc::SslServerCredentialsOptions::PemKeyCertPair key_cert_pair = {server_key, server_cert}; grpc::SslServerCredentialsOptions ssl_opts; ssl_opts.pem_root_certs = ca_cert; // Load CA certificate to verify clients ssl_opts.pem_key_cert_pairs.push_back(key_cert_pair); ssl_opts.client_certificate_request = grpc::SslServerCredentialsOptions::REQUEST_AND_REQUIRE_CLIENT_CERTIFICATE_AND_VERIFY; // Enforce mTLS auto server_credentials = grpc::SslServerCredentials(ssl_opts); grpc::ServerBuilder builder; builder.AddListeningPort("localhost:10955", server_credentials); ``` #### Client-side implementation ```c++ std::string client_cert = "client.crt"; std::string client_key = "client.key"; std::string ca_cert = "ca.crt"; // CA to verify the server grpc::SslCredentialsOptions ssl_opts; ssl_opts.pem_root_certs = ca_cert; // Load CA certificate to verify the server ssl_opts.pem_cert_chain = client_cert; // Load client certificate ssl_opts.pem_private_key = client_key; // Load client private key auto channel_credentials = grpc::SslCredentials(ssl_opts); auto channel = grpc::CreateChannel("cta-frontend:10955", channel_credentials); ``` ### Questions #### What happens in case the certificate is revoked? To check the validity of a request, we need to check the CRL (certificate revocation list). It seems that the gRPC framework can handle this when using SSL (internally relying on openssl to do it) with the function `X509_verify_cert` - details here: https://linux.die.net/man/1/verify. Apparently `verify_cert` can check for CRL, I think we just need to set the appropriate ssl_options of the `SslServerCredentials` or `TlsServerCredentials` when creating the client-server communication channels. I am 99% sure this applies to both TlsServerCredentials and SslServerCredentials (this one for sure) Looking at the source code: The function that checks is the CustomVerificationFunction <- tsi_create_ssl_client_handshaker_factory_with_options <- #### How do we refresh the certificate in case of revocation/expiry? Do we need to restart the server? Apparently not, `TlsServerCredentials` API supports this. Unfortunately this API is still experimental - I checked in the source code https://stackoverflow.com/questions/77448633/is-there-a-way-to-refresh-update-the-server-certificate-pair-in-c-grpc-server The stable `SslServerCredentials` API does not support it. ### Summing up on mTLS * `TlsServerCredentials` API useful for authentication of Physics WorkFlow Events. * Simple to implement on our side * **BUT**, experimental API --> we discussed and agreed this is not a problem, as it is supported (non experimental) by several other languages, so the C++ one is not going away. ## Information on dCache setup Discussed with Jacek and Tigran. - dCache uses two different gRPC servers. The one that serves the workflow events does not support any authentication method. The one serving the admin commands supports Kerberos authentication. - They have considered mTLS and Token Authentication using OpenID tokens. But these are not implemented atm. ## Summary - to discuss today on dev meeting: | Solution | WFE | cta-admin by operators | cta-admin by scripts | Pros | Cons | | ---------- | --- | ---------------------- | -------------------- | --------------- | ---------------- | | mTLS | yes | no | no | Simple to setup | Experimental API | | KRB5 token | no | yes | yes? | ? | ? | | Token Auth | yes | yes | yes? | widely applicable, supports scopes | Need a token provider | | Own SSS implementation | yes | yes | yes? | would work like XRootD | could be complicated? | * dCache setup Discussed today (27/2/2025) * mTLS: on production in fact we have ~20 clients, not a single one. So need to revise if that setup would work for mTLS. **Answer**: Yes, mutual TLS would work even in the case of a single server and multiple clients. If they all have certificates validated by a single CA (which would be the case for us I suppose), nothing needs to change on the implementation of the server. In the case that we want to have a different CA to validate each client, the way to do this is to combine the CA certs into one file (relevant issue: https://github.com/grpc/grpc/issues/17743). Asking chatGPT gives the following example: ```c++ // Paths to multiple CA certificates (e.g., CA1.crt, CA2.crt, etc.) const std::string ca_cert1 = "ca1.crt"; // First CA certificate const std::string ca_cert2 = "ca2.crt"; // Second CA certificate // Combine the CA certificates into one string (separate with newline or similar) std::string combined_ca_certs = ca_cert1 + "\n" + ca_cert2; grpc::SslServerCredentialsOptions::PemKeyCertPair key_cert_pair = {server_key, server_cert}; grpc::SslServerCredentialsOptions ssl_opts; ssl_opts.pem_key_cert_pairs.push_back(key_cert_pair); ssl_opts.pem_root_certs = combined_ca_certs; // Multiple CAs to trust // Create Server Credentials std::shared_ptr<grpc::ServerCredentials> creds = grpc::SslServerCredentials(ssl_opts); ``` * token Auth: Talk to FTS about generating tokens **Answer** FTS does not generate tokens. They receive a JWT token from their clients, which is the client's responsibility to get from some authority. Then FTS will validate the token with offline validation done by the respective authority. * SSS: the token itself could contain the id of the disk instance. * Chat with Steve(FTS), Mihai, EOS team on token generation, add more details about token-based authentication * better not to rely on CERN-SSO for getting the tokens EOS implementations: look into common/OAuth.cc for the jwt token validation CTA can be the token generator. Just using the appropriate fields to fill in the JWT. ## Conversation with Mihai about FTS Tokens (Friday March 14) FTS does not in general have systems communicating with them, as we do, but instead users who want to transfer files. Although they might also receive requests from Rucio. They only have to deal with access tokens. FTS supports atlas community, CMS, LHCb and other tokens. When they receive a request, this comes with a token. The token is a JWT token, in its headers it contains the authority that issued it, so this authority is then used with offline validation, to validate the token. Online validation -> does FTS do that? No, they do not They do offline validation Once a day or once per month, the token provider provides some public keys, you download and cache them, the signature is verified using the public key. oidc library is used (oic in python), is the one that knows how to connect to the token provider and download and cache the keys The clients are: Rucio, or Konstantina herself as an atlas user. Mihai advises to use API token We’d need to use SSO or IAM Grafana can give us the token ## Conversation with Niels and Joao, March 17 General interface: have a list of trusted endpoints on CTA-frontend. The token will contain the endpoint against which to validate its signature. ### EOS Generate the token on EOS. We will need to implement this logic of course, but should not be too complicated. Then on the CTA-side the code just expects a token and an endpoint to validate it. This endpoint could be the path to the public key of EOS? ### Admin-commands non-interactive Get a token from an external authority. Pass the endpoint this authority exposes for validation in the token. Also add it to the list of trusted endpoints. ### Admin-commands interactive Keep using Kerberos. ## Final proposal - March 20 Main idea: Use (self-hosted) Keycloak to generate JWT tokens. Tokens will be used by EOS disk instances (physics workflow events) and by cta-admin commands run non-interactively. ### Setting up Keycloak #### Authentication flow First create a user ##### Token validation * Online validation: Keycloak provides the introspection endpoint. We will not be doing online validation because it requires a call to this endpoint every time. * Offline validation: This is what we will be doing. For offline validation, keycloak gives us its public key, which we can cache. This public key will be stored in some CTA configuration. ### Non-interactive admin commands Token authentication with JWT token. Issuing authority for the token will be Keycloak. We can set up our own Keycloak server for CTA. ### Workflow Events Two solutions here: mTLS and token authentication. - For mTLS everything is detailed above. - For token authentication: -- Issuer could also be EOS. We will need to add the logic on the EOS side for generating and signing the JWT. -- Validation of the token will be done with ### Interactive admin commands Use Kerberos following the PoC provided by Jacek.