Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDS client doesn't work on kubernetes #93

Open
onsvejda opened this issue Feb 8, 2021 · 12 comments
Open

CDS client doesn't work on kubernetes #93

onsvejda opened this issue Feb 8, 2021 · 12 comments

Comments

@onsvejda
Copy link

onsvejda commented Feb 8, 2021

We tried using some two different approaches:

  1. Having "static" CdsServiceClient with cloning (Test1Controller in attached sample) - i.e.
    static CdsServiceClient client = ...
    using(var clone = client.Clone()) { ... cds business ... }
  2. Allocate client on the fly (Test2Controller in attached sample) - i.e.
    using(var client = new CdsServiceClient(...))

Both attempts choked up the cluster pretty quickly (load test = hitting cluster with 100 threads per second), after few seconds it was choked up.

  1. Only way how we're able to overcome - to a degree - the issue was to create a pool of .Clone() clients and don't dispose them at all - which has its drawbacks as it creates management burden (handling of cases like API token is expired / connection in faulted state / etc.) and you have to dispose the client at some point in time eventually because of those

There is somehow similar issue described here - dotnet/wcf#3344
However in our case it doesn't look like thread congestion (the measured numbers were not that high - more like a leakage of some sort). It does not reproduce on windows.

  1. The issue also isn't repro when going purely via native http client / calling to ODATA.

Sample with full repro here:
src.zip

@MattB-msft
Copy link
Member

Thanks,
We will take a look at this..
We are in process in adding Async support to this lib currently, which may help in this situation.

@BetimBeja
Copy link

Doing some performance tests with the latest version of the released package v0.4.4 on an azure function v3 I get the following inconsistency on the simple WhoAmIRequest
image
image

Implemented code is the following:

/* ServiceClientSingleton.cs */
using Microsoft.PowerPlatform.Dataverse.Client;

namespace AlbanianXrm.Functions
{
    public class ServiceClientSingleton
    {
        public ServiceClientSingleton(string connectionString)
        {
            ServiceClient = new ServiceClient(connectionString);
        }
        
        public ServiceClient ServiceClient { get; private set; }
    }
}
/* Startup.cs */
using AlbanianXrm.Functions;
using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection;
using System;

[assembly: FunctionsStartup(typeof(Startup))]
namespace AlbanianXrm.Functions
{
    public class Startup : FunctionsStartup
    {
        public override void Configure(IFunctionsHostBuilder builder)
        {
            builder.Services.AddSingleton((s) =>
            {
                return new ServiceClientSingleton(Environment.GetEnvironmentVariable("ConnectionString"));
            });

            builder.Services.AddScoped(sp =>
            {
                return sp.GetService<ServiceClientSingleton>().ServiceClient.Clone();
            });
        }
    }
}
/* WhoAmI.cs */
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Microsoft.PowerPlatform.Dataverse.Client;
using Microsoft.Crm.Sdk.Messages;
using System;
using System.Diagnostics;

namespace AlbanianXrm.Functions
{
    public class WhoAmI
    {
        private readonly ServiceClient _ServiceClient;

        public WhoAmI(ServiceClient serviceClient)
        {
            _ServiceClient = serviceClient;
        }

        [FunctionName("WhoAmI")]
        public async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
            ILogger log)
        {
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            log.LogInformation("Starting function {0} ticks", stopwatch.ElapsedTicks);
            try
            {
                var responseMessage = (await _ServiceClient.ExecuteAsync(new WhoAmIRequest())) as WhoAmIResponse;
                log.LogInformation("Response from Dataverse in {0} ticks", stopwatch.ElapsedTicks);
                return new OkObjectResult("Your application user id in Microsoft Dataverse is: " + responseMessage.UserId);
            }
            catch(Exception ex)
            {
                log.LogError(ex.Message);
                return new BadRequestObjectResult(ex);
            }        
        }
    }
}

I did a stress test using Apache JMeter with the following results:
image

@MattB-msft
Copy link
Member

@BetimBeja all the Execute commands run though our longstanding API interface which the CdsServiceClient uses. Can you tell me the instance/orgid that your connecting to so we can look at the other end of this to see where these requests went? And or can you provide the verbose logs for there requests? you can get them out of the in memory logger in the client and write them to a file that you can get at after the run if you need to .

Thanks.

@BetimBeja
Copy link

Environment ID: bb8ace81-e434-4238-b254-5bda52e9c5b6
image
I will try to update the logs collection this weekend and post them here. It is a trial environment used for learning purposes 😄

@MattB-msft
Copy link
Member

Sorry about the delay getting back to you here.. there was a bit of a long discussion around this.
We do now understand the issue that is causing the inconsistent performance in the API and the team is looking at how to address this in the longer term. It is not a short term fix.

tagging @JimDaly here.

@BetimBeja
Copy link

Thank you @MattB-msft, I am sorry I wasn't able to provide any more logs, I have been busy lately and have had very little time for this.

@MattB-msft
Copy link
Member

@BetimBeja as a heads up... we discovered and fixed number of issues that could have been impacting this in the ServiceClient, and a rather big issue with the way MSAL deals with cache locking we are working though actively.

However many of the updates we have recently provided may substantially improve the perf on Kubernetes,
if you get the chance, can you retest your scenario?

@BetimBeja
Copy link

@MattB-msft I will try to replicate the test this weekend! I will email you the details of the test!

@MattB-msft
Copy link
Member

@BetimBeja were you successfull?

@BetimBeja
Copy link

@MattB-msft I sent the email with subject "Stress-Test Azure Function v3 DV ServiceClient" to [email protected] as in your github profile, I will try to test again since a lot of time has passed.

@MattB-msft
Copy link
Member

Ah, sorry about that. Will check to see if I still have it .

Thanks,
MattB

@BetimBeja
Copy link

Just sent an update 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants