Getting tons of script run failures

I’ll have more info (but not a fix) to contribute to this as soon as my Monday slows down.

Ok, here’s my update on the situation, and I’m glad (?) somebody else is getting this so we can get more eyes on it. I’m getting about a 25-50% failure rate, with the exact same error logged that jordanritz posted. Only on three servers in my account, could be any script that they run.

Troubleshooting done so far with no impact on the issue:

  • Disabled AV
  • Removed all AV (don’t forget Windows Defender!)
  • Verified latest updates on .NET
  • Installed all MS updates
  • Removed Syncro and deleted ProgramData\Syncro, reinstalled
  • Modified one of the scripts to write to a txt file as the first line so I could see if it’s executing at all. The txt file was not written, telling me that the script is not executing even the first line of code.
  • Installed PowerShell 7 (not idea how to enforce it’s usage over older version pre-installed)
  • Commented out the Syncro include at the beginning of the scripts
  • Enabled extra Powershell logging. Nothing useful is logged during a failure, only that the PS console is started and then no more log entries are made for that session.
  • Used Nirsoft Full Event Log Viewer to verify that there are no events in ANY (and I mean any, the utility just dumps all events from all providers onto one screen) of the dozens event logs that correspond with the timestamp of the script failures.
  • Syncro support has yet to suggest anything other than it’s Windows fault. It may be, I can’t say for sure at this point.
  • I copy/pasted one of the scripts in to a PS1 file on the desktop and ran a batch to fire it over and over again 200 times in a row. Of these it only failed to execute once. Not sure what to make of this test, I get about 25-50% failure rates from Syncro called scripts, so inconclusive IMO.
  • Google
  • I checked the two .net config files mentioned by Jimmie, the values were not there.

jordanritz, I had one machine that failed to run all scripts although not the same way as this issue. I only stumbled onto it because Syncro never reported any problem, some of theme even said Success! Anyway, I mention it because on that machine it was fixed by removing Syncro, deleting the ProgramData\Syncro folder and then reinstalling it. It even came back as the same asset which was handy.

One interesting factoid, these all started for me on the same day, 12/13/2021. No new apps or Windows Updates on that day for me.

I’m pretty well out of ideas. Servers are all Dells, Win 2019, common installed apps include OpenManage, Huntress, BitDefender, Syncrify backup client.

12/13 was around the time of CVE-2021-44228. We do web hosting and have been fighting Apache stalls and crashes ever since they released the patched version of httpd. Probably not related unless you did some hardening for that CVE, but I hadn’t seen anything that would affect PS scripts from running.

From what I read PS 7 is made to live alongside 5.1 and you call it with pwsh.exe. I think Syncro will be using 5.1 no matter what.

So I get the two registry keys, are for setting TLS to default to strong instead of defaulting to weak or however it was by default. What are the two security policies you talk about looking for in machine.conf?

Google can probably explain it better than I can :). Just found it when searching the error listed. Outside of the bug years ago, this error seems rare, so not much info out there. The TLS option forces it to use TLS 1.2. These 2 policies don’t exist by default, so if they are not there, then no worries, but if they are, then someone set them and may be causing a problem.

Hey Jordan, any action on your side? I have a support ticket open, which means they know about it and will never respond again.

Looks like it’s a Threatlocker related issue. the error only occurred on devices where I was trialing Threatlocker. Threatlocker is actively working on it, it took them a couple days to figure out how to reproduce the error on their side, now they are just trying to figure out why its happening and get a fix rolled out.

Mark are you using TL?

Yes, I am. Does it sound like support is going to roll the fix out to everone, or just your account?
Mine is in Learning mode on these nodes, seems like it shouldn’t interfere… Gonna uninstall and test.

Just remembered, I have some nodes with TL in Secured mode that are not having this error. I suspect your TL is just misconfigured, but we’ll see how mine behaves after I disable it.

So in our testing it ONLY affects server 2019, and it does not seem to matter whether the device is secured or just in learning mode. The repeatable failure we are working off of for testing is opening CMD and typing in PowerShell, then exit, over and over. Failure rate is somewhere around 3 out of 10 on the main system I’ve been testing on. The interactive failure rate seems much more consistent than the syncro script run failure rate. I have some servers failing Syncro scripts almost 100% and others very low like 20-30%. TL did find that limited Ram seemed to make the failure more common, and I am using Bitdefender Gravity zone as well on these systems. At this time I do not know if TL has determined whether or not Bitdefender is required on the system to replicate this failure case.

At this time, I have not gotten the impression that they believe the issue is widespread however I am expecting they will release a new agent version that resolves the issue so for anyone else having this issue the solution will be upgrading to the latest agent version.

So far I have done a number of testing sessions with them, where they have given me special builds of the TL service with additional logging so they can try to figure out what is actually causing PS to error.

Well Jordan I’ll buy you a beer if this works out. Thanks for the updates.

And I have BD too, but removed it as a test a while ago and it didn’t make any difference on mine.

FYI I just got a pre-release for the Agent and can confirm it fixes the script run errors on my two test machines. I’ll try to post back once this goes live but my expectation is that it will come out as a 6.7 beta or something like that.

Welp, I can confirm that disabling TL does indeed stop these errors for me, even though TL was “only” running in Learning mode. Drinks are on me if you’re ever in Tucson Jordan.

For the record, I chatted with TL support today and they say that v6.8 (beta coming later today) will include this fix. I only intend to report back here if I have trouble with it. Take care everybody!

I wanted to wait until I tested it to post back but I can confirm 6.8 beta resolves all issues I knew of.

If you update version on specific endpoints the version will be locked untill you set then back to inherit, but if you move the version ahead on a computer group it will stay on the new version untill the upgrade policy dictates moving to the next version.