Synchronizing PowerShell scripts – control the flow

Intro

In the last article I described how to ensure that only a single instance of your script runs at a time. Today is time for the second topic in Synchronizing PowerShell scripts – control the flow. I’ll describe more advanced usage of Mutex. I will also show some more powerful synchronization technics with Semaphore.

Concurrency issue

Multithreading is really powerful. It’s actually a must in many high-performance scenarios. But it can cause a headache when it comes to concurrency. One simple way to avoid it is to protect resources with a synchronization object.

Let’s create a simple script with a concurrency issue. It’s just trying to obtain an exclusive write access to a file, write few letters and close the access. In case multiple threads try to set the write lock on the file simultaneously, it obviously fails.

The script is simplified on purpose. If you measure the execution with and without multithreading it will turn out that additional threads extend the overall execution time. In real life it would be more complex – like getting processes from many remote computers – then multithreading would decrease the execution time. Anyway, for the purpose of this article, let’s keep it as simple as possible to understand the concept.

0..5 | Foreach-Object -ThrottleLimit 2 -Parallel  {
    $thread = $([System.Threading.Thread]::CurrentThread.ManagedThreadId)
    (Get-Date).ToString("HH:mm:ss:ffff") + " - $thread writing..."
    "ABCD" | Out-File c:\temp\test.txt
    (Get-Date).ToString("HH:mm:ss:ffff") + " - $thread writing completed"
}

It will not fail in all cases because concurrency issues are intermittent from their nature. You can try to increase the number of iterations or written text size (the bigger it is, the longer it takes to save thus the write lock is kept longer).

On a second run I got what I expected, thread 74 raised an access violation error. By the way, the number is not the real OS’s thread ID, I use Managed Thread ID which is a .NET ‘virtual’ thread id. It’s much easier to get it and it’s enough for this example.

Synchronizing threads

One of possible ways to fix the issue is a Mutex. Solution is simple – the thread which wants to apply the write lock on the file needs to be the owner of the Mutex, otherwise it waits for it. This time you synchronize threads within the same process. It means that you can skip adding a name to the Mutex.

Of course you can use the named Mutex described here, to do the same between processes (2 or more scripts).

$mutex = [System.Threading.Mutex]::New($false)

0..2000 | Foreach-Object -ThrottleLimit 10 -Parallel  {
    ($using:mutex).WaitOne(15000) | Out-Null
    "ABCD" | Out-File c:\temp\test.txt
    ($using:mutex).ReleaseMutex()
}

$mutex.Dispose()

New Mutex is created with one parameter: $false. It means that the thread creating it won’t own it – it’s created as released from the beginning. Otherwise, the first new thread will never start its operations as the Mutex would be kept owned. I also increased the iterations and throttle limit significantly to make the access violation error almost unavoidable.

You may also wonder why I used ($using:mutex) inside the Foreach-Object. This is a standard way to access a variable from main thread inside the Foreach Parallel (see here).

That’s one possiblity for Synchronizing PowerShell scripts – control the flow. Read below to find the second one.

A wise cmdlet

Let’s imagine another scenario. You are an author of PowerShell module with a resource’s demanding function. It can be an Input/Output mechanism, memory etc. From the performance perspective you might know that it shouldn’t be called from external sources (scripts, apps, interactively) more than X times simultaneously. Otherwise, the added cost of operating above resource limit (for example using page file or switching the I/O context too often) is so high that makes the function unresponsive. Is there a way to limit the function not only to a single instance but for example to 3 instances? Yes, there is! It’s called Semaphore.

Semaphore?

In simple words Semaphore is anther synchronization mechanism built in Windows. To be more precise a semaphore is just a kernel object like Mutex. However it’s a little more complex and therefore allows to limit the threads (or processes) that can use a resource or “pool of resources”. Look here for Microsoft’s documentation.

For our purpose it’s enough to know that a Semaphore is like a Mutex. It’s created and released in almost identical way. The difference is in the counter. It belongs to the Semaphore object and allows to let in a specified number of threads, for example 3 at a time.

Let’s create an issue…

Let’s create an example function that is intended to be called externally. It’s very simple – just load into memory a big text file using a non-ideal method: Get-Content. Non-ideal is on purpose. This cmdlet wasn’t really optimized to be used against big text files. In this example I created a 9 MB file. It’s big enough to see the difference and small enough to do it quickly. Here is the function (I put it in a ResourceEater.ps1 file):

function Eat-SomeResurces{
    $p = Get-Content C:\temp\testResources.txt
    $p = $null
    [System.GC]::Collect()
}

Assigning null to the variable and then launching Garbage Collection on demand is done to make the effect more visible and easier to catch. It’s optional – you can skip it.

When you run this function, especially when you do it in parallel, the memory utilization and disk’s I/O are stressed. Depending on the hardware, you might need to increase the file size. On my PC it’s just enough – remember that Get-Content is really ineffective. It consumes much more memory than you can expect from the file size.

I measured execution of this function 10 times in parallel. It simulates calling the function from 10 different processes: other scripts, applications, interactive sessions etc. Entire operation took around 4 minutes:

Measure-Command { 
        0..10 | ForEach-Object -ThrottleLimit 10 -Parallel {       
                . .\ResourcesEater.ps1         
                Eat-SomeResurces                
            } 
}

… and solve it

Now it’s the right time for the Semaphore. I enhanced the function with it, leaving the core operation exactly the same. I also added some output so you can observe what’s going on.

A Semaphore object is created like a Mutex. What’s new is the creation of the semaphore. I used:

$semaphore = [System.Threading.Semaphore]::New(3, 3, "MySemaphore")

The first parameter is 3 which means there are 3 threads that can be let in (the resource is not used yet). It’s a little counter-intuitive because it counts descending – once a thread is let in this counter is decreased by 1. If it reaches 0 other threads are put int the queue until it’s again at least 1.

The second parameter is 3 again, which means that maximum 3 threads can be let in simultaneously.

The last parameter is a name so the Semaphore is accessible from all threads (so processes as well) in the system.

A ready function is here:

function Eat-SomeResurcesWithSemaphore{
    $thread = $([System.Threading.Thread]::CurrentThread.ManagedThreadId)
    $semaphore = $null

    if (!([System.Threading.Semaphore]::TryOpenExisting("MySemaphore",[ref]$semaphore)))
    {
        Write-Host (Get-Date).ToString("HH:mm:ss:ffff") " - Tread $thread creating semaphore..."
        $semaphore = [System.Threading.Semaphore]::New(3, 3, "MySemaphore")
    }

    Write-Host (Get-Date).ToString("HH:mm:ss:ffff") " - Tread $thread Checking sempahore..."
    $semaphore.WaitOne(300000)

    Write-Host (Get-Date).ToString("HH:mm:ss:ffff") " - Tread $thread Processing..."
    $p = Get-Content C:\temp\testResources.txt
    $p = $null
    [System.GC]::Collect()

    Write-Host (Get-Date).ToString("HH:mm:ss:ffff") " - Tread $thread Processing completed"
    $semaphore.Release()
}

It’s time to measure the execution with the same number of parallel launches:

Measure-Command { 
    0..10 | ForEach-Object -ThrottleLimit 10 -Parallel {       
            . .\ResourcesEater.ps1         
            Eat-SomeResurcesWithSemaphore                
        } 
}

And voila! This time it takes only 1min and almost 18 seconds. It’s a significant difference, taking into account that you just process only a 9MB file.

In the output you can clearly see the flow. Firstly, all threads are started but only 3 are allowed to process the file. As expected, other threads wait in the queue and are let in one by one after one of the running threads completed the processing:

The end

That’s it. I hope this knowledge will help you in Synchronizing PowerShell scripts – control the flow of your functions, synchronizing not only your script but also external apps and scripts using your functions. The next article in this series will be about synchronization between scripts based on communication – an exchange of data between threads.

Wiktor Mrówczyński

1 thought on “Synchronizing PowerShell scripts – control the flow”

  1. Pingback: Synchronizing PowerShell scripts - allow only one instance at a time - IT Constructors

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top