Read: 4 mins.

Compare-Object

There often is a need to compare two sets of data when scripting, and PowerShell’s Compare-Object would be used for that purpose. The resulting output would show which values exist in which set of data, marked by a “SideIndicator” property that can be confusing for some to interpret.

In this article, I will go over a function I wrote last year to compare and manipulate data while passing through all remaining Properties. To illustrate, let us create two sets of arrays containing a list of people’s names and their ages.

Creation of two arrays:

$Data1 = @(
[PSCustomObject][Ordered]@{Name="Michael Yuen";First="Michael";Last="Yuen";Age="30"}
[PSCustomObject]@{Name="John Doe";First="John";Last="Doe";Age="41"}
[PSCustomObject]@{Name="Jim Cork";First="Jim";Last="Cork";Age="16"}
[PSCustomObject]@{Name="Lisa Simmons";First="Lisa";Last="Simmons";Age="8"}
[PSCustomObject]@{Name="Matt Johnson";First="Matt";Last="Johnson";Age="67"}
[PSCustomObject]@{Name="Anne Curry";First="Anne";Last="Curry";Age="45"}
)
$Data2 = @(
[PSCustomObject][Ordered]@{Name="Harry Brown";First="Harry";Last="Brown";Age="21"}
[PSCustomObject]@{Name="Anne Curry";First="Anne";Last="Curry";Age="45"}
[PSCustomObject]@{Name="Jim Powell";First="Jim";Last="Powell";Age="18"}
[PSCustomObject]@{Name="Jim Cork";First="Jim";Last="Cork";Age="16"}
[PSCustomObject]@{Name="Wen Ping";First="Wen";Last="Ping";Age="19"}
[PSCustomObject]@{Name="Jane Doe";First="Jane";Last="Doe";Age="101"}
)

The arrays contain:

Name          First   Last    Age
----          -----   ----    ---
Michael Yuen  Michael Yuen    30
John Doe      John    Doe     41
Jim Cork      Jim     Cork    16
Lisa Simmons  Lisa    Simmons 8
Matt Johnson  Matt    Johnson 67
Anne Curry    Anne    Curry   45
Name        First Last   Age
----        ----- ----   ---
Harry Brown Harry Brown  21
Anne Curry  Anne  Curry  45
Jim Powell  Jim   Powell 18
Jim Cork    Jim   Cork   16
Wen Ping    Wen   Ping   19
Jane Doe    Jane  Doe    101

SideIndicator

Note that only “Anne Curry” and “Jim Cork” exist in both arrays, as also shown by the below command. The data is compared by the “Name” property:

Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual

SideIndicator values mean:

  • == Exists in both Reference and Difference
  • => Exists in Difference only
  • <= Exists in Reference only

What do “Reference” and “Difference” refer to? Looking at the command used, you will see that “ReferenceObject” is $Data1, and “DifferenceObject” is $Data2. In other words:

  • “Harry Brown” can only be found in $Data2 (Difference) (SideIndicator of “=>”)
  • “John Doe” in $Data1 (Reference) (SideIndicator of “<=”)
  • “Anne Curry” in both $Data1 and $Data2 (Reference and Difference) (SideIndicator of “==”)

-PassThru

Have you noticed that only “Name” and “SideIndicator” are output? What about each person’s First and Last Names and Age? Is there a way to include that data? Yes, by adding the -PassThru switch:

Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru

Manipulating Data

Let’s take this a step further by marking anyone over 18 years of age as an Adult:

Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru |
Select-Object Name, Age, @{n="Adult";e={If ($_.Age -ge 18) { Write-Output "Adult"}}}

What if you wanted to pass through the remaining Properties, such as “First” and “Last”? Add the “*” wildcard:

Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru |
Select-Object *, @{n="Adult";e={If ($_.Age -ge 18) { Write-Output "Adult"}}}

Output Only Matching Data

Below, we will only grab those names that exist in both $Data1 and $Data2 by piping the output to “Where-Object”:

Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru |
Select-Object *, @{n="Adult";e={If ($_.Age -ge 18) { Write-Output "Adult"}}} |
Where-Object {$_.SideIndicator -like "=="}

Summary

We have covered how to:

  • Compare two data sets for differences
  • Include all source properties in the output
  • Add a custom “Adult” property by evaluating each person’s age
  • Only grab the values that exist in both data sets by selecting any whose SideIndicator property is “==”

PowerShell: Compare-mObject

Here is where I turned everything into a function that outputs the below result:

function Compare-mObject {
<#
.SYNOPSIS
Compare differences between two sets of data (Reference vs Difference) based on specified Property
.DESCRIPTION
- Indicators within the "Exists" property
- Data that ONLY exists in Reference: "<="
- Data that ONLY exists in Difference: "=>"
- Data that exists in BOTH Reference and Difference: "=="
- Some data may contain non-ASCII characters, such as Umlauts (https://stackoverflow.com/questions/48947151/import-csv-export-csv-with-german-umlauts-ä-ö-ü)
Use "-Encoding UTF8" with Import-CSV and Export-CSV to handle UTF-8 (non-ASCII) characters
.NOTES
Script Created: 8/14/2019 Michael Yuen (www.yuenx.com)
Change History
- 8/20/2020 Michael Yuen: Turned original function into a standalone cmdlet
.EXAMPLE
Compare-mObjects -Reference "$REFERENCE_DATA" -Difference "$DIFFERENCE_DATA" -Property "samAccountName"
Compare object differences between Reference and Difference using the "samAccountName" property as reference
.EXAMPLE
$Ref = Get-ADGroupMember "Group1"; $Diff = Get-ADGroupMember "Group2"; Compare-mObjects -Reference $Ref -Difference $Diff -Property "samAccountName"
Compare AD group membership differences between "Group1" and "Group2" using the "samAccountName" property as reference
#>
[CmdletBinding()]
param (
# REFERENCE data
[Parameter(Position=0,Mandatory=$true,ValueFromPipeline=$false)]$Reference,
# DIFFERENCE data
[Parameter(Position=1,Mandatory=$true,ValueFromPipeline=$false)]$Difference,
# What Property to compare on (ie. distinguishedName)
[Parameter(Position=2,Mandatory=$true,ValueFromPipeline=$false)]$Property
)
<# Note: Under SideIndicator column: == item exists in both ReferenceObject and DifferenceObject
<= item exists only in ReferenceObjet
=> item exists only in DifferenceObject
Compare-Object: use -IncludeEqual parameter to include any values that exist in both files
use -ExcludeDifferent parameter to exclude any values that don't exist in both files
-SyncWindow specifies how far around to look to find the same element.
Default: 5 (looks for +/- 5 elements around), which is good for up to 11 elements
Using -SyncWindow 100 would be good for up to 201 elements
#>
$Result = Compare-Object -ReferenceObject $Reference -DifferenceObject $Difference -SyncWindow 5000 -IncludeEqual `
-Property $Property -PassThru | Sort-Object SideIndicator
# Modify SideIndicator to be readable and include all supplied properties
$ResultExpanded = $Result | Select-Object `
@{n="Exists";e={If ($_.SideIndicator -like "<=") { Write-Output "<= In REFERENCE only" } `
elseif ($_.SideIndicator -like "=>") { Write-Output "=> In DIFFERENCE only" } `
elseif ($_.SideIndicator -like "==") { Write-Output "== In BOTH (Reference & Difference)" }
else { Write-Output "N/A"} }
}, * -ExcludeProperty PropertyNames,AddedProperties,RemovedProperties,ModifiedProperties,PropertyCount
Return $ResultExpanded
}

When used in conjunction with other functions and cmdlets, very powerful tools can be used that log changes, act only on specific sets of data, speed up processing time, and can reduce complexity. How do YOU use the Compare-Object cmdlet to solve problems you have encountered?

Credits:
– Featured Image by Sharon McCutcheon via Unsplash