Contents
Compare-Object
There often is a need to compare two sets of data when scripting, and PowerShell’s Compare-Object would be used for that purpose. The resulting output would show which values exist in which set of data, marked by a “SideIndicator” property that can be confusing for some to interpret.
In this article, I will go over a function I wrote last year to compare and manipulate data while passing through all remaining Properties. To illustrate, let us create two sets of arrays containing a list of people’s names and their ages.
Creation of two arrays:
$Data1 = @( [PSCustomObject][Ordered]@{Name="Michael Yuen";First="Michael";Last="Yuen";Age="30"} [PSCustomObject]@{Name="John Doe";First="John";Last="Doe";Age="41"} [PSCustomObject]@{Name="Jim Cork";First="Jim";Last="Cork";Age="16"} [PSCustomObject]@{Name="Lisa Simmons";First="Lisa";Last="Simmons";Age="8"} [PSCustomObject]@{Name="Matt Johnson";First="Matt";Last="Johnson";Age="67"} [PSCustomObject]@{Name="Anne Curry";First="Anne";Last="Curry";Age="45"} )
$Data2 = @( [PSCustomObject][Ordered]@{Name="Harry Brown";First="Harry";Last="Brown";Age="21"} [PSCustomObject]@{Name="Anne Curry";First="Anne";Last="Curry";Age="45"} [PSCustomObject]@{Name="Jim Powell";First="Jim";Last="Powell";Age="18"} [PSCustomObject]@{Name="Jim Cork";First="Jim";Last="Cork";Age="16"} [PSCustomObject]@{Name="Wen Ping";First="Wen";Last="Ping";Age="19"} [PSCustomObject]@{Name="Jane Doe";First="Jane";Last="Doe";Age="101"} )
The arrays contain:
Name First Last Age ---- ----- ---- --- Michael Yuen Michael Yuen 30 John Doe John Doe 41 Jim Cork Jim Cork 16 Lisa Simmons Lisa Simmons 8 Matt Johnson Matt Johnson 67 Anne Curry Anne Curry 45
Name First Last Age ---- ----- ---- --- Harry Brown Harry Brown 21 Anne Curry Anne Curry 45 Jim Powell Jim Powell 18 Jim Cork Jim Cork 16 Wen Ping Wen Ping 19 Jane Doe Jane Doe 101
SideIndicator
Note that only “Anne Curry” and “Jim Cork” exist in both arrays, as also shown by the below command. The data is compared by the “Name” property:
Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual
SideIndicator values mean:
- == Exists in both Reference and Difference
- => Exists in Difference only
- <= Exists in Reference only
What do “Reference” and “Difference” refer to? Looking at the command used, you will see that “ReferenceObject” is $Data1, and “DifferenceObject” is $Data2. In other words:
- “Harry Brown” can only be found in $Data2 (Difference) (SideIndicator of “=>”)
- “John Doe” in $Data1 (Reference) (SideIndicator of “<=”)
- “Anne Curry” in both $Data1 and $Data2 (Reference and Difference) (SideIndicator of “==”)
-PassThru
Have you noticed that only “Name” and “SideIndicator” are output? What about each person’s First and Last Names and Age? Is there a way to include that data? Yes, by adding the -PassThru switch:
Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru
Manipulating Data
Let’s take this a step further by marking anyone over 18 years of age as an Adult:
Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru | Select-Object Name, Age, @{n="Adult";e={If ($_.Age -ge 18) { Write-Output "Adult"}}}
What if you wanted to pass through the remaining Properties, such as “First” and “Last”? Add the “*” wildcard:
Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru | Select-Object *, @{n="Adult";e={If ($_.Age -ge 18) { Write-Output "Adult"}}}
Output Only Matching Data
Below, we will only grab those names that exist in both $Data1 and $Data2 by piping the output to “Where-Object”:
Compare-Object -ReferenceObject $Data1 -DifferenceObject $Data2 -Property Name -IncludeEqual -PassThru | Select-Object *, @{n="Adult";e={If ($_.Age -ge 18) { Write-Output "Adult"}}} | Where-Object {$_.SideIndicator -like "=="}
Summary
We have covered how to:
- Compare two data sets for differences
- Include all source properties in the output
- Add a custom “Adult” property by evaluating each person’s age
- Only grab the values that exist in both data sets by selecting any whose SideIndicator property is “==”
PowerShell: Compare-mObject
Here is where I turned everything into a function that outputs the below result:
function Compare-mObject { <# .SYNOPSIS Compare differences between two sets of data (Reference vs Difference) based on specified Property .DESCRIPTION - Indicators within the "Exists" property - Data that ONLY exists in Reference: "<=" - Data that ONLY exists in Difference: "=>" - Data that exists in BOTH Reference and Difference: "==" - Some data may contain non-ASCII characters, such as Umlauts (https://stackoverflow.com/questions/48947151/import-csv-export-csv-with-german-umlauts-ä-ö-ü) Use "-Encoding UTF8" with Import-CSV and Export-CSV to handle UTF-8 (non-ASCII) characters .NOTES Script Created: 8/14/2019 Michael Yuen (www.yuenx.com) Change History - 8/20/2020 Michael Yuen: Turned original function into a standalone cmdlet .EXAMPLE Compare-mObjects -Reference "$REFERENCE_DATA" -Difference "$DIFFERENCE_DATA" -Property "samAccountName" Compare object differences between Reference and Difference using the "samAccountName" property as reference .EXAMPLE $Ref = Get-ADGroupMember "Group1"; $Diff = Get-ADGroupMember "Group2"; Compare-mObjects -Reference $Ref -Difference $Diff -Property "samAccountName" Compare AD group membership differences between "Group1" and "Group2" using the "samAccountName" property as reference #> [CmdletBinding()] param ( # REFERENCE data [Parameter(Position=0,Mandatory=$true,ValueFromPipeline=$false)]$Reference, # DIFFERENCE data [Parameter(Position=1,Mandatory=$true,ValueFromPipeline=$false)]$Difference, # What Property to compare on (ie. distinguishedName) [Parameter(Position=2,Mandatory=$true,ValueFromPipeline=$false)]$Property ) <# Note: Under SideIndicator column: == item exists in both ReferenceObject and DifferenceObject <= item exists only in ReferenceObjet => item exists only in DifferenceObject Compare-Object: use -IncludeEqual parameter to include any values that exist in both files use -ExcludeDifferent parameter to exclude any values that don't exist in both files -SyncWindow specifies how far around to look to find the same element. Default: 5 (looks for +/- 5 elements around), which is good for up to 11 elements Using -SyncWindow 100 would be good for up to 201 elements #> $Result = Compare-Object -ReferenceObject $Reference -DifferenceObject $Difference -SyncWindow 5000 -IncludeEqual ` -Property $Property -PassThru | Sort-Object SideIndicator # Modify SideIndicator to be readable and include all supplied properties $ResultExpanded = $Result | Select-Object ` @{n="Exists";e={If ($_.SideIndicator -like "<=") { Write-Output "<= In REFERENCE only" } ` elseif ($_.SideIndicator -like "=>") { Write-Output "=> In DIFFERENCE only" } ` elseif ($_.SideIndicator -like "==") { Write-Output "== In BOTH (Reference & Difference)" } else { Write-Output "N/A"} } }, * -ExcludeProperty PropertyNames,AddedProperties,RemovedProperties,ModifiedProperties,PropertyCount Return $ResultExpanded }
When used in conjunction with other functions and cmdlets, very powerful tools can be used that log changes, act only on specific sets of data, speed up processing time, and can reduce complexity. How do YOU use the Compare-Object cmdlet to solve problems you have encountered?
Credits:
– Featured Image by Sharon McCutcheon via Unsplash