split – What constitutes a line for Select-String method in Powershell?

split – What constitutes a line for Select-String method in Powershell?

  • Select-String operates on each (stringified on demand[1]) input object.

  • A multi-line string such as abc`r`ndef is a single input object.

    • By contrast, abc, def is a string array with two elements, passed as two input objects.
  • To ensure that the lines of a multi-line string are passed individually, split the string into an array of lines using PowerShells -split operator: abc`r`ndef -split `r?`n

    • (The ? makes the `r optional so as to also correctly deal with `n-only (LF-only, Unix-style) line endings.)

In short:

abc`r`ndef -split `r?`n | Select-String -Pattern abc

The equivalent, using a PowerShell string literal with regular-expression (regex) escape sequences (the RHS of -split is a regex):

abc`r`ndef -split r?n | Select-String -Pattern abc

It is somewhat unfortunate that the Select-String documentation talks about operating on lines of text, given that the real units of operations are input objects – which may themselves comprise multiple lines, as weve seen.
Presumably, this comes from the typical use case of providing input objects via the Get-Content cmdlet, which outputs a text files lines one by one.

Note that Select-String doesnt return the matching strings directly, but wraps them in [Microsoft.PowerShell.Commands.MatchInfo] objects containing helpful metadata about the match.
Even there the line metaphor is present, however, as it is the .Line property that contains the matching string.


[1] Optional reading: How Select-String stringifies input objects

If an input object isnt a string already, it is converted to one, though possibly not in the way you might expect:

Loosely speaking, the .ToString() method is called on each non-string input object[2]
, which for non-strings is not the same as the representation you get with PowerShells default output formatting (the latter is what you see when you print an object to the console or use Out-File, for instance); by contrast, it is the same representation you get with string interpolation in a double-quoted string (when you embed a variable reference or command in ..., e.g., $HOME or $(Get-Date)).

Often, .ToString() just yields the name of the objects type, without containing any instance-specific information; e.g., $PSVersionTable stringifies to System.Management.Automation.PSVersionHashTable.

# Matches NOTHING, because Select-String sees
# System.Management.Automation.PSVersionHashTable as its input.
$PSVersionTable | Select-String PSVersion 

In case you do want to search the default output format line by line, use the following idiom:

... | Out-String -Stream | Select-String ...

However, note that for non-string input it is more robust and preferable for subsequent processing to filter the input by querying properties with a Where-Object condition.

That said, there is a strong case to be made for Select-String needing to implicitly apply Out-String -Stream stringification, as discussed in this GitHub feature request.


[2] More accurately, .psobject.ToString() is called, either as-is, or – if the objects ToString method supports an IFormatProvider-typed argument – as .psobject.ToString([cultureinfo]::InvariantCulture) so as to obtain a culture-invariant representation – see this answer for more information.

abc`r`ndef

is one string which if you echo (Write-Output) out in console would result in:

PS C:Usersgpunktschmitz> echo abc`r`ndef
abc
def

The Select-String will echo out every string where abc is part of it. As abc is part the string this very string will be selected.

abc, def

is a list of two strings. Using the Select-String here will first test abc and then def if the pattern matches abc. As only the first one matches only it will be selected.

Use the following to split the string into a list and select only the elements containing abc

abc`r`ndef.Split(`r`n) | Select-String -Pattern abc

split – What constitutes a line for Select-String method in Powershell?

Basically Mr. Guenther Schmitz explained the correct usage of Select-String, but I want to just add some points to support his answer.

  1. I did some reverse engineering work against this Select-String cmdlet. Its in the Microsoft.PowerShell.Utility.dll. Some relevant code snippets are as follows, notice these are codes from reverse engineering for reference, not the actual source code.

    string text = inputObject.BaseObject as string;
    ...
    matchInfo = (inputObject.BaseObject as MatchInfo);
    object operand = ((object)matchInfo) ?? ((object)inputObject);
    flag2 = doMatch(operand, out matchInfo2, out text);
    

    We can find out that it just treat the inputObject as a whole string, it doesnt do any split.

  2. I dont find the actual source code of this cmdlet on github, probably this utility part is not open source yet. But I find the unit test of this Select-String.

    $testinputone = hello,Hello,goodbye
    $testinputtwo = hello,Hello
    

    The test strings they are using for unit test are actually lists of strings. It means that they were not even thinking about your use case and very possibly its just designed to accept input of string collection.

  3. However if we look at the official document of Microsoft regarding Select-String we do see it talks about line a lot while it cant recognize a line in a string. My personal guess is the concept of line is only meaningful while the cmdlet accept a file as an input, in the case the file is like a list of string, each item in the list represents a single line.

Hope it can make things more clear.

Leave a Reply

Your email address will not be published. Required fields are marked *