Quantcast
Viewing latest article 1
Browse Latest Browse All 2

I Can Analyze Code, And So Can You

[12 December 2014 – Update 1] Upon setting up a new VM I realized that I missed a prerequisite when writing this post. The Visual Studio 2015 Preview SDK is also required. This extension includes the VSIX project subtype required by the Diagnostic and Code Fix template used for the example. I’ve included a note about installing the SDK in the prerequisites section below.

[12 December 2014 – Update 2] The github project has moved under the .NET Analyzers organization. This organization is collecting diagnostics, code fixes, and refactorings like to showcase the capabilities of this technology. The project link has been updated accordingly.

Over the past several years, Microsoft has been hard at work on the .NET Compiler Platform (formerly Roslyn). Shipping with Visual Studio 2015 the .NET Compiler Platform includes a complete rewrite of the C# and Visual Basic compilers intended to bring features that developers have come to expect from modern compilers. In addition to the shiny new compilers the .NET Compiler Platform introduces a rich set of APIs that we can harness to build custom code diagnostic analyzers and code fix providers thus allowing us to detect and correct issues in our code.

When trying to decide on a useful demonstration for these features I thought of one of my coding pet peeves: using an if..else statement to conditionally set a variable or return a value. I prefer treating these scenarios as an expression via the conditional (ternary) operator and I often find myself refactoring these patterns in legacy code. It certainly would be nice to automate that process. It turns out that this is exactly the type of task at which diagnostic analyzers and code fix providers excel and we’ll walk through the process of creating such components in this post.

Prerequisites

Before we can start, you’ll need to make sure you have a few things installed:

Getting Started

Once the above items are installed fire up Visual Studio 2015 and create a new project. The .NET Compiler Platform SDK Templates installer adds several extensibility templates for both C# and Visual Basic. For the purposes of this exercise, the template we’re interested in is the C# Diagnostic and Code Fix (NuGet + VSIX). This template includes everything we need to not only create the diagnostic analyzer and code fix provider, but also to test and deploy the assemblies. I named the solution “UseConditionalOperatorAnalyzer” but you may select the name of your choise.

The newly created solution should include the three projects listed below. If you’d prefer to download the full source, you can find it on Github.

  • A Portable Class Library (PCL) that contains the code analyzer and code fix provider
  • A unit test project
  • A VSIX project that defines the Visual Studio extension

Once you’ve created the project (or cloned it from Github), turn your attention to the PCL project.

I won’t be addressing the unit test project in this article. The examples within the code are rather straightforward and should give you a good starting point without requiring any guidance. I do recommend playing with the test project a bit since it will greatly speed up detecting and resolving any problems you may encounter as you build your own analyzers and fix providers.

Writing the Diagnostic Analyzer

By default, the PCL project contains a file named DiagnosticAnalyzer.cs. This file includes a simple analyzer that detects type names containing lowercase letters. This example analyzer isn’t particularly useful but the code file itself provides a good starting point. Since our analyzer will contain only one rule; we’ll simply replace some of the implementations with our own code. Before we start hacking at the code, you may consider looking it over to get a feel for its structure.

The first change we’ll make is updating the constants defined at the top of the class to better reflect what our analyzer does. The analyzer we’re building will detect two if..else statement variations.

public const string DiagnosticId = "UseConditionalOperator";
internal const string Title = "Replace with conditional operator";
internal const string MessageFormat = "If statement can be replaced with a conditional operator.";
internal const string Category = "Syntax";

The next line defines the rule that will be presented to the user when the analyzer detects a match. The most interesting thing in this definition is the DiagnosticSeverity setting which identifies how the detected code will be highlighted in the IDE and the rule will be presented in the error list. By default this is set to Warning which means that the code will be highlighted with green squiggles and listed as a warning. If you wish, you may change this value to DiagnosticSeverity.Info to simply list an informational message and show the lightbulb icon when the cursor is over the corresponding code.

Immediately following the rule definition is the SupportedDiagnostics property. This property returns an ImmutableArray<DiagnosticDescriptor> identifying the rules exposed by this analyzer. We can leave this definition alone.

Next is the Initialize method which is the analyzer’s entry point. Here we register a delegate that will handle analyzing if statements. The template assumes that we want to analyze symbols that represent named types but we’re looking to perform syntax analysis so this needs to be updated as follows:

public override void Initialize(AnalysisContext context)
{
    context.RegisterSyntaxNodeAction(AnalyzeNode, SyntaxKind.IfStatement);
}

Here we’ve replaced the call to context.RegisterSymbolAction with a call to context.RegisterSyntaxNodeAction. Note that the SyntaxKind we’ve supplied to RegisterSyntaxNodeAction is IfStatement. This means that our delegate (AnalyzeNode) will be invoked only when the compiler encounters an if statement. We can now safely delete the AnalyzeSymbol method since we’ll be defining the AnalyzeNode method shortly.

With most of the plumbing complete we’re ready to move on to more interesting things, namely, writing the analysis code. Since we’re going to detect two variations of if statements (simple assignments and returns) we’ll begin by stubbing out methods for each variation as well as the AnalyzeNode method.

private static bool CanSimplifyAssignment(IfStatementSyntax ifStatement)
{
    throw new NotImplementedException();
}

private static bool CanSimplifyReturn(IfStatementSyntax ifStatement)
{
    throw new NotImplementedException();
}

private static void AnalyzeNode(SyntaxNodeAnalysisContext context)
{
}

We’ll also define an ImmutableArray<Analyzer> that we’ll use in the AnalyzeNode method to streamline the code a bit. I like to keep my definitions together so I put this with the constants we discussed above.

private static readonly ImmutableArray<Analyzer> Tests =
    ImmutableArray.Create<Analyzer>(CanSimplifyAssignment, CanSimplifyReturn);

Now we’re ready to define the analyzers and AnalyzeNode methods, but where do we start? For that, we can turn to the Syntax Visualizer.

Syntax Visualization

The Syntax Visualizer is a Visual Studio extension that displays the details of the current document’s syntax tree. It’s a great place to uncover the structure you’re trying to identify in your analyzer. (If you’ve installed the extension and the window is not visible, you can find it under View/Other Windows).

The visualizer’s top panel displays each syntax node in a hierarchical format while the bottom panel shows the selected node’s properties including it’s type and kind, both of which will be useful when implementing the analyzer methods. While the top panel is useful for understanding the overall structure of the document, the node properties will generally be more helpful for analysis.

One of the if statement variations we’d like to identify in this analyzer is one that conditionally returns a value. Consider the following method which exemplifies what we’re looking for within the code:

private string GetMessage()
{
    var now = DateTime.Now;
    if (now.DayOfWeek == DayOfWeek.Saturday || now.DayOfWeek == DayOfWeek.Sunday)
        return "Weekend";
    else
        return "Weekday";
}

Image may be NSFW.
Clik here to view.
If Statement Visualization
By far, the easiest way to use the visualizer is to simply click on a syntactic element. For example, to see the details of the if statement in the preceding snippet, simply click somewhere within the if statement. The visualizer should highlight the corresponding node and display the structure as pictured. (I’ve expanded/collapsed portions to give a better idea of the syntactic structure).

From this node alone we can discern that the if statement’s body consists of a return statement and an else clause that contains another return statement. If we inspect the properties of the if statement, we learn that the first return statement is exposed via the Statement property and the else clause is exposed via the Else property. This process is generally sufficient for discovering the structure of a syntactic element.

Now that we know how to use the visualizer, we can move on to implementing the analyzer methods.

Implementing the Analyzer Methods

To ensure that an analyzer doesn’t cause performance problems, you should try to rule out nodes quickly and bind to syntax only as needed. You can see this principle in action throughout the following samples as we return as soon as we detect that a node doesn’t meet our selection criteria.

The first method we’ll implement is AnalyzeNode. The implementation is quite simple but we need some information from the visualizer to get started.

The AnalyzeNode method accepts an instance of SyntaxNodeAnalysisContext which identifies the node that triggered analysis via the Node property. The Node property’s type, SyntaxNode, is a base class thus it does not expose the properties we discovered in the previous section (Statement and Else); we need to cast it to something more useful. To determine the node’s actual type, we can refer back to the node in the visualizer, inspect the node’s properties, and we’ll see that it’s an instance of IfStatementSyntax. Let’s put this information to use.

var ifStatement = (IfStatementSyntax)context.Node;

if (ifStatement.Else == null || !Tests.Any(p => p(ifStatement))) return;

var diag = Diagnostic.Create(Rule, ifStatement.GetLocation());
context.ReportDiagnostic(diag);

Because our analyzer is handling only if statements, we can safely cast without worrying about an invalid conversion. After casting the node we verify that the if statement has an else clause and that the if statement satisfies at least one of our analyzers (via the Any method). If so, we create and report a new Diagnostic instance identifying the rule the code has matched and its location within the source file.

Our two analysis methods (CanSimplifyAssignment and CanSimplifyReturn) currently throw an exception so they aren’t of much use yet. The later case (returns) is simpler so we’ll implement that first using more information gleaned from the syntax visualizer.

if ((SyntaxKind)ifStatement.Statement.RawKind != SyntaxKind.ReturnStatement) return false;
if ((SyntaxKind)ifStatement?.Else?.Statement.RawKind != SyntaxKind.ReturnStatement) return false;

return true;

The implementation is again pretty straightforward; we simply ensure that both parts of the if..else statement are returns by checking the RawKind properties. Since we don’t care about what’s returned at this point it’s faster to check the RawKind property than mess with converting the instances from SyntaxStatement to ReturnStatementSyntax. Also note how we’re using C# 6’s new ?. operator to short-circuit possible null references when inspecting the else branch.

The CanSimplifyAssignment method is a bit more involved because we need to drill down into the nested statements to ensure that both the if and else clauses are trying to assign to the same identifier.

var truePartStmt = ifStatement.Statement as ExpressionStatementSyntax;
if (truePartStmt == null) return false;

var truePartAssignment = (truePartStmt.Expression as AssignmentExpressionSyntax)?.Left;
if (truePartAssignment == null) return false;

var falsePartStmt = ifStatement?.Else?.Statement as ExpressionStatementSyntax;
if (falsePartStmt == null) return false;

var falsePartAssignment = (falsePartStmt.Expression as AssignmentExpressionSyntax)?.Left;
if (falsePartAssignment == null) return false;

if (!truePartAssignment.IsEquivalentTo(falsePartAssignment)) return false;

return true;

Here you can see how we walk the tree via the various node properties (again using the ?. operator to short-circuit possible null references), ultimately verifying that the same identifier is used in both parts of the if..else statement.

Observing the Analysis Result

Running the code at this point will start a new instance of Visual Studio, install the analyzer and code fix provider as an extension (via the VSIX project), and attach the debugger. Opening (or creating) a project with code that matches one of our handled cases should result in something like the following (provided that you left the DiagnosticSeverity at Warning.

Image may be NSFW.
Clik here to view.
Highlighted Code Diagnostic

Of course, we’re only halfway done. We’ve built the analyzer but haven’t yet built the code fix provider; all we have is the sample provider which changes the case of a type name. Since that most definitely isn’t what we want we’ll build the real provider in the next section.

Writing The Code Fix Provider

To begin building the code fix provider we’ll follow a similar process to what we did with the diagnostic analyzer. We’ll start with the template-generated code file and update or replace things as necessary. As with the analyzer, you may consider looking over this code to get a feel for its structure before we start hacking at it.

The code fix provider’s first two methods, GetFixableDiagnostics and GetFixAllProvider remain unchanged. These methods identify the diagnostics that the provider handles and specifies the class that will handle applying the fix to all instances of the rule, respectively.

The first method we need to change is ComputeFixesAsync which is responsible for locating the affected node and registering the delegate that handles building the new code. Update the method body as follows then delete the MakeUppercaseAsync method since it’s no longer needed.

var root = await context.Document.GetSyntaxRootAsync(context.CancellationToken).ConfigureAwait(false);

var diagnostic = context.Diagnostics.First();
var diagnosticSpan = diagnostic.Location.SourceSpan;
var declaration = root.FindToken(diagnosticSpan.Start).Parent.AncestorsAndSelf().OfType<IfStatementSyntax>().First();

context.RegisterFix(
    CodeAction.Create("Replace with conditional operator", c => MakeConditionalAsync(context.Document, declaration, c)),
    diagnostic);

Here we’ve indicated that we care only about IfStatementSyntax instances and that when applying the fix for the UseConditionalOperator rule, invoke the soon to be defined MakeConditionalAsync method.

Creating code fixes is much more involved than code analysis. With code fixes we’re generally transforming one node into another so we need to know a bit more about the original structure. Furthermore, we want to generate readable code so we need to pay special attention to whitespace and other formatting (called trivia in the compiler services API). It’s important to recognize that the syntax trees we’re working with are immutable constructs so every time we make a change, we get a new tree back. When a fix requires us to apply multiple changes, we must enable tracking on the tree so we can locate the affected nodes in the resulting tree.

Let’s stub out the methods that create the two code fixes and define the MakeConditionalAsync method. Fortunately, the analyzer already took care of filtering out nodes that don’t meet our criteria so although we have to parse nodes, we don’t need to verify they meet the fix criteria.

private async static Task<SyntaxNode> ApplyAssignmentCodeFix(Document sourceDocument, IfStatementSyntax ifStatement)
{
    throw new NotImplementedException();
}

private async static Task<SyntaxNode> ApplyReturnCodeFix(Document sourceDocument, IfStatementSyntax ifStatement)
{
    throw new NotImplementedException();
}

private async Task<Document> MakeConditionalAsync(Document document, IfStatementSyntax ifStatement, CancellationToken cancellationToken)
{
    SyntaxNode newRoot = null;

    switch ((SyntaxKind)ifStatement.Statement.RawKind)
    {
        case SyntaxKind.ExpressionStatement:
            newRoot = await ApplyAssignmentCodeFix(document, ifStatement);
            break;

        case SyntaxKind.ReturnStatement:
            newRoot = await ApplyReturnCodeFix(document, ifStatement);
            break;
    }

    return newRoot == null ? document : document.WithSyntaxRoot(newRoot);
}

In the MakeConditionalAsync method we simply apply the same technique as before to determine the if statement’s Statement‘s kind and invoke the corresponding method. Both ApplyAssignmentCodeFix and ApplyReturnCodeFix return a SyntaxNode that will serve as the new document root. In the event that the statement is neither, we simply return the original document without applying any transformations.

Before going any further, let’s define some helper methods that will simplify some things for us a bit later on.

private static TNode ApplyFormatting<TNode>(TNode node)
    where TNode : SyntaxNode
{
    return node.WithAdditionalAnnotations(Formatter.Annotation);
}

private static TNode CreateNodeWithSourceFormatting<TNode>(SyntaxNode sourceNode, Func<TNode> factory)
    where TNode : SyntaxNode
{
    return
        factory()
            .WithLeadingTrivia(sourceNode.GetLeadingTrivia())
            .WithTrailingTrivia(sourceNode.GetTrailingTrivia());
}

The first of these methods, ApplyFormatting applies formatting per the IDE settings. The other method, CreateNodeWithSourceFormatting, uses a factory delegate to create a new node then copies the leading and trailing trivia from the another node before returning the new node.

Fixing Conditional Returns

As for the code fix generation implementations, the return code fix is much simpler than the fix for conditional assignments so we’ll start with that one again.

var truePartStmt = ifStatement.Statement as ReturnStatementSyntax;
var falsePartStmt = ifStatement.Else.Statement as ReturnStatementSyntax;

var conditionalExpr =
    SyntaxFactory.ConditionalExpression(
        ApplyFormatting(SyntaxFactory.ParenthesizedExpression(ifStatement.Condition)),
        ApplyFormatting(truePartStmt.Expression),
        ApplyFormatting(falsePartStmt.Expression));

var returnStmt =
    CreateNodeWithSourceFormatting(
        ifStatement,
        () => SyntaxFactory.ReturnStatement(conditionalExpr));

var docRoot = await sourceDocument.GetSyntaxRootAsync();
return docRoot.ReplaceNode(ifStatement as CSharpSyntaxNode, returnStmt);

In the above snippet, we retrieve the return statements from both the if and else clauses before building up a new conditional expression via the SyntaxFactory, passing in the if statement’s condition in parenthesized form along with both return statement expressions. We then create a new return statement with the conditional expression we just created.

We proceed with replacing the original if statement with the newly created return statement by getting the source document’s root and invoking ReplaceNode. The reason we cast ifStatement to CSharpSyntaxNode is that ReplaceNode is a generic method which requires that the replacement node be the same type as the source node. Casting the source node to CSharpSyntaxNode works around the type change.

Fixing Conditional Assignments

When I stated earlier that fixing conditional returns was easier than fixing conditional assignments I was fibbing a bit. If we were to replace only the if statement the code would be only slightly more complicated as depicted here:

var truePartStmt = ifStatement.Statement as ExpressionStatementSyntax;
var truePartExpr = truePartStmt.Expression as AssignmentExpressionSyntax;
var falsePartStmt = ifStatement.Else.Statement as ExpressionStatementSyntax;
var falsePartExpr = falsePartStmt.Expression as AssignmentExpressionSyntax;

var conditionalExpr =
    SyntaxFactory
        .ConditionalExpression(
            ApplyFormatting(SyntaxFactory.ParenthesizedExpression(ifStatement.Condition)),
            ApplyFormatting(truePartExpr.Right),
            ApplyFormatting(falsePartExpr.Right));

var assignmentExpr =
    SyntaxFactory
        .AssignmentExpression(SyntaxKind.SimpleAssignmentExpression, truePartExpr.Left, conditionalExpr);

var assignmentStmt =
    CreateNodeWithSourceFormatting(
        ifStatement,
        () => SyntaxFactory.ExpressionStatement(assignmentExpr));

var docRoot = await sourceDocument.GetSyntaxRootAsync();
return docRoot.ReplaceNode(ifStatement as CSharpSyntaxNode, assignmentStmt);

As you can see, we need to drill into the assignment expressions and pass the right side to the ConditionalExpression along with the parenthesized condition from the original if statement. We then wrap the conditional expression in an assignment expression and then an assignment statement before replacing the if statement from the source document. The distinction between expressions and statements is important at this level because we’re constructing new syntax for a statement-based language. Rather than allowing us to insert expressions directly, we need to wrap them inside a statement that allows expressions, hence the name ExpressionStatement.

Combining Declaration and Assignment

What’s much more interesting than simply replacing the if statement is when we take into account what we’re assigning to and whether we can combine the assignment with the declaration. For instance, we can’t combine the assignment with the declaration when assigning to a property or field but we might be able to when assigning to a local variable. The reason I say we might be able to combine in the case of a local variable is that we can only really do it when the variable isn’t being referenced anywhere prior to or within the if statement.

Creating a new variable declaration via the SyntaxFactory requires a rather cumbersome method/argument chain so for readability I broke it out into another method as defined here:

private static LocalDeclarationStatementSyntax CreateVarDeclaration(string symbolName, ExpressionSyntax expression)
{
    return
        SyntaxFactory.LocalDeclarationStatement(
            SyntaxFactory.VariableDeclaration(
                SyntaxFactory.IdentifierName("var").WithTrailingTrivia(SyntaxFactory.Space),
                SyntaxFactory.SingletonSeparatedList(
                    SyntaxFactory
                        .VariableDeclarator(symbolName)
                        .WithTrailingTrivia(SyntaxFactory.Space)
                        .WithInitializer(SyntaxFactory.EqualsValueClause(expression)))));
}

In order to determinate if we can combine the declaration and assignment we need to jump into some symbol analysis via the source document’s semantic model.

// Moved docRoot definition from after assignmentStmt definition
var docRoot = await sourceDocument.GetSyntaxRootAsync();

var semanticModel = await sourceDocument.GetSemanticModelAsync();
var targetSymbol = semanticModel.GetSymbolInfo(truePartExpr.Left).Symbol;

if (targetSymbol.Kind == SymbolKind.Local)
{
    var declarationSyntax = targetSymbol.DeclaringSyntaxReferences[0].GetSyntax().Parent.Parent;
    var dataFlowAnalysis = semanticModel.AnalyzeDataFlow(declarationSyntax, ifStatement);
    var variableIsRead = dataFlowAnalysis.ReadInside.Any(v => v.Name.Equals(targetSymbol.Name));

    if (!variableIsRead)
    {
        var newRoot = docRoot.TrackNodes(declarationSyntax, ifStatement);

        var newDeclarationSyntax = newRoot.GetCurrentNode(declarationSyntax);
        newRoot = newRoot.RemoveNode(newDeclarationSyntax, SyntaxRemoveOptions.KeepNoTrivia);

        var declarationStmt =
            CreateNodeWithSourceFormatting(
                ifStatement,
                () => CreateVarDeclaration(targetSymbol.Name, conditionalExpr));

        var newIfStatement = newRoot.GetCurrentNode(ifStatement);
        return newRoot.ReplaceNode(newIfStatement as CSharpSyntaxNode, declarationStmt);
    }
}

// var assignmentStmt = ...

We begin by getting the semantic model from the source document. From the semantic model we obtain a reference to the identifier’s symbol which we then inspect to determine if we’re setting a local variable. If we’re not dealing with a local variable we just move on since we can’t condense the declaration and the assignment. If the symbol does represent a local variable we continue inspecting the local variable to determine if it’s being read.

Reference checking is done through the semantic model’s AnalyzeDataFlow method. This method returns a DataFlowAnalysis object that contains immutable arrays describing things like variable declarations, reads, writes, and so on. To narrow the scope of our reference search we define a search range by providing the the original declaration as the first argument and the if statement as the second argument. We can then search the ReadInside array for the variable name. If it’s not read, we can proceed with combining the two statements. (This may not be the best way to approach this problem. If there’s a better way, I’d love to hear it!)

In order to combine the two statements we must make multiple changes to the source document but remember that the tree is immutable and every change we make results in a new tree. A complication of this is that once we make a change, all the nodes are different so we can no longer reference them as we did before. This is where node tracking comes into play.

To enable node tracking we invoke the aptly named TrackNodes method. This creates a new tree where the specified nodes are indexed internally such that they can be accessed through the GetCurrentNode method by passing in the node from the source document. Here we enable tracking, get the declaration node from the new tree and remove it, create a new declaration statement, locate the if statement, and replace it with the new declaration which we then return.

Applying the Code Fix

With the code fix provider now fully defined we can run the solution against some code that satisfies the analyzer. For instance, if we run the analyzer against the following method definition:

public string SetMessageVariable()
{
    var message = "";
    var now = DateTime.Now;
    if (now.DayOfWeek == DayOfWeek.Saturday || now.DayOfWeek == DayOfWeek.Sunday)
        message = "Weekend!";
    else
        message = "Weekday";
    return message;
}

…the analyzer should detect the match and suggest our code fix as shown:

Image may be NSFW.
Clik here to view.
Apply Code Fix

Applying the fix to the method results in the method being rewritten as:

public string SetMessageVariable()
{
    var now = DateTime.Now;
    var message = (now.DayOfWeek == DayOfWeek.Saturday || now.DayOfWeek == DayOfWeek.Sunday) ? "Weekend!" : "Weekday";
    return message;
}

Wrapping Up

The .NET Compiler Services are a great addition to the .NET ecosystem. It has been my goal to demonstrate how, through aspects like the code analysis API and the syntax visualizer, we can create Visual Studio extensions with relative ease. The extensions we create can then guide us toward writing better code, uncovering patterns within our code, or even enforcing internal coding standards.


Tagged: .NET, .NET Compiler Services, Roslyn, Software Development, Visual Studio 2015 Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing latest article 1
Browse Latest Browse All 2

Trending Articles