Wednesday, September 26, 2018

Dreamforce 2018 Presentation - Understand your Org shape via visualization of Metadata Component Dependencies

At Dreamforce this year I gave a theater presentation on using the Metadata Component Dependency Pilot API along with Gephi to explore ways of untangling the "Happy Soup" that makes up a Salesforce org.

In this post I'll cover the contents of that talk and expanded on some areas that I'd like to cover in more depth.

Session recording

Why Visualize an Orgs Metadata?

All four sets are identical when examined using simple
summary statistics, but vary considerably when graphed - Source

If you've ever seen the Anscombe's quartet you'd know a compelling reason to visualize information in addition to straight out numerical analysis. Visualizations can reveal patterns and features in the data that otherwise might not be readily apparent.

Additionally, when we start drilling into all the metadata in an org and the relationships within that metadata we can very quickly be overwhelmed by the share volume of it. There can be thousands of metadata items and many more relationships between them.

The Soup and the Polar Bear

The concept of the metadata in your production org existing as a "Happy Soup" was introduced at TrailheadDX 2017 by Wade Wegner. It describes the scenario where all the conceptual apps that make up your production org intermingle without any strongly defined boundaries between them. Essentially, the org becomes one big melting pot of metadata. This has been the status quo for some time now (if you ignore managed packages and the relatively new addition of Packaging 2.0).

Soup chef at work on your production org - a Marvelous Metadata Mixing Pot

Now, take one of the production orgs that I work in day-to-day as an example. It was forged 11 years ago when the seasonal release logo was a polar bear and the API version was v8.0. Ever since then it's been accumulating metadata changes as multiple people have come and gone. Each person leaving a distinct mark on the metadata that makes up that org.

Back to our soup metaphor, that org is a decade old soup. Different soup chefs have been mixing ingredients in via a pinch of declarative UI changes, a dash of changes sets, a smidgen of metadata API deploys, and maybe the odd unmanaged package thrown in for good measure. Over the years an org can seem less like a carefully conceived soup recipe and more like a marvelous concoction with everything thrown in and reacting together in weird and wonderful ways.

Which kind of leaves you wondering - If you were presented with just the resulting happy soup, how would you identify the ingredients that went into it and how they need to interact so you could replicate it again? How can we unmake the happy soup?

The goal of our metadata visualizations will be to use the minds ability to quickly perceive and derive meaning from details such as size, color, position, and proximity to find insights that might not be apparent in tabular data. That might sound complicated, but the objective is to make otherwise complicated relationships readily apparent.

“By visualizing information, we turn it into a landscape that you can explore with your eyes. A sort of information map. And when you’re lost in information, an information map is kind of useful.” - David McCandless

So, we know we want to make visualizations of the metadata and the dependencies within, but where do we start gathering that data?

The Metadata Component Dependency API

Luckily for us the new Metadata Dependencies pilot introduced in Summer '18 at TrailheadDX allows for the use of Tooling API SOQL queries to quickly find dependencies between metadata components. You can signup for the pilot via your AE as per the release notes: Untangle Your Dependencies with MetadataComponentDependency Queries (Pilot)

The benefit of this pilot is that it is much faster and easier than trying to assemble the same information yourself. I've tried extracting similar details via the ApexClassMember.SymbolTable. While possible, it is a path that is thwart with multiple API calls per ApexClass. And at the end of all that you will still only have a fraction of the possible relationships covered.

Once you are in the pilot it is pretty simple to use. A single SOQL query can bring back the majority of relationships for an org. For example, using the Tooling API via the sfdx cli:

sfdx force:data:soql:query --usetoolingapi --query "SELECT MetadataComponentId, MetadataComponentName, MetadataComponentType, RefMetadataComponentId, RefMetadataComponentName, RefMetadataComponentType FROM MetadataComponentDependency Where RefMetadataComponentType = 'CustomField'" -u devorg

Supported Dependencies

As this is a pilot feature the types of dependencies it can reveal isn't 100% yet. Below is a small subset of what can and can't be tracked: (as at the time of posting)

TypeUsed IdSupported
CustomObjectApex Class
CustomObjectProcess Builder
CustomObjectLookup Relationship Field
CustomObjectReports
CustomFieldLayout
CustomFieldReports
FlowProcess Builder
FlowFlow

You can see the complete list of supported metadata relationships in https://github.com/afawcett/dependencies-sample

Augmenting the Graph

Using the results from the Metadata Component Dependency query as a starting point we can then augment the resulting graph with additional metrics. Here are some of the additional details that can be merged in as meta-metadata.

  • CustomField.Metadata - can be used to show lookup relationships
  • ApexClass.ApiVersion - useful for finding Apex classes with mismatched API versions
  • The code coverage results for each Apex class
  • The number of lines of code in each Apex Class
  • Where available, the created date for the Metadata
  • The number of records in a CustomObject
  • The number of Triggers per sObject

Tooling to create the Dependency Graph

The first part of the visualization process will be to query the MetadataComponentDependency records and convert the results into a graph. I didn't go into this step in any detail in my talk as it isn't very interesting to watch. At it's simplest, you can use one of the following options.

CLI Generation of the Gephi graph

NOTE: At the time of publishing the Github repo wasn't fully up to date with the latest public build. I'll aim to have this done shortly after Dreamforce.

Run the following command:

sfdx fitdx:dependencies:report -u DeveloperOrg > DeveloperOrg.gexf

And that's pretty much it. The command from the FitCLI plugin will run the query, build the graph, and export it ready for opening in Gephi. You only need to point it at the correct Salesforce Org and redirect the output into a file.

If you want to add the augmented graph metrics you can supply the -a option. Note that this will take significantly longer to complete as it involves a number of additional tooling API queries. It's also beneficial to run all the Apex Tests before exporting the graph so it has complete coverage details.

GUI option for creating the Gephi file

As an alternative to using the command line you can also using the FuseIT SFDC Explorer from v3.10 onwards. This now includes the Metadata Dependencies tab to generate Gephi file. It's pretty much exactly the same as the command line version and will prompt for where you want to save the resulting file.

Ideally with either the CLI or the GUI options the target org will have access to MetadataComponentDependency records. If that isn't the case you can still use the augmented option to get some graph details. It won't be as fast or as complete, but you will get a good deal of the relationships based on other data exposed via the Tooling API.

Examples

Now we can get into the interesting part of loading the graphs into Gephi to see what we can discover. You can open the .gexf files directly into Gephi and then start manipulating them.

This video quickly goes through the Field Usage and Packages examples that are expanded on below.

Field Usage

Lets say we want to find what will be impacted if changes are made to the Property__c.Address__c field in the Dreamhouse app. The steps would be as follows:

  1. Partition the Node color by Type. This will make it more apparent what type of nodes you are looking at.
  2. Size the nodes based on the Betweenness Centrality. Which is a fancy way of saying "make the nodes that appear on the shortest paths between other nodes bigger". This is one possible indication of how important a node is to all the other nodes in the graph. Before we can do this we need to run the Network Diameter statistics.
  3. Apply a layout to arrange the nodes. I've used the "Force Atlas" layout with the Repulsion strength set to 1500 and Adjust by Sizes checked.
  4. Show the Node Labels and scale to fit
  5. Switch to the Data Table window and filter to the Address field.
  6. Zoom out a bit, and you will be able to see the the Metadata that references the Property__c.Address__c field.

Potential Packages

  1. Run the "Modularity" Statistics. This algorithm will look for metadata communities based on the dependencies between them. You can adjust the Resolution up or down if you want more or less communities.
  2. Change the appearance of the Nodes to be partitioned by the resulting "Modularity Class". With the correct resolution and layout it will given you a reasonable guide to where potential packages are located. You will likely need to make some finer adjustments along the boundaries.
  3. One way to export the nodes for the package is to select them and then copy to a new workspace. The new data table can then be exported as a csv.

API Versions

  1. Filter to just those bits of metadata that have an API version defined. One way to do this is to drag a "Non-null (ApiVersion)" filter down onto the Queries.
  2. Under Appearance > Nodes, set the node size to rank by Degree. The degree is a measure of how many edges go in or out of the node. So more connected nodes will be larger.
  3. Set the Nodes color to be ranked by API version. Choose a color pallete that emphasizes lower API versions as being problematic.
  4. Set the Node Label to "API Version" and size as required.

Code Coverage

  1. Size the nodes by "LengthWithoutComments" to give an indication of how much code is in each Apex class.
  2. Color the nodes by ranking of the "PercentCovered". Use the slider to roughly indicate code with 75% coverage.
  3. It can be useful to filter to just the test classes and assign them a different node color. They won't have any coverage themselves.
  4. Set the node label to the "PercentCovered"
  5. A circular layout sorted by the coverage percent can be useful to separate out the problem nodes. Move the text classes into a seperate ring so you can see the relationships between the tests and the classes they cover.

Classes with excessive lines of code

As an experiment I drastically increased the amount of node scaling that was possible. This can really emphasis the differences in the size of Apex classes.