Generating Reports from Static Code Analysis Output (SARIF)

Results produced by various static code analysis tools, (such as GitHub CodeQL), are generally stored in SARIF format (Static Analysis Results Interchange Format). SARIF files present static code analysis results using JSON. The popularity of JSON amongst developers has resulted in a range of publicly accessible SARIF-parsing utilities, capable of transforming code analysis results into other popular report-based output formats, such as HTML/Word.

In this post, we’ll walk through the process of downloading respective SARIFs from a historical CodeQL scan, performed on a sample GitHub repository. We’ll then use microsoft/sarif-tools, to “transform” them into a more presentable, HTML-based, reporting format.

Prerequisites

Sample commands/code presented throughout the article assume the following utilities are already installed:

Sample GitHub Repository

Repository URLhttps://github.com/my-git-user/test.git
Branchmain

GitHub Token

A personal access token was generated to allow for relevant GitHub REST authentication and API calls.

The following fictitious value will be substituted for sample commands used through this article:

github_pat_tttttttttttttttttttttttxxxxxxxx

Retrieve Commit Details for CodeQL Analysis

As part of the final CodeQL analysis scan phase, the results (SARIF file(s)) are uploaded to GitHub using the codeql github upload-results command:

codeql github upload-results --sarif=<file> ....[--commit=<commit>]

The command requires a commit argument value as input, which essentially tags/ties the analysis results (SARIF file) to a specific commit.

For our sample repository, we will aim to obtain SARIFs related to a specific commit.

To identify the commit we’re interested, navigate to:

  • Security > code scanning
  • click on “Tools”
  • under Setup types, click on “API upload
Security Codeql Commits
  • We note down the full value of the commit SHA: 68d2a854xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Also, notice that for this particular analysis, there were two invocations of CodeQL, i.e., for java & python
  • The screenshot below is a sample of the analysis results, as viewed from GitHub
Codeql Analysis Results

Retrieve Code Scanning Analysis ID(s)

For Specific Commit

We start by identifying the code scanning analysis ID(s) by calling the REST API list-code-scanning-analyses-for-a-repository endpoint, and filtering by the commit SHA 68d2a854xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

  • The commands below setup the call the to the endpoint, along with a filter for the specific commit:
$ GITHUB_TOKEN=github_pat_tttttttttttttttttttttttxxxxxxxx
$ REPOS=(test)
$ OWNER=my-git-user

$ for ((i=0; i<${#REPOS[@]}; i++));
do
  gh api /repos/$OWNER/"${REPOS[i]}"/code-scanning/analyses --paginate | jq -r \
    '["ID","URL","CATEGORY","BRANCH","COMMIT_SHA"],
      (sort_by(.category) | .[] |select(.commit_sha=="68d2a854xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") |
      [.id,.url,.category,.ref,.commit_sha]) | @tsv' | column -ts $'\t'
done
  • Sample output
ID         URL                                                                             CATEGORY  BRANCH           COMMIT_SHA
234567890  https://api.github.com/repos/my-git-user/test/code-scanning/analyses/234567890  python    refs/heads/main  68d2a854xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
123456789  https://api.github.com/repos/my-git-user/test/code-scanning/analyses/123456789  java      refs/heads/main  68d2a854xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Each URL listed can be as input to the REST API code scanning analysis GET request for downloading the respective SARIF file:

  • When using CURL, the fully qualified URL should be used:
https://api.github.com/repos/my-git-user/test/code-scanning/analyses/234567890
https://api.github.com/repos/my-git-user/test/code-scanning/analyses/123456789
  • When working with GitHub cli command, gh api, the “https://api.github.com” prefix can be dropped:
/repos/my-git-user/test/code-scanning/analyses/234567890
/repos/my-git-user/test/code-scanning/analyses/123456789

Downloading SARIFs

Call the REST API endpoint, using the URLs from the previous section.

Python Analysis

  • Make the call to the GitHub API to retrieve the SARIF associated with Python analysis, saving the output to file sarif_python.json :
$ gh api \
  -H "Accept: application/sarif+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/my-git-user/test/code-scanning/analyses/234567890 | jq . > sarif_python.json

Note the subtle change in the header request, i.e.,

-H "Accept: application/sarif+json"

Java Analysis

  • Repeat the procedure for the Java analysis, saving the output to file: sarif_java.json :
$ gh api \
  -H "Accept: application/sarif+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/my-git-user/test/code-scanning/analyses/123456789 | jq . > sarif_java.json

So, we now have the SARIFs stored as local files, i.e.:

sarif_python.json
sarif_java.json

At this point, we can use these files as inputs to microsoft/sarif-tools in order to generate our HTML-based output.

HTML Report from SARIF Files

Python (sarif_python.json)

  • Generate a HTML file, sarif_python.html, from sarif_python.json file by running:
$ sarif html --output ./sarif_python.html --no-autotrim sarif_python.json
  • The following shows the HTML output, sarif_python.html, as viewed through a standard web browser
Html Codeql Analysis Report Python
  • What if we wanted to include the URL used during the API request when downloading the SARIF, i.e.,?
https://api.github.com/repos/my-git-user/test/code-scanning/analyses/234567890
  • We can achieve the desired output by using sed to append a line at the correct location within the raw sarif_python.html file, i.e., just after “Sarif Summary: CodeQL“:
$ sed -i "/<h3>Sarif Summary: <b>CodeQL<\/b><\/h3>/a\
<h4>Git URL: <b>https://api.github.com/repos/my-git-user/test/code-scanning/analyses/234567890<\/b><\/h4>\
" ./sarif_python.html
Html Codeql Analysis Python Report Enhanced

Java (sarif_java.json)

Repeat the steps in the previous section to generate the HTML report for to the Java analysis by using the corresponding URI:

https://api.github.com/repos/my-git-user/test/code-scanning/analyses/123456789

Report Automation

Wouldn’t it be nice if can automate the whole process, and add in support for uploading the reports as GitHub pages?