Use Google Sheets Apps Script to track Open Source GitHub and Docker statistics

If you run and maintain an Open Source project you’ll typically will want to keep track of things like your downloads, stars, commits over time, etc. to help you gauge engagement and overall health of your project. Here’s a quick way hack to keep track of some of this data in Google Sheet which will help you simplify the collection of data and help you better understand what that data means for your project.

GitHub

Let’s start with GitHub repository statistics. To collect data a lot of data from github, you’ll need to generate a token. But, if you’re only going to run a few requests you can ignore the token use, but it is recommended to avoid throttling.

Let’s take our repositories for immudb and CAS. We’d like to collect a monthly entry for the 2 repositories, that look like this:

To start with, let’s first create a Google Sheet with the structure as shown in the above screenshot. We’ll fill the document line by line using Google Apps Script, a javascript platform that lets you integrate with and automate tasks across Google Docs.

Once we have our documents, we can now click on the “Apps Scripts” entry in the “Extensions” menu:

The following script will pull the data from GitHub and populate our sheet. Copy & paste the code into the Apps Script function and give it a name. You only need to change the token to your own and include your repository names.

var TOKEN = 'your-private-token';

function recordGithubstats() {
  repos = ['org/repo', 'org/repo'];
  
  var headers = {
       "Authorization": "Token " + TOKEN,
       "Accept": "application/vnd.github.v3+json"
   };
     
   //Logger.log(headers);
     
   var options = {
       "headers": headers,
       "method" : "GET",
       "muteHttpExceptions": true
   };
  

  var row = [new Date()];
  
  repos.forEach(function (repo) {
    var response = UrlFetchApp.fetch("https://api.github.com/repos/" + repo, options);
    var response2 = UrlFetchApp.fetch("https://api.github.com/repos/" + repo + "/contributors?page=1&per_page=1000", options);
    var response3 = UrlFetchApp.fetch("https://api.github.com/repos/" + repo + "/releases", options);
    var downloadStats = 0;
    var repoStats = JSON.parse(response.getContentText());
    var contrStats = JSON.parse(response2.getContentText());
    var downloads = JSON.parse(response3.getContentText());
    
    Logger.log(downloads);
    
    for(var i = 0; i < downloads.length; i++) {
      var assets = downloads[i].assets
      for(var x = 0; x < assets.length; x++) {
             downloadStats = downloadStats + downloads[i].assets[x].download_count;
         }   
    }
    

    row.push(repoStats['stargazers_count']);
    row.push(repoStats['forks_count']);
    row.push(repoStats['subscribers_count']);
    row.push(contrStats.length);
    row.push(downloadStats);
                       
  });
  
  var sheet = SpreadsheetApp.getActiveSheet();
  sheet.appendRow(row);
}

You can now run it to see if the cells in the sheet will be filled with the correct numbers. (Btw, you have quite decent debugging capabilities in Google’s Apps Script to check the API response content.)

That’s it, you now can run it manually or using the built-in scheduler. Our results looked like this:

Docker

We can do the same for Docker Hub too and collect image download statistics. Let’s add the Title to some new columns in the sheet, open the Apps Script editor and paste the following code (just don’t forget to change the image names).

function recordDockerImagePullCount() {
  images = ['org/container', 'org/container'];
  var row = [new Date()];
  
  images.forEach(function (image) {
    var pull_count = get_image_pull_count(image);
    row.push(pull_count);
  });
  
  var sheet = SpreadsheetApp.getActiveSheet();
  sheet.appendRow(row);
}

function get_image_pull_count(image) {
  var response = UrlFetchApp.fetch("https://hub.docker.com/v2/repositories/" + image);
  var imageStats = JSON.parse(response.getContentText());
  return imageStats['pull_count'];
}

And here we see our results:

The best thing about using Google Sheets with the Apps script is the built-in scheduler. That way you can run the script daily, weekly or monthly to track all the required statistics.

To do so, just click on the timer icon:

and configure the scheduler

Of course, you can use this kind of script with anything that provides an online API. It’s a very simple, yet very powerful and convenient solution for regular statistic tracking and reporting. Hopefully having more insight into your community can help you better engage with them and continue to grow your contributor base and project.

CNIL Metrics & Logs

Self-Hosted performance monitoring and compliant log analysis for VMware vSphere, container and much more.

immudb

Built on the fastest immutable ledger technology. Open Source and easy to use and integrate into existing application.

Codenotary Cloud

Trusted CI/CD, SBOM and artifact
protection with cryptographic proof.
One CLI to manage all.

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.