Skip to main content

Command Palette

Search for a command to run...

How to efficiently handle irregular JSON data:

Published
7 min read
How to efficiently handle irregular JSON data:

Using Javascript logical operators && and ||

JSONJSON

This post is about how I had been processing JSON data from Web API calls, why the approach I chose was “poor taste” code, and what I did to fix it.

Prerequisites

  • You are familiar with basic Javascript (i.e. you know loops, if-thens etc.).

    For example, you are good to go if you have done (or can do) the Basic JavaScript challenges in Quincy Larson’s Free Code Camp (link)

  • You have cursory knowledge of web APIs. (i.e. you know what they are used for). You know a bit about relational database and SQL.

    This does not mean you have actually used them in your code (although that would be nice).

  • You are NOT an experienced programmer. You will be bored :-)

Preamble (a.k.a. why this post?)

The inspiration for this post comes from Brian Barto’s thought provoking and popular (2.1 k recommends and counting) article on “good taste” coding. Brian used an example given by Linus Torvalds (you know that guy who single handedly wrote the kernel of one of the world’s most robust, efficient and elegant operating system and gave it away for free) in an interview to drive home some software development best practices. While I encourage everyone to read the article, my post here is about what I ended up doing after reading his article.

I am a self-taught newbie hobbyist programmer. And my partner (in both code and in life) who is trained in “legacy” CS, says I am bit of a “code brat.” I code first, think (of algorithm and stuff) later. I code for my pleasure (i.e. not for money, for now). Once I get an app working and tested, I deploy it and never look back (well I do listen to user complains and fix them, but that is about it). Brian’s article nudged me to take a critical look at the code I have written over the last two years (as I learned Google Apps Script and built Google Doc/sheet Add-ons and web apps using Google App script).

It is humbling to find examples of “poor taste” programming in one’s own code. But it is fun to fix them and talking about them is almost therapeutic. In this post I will describe one such fix (the fancy word for it is refactoring).

On processing the (data) web using APIs and JSON

Web APIs give you access to databases with rich information that your program can use. Here’s an introduction with detailed example using the New York Times Web API. Most modern Web APIs return data in JSON format. JSON is a subset of Javascript and parsing a string to a JSON (i.e. a Javascript) object is trivial.

var data = JSON.parse(string_result_of_api_call)

Boom! The Javascript object datanow contains all the information that was sent back in response to the API call.

Things get a bit messy from here.

The “mess” explained

The use of Web API to get data from a web service is conceptually very similar to making a query into a database. For example, a SQL query select * from cars where price &lt;10K(get me all the cars priced <10K), will return a table with rows that match the query criteria.

In JSON the results would look a bit different.

[{make:'Toyota', model:'Camry', year:1999, price:7K, doors:'2DR'},{make:'Nissan', model:'Sentra', year:1988, price:3K, doors:'2DR'},{make:'Jeep', model:'Cherokee', year:1988, price:7K, doors:'4DR'}{make:'Jeep', model:'Wrangler', year:1984, price:9K, doors:'2DR'}]

  1. The result is a valid Javascript object (an array in this example)

  2. Each row of the result is represented as an element of an array and the fields name are given for every value.

Although the look is different, the underlying structure is still not that different.

Now let’s bring the Web’s messiness. What if some of the fields are absent from some of the rows. This usually does not happen in professionally built and curated in-house databases but irregularities like this is the norm when it comes to the Web and results from the Web API calls are no different.

If the results were returned in an SQL style rows they would look like the following (where we are missing information about the Nissan’s year and its drive train type the model for one of the two Jeep).

Processing this data is fairly straight forward. We loop through the data, for each row, we check if a value is null (or blank) and do what we need to do with the data that is available. Our code does not break, just because there is a null or an empty value.

In JSON the data would be something like

[{make:'Toyota', model:'Camry', year:1999, price:7K, doors:'2DR'}, {make:'Nissan', model:'Sentra', price:3K},{make:'Jeep', year:1988, price:7K, doors:'4DR'}, {make:'Jeep', model:'Wrangler', year:1984, price:9K, doors:'2DR'}]

For Nissan, as expected there is no key for year and doors. For one of the Jeeps the model is missing.

My n00b inclination was that the processing of this data is still quite simple. I will need to add an additional check to see if a field is present before accessing its value. I will access the key and see if accessing the key returns undefined. If it does not, then I capture the value. In our Nissan example, trying to access the key year will return undefined. So my pattern was

if (key != undefined) {do_some_thing(object[key]}

It worked well until I encountered complex (i.e. real life) JSONs.

JSON is a nested tree like structure, rather than a table structure. While the keys of a JSON object has to be a string or a number, the values can be objects. The values of the keys of these child objects can themselves again be objects. Many real world JSONs can be 8 or 9 level deep. Using my pattern this would mean that I would have a 8 or 9 level nested if-thens.

Try debugging that.

“Poor taste” a detailed example

We continue with the car domain. Let’s say we are tasked to print the values of certain fields from the inventory. We will display the value of the fields if they are available. If the fields are not there or if the value of a field is not present we will not display anything (i.e. write a blank).

//Note that this is an INEFFICIENT way to solve the problem
function noob_way() {

  // get test data from a test data utility
  var cars=get_data_set("basic");  


  for (var i in cars){
    if(cars[i]["model"]!==undefined) { 
      Logger.log(cars[i]["model"]); // Logger.log() is Google App Script's console.log  
    } 

    if(cars[i]["trim"]!==undefined) {
      Logger.log(cars[i]["trim"]); // Logger.log() is Google App Script's console.log  
    }

    if(cars[i]["year"]!==undefined) {
      Logger.log(cars[i]["year"]); // Logger.log() is Google App Script's console.log  
    }

    if(cars[i]["wheel"]!==undefined) {
      Logger.log(cars[i]["wheel"]);
      if(cars[i]["wheel"]["wheel_size"]!==undefined){
        Logger.log(cars[i]["wheel"]["wheel_size"]); // Logger.log() is Google App Script's console.log  
      }
      if(cars[i]["wheel"]["wheel_metal"]!==undefined){
        Logger.log(cars[i]["wheel"]["wheel_metal"]); // Logger.log() is Google App Script's console.log  
      }
    }
  }
}

Checking for model, trim and year is easy because the keys are at the first level. But wheel_size and wheel_metal are nested fields. We need to first check if the parent field (in this case wheel) exists before we access them. You can see how the complexity increases in line 21-29. Again imagine processing fields that are 8 or 9 level deep.

Before I describe my refactoring let’s look at an interesting feature that is available in many programming languages.

Short-circuiting — booleans

As logical expressions are evaluated left to right, they are tested for possible “short-circuit” opportunities using the following rules:

1. false && (*anything)* is short-circuit evaluated to false

2. true || (*anything)* is short-circuit evaluated to true

Here’s a useful partial truth table

A partial [truth table](https://cdn.hashnode.com/res/hashnode/image/upload/v1630771977049/4uiFW7vvT.html)A partial truth table

Taking advantage of rule # 1 if you write cars["wheel"] && cars["wheel"]["wheel_size"] AND if cars["wheel"] does not exist (i.e. it is false in Javascript) the rest never gets evaluated.

Basically the short-circuiting can convert a nested if else error checking for undefined fields to a simple single line of code. The depth of a field will not increase the complexity it will just add more clauses with &&s.

Finally, here’s the cleaned up code

//Smarter way to access the data
function noob_smart_way(){

  var cars=get_data_set("basic");  
  var format_string = "%s %s %s %s %s";

  for (var i in cars){
    Logger.log(format_string, cars[i]["model"] || "",cars[i]["trim"] || "",cars[i]["year"] ||  "",
                cars[i]["wheel"] && cars[i]["wheel"]["wheel_size"]    ||  "",
               cars[i]["wheel"] && cars[i]["wheel"]["wheel_metal"] || "") 
    // Logger.log() is Google App Script's console.log  
  }   
}

TL;DR: Use short circuit of && to check for keys in a deep JSON instead of a nested if-else. This will significantly reduce your code complexity. Thread together all the keys from the top level until you reach the key whose value you want with bunch &&s.

k_0:{k_1: {k_2:{k3: … kn}} should become k_0 && k_0[k_1] && k_0[k_1][k_2]&& … &&k_0[k_1][k_2]….[kn]

In the last two years I have spent 2300 hours in coding and have written 23,342 lines of code. In that code I have made 349 real world JSON calls followed by nested if-then-else. Say it took me on the average 1 hour to code/debug each of them. That is still 350 hours or 15% of my coding time.

This pattern literally takes a minute or less to write and it is mostly bug proof. That is 350 minutes or about 6 hours.Go figure!

More from this blog

Code Blogs

10 posts