Azure Databricks CLI "Error: JSONDecodeError: Expecting property name enclosed in double quotes:..."
Have you been trying to create a Databricks cluster using the CLI? Have you been getting infuriated by something seemingly so trivial? Well, join the club. Although, get ready to depart the club because I may have the solution you need.
When creating a cluster using the CLI command databricks clusters create
, you're required to pass in either a JSON string or a path to a JSON file. I recently opted for the first option. [Note: I'm using PowerShell to talk to the Databricks CLI.]
How it transpired
Here's what (I feel) should have worked.
But running that, we receive the output:
Error: JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 5 (char 7)
Huh? You asked for a JSON string, I gave you a JSON string. Why are you complaining? Maybe the Databricks CLI wants me to wrap more quotes around my --json
argument. Not sure why, but let's try that.
databricks clusters create --json "$configToJson"
Nope. Same error. More quotes?
databricks clusters create --json "'$configToJson'"
Of course, there's no chance that's going to work and it gives us the error:
Error: JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Hmm. What could be happening?
A little bit of Googling later, I find someone who had the same problem as me (who was using the windows command prompt). They alluded to the need to "escape" the double quotes with a backslash within the JSON string. Sounds odd (a backslash is neither the escape character in PowerShell nor the Windows Command Prompt), but let's give it a whirl:
And hey presto, it works:
{ "cluster_id": "0704-090525-blocs355" }
But... why?
At this point, it's very easy to shrug this off now that it's working and not bother to try and understand why it's now working. I mentioned that "a backslash is neither the escape character in PowerShell nor the Windows Command Prompt", but there are numerous common languages for which it is the escape character. After locating the Databricks CLI GitHub repo, I saw that it was written in Python, a language which uses the backslash as an escape character.
Under the covers, the Databricks CLI is using the json.loads()
method to parse our --json
argument, and the error we're getting is a JSONDecodeError coming from that json package. The loads()
method takes a string in the form '{ "name":"John", "age":30, "city":"New York"}'
and converts it to a Python dictionary. This should work with the argument we used in the first attempt earlier, but it doesn't. My hypothesis (which I'm unsure how to prove) is that the JSON string argument is implictly being wrapped in double quotes in the process of being passed to the Python method, so the method receives this:
which isn't a valid representation of a string in Python (because the 2nd double quote closes the first double quote), and cannot be parsed into a dictionary. Add in the interior double quote escaping, and all's good - we now have a string that Python can understand.
A sensible alternative
I mentioned earlier that you can also pass in a path to a JSON file using the --json-file
parameter. This works exactly how you'd expect (no funny business with escaping quotes since you're passing it as a .json
file as opposed to a JSON string). In our case, however, having to define the JSON object elsewhere in our code-base would have been sub-optimal, but YMMV.
I hope this blog has helped some of you!