Advanced json manipulation with python

Python provides really simple api for json manipulation. This api provides us two main functionality. Those are, converting a python structure to json string and a json string to python structure.

Here is some code that does some sample json to python conversation

import json

# lets start with a python structure and convert it to json string
python_object = {'some_text': 'my text',
                 'some_number': 12345,
                 'null_value': None,
             'some_list': [1,2,3]}
# here we are converting python object to json string
json_string = json.dumps(python_object)
# json_string = '{"some_list": [1, 2, 3], "some_text": "my text", 
#                 "some_number": 12345, "null_value": null}'
# api converts a python dictionary to json object and vice versa
# At this point we have a json_string in our hands. Now lets convert it back to pyton structure
new_python_object = json.loads(json_string)
# new_python_object = {u'some_list': [1, 2, 3], u'some_text': u'my text',
#                      u'some_number': 12345, u'null_value': None}

Converting json string to python objects

We use json.loads function to convert json strings to python objects. Signiture for json.loads is

json.loads(s, encoding=None, cls=None, 
        object_hook=None, parse_float=None, 
        parse_int=None, parse_constant=None, 
        object_pairs_hook=None, **kw)

Lets experiment with it:

In [33]: json_string='{"name":"Product-1","quantity":1,"price":12.50}'

In [34]: json.loads(json_string)
Out[34]: {u'name': u'Product-1', u'price': 12.5, u'quantity': 1}

Here is some standard operation. We are converting a json object which has name, quantity and price. Notice that default behavior rendered price as float. You probably want it as a Decimal object. This is really easy to fix:

In [35]: json.loads(json_string,parse_float=Decimal)
Out[35]: {u'name': u'Product-1', u'price': Decimal('12.50'), u'quantity': 1}

This time we call loads method with parsefloat argument. Which uses given callable to parse float strings that way we get our price as Decimal values. You can also change how integers are being parsed by parseint argument.

You can change how ints and floats are being parsed but you can also change how json objects are being converted. by default json objects are converted into python dicts. lets convert them into named-tuples instead.

in [69]: from collections import namedtuple
in [70]: Product = namedtuple('Product',['name','quantity','price'])
In [71]: json.loads('{"name":"Product-1","quantity":1,"price":12.50}',object_hook=lambda x: Product(**x),parse_float=Decimal)
Out[71]: Product(name=u'Product-1', quantity=1, price=Decimal('12.50'))

Lambda expression we used as object hook can be replaced with following method which does the same thing

def object_hook_handler(parsed_dict):
    """ parsed_dict argument will get 
        json object as a dictionary
        for above example it would have
        following dictionary as value
        {u'name': u'Product-1', u'price': Decimal('12.50'), u'quantity': 1}
    """
    return Product(name=parsed_dict['name'],
                   quantity=parsed_dict['quantity'],
                   price=parsed_dict['price'])

objecthook parameter gets values as parsed dictionary, which means you already lost order of keys when you get them. If order is important for your use case you can use objectpairs_hook argument which returns list of key,value tuples. For example you could get a OrderedList instead of regular list.

In [83]: from collections import OrderedDict

In [84]: json.loads('{"name":"Product-1","quantity":1,"price":12.50}',object_pairs_hook=OrderedDict,parse_float=Decimal)
Out[84]: OrderedDict([(u'name', u'Product-1'), (u'quantity', 1), (u'price', Decimal('12.50'))])

if we would write objectpairshook_handler function it would get following list as argument

[(u'name', u'Product-1'), (u'quantity', 1), (u'price', Decimal('12.50'))]

if need further customization you could implement a custom decoder spesific to your json format. It is really easy. JSON decoder is just a json.JSONDecoder subclass that implements decode method which gets a json string and returns a python object that you need. Here is a json decoder template as a start point

class TemplateJSONDecoder(json.JSONDecoder):
    def decode(self,json_string):
        """
        json_string is basicly string that you give to json.loads method
        """
        default_obj = super(TemplateJSONDecoder,self).decode(json_string)

        # manipulate your object any way you want
        # ....

        return default_obj

JSON decoder is really usefull for code encapsulation. You can write decoders for every json format you have that removes unnecessary data, converts Decimal,datetime values and returns different objects according to location of the tree. for example you could pass Product and User objects in the same json string and convert them into model objects in parser.

Converting python objects to json strings

Use json.dumps function to convert python objects to json strings. By default, this method can only handle dicts,lists,strings,unicodes,int and float numbers. If you try to convert any other type like datetime or Decimal, you will get a Type error. here is the signiture of json.dumps

json.dumps(obj, skipkeys=False, ensure_ascii=True, 
           check_circular=True, allow_nan=True, cls=None, 
           indent=None, separators=None, encoding='utf-8', 
           default=None, **kw)

If you want a pretty printed version of the string set indent argument to any number (set it to 2).

In [129]: json.dumps({'name':'Huseyin','last_name':'Yilmaz'},indent=2)
Out[129]: '{\n  "last_name": "Yilmaz", \n  "name": "Huseyin"\n}'

As I explained if you give unexpected types to json.dumps, it will cause a type error. But you can assign a default(obj) function which will be used to serialize unexpected types to serializable format. For example lets convert all unexpected types to str.

In [132]: json.dumps({'name':'Huseyin','now':datetime.datetime.now()},default=str)
Out[132]: '{"now": "2012-04-10 15:40:58.120466", "name": "Huseyin"}'

Lastly you can write custom encoder for your objects. Here is two sample encoder implementation:

from collections import namedtuple
from datetime import datetime
from decimal import Decimal
import json

User = namedtuple('User',['name','last_name'])


obj = {'user':User('Huseyin','Yilmaz'),'amount':Decimal('120.50'),'date':datetime.now()}


class MyEncoder1(json.JSONEncoder):
    def default(self, obj):
        """
        default method is used if there is an unexpected object type
        in our example obj argument will be Decimal('120.50') and datetime
        in this encoder we are converting all Decimal to float and datetime to str
        """
        if isinstance(obj, datetime):
            obj = str(obj)
        elif isinstance(obj, Decimal):
            obj = float(obj)
        else:
            obj = super(MyEncoder1, self).default(obj)
        print obj
        return obj


print json.dumps(obj, cls=MyEncoder1)
# {"date": "2012-04-10 16:34:01.985232", "amount": 120.5, "user": ["Huseyin", "Yilmaz"]}
# MyEncoder1 converts datetime and decimal object correctly but it converts User named tuple to
# json list. we might want to convert it to. json object instead.

class MyEncoder2(json.JSONEncoder):
    def encode(self, obj):
        """
        encode method gets an original object
        and returns result string. obj argument will be the
        object that is passed to json.dumps function
        """
        obj['amount'] = float(obj['amount'])
        obj['date'] = str(obj['date'])
        obj['user'] = obj['user']._asdict()

        return super(MyEncoder2, self).encode(obj)

print json.dumps(obj, cls=MyEncoder2)
# {"date": "2012-04-10 16:48:06.847596", "amount": 120.5, "user": {"name": "Huseyin", "last_name": "Yilmaz"}}

blog comments powered by Disqus