|
|
Label: ♦english
♦miscellaneous
fu
created at Sunday, 2010-04-25, 21:43:22
4 Replies, 2111 Hits
Dao supports function overloading, which imposes a significant overhead to function call, because when a function is called, the parameters have to be checked against the parameter types of a list of overloaded functions. Such checking is time consuming. One possible way to solve this is to encode function prototypes into string, and encode the calling parameter types into string as well, and then use such strings for function lookup. But it seems to me than, to do this efficiently and safely, it requires a more strict typing system. Or maybe some languages were doing in the same way as I am going to describe. In the method I am going to describe, it is still necessary to encode the calling parameter types into a signature string, but not the function prototypes. The key idea here is caching -- caching the type checking results for the future function calls. From time to time, I had thought about caching as the right way to go, but it was not obvious how. Now it seems it can actually be done fairly simply. The idea is related to (possibly inspired by) that idea behind the planned feature for binding interface as prototype I posted yesterday. The idea is based on two round hashing (one round should also work, but less efficient):
The first has could be global to a process, and the second could be local to each set of overloaded functions. When a function is called, encode the calling parameter types on the fly to a signature string, then use this signature to lookup if there is cached type checking results. If there is, use the cached results to select the right function to call. Otherwise, perform detailed type checking between the calling types and the overloaded functions, and cache the results in the hash tables.
When I was thinking about using hash to reduce function call overhead, somehow it came into my mind that I can implement a hash type on the top of the existing rb-tree implementation of map. Certainly it will not be the best way to implement hash table, but it will take a minimum amount of effort by using the existing code base. The idea is also very simple: hash the keys to rb-tree nodes, and store some (or all) of the nodes in an extra array. For key lookup, compute the hash value of the query key and find the corresponding tree node in the array, if the node has key equals to the query key, return the value hold by the node (O(1)); otherwise, search the branches starting from that tree node (O(1)~O(log(n))). For this, the key comparison between keys for the rb-tree must first compare their hashed values and then their values. For key insertion, insert it to the rb-tree as usual (sorted first by key hash values then key values). Then check if there is a node in the extra node array has the same key has value, if there is, check which one is higher (closer to the root) in the tree, and set that one in the node array. In the best case it will take O(1) time, when the new node and the existing node in the array are directly linked in the tree. The worst case happens, when the array need to be increased, which will take O(n). Similarly, with extra steps for updating the node array, key deletion can be done in O(1) to O(n) time. Time complexity:
Once this is implemented, an extra syntax construct can be added to Dao, like:
h = { key1 : value1, key2 : value2, ... }
Which will also create an associative array, where the keys are unordered. Whereas, associative array created by,
h = { key1 => value1, key2 => value2, ... }
will have ordered keys. (extra methods will be added to map type to convert from one to the other)There has been requests to use : instead of => in constructing associative array, mainly because that's what Python dictionary type uses. The above solution will not only solve this issue, it will actually bring that data type closer to the Python dictionary:-) Comments
fu commented at Tuesday, 2010-04-27, 07:50:43
And it's updated to svn (r45). The underlying hash function is MurmurHash2 , which seems to work pretty well. Now integer, float, double, complex, long, string, and tuple etc. types can be used as hash key.
The syntax for enumerating hash by,
h = { key1 : value1, key2 : value2, ... }
is also implemented. But this syntax overrides the syntax for constructing list by specifying value range, e.g. {init : step : size} , so I need to find another proper syntax for that.
Pompei2 commented at Thursday, 2010-04-29, 14:07:33
Good idea to cache the results.
Have you already thought about how long to cache results? Will they stay cached until the end of the program or will there be an upper limit for the number of cached calls etc? You didn't write anything about this, that's why I ask.
fu commented at Friday, 2010-04-30, 05:04:16
Such cache will be created per vm process, this will avoid the need to protect by lock. It will stay for the whole life time of a vm process. I think there will be no need to set a limit for the cache, it will just take a little amount of memory.
fu commented at Friday, 2010-04-30, 05:50:02
Now the colliding key-value nodes with the same hash key are organized into separate rb-trees, where the roots of these trees are stored in the hash table. This implementation is much more efficient than the original implementation for insertion and deletion (the lookup speed is already fast).
|
fu: ... I forgot to say something about the plan for the whole new year in my previous reply. Well, besides w ... (Jan.19,01:40) fu: ... Happy new dragon year (which will start from this sunday)! Actually, it was a busy month (I wish th ... (Jan.18,22:46) ybabel: What's the plan for the new year ? Hello 'vry budy :- ) happy new year (when is the new year for you Fu ?) I saw you come back and comm ... (Jan.18,18:59) |