codecogs equations

torsdag 28 februari 2019

Zen of Python: Sparse is better than dense

In the sampling mini project, and also in the 2D interval selection problem, I had to implement a few operations on intervals one dimension and higher. Let's call intervals in an arbitrary number of dimensions boxes.
  • contains: given a box and a point, checks if the point is contained in the box. 
  • intersection: given two boxes, returns the overlap between them (if there is any). 
  • profile: given a box and a point specified in a subset S of the box's dimensions D, return the box's expansion in D\S (the remaining dimensions), if the point is contained in the box's projection onto S. We can see this as intersect between the box and the point expanded infinitely in D\S. 
  • volume: given a box, return the product of its size in all its dimensions.
  • corners: given a box, return all its corners as tuples. 
A simple google search did not suffice to find an impressive python package that supports these relatively abstract operations. I decided to clean up my code to make it a bit more general. It struck me that a good data structure for boxes might be a dict. The values are 2-tuples (or empty tuples for empty intervals) and the keys, representing the dimensions, can be any hashable. Even if I until now have worked with things that are naturally contained in euclidean space, I can imagine getting use for a sparse representation of intervals whenever dealing with a product space of ordered sets. Some bonus features:
  • Handle empty intervals
  • Projection onto arbitrary dimension is trivial. Just a key access, like so: box[dim].
Actually, this concept can be even further generalized. The only operations above that rely on the dimension being ordered are volume and corners. The others we can get as long as we have defined intersection on each of the dimensions in use. The general concept is a product space, stored only as a collection of expansions in each dimension. 

Inga kommentarer:

Skicka en kommentar