Computing and presenting emergent crowd simulations in real time is a computationally intensive task. This intensity is mostly due to the complexity of the traversal algorithm needed for the interactions of all elements against each other on the basis of a proximity query. By using special data structures such as grids, and due to the parallel nature of graphics hardware, relevant previou work reduced this complexity considerably, making it possible to achieve interactive frame rates. However, existing proposals tend to be heavily bound by the maximum density of such grids, which is usually high, leading to arguably inefficient algorithms. In this article we propose the use of a fine- grained grid and accompanying data manipulation, to lead to scalable algorithmic complexity. We also implement a representative flocking boids case study, from which we ran benchmarks with more than one million simulated and rendered boids at nearly 30fps. We remark that related previous work achieved no more than 15,000 boids with interactive frame rates.