What Data Structure is Best to Balance Ordering and Lookup Speed

16 hours ago 3
ARTICLE AD BOX

I was asked to write a simple store manager that keeps track of current sale volume of different items and is able to print out the mostSold number of items - essentially implementing these two methods:

public void addSale(String itemName, int quantity) { // Either add a new entry or increment current quantity. } public void print(int mostSold) { // Print the mostSold sold items and their quantities. }

Ignoring synchronisation, I came up with the following naive implementation, mapping name to total quantity, and ordering the entries at the time of calling print.

private final Map<String, Integer> m_sales = new HashMap<>(); public void addSale(final String itemName, final int quantity) { final int currentQuantity = m_sales.getOrDefault(itemName, 0); m_sales.put(itemName, currentQuantity + quantity); } public void print(final int mostSold) { m_sales.entrySet().stream() .sorted(Comparator.comparingInt(Map.Entry::getValue)) .limit(mostSold) .forEach(entry -> { IO.println(entry.getKey() + ": " + entry.getValue()); }); }

The time complexity of addSale is O(1) and print is O(n log(n)) in average and worst time.

As a next step I was told the two methods would be called equally as frequent and, hence, whether I could improve on print's time.

My immediate thought was to see if I could prevent having to re-sort the map entries again and again and do some housekeeping at the time of insertion to maintain some sort of order within the data structure itself. Java has a TreeMap, but this can only be ordered by the key. I therefore decided to introduce a custom object and use a TreeSet to maintain the order based on the quantity.

private final Set<Item> m_sales = new TreeSet<>(Comparator.comparingInt(item -> item.m_quantity)); public void addSale(final String itemName, final int quantity) { Item item = null; for (final Item currentItem : m_sales) { if (currentItem.m_name.equals(itemName)) { item = currentItem; break; } } if (item == null) { item = new Item(itemName); } item.m_quantity += quantity; m_sales.add(item); } public void print(final int mostSold) { m_sales.stream() .limit(mostSold) .forEach(item -> { IO.println(item.m_name + ": " + item.m_quantity); }); } private static class Item { private final String m_name; private int m_quantity = 0; private Item(final String name) { m_name = name; } @Override public boolean equals(Object o) { if (o == null || getClass() != o.getClass()){ return false; } Item item = (Item) o; return Objects.equals(m_name, item.m_name); } @Override public int hashCode() { return Objects.hashCode(m_name); } }

print is now O(n) but addSale has gone from O(1) to O(n) since we no longer can rely on Map's constant time operations and in the worst case need to iterate through the full set.

I therefore combined the two to use the map as a sort of lookup index.

private final Set<Item> m_sales = new TreeSet<>(Comparator.comparingInt(item -> item.m_quantity)); private final Map<String, Item> m_index = new HashMap<>(); public void addSale(final String itemName, final int quantity) { Item item = m_index.get(itemName); if (item == null) { item = new Item(itemName, quantity); m_index.put(itemName, item); } else { item.m_quantity += quantity; } m_sales.add(item); } public void print(final int mostSold) { m_sales.stream() .limit(mostSold) .forEach(item -> { IO.println(item.m_name + ": " + item.m_quantity); }); } private static class Item { private final String m_name; private int m_quantity; private Item(final String name, final int quantity) { m_name = name; m_quantity = quantity; } @Override public boolean equals(Object o) { if (o == null || getClass() != o.getClass()){ return false; } Item item = (Item) o; return Objects.equals(m_name, item.m_name); } @Override public int hashCode() { return Objects.hashCode(m_name); } }

Since we have an additional map, we don't need to iterate through the TreeSet to see if we have an existing Item to update rather than enter a new. My thinking is this makes addSale O(log(n)) rather than O(n). However, this does open yourself up to inconsistencies if the management between the map and set isn't carefully managed. I also haven't considered how synchronisation fits in here and I appreciate this could make this implementation less than ideal.

Is there a better way of doing this? Is there a data structure that could better combine the need for ordering and lookup that I've missed? I feel like I'm maybe overcomplicating this. Thanks!

Read Entire Article