Skip to content

Commit

Permalink
Use hash table for single map complex type subscript (facebookincubat…
Browse files Browse the repository at this point in the history
…or#8798)

Summary:
Use hash table for single map complex type subscript

Deltoid NGA is limited by constantly sorting a map of 12K keys where
the key is an array. The trivial fix is to use a hash table instead of
sorting. Sorting 12K keys is ~ 12K * 14 comparisons for the sort and
then 1K * 14 comparisons for the binary search. Using a hash there are
a few compares for collisions in build and typically one in probing.

To do: Make the function stateful and avoid remaking the hash table.

Pull Request resolved: facebookincubator#8798

Reviewed By: kevinwilfong

Differential Revision: D53903153

Pulled By: oerling

fbshipit-source-id: 2e9c6b8341f01dbc82a570d334d10728fb3dbb36
  • Loading branch information
Orri Erling authored and facebook-github-bot committed Feb 27, 2024
1 parent 7e0a5a2 commit 430282a
Showing 1 changed file with 25 additions and 23 deletions.
48 changes: 25 additions & 23 deletions velox/functions/lib/SubscriptUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,27 @@ struct MapKey {
const vector_size_t baseIndex;
const vector_size_t index;

size_t hash() const {
return baseVector->hashValueAt(baseIndex);
}

bool operator==(const MapKey& other) const {
return baseVector->equalValueAt(
other.baseVector, baseIndex, other.baseIndex);
}

bool operator<(const MapKey& other) const {
return baseVector->compare(other.baseVector, baseIndex, other.baseIndex) <
0;
}
};

struct MapKeyHasher {
size_t operator()(const MapKey& key) const {
return key.hash();
}
};

VectorPtr applyMapComplexType(
const SelectivityVector& rows,
const VectorPtr& mapArg,
Expand Down Expand Up @@ -219,37 +234,24 @@ VectorPtr applyMapComplexType(
// Fast path for the case of a single map. It may be constant or dictionary
// encoded. Sort map keys, then use binary search.
if (baseMap->size() == 1) {
auto sortedKeyIndices = baseMap->sortedKeyIndices(0);

std::vector<MapKey> sortedKeys;
sortedKeys.reserve(sortedKeyIndices.size());
for (const auto& index : sortedKeyIndices) {
sortedKeys.emplace_back(
MapKey{mapKeysBase, mapKeysIndices[index], index});
folly::F14FastSet<MapKey, MapKeyHasher> set;
auto numKeys = rawSizes[0];
set.reserve(numKeys * 1.3);
for (auto i = 0; i < numKeys; ++i) {
set.insert(MapKey{mapKeysBase, mapKeysIndices[i], i});
}

rows.applyToSelected([&](vector_size_t row) {
VELOX_CHECK_EQ(0, mapIndices[row]);

bool found = false;
auto searchIndex = searchIndices[row];

auto it = std::lower_bound(
sortedKeys.begin(),
sortedKeys.end(),
MapKey{searchBase, searchIndex, row});

if (it != sortedKeys.end()) {
if (mapKeysBase->equalValueAt(searchBase, it->baseIndex, searchIndex)) {
rawIndices[row] = it->index;
found = true;
}
}

if (!found) {
auto it = set.find(MapKey{searchBase, searchIndex, row});
if (it != set.end()) {
rawIndices[row] = it->index;
} else {
nullsBuilder.setNull(row);
}
});

} else {
// Search the key in each row.
rows.applyToSelected([&](vector_size_t row) {
Expand Down

0 comments on commit 430282a

Please sign in to comment.