Skip to content

Commit

Permalink
Merge pull request #82 from KIwabuchi/feature/dnnd_new_api
Browse files Browse the repository at this point in the history
(DNND) Enhanced String Examples
  • Loading branch information
KIwabuchi authored Oct 10, 2024
2 parents f6ab89a + d557b6a commit d224584
Show file tree
Hide file tree
Showing 17 changed files with 595 additions and 150 deletions.
57 changes: 3 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,16 +165,14 @@ While the second example should be faster, the first is easier to use and more e

```shell
cd build

mpirun -n 2 ./examples/dnnd_example
mpirun -n 2 ./examples/dnnd_simple_example
```


## Running DNND PM (persistent memory) Examples
## Running DNND Advanced API Examples

### Build

The DNND PM examples require [Metall](https://github.com/LLNL/metall) and [Boost C++ Libraries](https://www.boost.org/) in addition to saltatlas's basic components.
The DNND advanced-API examples require [Metall](https://github.com/LLNL/metall) and [Boost C++ Libraries](https://www.boost.org/) in addition to saltatlas's basic components.
Add `-DSALTATLAS_USE_METALL=ON` when running CMake.
Those libraries are automatically downloaded and set up properly.

Expand All @@ -189,55 +187,6 @@ mkdir build && cd build
cmake ../ -DSALTATLAS_USE_METALL=ON -DSALTATLAS_USE_HDF5=OFF -DCMAKE_BUILD_TYPE=RELEASE
```

### Executables

There are three examples executables for k-NN index construction,
k-NN index optimization, and query, respectively.

Use `-h` option to show the help menus.

#### dnnd_pm_const_example

This program constructs a k-NN index.

```shell
mpirun -n [#of procs] ./examples/dnnd_pm_const_example (options) point_file_0 point_file_1...
```
#### dnnd_pm_optimize_example
This program optimizes an already constructed k-NN index.
```shell
mpirun -n [#of procs] ./examples/dnnd_pm_optimize_example (options)
```
#### dnnd_pm_query_example
This program performs queries against an already constructed index.
```shell
mpirun -n [#of procs] ./examples/dnnd_pm_query_example (options)
```
### Running Example
```shell
cd build

# Construct a k-NN index and store
mpirun -n 2 ./examples/dnnd_pm_const_example -z ./pindex -k 2 -f l2 -p wsv ../examples/datasets/point_5-4.dat

# Optimize the k-NN index created above
mpirun -n 2 ./examples/dnnd_pm_optimize_example -z ./pindex -u -m 1.5

# Open the k-NN index created above, query nearest neighbors, and show the accuracy.
mpirun -n 2 ./examples/dnnd_pm_query_example -z ./pindex \
-n 4 -q ../examples/datasets/query_5-4.txt -g ../examples/datasets/ground-truth_5-4.txt
```
# License
saltatlas is distributed under the MIT license.

Expand Down
3 changes: 2 additions & 1 deletion examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ endfunction()
add_saltatlas_example(dnnd_simple)
add_saltatlas_example(dnnd_simple_custom_distance)
add_saltatlas_example(dnnd_simple_custom_point)
add_saltatlas_example(dnnd_levenshtein)
add_saltatlas_example(dnnd_simple_levenshtein)
add_saltatlas_example(dnnd_simple_charhist)

add_saltatlas_dnnd_example_feature_type(dnnd_bench float)
add_saltatlas_dnnd_example_feature_type(dnnd_bench uint8_t)
Expand Down
5 changes: 4 additions & 1 deletion examples/datasets/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
configure_file(point_5-4.txt point_5-4.txt COPYONLY)
configure_file(query_5-4.txt query_5-4.txt COPYONLY)
configure_file(ground-truth_5-4.txt ground-truth_5-4.txt COPYONLY)
configure_file(all-distance-pairs_5-4.txt all-distance-pairs_5-4.txt COPYONLY)
configure_file(all-distance-pairs_5-4.txt all-distance-pairs_5-4.txt COPYONLY)
configure_file(point_string.txt point_string.txt COPYONLY)
configure_file(query_string.txt query_string.txt COPYONLY)
configure_file(ground-truth_string.txt ground-truth_string.txt COPYONLY)
28 changes: 22 additions & 6 deletions examples/datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,36 @@ each containing **4** points.
The dataset consists of the following files:

- [point_5-4.txt](./point_5-4.txt):
- This file contains 20 feature vectors (5 dimensions).
- The file is in the white-space separated value (WSV) format.
- The distance metric is L2 (Euclidean) distance.
- Contains 20 feature vectors (5 dimensions).
- White-space separated value (WSV) format.
- Distance metric is L2 (Euclidean) distance.

- [query_5-4.txt](./query_5-4.txt):
- This file consists of 5 search queries.
- Consists of 5 search queries.
- Each query point is close to one of the clusters in the point_5-4.txt file.
- There is a one-to-one correspondence between the query points and the clusters.

- [ground-truth_5-4.txt](./ground-truth_5-4.txt):
- This file contains ground truth data of the 5 queries.
- Contains ground truth data of the 5 queries.
- The first half of the file lists the ground truth nearest neighbor IDs.
- The second half of the file lists the ground truth distances.
- For example, the first line is for the ground truth nearest neighbor IDs of the first query point. The sixth line contains the ground truth distances of the first query point.

- [all-distance-pairs_5-4.txt](./all-distance-pairs_5-4.txt):
- This file contains all possible distance pairs between the input points in the dataset.
- This file contains all possible distance pairs between the input points in the dataset.

## String Dataset

There is also a string dataset.

- [point_string.txt](./point_string.txt):
- Contains 9 strings with different lengths.
- Distance function is the Levenshtein.

- [query_string.txt](./query_string.txt):
- Contains 5 queries.

- [ground-truth_string.txt](./ground-truth_string.txt):
- Contains the ground truth data of the 5 queries.
- The same format as the ground-truth_5-4.txt file.
- For each query, all data point IDs and distances to them from the query point are listed, sorted by the distance.
10 changes: 10 additions & 0 deletions examples/datasets/ground-truth_string.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
4 5 1 2 3 6 7 8 0
0 1 2 4 8 3 5 6 7
6 7 8 1 0 2 4 5 3
3 5 4 2 0 1 8 6 7
3 5 2 4 8 0 1 6 7
2 2 3 3 3 3 3 3 4
1 1 1 3 3 4 4 4 4
2 2 3 4 5 5 5 5 6
1 1 2 3 4 4 4 5 5
2 2 3 3 3 4 4 4 5
2 changes: 1 addition & 1 deletion examples/datasets/point_string.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ call
sell
start
stars
tart
falg
2 changes: 1 addition & 1 deletion examples/datasets/query_string.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ sail
lag
starry
bell
belt
belt
7 changes: 3 additions & 4 deletions examples/dnnd_advanced.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@
//
// SPDX-License-Identifier: MIT

/// \brief A simple example of using DNND's simple with a custom distance
/// function. It is recommended to see the examples/dnnd_simple_example.cpp
/// beforehand. Usage:
/// \brief A simple example of using DNND's advanced API
/// Usage:
/// cd build
/// mpirun -n 2 ./example/dnnd_simple_custom_distance_example
/// mpirun -n 2 ./example/dnnd_advanced

#include <iostream>
#include <vector>
Expand Down
23 changes: 13 additions & 10 deletions examples/dnnd_bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ int main(int argc, char **argv) {
}

if (!opt.query_result_file_path.empty()) {
comm.cout0() << "\nDumping query results to " << opt.query_result_file_path
<< std::endl;
comm.cout0() << "\nDumping query results to "
<< opt.query_result_file_path << std::endl;
saltatlas::utility::gather_and_dump_neighbors(
query_results, opt.query_result_file_path, comm);
}
Expand Down Expand Up @@ -253,19 +253,22 @@ void usage(std::string_view exe_name, cout_type &cout) {
<< std::endl;
cout << "Options:" << std::endl;
cout << " -k <int> kNNG k parameter (required)" << std::endl;
cout << " -f <string> Distance name (required)" << std::endl;
cout << " -p <string> Point file format (required)" << std::endl;
cout << " -r <float> NN-Descent r parameter (default: 0.8)" << std::endl;
cout << " -d <float> NN-Descent delta parameter (default: 0.001)"
cout << " -f <string> Distance name (required). l1, l2, sql2, cosine, "
"altcosine, jaccard, altjaccard, and levenshtein are supported."
<< std::endl;
cout << " -p <string> Point file format (required). wsv, wsv-id, csv, "
"csv-id, str, and str-id are supported"
<< std::endl;
cout << " -r <float> NN-Descent r parameter (default: 0.8)" << std::endl;
cout << " -d <float> NN-Descent delta parameter (default: 0.001)"
<< std::endl;
cout << " -u Make index undirected (default: false)" << std::endl;
cout << " -m <float> High degree pruning parameter, must be >= 0 "
"(default: 0.0, no "
"prunning)"
cout << " -m <float> High degree pruning parameter, must be >= 0 "
"(default: 0.0, no prunning)"
<< std::endl;
cout << " -q <string> Query file path" << std::endl;
cout << " -n <int> #of nearest neighbors to search" << std::endl;
cout << " -e <float> Query epsilon parameter (default: 0.1)" << std::endl;
cout << " -e <float> Query epsilon parameter (default: 0.1)" << std::endl;
cout << " -g <string> Ground truth file path" << std::endl;
cout << " -o <string> Query result file path" << std::endl;
cout << " -b <int> Batch size (default: 1^25)" << std::endl;
Expand Down
61 changes: 0 additions & 61 deletions examples/dnnd_levenshtein.cpp

This file was deleted.

Loading

0 comments on commit d224584

Please sign in to comment.