Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster BatList.cartesian_product #997

Open
fccm opened this issue Dec 20, 2020 · 5 comments
Open

Faster BatList.cartesian_product #997

fccm opened this issue Dec 20, 2020 · 5 comments

Comments

@fccm
Copy link
Member

fccm commented Dec 20, 2020

It seems that we could make BatList.cartesian_product a little bit faster by using 2 fold_left instead of 2 List.map so that we avoid the cost of calling List.concat at the end:

let cartesian_product l1 l2 =
  List.concat (List.map (fun a -> List.map (fun b -> (a, b)) l2) l1)

let cartesian_product_fold l1 l2 =
  let l1 = List.rev l1 in
  let l2 = List.rev l2 in
  List.fold_left
    (fun acc a ->
       List.fold_left
         (fun acc b -> (a, b)::acc)
         acc l2)
    [] l1

let cartesian_product_fold_rev l1 l2 =
  List.fold_left
    (fun acc a ->
       List.fold_left
         (fun acc b -> (a, b)::acc)
         acc l2)
    [] l1

let () =
  let l1 = List.init 2_000 (fun i -> i) in
  let l2 = List.init 4_000 (fun i -> i) in
  let p =
    match Sys.argv.(1) with
    | "concat" -> cartesian_product l1 l2
    | "fold" -> cartesian_product_fold l1 l2
    | "fold_rev" -> cartesian_product_fold_rev l1 l2
    | _ -> raise Exit
  in
  Printf.printf "len: %d\n%!" (List.length p);
;;

The only thing is that we need to reverse the 2 input lists if we want to preserve the same output order than the current implementation. The version that doesn't reverse the input lists is just a little bit faster, which could be interesting for people that don't mind about the order of the output.

I only "benched" with time:

$ ocamlopt cp.ml -o cp.opt

$ time ./cp.opt concat
len: 8000000

real    0m2,745s
user    0m2,494s
sys     0m0,185s

$ time ./cp.opt fold
len: 8000000

real    0m2,459s
user    0m2,324s
sys     0m0,134s

$ time ./cp.opt fold_rev
len: 8000000

real    0m2,419s
user    0m2,238s
sys     0m0,122s
@UnixJunkie
Copy link
Member

Maybe send a PR, I'd like to see the two versions side by side.

@fccm
Copy link
Member Author

fccm commented Dec 25, 2020

OK, I'll do a PR soon.

I was also thinking providing cartesian_product and cartesian_product_rev.

I would also be happy with iterators:

  • cartesian_product_iter
  • and cartesian_product_fold
    (like in containers)

Do you agree?

fccm added a commit to fccm/batteries-included that referenced this issue Dec 29, 2020
this implementation avoids the call to List.concat

as explained in issue ocaml-batteries-team#997
@UnixJunkie
Copy link
Member

Can't you use Seq over a cartesian product to get an iterator?

@fccm
Copy link
Member Author

fccm commented Dec 30, 2020

This is to avoid constructing a list if we only need to iter or fold over it.

If you want to use Seq, you can also construct the list with BatList.cartesian_product and go through it with List.iter and List.fold_left. But what if the list is really big?

You can see in the test above that with 2 lists of lengths 2_000 and 4_000, the result is a list of length 8_000_000.

@UnixJunkie
Copy link
Member

please, send a PR for your improved cartesian_product.
I don't like so much discussions in a bug tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants