SemiBin2

Summary: Use the SemiBin2 command. As of version 2.2, only SemiBin2 is installed.

History

Starting with version 1.5 (officially SemiBin2 beta, released March 2023), installing the SemiBin package installed two scripts: SemiBin and SemiBin2. They had the same functionality, but slightly different interfaces. As of version 2.0 (released October 2023), the older SemiBin command was not recommended (except for backwards compatibility) and newer projects should use SemiBin2.

In version 2.1 (released March 2024), we deprecated the SemiBin command and introduced a more explicit SemiBin1 command for backwards compatibility.

In version 2.2 (released March 2025), only SemiBin2 is installed. The SemiBin and SemiBin1 commands are no longer available.

Upgrading to SemiBin2

  1. If you are using the easy_* workflows, then they will probably continue to work exactly the same (except that you will get better results faster).
  2. Outputs are now always in a directory called output_bins (unless you explicitly ask for the pre-reclustered bins to be written out with the --write-pre-reclustering-bins option).
  3. By default, bins are in file named as SemiBin_{label}.fa.gz (and compressed with gzip as the name indicates; you can change the compression with the --compression flag, including setting compression=none if you prefer no compression).

Points 2 and 3 may require some minor modifications to wrapper scripts.

Longer list of differences between SemiBin2 and SemiBin1

The biggest different is that the default training mode is self-supervised mode.

  • Output bins are now in a directory called output_bins (in SemiBin1, it actually depended on which parameters were used).
  • Output filenames are now anvi'o compatible (effectively, the default value of --tag-output is SemiBin), see discussion at #123.
  • --compression defaults to gz (instead of none)
  • ORF finder defaults to the fast-naive internal ORF finder
  • --write-pre-reclustering-bins is False by default
  • To train in semi-supervised mode, you must use the train_semi subcommand (and there is no train subcommand, you must be specific: train_semi or train_self).

A few arguments that were deprecated before are removed: - --recluster: it did nothing already as reclustering is default - --mode: use --train-from-many instead - --training-type: use --semi-supervised to use semi-supervised learning instead