.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/nani101.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_nani101.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_nani101.py:


Learn NANI in 60 seconds!
===============================================

How to use NANI in 60 seconds? Say no more! 

The main idea is to use the NANI to optimize initial centroids so *k*-means is 100% 
deterministic, converges faster, and finds better solutions. Here is a simple
example to get started.

The pwd of this script is ``$PATH/MDANCE/examples``.

.. GENERATED FROM PYTHON SOURCE LINES 15-16

Let's start with importing necessary libraries.

.. GENERATED FROM PYTHON SOURCE LINES 16-23

.. code-block:: Python


    from matplotlib import pyplot as plt
    from sklearn.cluster import KMeans
    from sklearn.datasets import make_blobs

    from mdance.cluster.nani import KmeansNANI


.. GENERATED FROM PYTHON SOURCE LINES 24-27

Data
  - Load the data from a file, must be array of shape (n_samples, n_features).
  - In this example, we will generate synthetic data using ``make_blobs`` from Fig. 2 of the `NANI paper <https://pubs.acs.org/doi/10.1021/acs.jctc.4c00308>`_.

.. GENERATED FROM PYTHON SOURCE LINES 27-31

.. code-block:: Python


    n_clusters = 7
    data, true_labels = make_blobs(n_samples=1000, centers=n_clusters, n_features=2, random_state=0)


.. GENERATED FROM PYTHON SOURCE LINES 32-34

First, let's checkout how state-of-the-art *k*-means performs on the data.
which uses *k*-means++ initialization.

.. GENERATED FROM PYTHON SOURCE LINES 34-39

.. code-block:: Python


    og_kmeans = KMeans(n_clusters=n_clusters, init='k-means++', n_init=1, random_state=42)
    og_kmeans.fit(data)
    og_kmeans_labels = og_kmeans.labels_


.. GENERATED FROM PYTHON SOURCE LINES 40-41

Visualize the clustered results to true labels.

.. GENERATED FROM PYTHON SOURCE LINES 41-49

.. code-block:: Python


    fig1, ax1 = plt.subplots(1, 2, figsize=(12, 8), sharex=True, sharey=True)
    ax1[0].scatter(data[:, 0], data[:, 1], c=og_kmeans_labels, cmap='tab10', s=20)
    ax1[0].set_title('k-means++ Labels', fontsize=16, fontweight='bold')
    ax1[1].scatter(data[:, 0], data[:, 1], c=true_labels, cmap='tab10', s=20)
    ax1[1].set_title('True Labels', fontsize=16, fontweight='bold')
    plt.show()


.. image-sg:: /examples/images/sphx_glr_nani101_001.png
   :alt: k-means++ Labels, True Labels
   :srcset: /examples/images/sphx_glr_nani101_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 50-58

NANI
-------------

As you can see, *k*-means++ initialization did not get it right. 
Let's use NANI to optimize initial centroids.
  - Create an instance of KmeansNANI.
  - ``data``: data to cluster.
  - ``n_clusters``: number of clusters.

.. GENERATED FROM PYTHON SOURCE LINES 58-64

.. code-block:: Python


    mod = KmeansNANI(data=data, n_clusters=n_clusters, metric='MSD', N_atoms=1, 
                     init_type='strat_all', percentage=10)
    initiators = mod.initiate_kmeans()
    initiators = initiators[:n_clusters]


.. GENERATED FROM PYTHON SOURCE LINES 65-71

*k*-means with NANI
  - Create an instance of KMeans.
  - ``n_clusters``: number of clusters.
  - ``init``: initial centroids.
 - ``n_init``: NANI only needs one initialization!
 - ``random_state``: We don't need this because NANI is 100% deterministic!

.. GENERATED FROM PYTHON SOURCE LINES 71-75

.. code-block:: Python


    kmeans = KMeans(n_clusters=n_clusters, init=initiators, n_init=1, random_state=None)
    kmeans.fit(data)
    kmeans_labels = kmeans.labels_


.. GENERATED FROM PYTHON SOURCE LINES 76-78

Plot
  - Visualize the clustered results to true labels.

.. GENERATED FROM PYTHON SOURCE LINES 78-86

.. code-block:: Python


    fig, ax2 = plt.subplots(1, 2, figsize=(12, 8), sharex=True, sharey=True)
    ax2[0].scatter(data[:, 0], data[:, 1], c=kmeans_labels, cmap='tab10', s=20)
    ax2[0].set_title('NANI Labels', fontsize=16, fontweight='bold')
    ax2[1].scatter(data[:, 0], data[:, 1], c=true_labels, cmap='tab10', s=20)
    ax2[1].set_title('True Labels', fontsize=16, fontweight='bold')
    plt.show()


.. image-sg:: /examples/images/sphx_glr_nani101_002.png
   :alt: NANI Labels, True Labels
   :srcset: /examples/images/sphx_glr_nani101_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 87-93

As you can see, NANI clustered the data perfectly!

That's it! You have successfully used NANI to optimize initial centroids for *k*-means clustering.
  - ``kmeans_labels``: cluster labels assigned by *k*-means using NANI.

For more advance usage, please look at the `NANI Tutorial <../tutorials/nani.html>`_.
Why? Because NANI can also predict number of clusters, work with Molecular Dynamics data, and more!


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 9.479 seconds)


.. _sphx_glr_download_examples_nani101.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: nani101.ipynb <nani101.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: nani101.py <nani101.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_