{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4ff891f2-5d09-460a-a581-4af09c7fa9e2",
   "metadata": {},
   "source": [
    "# **Spatially Resolved Transcriptomics Datasets Measured Utilizing Different Platforms.**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e55a956b-60f6-49c0-9e61-f1d6dda06e91",
   "metadata": {},
   "source": [
    "In this tutorial, we demonstrate how to apply `BatchEval` to assess integrated data that is measured bt different platforms. As an example, we used three mouse olfactory bulb datas. One tissue section was profiled by `10x Genomics Visium`, while the others were measured by `Stereo-seq`.           \n",
    "\n",
    "github: [https://github.com/STOmics/BatchEval.git](https://github.com/STOmics/BatchEval.git)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "590bde5b-e15c-4ae7-a3f1-488c222a86ea",
   "metadata": {},
   "source": [
    "### Import packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0b152f00-841d-4c2d-b50a-492b9a59cadd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "01c7f71f-d3f2-4bab-835f-d7f31722eaa9",
   "metadata": {},
   "outputs": [],
   "source": [
    "sys.path.append(os.getcwd())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c5d4f742-f694-4e68-a30d-6acff10a8642",
   "metadata": {},
   "outputs": [],
   "source": [
    "import scanpy as sc\n",
    "from anndata import AnnData\n",
    "from warnings import filterwarnings\n",
    "from BatchEval import batch_eval\n",
    "filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "960c6bf3-7b0f-4a84-b306-23e61824b4cb",
   "metadata": {},
   "source": [
    "### Load datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f3bd6576-23f5-4f12-a435-67fb479db83c",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = [sc.read_h5ad(os.path.join(\"./demo_data\", t)) for t in sorted(os.listdir(\"./demo_data\")) if t.endswith(\"h5ad\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbaf9b6e-2600-4dda-9cd8-f8a7bdb936e4",
   "metadata": {},
   "source": [
    "### Optional: External data integration methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "521b2b4d-d088-44c6-91ef-93c7b0865088",
   "metadata": {},
   "outputs": [],
   "source": [
    "import scanorama"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3df80350-b6fe-42f4-b17e-b941a53727f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "merge_data = AnnData.concatenate(*data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f0cbb698-b140-4772-be40-a11f8e881eca",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found 22891 genes among all datasets\n",
      "[[0.         0.50985222 0.        ]\n",
      " [0.         0.         0.02111486]\n",
      " [0.         0.         0.        ]]\n",
      "Processing datasets (0, 1)\n"
     ]
    }
   ],
   "source": [
    "scanorama_correct = scanorama.correct_scanpy(\n",
    "    [merge_data[merge_data.obs[\"batch\"] == c] for c in merge_data.obs[\"batch\"].cat.categories], \n",
    "    return_dimred=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ea8e34f5-3317-455e-9abe-5b9413847424",
   "metadata": {},
   "outputs": [],
   "source": [
    "scanorama_correct = AnnData.concatenate(*scanorama_correct)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbc36a01-ec74-4c8e-8745-0690ea5c7b72",
   "metadata": {},
   "source": [
    "### Running `BatchEval` Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16d846b5-1371-4251-9df7-ec5dd76976d4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2023-09-18 16:00:18 BatchEval Pipeline Starting\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-09-18 16:00:32.389107: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2023-09-18 16:00:37.907511: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2023-09-18 16:05:57 The 'Raw Report.html' has been saved to ./output/mouse_ob\n"
     ]
    }
   ],
   "source": [
    "batch_eval(*data,\n",
    "           norm_log=True,\n",
    "           is_scale=False,\n",
    "           n_pcs=50,\n",
    "           n_neighbors=15,\n",
    "           batch_key=\"batch\",\n",
    "           position_key=\"X_umap\",\n",
    "           condition=None,\n",
    "           count_key=\"total_counts\",\n",
    "           celltype_key=\"celltype\",\n",
    "           external_list=[{\n",
    "               \"correct_data\": scanorama_correct,\n",
    "               \"re_pca\": False,\n",
    "               \"re_neigh\": True,\n",
    "               \"use_rep\": \"X_scanorama\",\n",
    "               \"re_umap\": True,\n",
    "               \"batch_key\": \"batch\"},],\n",
    "           report_path=\"./output/mouse_ob\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d21817d8-58ce-402d-9019-a1a69c9ac4c3",
   "metadata": {},
   "source": [
    "### Ouput          \n",
    "A series of HTML pages were output for the data integration evaluations, which we are able to find in the`report_path`.    \n",
    "                            \n",
    "- `BatchEval Report.html`: the summaries report of `BatchEval Pipeline`.           \n",
    "- `Raw Report.html`: the raw data integration report.\n",
    "- `Harmony Report.html`: the data integration report of using `BatchEval Pipeline` built-in `Harmony` mehtod, which is single cell base methods.\n",
    "- `BBKNN Report.html`: the data integration report of using `BatchEval Pipeline` built-in `BBKNN` mehtod, which is single cell base methods.                 \n",
    "- `spatiAlign Report.html`: the data integration report of using `BatchEval Pipeline` built-in `spatiAlign` mehtod, which is spatially resolved transcriptomics method.\n",
    "- `External Report.html`: if the `externnal_list` is not `None`, and will generate external report."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed7161bc-5b47-4ec4-aa62-00fc09246d12",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5a0511f-2919-4d71-8621-3f96b0c6296d",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}