Initial commit — clean history (removed large test files, browser profiles, Medidata/Clario downloads)
This commit is contained in:
@@ -0,0 +1,117 @@
|
||||
# PanoramaContacts — CLAUDE.md
|
||||
|
||||
## Účel adresáře
|
||||
|
||||
Import kontaktů středisek (site contacts) z exportů systému PANORAMA (CTMS) do MySQL a jejich zobrazení ve Streamlit web reportu.
|
||||
Filtruje pouze záznamy pro **Czechia**. Aktuálně pokryté protokoly:
|
||||
|
||||
| Protocol ID | TA |
|
||||
|---|---|
|
||||
| `77242113UCO3001` | Immunology |
|
||||
| `42847922MDD3003` | Neuroscience |
|
||||
|
||||
---
|
||||
|
||||
## Soubory
|
||||
|
||||
| Soubor | Účel |
|
||||
|---|---|
|
||||
| `import_CZ_contacts.py` | Import xlsx → MySQL |
|
||||
| `webreport.py` | Streamlit web report |
|
||||
| `run_webreport.py` | PyCharm launcher (`streamlit run webreport.py`) |
|
||||
| `sql/create_CTMS_contacts.sql` | DDL tabulky `CTMS_contacts` |
|
||||
| `SourceData/*.xlsx` | PANORAMA Dashboard exporty (zdrojová data) |
|
||||
| `filter_state.json` | Automaticky ukládaný stav filtrů (generuje app) |
|
||||
|
||||
---
|
||||
|
||||
## MySQL
|
||||
|
||||
- **Host:** 192.168.1.76:3306 · **DB:** `studie` · **Tabulka:** `CTMS_contacts`
|
||||
- **Sheet v xlsx:** `Site Contacts`, header na řádku 6 (0-based index 5)
|
||||
|
||||
### Klíčové sloupce tabulky
|
||||
|
||||
| Sloupec | Typ | Poznámka |
|
||||
|---|---|---|
|
||||
| `file_date` | DATE | Z `dcterms:created` v docProps/core.xml xlsx |
|
||||
| `imported_at` | DATETIME | Auto timestamp importu |
|
||||
| `protocol_id` | VARCHAR(20) | Identifikátor studie |
|
||||
| `site_id` | VARCHAR(15) | Středisko (např. `DD5-CZ10006`) |
|
||||
| `contact_role` | VARCHAR(50) | Role kontaktu (PI, Study Coordinator, …) |
|
||||
| `contact_start_date` | DATE | Začátek platnosti kontaktu |
|
||||
| `contact_end_date` | DATE | Konec platnosti — NULL = stále aktivní |
|
||||
| `email` | VARCHAR(100) | Hlavní e-mail |
|
||||
|
||||
---
|
||||
|
||||
## import_CZ_contacts.py
|
||||
|
||||
- Zpracuje všechny `*.xlsx` v `SourceData/`
|
||||
- Přeskočí soubory, jejichž `file_date` ≠ dnešní datum (UTC)
|
||||
- Přepis: DELETE + INSERT podle `(file_date, protocol_id, country_name)`
|
||||
- `clean_value()` převede NaN / NaT / Timestamp na typy přijatelné MySQL driverem
|
||||
|
||||
---
|
||||
|
||||
## webreport.py — Streamlit app
|
||||
|
||||
### Filtry (sidebar)
|
||||
|
||||
| Filtr | Widget | Logika options |
|
||||
|---|---|---|
|
||||
| **Střediska** | radio | Aktivní / Neaktivní / Všechna |
|
||||
| **Protokol** | selectbox | Z celé DB |
|
||||
| **Role** | multiselect | Filtrováno dle protokolu + aktivní/neaktivní |
|
||||
| **Site** | multiselect | Filtrováno dle protokolu + aktivní/neaktivní |
|
||||
| **Hledání** | text_input | Fulltext přes všechny sloupce řádku |
|
||||
|
||||
### Logika filtru Střediska
|
||||
|
||||
| Hodnota | Site podmínka | End Date podmínka |
|
||||
|---|---|---|
|
||||
| **Aktivní** | `site_id` v `ACTIVE_SITES` | `contact_end_date IS NULL` |
|
||||
| **Neaktivní** | `site_id` NOT v `ACTIVE_SITES` | bez omezení |
|
||||
| **Všechna** | bez omezení | bez omezení |
|
||||
|
||||
### Aktivní střediska (ACTIVE_SITES)
|
||||
|
||||
```python
|
||||
"77242113UCO3001": {
|
||||
"DD5-CZ10001", "DD5-CZ10003", "DD5-CZ10006", "DD5-CZ10009",
|
||||
"DD5-CZ10010", "DD5-CZ10012", "DD5-CZ10013", "DD5-CZ10015",
|
||||
"DD5-CZ10016", "DD5-CZ10020", "DD5-CZ10021", "DD5-CZ10022",
|
||||
}
|
||||
"42847922MDD3003": {
|
||||
"S10-CZ10004", "S10-CZ10008", "S10-CZ10011", "S10-CZ10012",
|
||||
}
|
||||
```
|
||||
|
||||
### Perzistence filtrů
|
||||
|
||||
- Stav se ukládá do `filter_state.json` při každé změně filtru (`on_change=save_filter_state`)
|
||||
- Načítá se jednou za session přes flag `filters_initialized` v `st.session_state`
|
||||
- Při načítání se hodnoty validují vůči aktuálním options (ochrana před zastaralými daty)
|
||||
|
||||
### Clipboard tlačítko
|
||||
|
||||
- Knihovna `pyperclip` — kopíruje přímo do Windows clipboardu ze serverové strany
|
||||
- Formát: `Jméno Příjmení <email@domain.cz>; …`
|
||||
- Reaguje na aktuálně zobrazené (filtrované) záznamy
|
||||
|
||||
### Cache
|
||||
|
||||
- `@st.cache_data(ttl=300)` — data se drží 5 minut
|
||||
- Tlačítko 🔄 Obnovit data volá `st.cache_data.clear()` + `st.rerun()`
|
||||
|
||||
---
|
||||
|
||||
## Závislosti (venv)
|
||||
|
||||
```
|
||||
mysql-connector-python
|
||||
pandas
|
||||
openpyxl
|
||||
streamlit
|
||||
pyperclip
|
||||
```
|
||||
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"sel_status": "Aktivní",
|
||||
"sel_proto": "77242113UCO3001",
|
||||
"sel_role": [
|
||||
"Principal Investigator",
|
||||
"Sub-Investigator"
|
||||
],
|
||||
"sel_site": []
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
"""
|
||||
import_CZ_contacts.py
|
||||
Importuje kontakty středisek Czechia z PANORAMA Dashboard xlsx do MySQL tabulky CTMS_contacts.
|
||||
- Zpracuje všechny *.xlsx soubory ve SOURCE_DIR
|
||||
- Filtruje pouze řádky Country Name == 'Czechia'
|
||||
- file_date bere z document properties xlsx (dcterms:created)
|
||||
- Každý soubor vždy přepíše (delete + insert podle file_date + protocol_id + country)
|
||||
"""
|
||||
|
||||
import zipfile
|
||||
import xml.etree.ElementTree as ET
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
import pandas as pd
|
||||
import mysql.connector
|
||||
|
||||
# ── Konfigurace ────────────────────────────────────────────────────────────────
|
||||
SOURCE_DIR = Path(r"U:\PythonProject\Janssen\CTMS\PanoramaContacts\SourceData")
|
||||
|
||||
DB_CONFIG = {
|
||||
"host": "192.168.1.76",
|
||||
"port": 3306,
|
||||
"user": "root",
|
||||
"password": "Vlado9674+",
|
||||
"database": "studie",
|
||||
"charset": "utf8mb4",
|
||||
}
|
||||
|
||||
TABLE = "CTMS_contacts"
|
||||
COUNTRY = "Czechia"
|
||||
SHEET = "Site Contacts"
|
||||
HEADER_ROW = 5 # 0-based → řádek č. 6 v Excelu
|
||||
|
||||
COL_MAP = {
|
||||
"Sector": "sector",
|
||||
"TA": "ta",
|
||||
"Protocol ID": "protocol_id",
|
||||
"GTL-GTM/CTM": "gtl_ctm",
|
||||
"Country Name": "country_name",
|
||||
"LTM Name": "ltm_name",
|
||||
"Site ID": "site_id",
|
||||
"SM Name": "sm_name",
|
||||
"PI Full Name": "pi_full_name",
|
||||
"Institution Name": "institution_name",
|
||||
"Contact Identifier": "contact_identifier",
|
||||
"Title": "contact_title",
|
||||
"Last Name": "last_name",
|
||||
"First Name": "first_name",
|
||||
"Contact Role": "contact_role",
|
||||
"Contact Type": "contact_type",
|
||||
"Pr St Cont Primary Indicator": "primary_indicator",
|
||||
"SUA Reporting Indicator": "sua_reporting_indicator",
|
||||
"Financial Disclosure Indicator": "financial_disclosure_indicator",
|
||||
"Contact Phone Number": "phone",
|
||||
"Alternative Phone Number": "phone_alt",
|
||||
"Mobile Phone Number": "phone_mobile",
|
||||
"Contact Fax Number": "fax",
|
||||
"Contact Email Address": "email",
|
||||
"SUA Reporting Email Address": "email_sua",
|
||||
"Contact Start Date": "contact_start_date",
|
||||
"Contact End Date": "contact_end_date",
|
||||
"Degree/qualification": "degree_qualification",
|
||||
"Job Title": "job_title",
|
||||
"Contact Address Line 1": "address_line1",
|
||||
"Contact Address Line 2": "address_line2",
|
||||
"Contact Address Line 3": "address_line3",
|
||||
"Contact City": "city",
|
||||
"Contact Addr State/Province": "state_province",
|
||||
"Contact Zip/Postal Code": "zip_postal_code",
|
||||
}
|
||||
|
||||
|
||||
# ── Pomocné funkce ─────────────────────────────────────────────────────────────
|
||||
def get_file_created_date(xlsx_path: Path):
|
||||
"""Vrátí date z dcterms:created v docProps/core.xml."""
|
||||
with zipfile.ZipFile(xlsx_path) as z:
|
||||
with z.open("docProps/core.xml") as f:
|
||||
root = ET.parse(f).getroot()
|
||||
el = root.find("{http://purl.org/dc/terms/}created")
|
||||
dt = datetime.fromisoformat(el.text.replace("Z", "+00:00"))
|
||||
return dt.astimezone(timezone.utc).date()
|
||||
|
||||
|
||||
def clean_value(val):
|
||||
"""Převede NaN / NaT / Timestamp na typy přijatelné MySQL driverem."""
|
||||
import math
|
||||
if val is None:
|
||||
return None
|
||||
if isinstance(val, float):
|
||||
return None if math.isnan(val) else val
|
||||
if isinstance(val, pd.Timestamp):
|
||||
return None if pd.isnull(val) else val.date()
|
||||
try:
|
||||
if pd.isna(val):
|
||||
return None
|
||||
except Exception:
|
||||
pass
|
||||
return val
|
||||
|
||||
|
||||
def get_protocol_id(xlsx_path: Path) -> str:
|
||||
"""Přečte protocol_id z prvního datového řádku (rychle, bez načtení celého souboru)."""
|
||||
df = pd.read_excel(xlsx_path, sheet_name=SHEET, header=HEADER_ROW,
|
||||
usecols=["Protocol ID"], nrows=1)
|
||||
return str(df["Protocol ID"].iloc[0])
|
||||
|
||||
|
||||
def import_file(xlsx_path: Path, cursor, conn):
|
||||
"""Zpracuje jeden xlsx soubor — vždy přepíše (delete + insert)."""
|
||||
file_date = get_file_created_date(xlsx_path)
|
||||
protocol_id = get_protocol_id(xlsx_path)
|
||||
print(f" file_date : {file_date}")
|
||||
print(f" protocol_id : {protocol_id}")
|
||||
|
||||
df = pd.read_excel(xlsx_path, sheet_name=SHEET, header=HEADER_ROW)
|
||||
df_cz = df[df["Country Name"] == COUNTRY].copy()
|
||||
print(f" radku CZ : {len(df_cz)}")
|
||||
|
||||
if df_cz.empty:
|
||||
print(" -> zadne CZ radky, preskoceno")
|
||||
return 0
|
||||
|
||||
df_cz = df_cz.rename(columns=COL_MAP)
|
||||
db_cols = list(COL_MAP.values())
|
||||
|
||||
# Smazání stávajících záznamů pro tento soubor (přepis)
|
||||
cursor.execute(
|
||||
f"DELETE FROM {TABLE} "
|
||||
f"WHERE file_date = %s AND protocol_id = %s AND country_name = %s",
|
||||
(file_date, protocol_id, COUNTRY)
|
||||
)
|
||||
deleted = cursor.rowcount
|
||||
if deleted:
|
||||
print(f" prepis : smazano {deleted} starych radku")
|
||||
|
||||
placeholders = ", ".join(["%s"] * (len(db_cols) + 1))
|
||||
insert_cols = "file_date, " + ", ".join(db_cols)
|
||||
sql_insert = f"INSERT INTO {TABLE} ({insert_cols}) VALUES ({placeholders})"
|
||||
|
||||
for _, row in df_cz.iterrows():
|
||||
values = [file_date] + [clean_value(row.get(col)) for col in db_cols]
|
||||
cursor.execute(sql_insert, values)
|
||||
|
||||
conn.commit()
|
||||
return len(df_cz)
|
||||
|
||||
|
||||
# ── Hlavní logika ──────────────────────────────────────────────────────────────
|
||||
def main():
|
||||
files = sorted(SOURCE_DIR.glob("*.xlsx"))
|
||||
if not files:
|
||||
print(f"Zadne xlsx soubory v {SOURCE_DIR}")
|
||||
return
|
||||
|
||||
today = datetime.now(timezone.utc).date()
|
||||
print(f"Nalezeno souboru: {len(files)} | dnesni datum: {today}")
|
||||
print(f"Pripojuji se k MySQL...")
|
||||
conn = mysql.connector.connect(**DB_CONFIG)
|
||||
cursor = conn.cursor()
|
||||
|
||||
summary = []
|
||||
for xlsx in files:
|
||||
print(f"\n[{xlsx.name}]")
|
||||
try:
|
||||
file_date = get_file_created_date(xlsx)
|
||||
if file_date != today:
|
||||
print(f" file_date : {file_date} -> PRESKOCENO (neni dnesni datum)")
|
||||
summary.append((xlsx.name, f"preskoceno (file_date={file_date})"))
|
||||
continue
|
||||
n = import_file(xlsx, cursor, conn)
|
||||
if n is None:
|
||||
summary.append((xlsx.name, "preskoceno"))
|
||||
else:
|
||||
summary.append((xlsx.name, f"importovano {n} radku"))
|
||||
except Exception as e:
|
||||
conn.rollback()
|
||||
summary.append((xlsx.name, f"CHYBA: {e}"))
|
||||
print(f" CHYBA: {e}")
|
||||
|
||||
cursor.close()
|
||||
conn.close()
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("SOUHRN:")
|
||||
for name, status in summary:
|
||||
print(f" {name:<45} {status}")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,6 @@
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
app = Path(__file__).parent / "webreport.py"
|
||||
subprocess.run([sys.executable, "-m", "streamlit", "run", str(app)])
|
||||
@@ -0,0 +1,83 @@
|
||||
-- ============================================================
|
||||
-- Databáze : studie
|
||||
-- Tabulka : CTMS_contacts
|
||||
-- Popis : Kontakty center ze systému PANORAMA (CTMS)
|
||||
-- studie 42847922MDD3003 (Neuroscience)
|
||||
-- Vytvořeno : 2026-05-07
|
||||
-- ============================================================
|
||||
|
||||
USE studie;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS CTMS_contacts (
|
||||
-- ── Interní klíče ──────────────────────────────────────────
|
||||
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
|
||||
file_date DATE NOT NULL COMMENT 'Datum vytvoření zdrojového souboru (PANORAMA export)',
|
||||
imported_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Datum a čas importu záznamu do DB',
|
||||
|
||||
-- ── Studie / organizace ────────────────────────────────────
|
||||
sector VARCHAR(20) COMMENT 'Pharma / ...',
|
||||
ta VARCHAR(30) COMMENT 'Therapeutic Area',
|
||||
protocol_id VARCHAR(20) COMMENT 'Protocol ID (např. 42847922MDD3003)',
|
||||
gtl_ctm VARCHAR(50) COMMENT 'GTL-GTM/CTM jméno',
|
||||
|
||||
-- ── Lokalita (site) ────────────────────────────────────────
|
||||
country_name VARCHAR(60) COMMENT 'Název země',
|
||||
ltm_name VARCHAR(50) COMMENT 'LTM Name',
|
||||
site_id VARCHAR(15) COMMENT 'Identifikátor střediska (např. S10-CZ10008)',
|
||||
sm_name VARCHAR(60) COMMENT 'Site Manager Name',
|
||||
pi_full_name VARCHAR(80) COMMENT 'Principal Investigator – celé jméno',
|
||||
institution_name VARCHAR(100) COMMENT 'Název instituce / kliniky',
|
||||
|
||||
-- ── Kontaktní osoba ────────────────────────────────────────
|
||||
contact_identifier INT UNSIGNED COMMENT 'PANORAMA interní ID kontaktu',
|
||||
contact_title VARCHAR(25) COMMENT 'Titul (Mr, Ms, Dr, ...)',
|
||||
last_name VARCHAR(50) COMMENT 'Příjmení',
|
||||
first_name VARCHAR(40) COMMENT 'Jméno',
|
||||
contact_role VARCHAR(50) COMMENT 'Role kontaktu (Study Coordinator, PI, ...)',
|
||||
contact_type VARCHAR(30) COMMENT 'Typ kontaktu (Study-Site Staff, ...)',
|
||||
|
||||
-- ── Indikátory ─────────────────────────────────────────────
|
||||
primary_indicator ENUM('Yes','No') COMMENT 'Pr St Cont Primary Indicator',
|
||||
sua_reporting_indicator ENUM('Yes','No') COMMENT 'SUA Reporting Indicator',
|
||||
financial_disclosure_indicator ENUM('Yes','No') COMMENT 'Financial Disclosure Indicator',
|
||||
|
||||
-- ── Kontaktní údaje ────────────────────────────────────────
|
||||
phone VARCHAR(40) COMMENT 'Hlavní telefonní číslo',
|
||||
phone_alt VARCHAR(40) COMMENT 'Alternativní telefonní číslo',
|
||||
phone_mobile VARCHAR(40) COMMENT 'Mobilní číslo',
|
||||
fax VARCHAR(40) COMMENT 'Faxové číslo',
|
||||
email VARCHAR(100) COMMENT 'Hlavní e-mailová adresa',
|
||||
email_sua VARCHAR(100) COMMENT 'SUA Reporting e-mail',
|
||||
|
||||
-- ── Datumy ─────────────────────────────────────────────────
|
||||
contact_start_date DATE COMMENT 'Datum začátku platnosti kontaktu',
|
||||
contact_end_date DATE COMMENT 'Datum konce platnosti kontaktu',
|
||||
|
||||
-- ── Kvalifikace ────────────────────────────────────────────
|
||||
degree_qualification VARCHAR(30) COMMENT 'Titul / kvalifikace',
|
||||
job_title VARCHAR(40) COMMENT 'Pracovní pozice',
|
||||
|
||||
-- ── Adresa ─────────────────────────────────────────────────
|
||||
address_line1 VARCHAR(100) COMMENT 'Adresní řádek 1',
|
||||
address_line2 VARCHAR(60) COMMENT 'Adresní řádek 2',
|
||||
address_line3 VARCHAR(100) COMMENT 'Adresní řádek 3',
|
||||
city VARCHAR(50) COMMENT 'Město',
|
||||
state_province VARCHAR(40) COMMENT 'Stát / provincie',
|
||||
zip_postal_code VARCHAR(20) COMMENT 'PSČ',
|
||||
|
||||
-- ── Klíče ──────────────────────────────────────────────────
|
||||
PRIMARY KEY (id),
|
||||
|
||||
-- Rychlé vyhledávání podle nejčastěji dotazovaných polí
|
||||
INDEX idx_file_date (file_date),
|
||||
INDEX idx_country (country_name),
|
||||
INDEX idx_site_id (site_id),
|
||||
INDEX idx_protocol (protocol_id),
|
||||
INDEX idx_contact_role (contact_role),
|
||||
INDEX idx_email (email),
|
||||
INDEX idx_contact_identifier (contact_identifier)
|
||||
|
||||
) ENGINE=InnoDB
|
||||
DEFAULT CHARSET=utf8mb4
|
||||
COLLATE=utf8mb4_unicode_ci
|
||||
COMMENT='CTMS contacts – Site Contacts, studie 42847922MDD3003';
|
||||
@@ -0,0 +1,223 @@
|
||||
"""
|
||||
create_report.py
|
||||
Streamlit report kontaktů z MySQL tabulky CTMS_contacts.
|
||||
Spuštění: streamlit run create_report.py
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import mysql.connector
|
||||
import pandas as pd
|
||||
import pyperclip
|
||||
import streamlit as st
|
||||
|
||||
# ── Konfigurace ────────────────────────────────────────────────────────────────
|
||||
DB_CONFIG = {
|
||||
"host": "192.168.1.76",
|
||||
"port": 3306,
|
||||
"user": "root",
|
||||
"password": "Vlado9674+",
|
||||
"database": "studie",
|
||||
"charset": "utf8mb4",
|
||||
}
|
||||
|
||||
TABLE = "CTMS_contacts"
|
||||
STATE_FILE = Path(__file__).parent / "filter_state.json"
|
||||
|
||||
ACTIVE_SITES = {
|
||||
"77242113UCO3001": {
|
||||
"DD5-CZ10001", "DD5-CZ10003", "DD5-CZ10006", "DD5-CZ10009",
|
||||
"DD5-CZ10010", "DD5-CZ10012", "DD5-CZ10013", "DD5-CZ10015",
|
||||
"DD5-CZ10016", "DD5-CZ10020", "DD5-CZ10021", "DD5-CZ10022",
|
||||
},
|
||||
"42847922MDD3003": {
|
||||
"S10-CZ10004", "S10-CZ10008", "S10-CZ10011", "S10-CZ10012",
|
||||
},
|
||||
}
|
||||
|
||||
DISPLAY_COLS = {
|
||||
"site_id": "Site ID",
|
||||
"institution_name": "Institution",
|
||||
"pi_full_name": "PI",
|
||||
"contact_title": "Title",
|
||||
"last_name": "Last Name",
|
||||
"first_name": "First Name",
|
||||
"contact_role": "Role",
|
||||
"primary_indicator": "Primary",
|
||||
"phone": "Phone",
|
||||
"phone_mobile": "Mobile",
|
||||
"email": "Email",
|
||||
"contact_start_date": "Start Date",
|
||||
"contact_end_date": "End Date",
|
||||
}
|
||||
|
||||
STATUS_OPTIONS = ["Aktivní", "Neaktivní", "Všechna"]
|
||||
DEFAULT_STATUS = "Aktivní"
|
||||
|
||||
|
||||
# ── Perzistence filtrů ─────────────────────────────────────────────────────────
|
||||
def load_filter_state() -> dict:
|
||||
if STATE_FILE.exists():
|
||||
try:
|
||||
return json.loads(STATE_FILE.read_text(encoding="utf-8"))
|
||||
except Exception:
|
||||
pass
|
||||
return {}
|
||||
|
||||
def save_filter_state():
|
||||
state = {
|
||||
"sel_status": st.session_state.get("sel_status", DEFAULT_STATUS),
|
||||
"sel_proto": st.session_state.get("sel_proto", "Všechny"),
|
||||
"sel_role": st.session_state.get("sel_role", []),
|
||||
"sel_site": st.session_state.get("sel_site", []),
|
||||
}
|
||||
STATE_FILE.write_text(json.dumps(state, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||
|
||||
|
||||
# ── Data ───────────────────────────────────────────────────────────────────────
|
||||
@st.cache_data(ttl=300)
|
||||
def load_data() -> pd.DataFrame:
|
||||
cols = ", ".join(DISPLAY_COLS.keys())
|
||||
sql = (
|
||||
f"SELECT protocol_id, file_date, {cols} "
|
||||
f"FROM {TABLE} "
|
||||
f"ORDER BY protocol_id, site_id, contact_role, last_name, first_name"
|
||||
)
|
||||
conn = mysql.connector.connect(**DB_CONFIG)
|
||||
cursor = conn.cursor(dictionary=True)
|
||||
cursor.execute(sql)
|
||||
rows = cursor.fetchall()
|
||||
cursor.close()
|
||||
conn.close()
|
||||
return pd.DataFrame(rows)
|
||||
|
||||
|
||||
# ── Aplikace ───────────────────────────────────────────────────────────────────
|
||||
st.set_page_config(page_title="CTMS Contacts", page_icon="🏥", layout="wide")
|
||||
st.title("🏥 CTMS Contacts — Czechia")
|
||||
|
||||
try:
|
||||
df = load_data()
|
||||
except Exception as e:
|
||||
st.error(f"Chyba připojení k MySQL: {e}")
|
||||
st.stop()
|
||||
|
||||
protocols = ["Všechny"] + sorted(df["protocol_id"].unique().tolist())
|
||||
|
||||
# Načti uložený stav jednou za session
|
||||
if "filters_initialized" not in st.session_state:
|
||||
saved = load_filter_state()
|
||||
st.session_state["sel_status"] = saved.get("sel_status", DEFAULT_STATUS) if saved.get("sel_status") in STATUS_OPTIONS else DEFAULT_STATUS
|
||||
st.session_state["sel_proto"] = saved.get("sel_proto", "Všechny") if saved.get("sel_proto") in protocols else "Všechny"
|
||||
st.session_state["sel_role"] = saved.get("sel_role", [])
|
||||
st.session_state["sel_site"] = saved.get("sel_site", [])
|
||||
st.session_state["filters_initialized"] = True
|
||||
|
||||
# Role a centra podle vybraného protokolu + aktivní/neaktivní
|
||||
all_active = set().union(*ACTIVE_SITES.values())
|
||||
df_opts = df.copy()
|
||||
if st.session_state["sel_proto"] != "Všechny":
|
||||
df_opts = df_opts[df_opts["protocol_id"] == st.session_state["sel_proto"]]
|
||||
if st.session_state["sel_status"] == "Aktivní":
|
||||
df_opts = df_opts[df_opts["site_id"].isin(all_active) & df_opts["contact_end_date"].isna()]
|
||||
elif st.session_state["sel_status"] == "Neaktivní":
|
||||
df_opts = df_opts[~df_opts["site_id"].isin(all_active)]
|
||||
roles = sorted(df_opts["contact_role"].dropna().unique().tolist())
|
||||
sites = sorted(df_opts["site_id"].dropna().unique().tolist())
|
||||
|
||||
# Pročisti neplatné výběry po změně protokolu
|
||||
st.session_state["sel_role"] = [r for r in st.session_state["sel_role"] if r in roles]
|
||||
st.session_state["sel_site"] = [s for s in st.session_state["sel_site"] if s in sites]
|
||||
|
||||
# ── Sidebar filtry ─────────────────────────────────────────────────────────────
|
||||
with st.sidebar:
|
||||
st.header("Filtry")
|
||||
|
||||
st.radio(
|
||||
"Střediska", STATUS_OPTIONS, horizontal=True,
|
||||
key="sel_status", on_change=save_filter_state,
|
||||
)
|
||||
st.selectbox(
|
||||
"Protokol", protocols,
|
||||
key="sel_proto", on_change=save_filter_state,
|
||||
)
|
||||
st.multiselect(
|
||||
"Role", roles,
|
||||
key="sel_role", on_change=save_filter_state,
|
||||
)
|
||||
st.multiselect(
|
||||
"Site", sites,
|
||||
key="sel_site", on_change=save_filter_state,
|
||||
)
|
||||
|
||||
search = st.text_input("Hledat (jméno, email…)")
|
||||
|
||||
st.divider()
|
||||
if st.button("🔄 Obnovit data"):
|
||||
st.cache_data.clear()
|
||||
st.rerun()
|
||||
|
||||
st.caption(f"Naposledy načteno: {pd.Timestamp.now().strftime('%H:%M:%S')}")
|
||||
|
||||
# ── Filtrování ─────────────────────────────────────────────────────────────────
|
||||
filtered = df.copy()
|
||||
|
||||
if st.session_state["sel_proto"] != "Všechny":
|
||||
filtered = filtered[filtered["protocol_id"] == st.session_state["sel_proto"]]
|
||||
|
||||
if st.session_state["sel_status"] == "Aktivní":
|
||||
filtered = filtered[filtered["site_id"].isin(all_active) & filtered["contact_end_date"].isna()]
|
||||
elif st.session_state["sel_status"] == "Neaktivní":
|
||||
filtered = filtered[~filtered["site_id"].isin(all_active)]
|
||||
|
||||
if st.session_state["sel_role"]:
|
||||
filtered = filtered[filtered["contact_role"].isin(st.session_state["sel_role"])]
|
||||
if st.session_state["sel_site"]:
|
||||
filtered = filtered[filtered["site_id"].isin(st.session_state["sel_site"])]
|
||||
if search:
|
||||
mask = filtered.apply(
|
||||
lambda row: row.astype(str).str.contains(search, case=False, na=False).any(),
|
||||
axis=1,
|
||||
)
|
||||
filtered = filtered[mask]
|
||||
|
||||
# ── Metriky ────────────────────────────────────────────────────────────────────
|
||||
col1, col2, col3, col4 = st.columns(4)
|
||||
col1.metric("Kontaktů celkem", len(filtered))
|
||||
col2.metric("Protokolů", filtered["protocol_id"].nunique())
|
||||
col3.metric("Středisek", filtered["site_id"].nunique())
|
||||
col4.metric("Rolí", filtered["contact_role"].nunique())
|
||||
|
||||
st.divider()
|
||||
|
||||
# ── Tabulka ────────────────────────────────────────────────────────────────────
|
||||
display = filtered[["protocol_id", "file_date"] + list(DISPLAY_COLS.keys())].copy()
|
||||
display = display.rename(columns={"protocol_id": "Protocol", "file_date": "File Date", **DISPLAY_COLS})
|
||||
|
||||
st.dataframe(
|
||||
display,
|
||||
width="stretch",
|
||||
hide_index=True,
|
||||
column_config={
|
||||
"Email": st.column_config.LinkColumn("Email", display_text=".*"),
|
||||
"Start Date": st.column_config.DateColumn("Start Date", format="DD-MMM-YYYY"),
|
||||
"End Date": st.column_config.DateColumn("End Date", format="DD-MMM-YYYY"),
|
||||
},
|
||||
)
|
||||
|
||||
st.caption(f"Zobrazeno {len(filtered)} z {len(df)} záznamů")
|
||||
|
||||
st.divider()
|
||||
email_rows = filtered[["first_name", "last_name", "email"]].dropna(subset=["email"])
|
||||
email_rows = email_rows[email_rows["email"].str.strip() != ""]
|
||||
entries = [
|
||||
f"{row.first_name} {row.last_name} <{row.email}>"
|
||||
for row in email_rows.itertuples()
|
||||
]
|
||||
email_str = "; ".join(entries)
|
||||
|
||||
if st.button(f"📋 Kopírovat emaily do clipboardu ({len(entries)} adres)"):
|
||||
if entries:
|
||||
pyperclip.copy(email_str)
|
||||
st.success(f"✅ Zkopírováno {len(entries)} adres — vlož přímo do pole Komu.")
|
||||
@@ -0,0 +1,47 @@
|
||||
import pandas as pd
|
||||
|
||||
CSV_FILE = "filename.csv"
|
||||
|
||||
df = pd.read_csv(CSV_FILE, sep=";", encoding="utf-8-sig")
|
||||
|
||||
# Parse dates
|
||||
date_cols = ["Original Due Date", "Due Date", "Window Start Date", "Cutoff Date", "Completed Date"]
|
||||
for col in date_cols:
|
||||
df[col] = pd.to_datetime(df[col], errors="coerce")
|
||||
|
||||
# Country from site number
|
||||
df["Country"] = df["Study Site Number"].str.extract(r"DD5-([A-Z]+)\d+")
|
||||
|
||||
print("=" * 60)
|
||||
print("CTMS VISITS EXPORT — přehled dat")
|
||||
print("=" * 60)
|
||||
print(f"\nCelkem řádků : {len(df):,}")
|
||||
print(f"Celkem sloupců: {len(df.columns)}")
|
||||
print(f"\nSloupce:\n " + "\n ".join(df.columns.tolist()))
|
||||
|
||||
print(f"\nSites celkem : {df['Study Site Number'].nunique()}")
|
||||
print(f"Zemí celkem : {df['Country'].nunique()}")
|
||||
print(f"Země : {', '.join(sorted(df['Country'].dropna().unique()))}")
|
||||
|
||||
print("\nStatus:")
|
||||
for k, v in df["Status"].value_counts().items():
|
||||
print(f" {k:<20} {v:>6,}")
|
||||
|
||||
print("\nCategory:")
|
||||
for k, v in df["Category"].value_counts().items():
|
||||
print(f" {k:<25} {v:>6,}")
|
||||
|
||||
print("\nSub Category:")
|
||||
for k, v in df["Sub Category"].value_counts().items():
|
||||
print(f" {k:<20} {v:>6,}")
|
||||
|
||||
print(f"\nReference kódy: {sorted(df['Reference'].dropna().unique().tolist())}")
|
||||
|
||||
print("\nRozsah dat:")
|
||||
for col in ["Due Date", "Completed Date"]:
|
||||
vals = df[col].dropna()
|
||||
if len(vals):
|
||||
print(f" {col:<20} {vals.min().date()} — {vals.max().date()}")
|
||||
|
||||
print("\nNáhled (5 řádků):")
|
||||
print(df.head(5).to_string())
|
||||
@@ -0,0 +1,401 @@
|
||||
import pandas as pd
|
||||
import openpyxl
|
||||
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, numbers
|
||||
from openpyxl.utils import get_column_letter
|
||||
from datetime import date
|
||||
import os
|
||||
|
||||
CSV_FILE = "filename.csv"
|
||||
SVR_FILE = "Site Visit Report (2).xlsx"
|
||||
OUTPUT_DIR = os.path.join("..", "..", "CTMS", "output")
|
||||
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
||||
today_str = date.today().strftime("%Y-%m-%d")
|
||||
OUTPUT_FILE = os.path.join(OUTPUT_DIR, f"{today_str} UCO3001 CZ CTMS Visits.xlsx")
|
||||
|
||||
# --- Load & filter ---
|
||||
df = pd.read_csv(CSV_FILE, sep=";", encoding="utf-8-sig")
|
||||
df["Country"] = df["Study Site Number"].str.extract(r"DD5-([A-Z]+)\d+")
|
||||
cz = df[df["Country"] == "CZ"].copy()
|
||||
|
||||
date_cols = ["Original Due Date", "Due Date", "Window Start Date", "Cutoff Date", "Completed Date"]
|
||||
for col in date_cols:
|
||||
cz[col] = pd.to_datetime(cz[col], errors="coerce")
|
||||
|
||||
SITES = [
|
||||
"DD5-CZ10001", "DD5-CZ10003", "DD5-CZ10006", "DD5-CZ10009",
|
||||
"DD5-CZ10010", "DD5-CZ10012", "DD5-CZ10013", "DD5-CZ10015",
|
||||
"DD5-CZ10016", "DD5-CZ10020", "DD5-CZ10021", "DD5-CZ10022",
|
||||
]
|
||||
cz = cz[cz["Study Site Number"].isin(SITES) & cz["Status"].isin(["Completed", "Scheduled", "Planned"])].copy()
|
||||
|
||||
cz["CRA"] = cz["Assigned To Last Name"].fillna("")
|
||||
|
||||
# --- Merge Site Visit Report (2) ---
|
||||
import re as _re
|
||||
def _svid_to_ref(svid):
|
||||
svid = str(svid).replace("MCTMS|", "")
|
||||
if svid == "Qualification Visit": return "SQV"
|
||||
if svid == "Site Initiation": return "SIV"
|
||||
if svid == "Closure Visit": return "COV"
|
||||
m = _re.match(r"Monitoring Visit (\d+)", svid)
|
||||
return f"IMV{m.group(1)}" if m else svid
|
||||
|
||||
svr = pd.read_excel(SVR_FILE, header=5)
|
||||
svr = svr[svr["Site ID"].isin(SITES)].copy()
|
||||
svr["Reference"] = svr["Site Visit ID"].apply(_svid_to_ref)
|
||||
svr = svr[["Site ID", "Reference", "Site Visit Type", "Submitter Name", "Approver Name"]].rename(columns={"Site ID": "Study Site Number"})
|
||||
|
||||
cz = cz.merge(svr, on=["Study Site Number", "Reference"], how="left")
|
||||
|
||||
# --- Styles ---
|
||||
FONT_NAME = "Arial"
|
||||
COL_HEADER = "1F5C99" # dark blue
|
||||
COL_COMPL = "E2EFDA" # light green
|
||||
COL_SCHED = "FFF2CC" # light yellow
|
||||
COL_PLAN = "FCE4D6" # light orange
|
||||
COL_NA = "F2F2F2" # grey
|
||||
WHITE = "FFFFFF"
|
||||
DARK_TEXT = "000000"
|
||||
|
||||
STATUS_COLORS = {
|
||||
"Completed": COL_COMPL,
|
||||
"Scheduled": COL_SCHED,
|
||||
"Planned": COL_PLAN,
|
||||
"Not applicable": COL_NA,
|
||||
}
|
||||
|
||||
thin = Side(style="thin", color="BFBFBF")
|
||||
med = Side(style="medium", color="808080")
|
||||
|
||||
def border(left=thin, right=thin, top=thin, bottom=thin):
|
||||
return Border(left=left, right=right, top=top, bottom=bottom)
|
||||
|
||||
def header_cell(ws, row, col, value, width=None):
|
||||
c = ws.cell(row=row, column=col, value=value)
|
||||
c.font = Font(name=FONT_NAME, bold=True, color=WHITE, size=10)
|
||||
c.fill = PatternFill("solid", fgColor=COL_HEADER)
|
||||
c.alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
|
||||
c.border = Border(left=Side(style="medium", color=WHITE),
|
||||
right=Side(style="medium", color=WHITE),
|
||||
top=thin, bottom=thin)
|
||||
if width and col <= ws.max_column or width:
|
||||
ws.column_dimensions[get_column_letter(col)].width = width
|
||||
return c
|
||||
|
||||
def data_cell(ws, row, col, value, fill_color=WHITE, align="left", bold=False, num_fmt=None, date_val=False):
|
||||
c = ws.cell(row=row, column=col, value=value)
|
||||
c.font = Font(name=FONT_NAME, size=9, bold=bold, color=DARK_TEXT)
|
||||
if fill_color != WHITE:
|
||||
c.fill = PatternFill("solid", fgColor=fill_color)
|
||||
c.alignment = Alignment(horizontal=align, vertical="center")
|
||||
c.border = border()
|
||||
if num_fmt:
|
||||
c.number_format = num_fmt
|
||||
elif date_val and isinstance(value, (pd.Timestamp, type(None))):
|
||||
c.number_format = "DD-MMM-YYYY"
|
||||
return c
|
||||
|
||||
# =========================================================
|
||||
# SHEET 1: Přehled per site
|
||||
# =========================================================
|
||||
wb = openpyxl.Workbook()
|
||||
ws1 = wb.active
|
||||
ws1.title = "Přehled CZ"
|
||||
ws1.freeze_panes = "A3"
|
||||
|
||||
# Title
|
||||
ws1.merge_cells("A1:M1")
|
||||
title = ws1["A1"]
|
||||
title.value = f"UCO3001 — CZ CTMS Visits Overview | {today_str}"
|
||||
title.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
|
||||
title.fill = PatternFill("solid", fgColor="2E4057")
|
||||
title.alignment = Alignment(horizontal="center", vertical="center")
|
||||
ws1.row_dimensions[1].height = 22
|
||||
|
||||
# Headers
|
||||
headers = [
|
||||
("Site", 14), ("Investigátor", 22),
|
||||
("SQV", 11), ("SIV", 11),
|
||||
("IMV\nCompleted", 11), ("IMV\nScheduled", 11), ("IMV\nPlanned", 11),
|
||||
("COV", 11),
|
||||
("Poslední vizita\nDatum", 14), ("Poslední vizita\nTyp", 16),
|
||||
("Příští vizita\nDatum", 14), ("Příští vizita\nTyp", 16),
|
||||
("Celkem\nvizit", 10),
|
||||
]
|
||||
for ci, (h, w) in enumerate(headers, 1):
|
||||
header_cell(ws1, 2, ci, h, width=w)
|
||||
ws1.row_dimensions[2].height = 30
|
||||
|
||||
# Data per site
|
||||
sites = sorted(cz["Study Site Number"].unique())
|
||||
for ri, site in enumerate(sites, 3):
|
||||
s = cz[cz["Study Site Number"] == site]
|
||||
inv_row = s.iloc[0]
|
||||
inv = f"{inv_row['INV_FIRST_NAME']} {inv_row['INV_LAST_NAME']}"
|
||||
cra = s["CRA"].replace("", pd.NA).dropna().iloc[0] if not s["CRA"].replace("", pd.NA).dropna().empty else ""
|
||||
|
||||
sqv = s[s["Reference"] == "SQV"]
|
||||
siv = s[s["Reference"] == "SIV"]
|
||||
cov = s[s["Reference"] == "COV"]
|
||||
imv = s[s["Category"] == "Monitoring Visit"]
|
||||
|
||||
def visit_status(sub):
|
||||
if sub.empty:
|
||||
return ("—", COL_NA)
|
||||
st = sub.iloc[0]["Status"]
|
||||
return (st, STATUS_COLORS.get(st, WHITE))
|
||||
|
||||
sqv_st, sqv_c = visit_status(sqv)
|
||||
siv_st, siv_c = visit_status(siv)
|
||||
cov_st, cov_c = visit_status(cov)
|
||||
|
||||
imv_comp = int((imv["Status"] == "Completed").sum())
|
||||
imv_sch = int((imv["Status"] == "Scheduled").sum())
|
||||
imv_plan = int((imv["Status"] == "Planned").sum())
|
||||
|
||||
# Last completed
|
||||
comp = s[s["Status"] == "Completed"].dropna(subset=["Completed Date"])
|
||||
last_comp = comp.sort_values("Completed Date").iloc[-1] if not comp.empty else None
|
||||
last_date = last_comp["Completed Date"] if last_comp is not None else None
|
||||
last_type = last_comp["Reference"] if last_comp is not None else "—"
|
||||
|
||||
# Next upcoming — pouze vizity s Due Date po poslední Completed
|
||||
upcoming = s[s["Status"].isin(["Scheduled", "Planned"])].dropna(subset=["Due Date"])
|
||||
if last_date is not None:
|
||||
upcoming = upcoming[upcoming["Due Date"] > last_date]
|
||||
next_vis = upcoming.sort_values("Due Date").iloc[0] if not upcoming.empty else None
|
||||
next_date = next_vis["Due Date"] if next_vis is not None else None
|
||||
next_type = next_vis["Reference"] if next_vis is not None else "—"
|
||||
|
||||
total = len(s)
|
||||
bg = WHITE if ri % 2 == 0 else "F7F9FC"
|
||||
|
||||
row_data = [
|
||||
(site, "left", True, None, None),
|
||||
(inv, "left", False, None, None),
|
||||
(sqv_st, "center", False, None, sqv_c),
|
||||
(siv_st, "center", False, None, siv_c),
|
||||
(imv_comp, "center", False, "#,##0", None),
|
||||
(imv_sch, "center", False, "#,##0", None),
|
||||
(imv_plan, "center", False, "#,##0", None),
|
||||
(cov_st, "center", False, None, cov_c),
|
||||
(last_date, "center", False, "DD-MMM-YY",None),
|
||||
(last_type, "center", False, None, None),
|
||||
(next_date, "center", False, "DD-MMM-YY",None),
|
||||
(next_type, "center", False, None, None),
|
||||
(total, "center", True, "#,##0", None),
|
||||
]
|
||||
for ci, (val, align, bold, fmt, fill) in enumerate(row_data, 1):
|
||||
fc = fill if fill else bg
|
||||
c = data_cell(ws1, ri, ci, val, fill_color=fc, align=align, bold=bold)
|
||||
if fmt:
|
||||
c.number_format = fmt
|
||||
ws1.row_dimensions[ri].height = 16
|
||||
|
||||
# Autofilter
|
||||
ws1.auto_filter.ref = f"A2:{get_column_letter(len(headers))}2"
|
||||
|
||||
# =========================================================
|
||||
# SHEET 2: Detail všech CZ vizit
|
||||
# =========================================================
|
||||
ws2 = wb.create_sheet("Detail CZ")
|
||||
ws2.freeze_panes = "A3"
|
||||
|
||||
ws2.merge_cells("A1:N1")
|
||||
t2 = ws2["A1"]
|
||||
t2.value = f"UCO3001 — CZ CTMS Visits — Detail | {today_str}"
|
||||
t2.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
|
||||
t2.fill = PatternFill("solid", fgColor="2E4057")
|
||||
t2.alignment = Alignment(horizontal="center", vertical="center")
|
||||
ws2.row_dimensions[1].height = 22
|
||||
|
||||
det_headers = [
|
||||
("Site", 14), ("Investigátor", 22), ("CRA (Submitter)", 24),
|
||||
("Ref", 9), ("Název vizity", 24), ("Category", 20), ("Sub Category", 16),
|
||||
("Status", 14),
|
||||
("Due Date", 13), ("Window Start", 13), ("Cutoff Date", 13), ("Completed Date", 13),
|
||||
("Typ vizity", 12),
|
||||
]
|
||||
for ci, (h, w) in enumerate(det_headers, 1):
|
||||
header_cell(ws2, 2, ci, h, width=w)
|
||||
ws2.row_dimensions[2].height = 26
|
||||
|
||||
# Sort: site → SQV → SIV → IMV1 → IMV2 … → COV
|
||||
ref_order = {"SQV": 0, "SIV": 1, "COV": 9999}
|
||||
def ref_sort_key(ref):
|
||||
if ref in ref_order:
|
||||
return ref_order[ref]
|
||||
import re
|
||||
m = re.match(r"IMV(\d+)$", str(ref))
|
||||
return int(m.group(1)) + 1 if m else 5000
|
||||
cz["_ref_ord"] = cz["Reference"].apply(ref_sort_key)
|
||||
detail = cz.sort_values(["Study Site Number", "_ref_ord"]).reset_index(drop=True)
|
||||
|
||||
for ri, row in detail.iterrows():
|
||||
r = ri + 3
|
||||
st = row["Status"]
|
||||
bg = STATUS_COLORS.get(st, WHITE)
|
||||
|
||||
inv = f"{row['INV_FIRST_NAME']} {row['INV_LAST_NAME']}"
|
||||
submitter = row["Submitter Name"] if pd.notna(row.get("Submitter Name")) else ""
|
||||
visit_type = row["Site Visit Type"] if pd.notna(row.get("Site Visit Type")) else ""
|
||||
vals = [
|
||||
(row["Study Site Number"], "left", True),
|
||||
(inv, "left", False),
|
||||
(submitter, "left", False),
|
||||
(row["Reference"], "center", True),
|
||||
(row["Visit Name"], "left", False),
|
||||
(row["Category"], "left", False),
|
||||
(row["Sub Category"], "left", False),
|
||||
(st, "center", False),
|
||||
(row["Due Date"], "center", False),
|
||||
(row["Window Start Date"], "center", False),
|
||||
(row["Cutoff Date"], "center", False),
|
||||
(row["Completed Date"], "center", False),
|
||||
(visit_type, "center", False),
|
||||
]
|
||||
for ci, (val, align, bold) in enumerate(vals, 1):
|
||||
c = data_cell(ws2, r, ci, val, fill_color=bg, align=align, bold=bold)
|
||||
if isinstance(val, pd.Timestamp) and not pd.isna(val):
|
||||
c.value = val.to_pydatetime()
|
||||
c.number_format = "DD-MMM-YY"
|
||||
ws2.row_dimensions[r].height = 14
|
||||
|
||||
ws2.auto_filter.ref = f"A2:{get_column_letter(len(det_headers))}2"
|
||||
|
||||
# =========================================================
|
||||
# SHEET 3: Nadcházející / Scheduled+Planned
|
||||
# =========================================================
|
||||
ws3 = wb.create_sheet("Nadcházející vizity")
|
||||
ws3.freeze_panes = "A3"
|
||||
|
||||
ws3.merge_cells("A1:J1")
|
||||
t3 = ws3["A1"]
|
||||
t3.value = f"UCO3001 — CZ — Nadcházející vizity (Scheduled + Planned) | {today_str}"
|
||||
t3.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
|
||||
t3.fill = PatternFill("solid", fgColor="2E4057")
|
||||
t3.alignment = Alignment(horizontal="center", vertical="center")
|
||||
ws3.row_dimensions[1].height = 22
|
||||
|
||||
upc_headers = [
|
||||
("Due Date", 13), ("Site", 14), ("Investigátor", 22), ("CRA", 14),
|
||||
("Ref", 9), ("Název vizity", 24), ("Category", 20),
|
||||
("Status", 12), ("Window Start", 13), ("Cutoff Date", 13),
|
||||
]
|
||||
for ci, (h, w) in enumerate(upc_headers, 1):
|
||||
header_cell(ws3, 2, ci, h, width=w)
|
||||
ws3.row_dimensions[2].height = 26
|
||||
|
||||
upcoming = cz[cz["Status"].isin(["Scheduled", "Planned"])].sort_values(["Due Date", "Study Site Number"]).reset_index(drop=True)
|
||||
|
||||
for ri, row in upcoming.iterrows():
|
||||
r = ri + 3
|
||||
bg = STATUS_COLORS.get(row["Status"], WHITE)
|
||||
inv = f"{row['INV_FIRST_NAME']} {row['INV_LAST_NAME']}"
|
||||
vals = [
|
||||
(row["Due Date"], "center", True),
|
||||
(row["Study Site Number"], "left", False),
|
||||
(inv, "left", False),
|
||||
(row["CRA"], "center", False),
|
||||
(row["Reference"], "center", True),
|
||||
(row["Visit Name"], "left", False),
|
||||
(row["Category"], "left", False),
|
||||
(row["Status"], "center", False),
|
||||
(row["Window Start Date"], "center", False),
|
||||
(row["Cutoff Date"], "center", False),
|
||||
]
|
||||
for ci, (val, align, bold) in enumerate(vals, 1):
|
||||
c = data_cell(ws3, r, ci, val, fill_color=bg, align=align, bold=bold)
|
||||
if isinstance(val, pd.Timestamp) and not pd.isna(val):
|
||||
c.value = val.to_pydatetime()
|
||||
c.number_format = "DD-MMM-YY"
|
||||
ws3.row_dimensions[r].height = 14
|
||||
|
||||
ws3.auto_filter.ref = f"A2:{get_column_letter(len(upc_headers))}2"
|
||||
|
||||
# =========================================================
|
||||
# SHEET 4: Problémy — datové nesoulady
|
||||
# =========================================================
|
||||
ws4 = wb.create_sheet("Problémy")
|
||||
ws4.freeze_panes = "A3"
|
||||
|
||||
# Načteme původní data bez statusového filtru pro detekci problémů
|
||||
df_raw = pd.read_csv(CSV_FILE, sep=";", encoding="utf-8-sig")
|
||||
df_raw["Country"] = df_raw["Study Site Number"].str.extract(r"DD5-([A-Z]+)\d+")
|
||||
cz_raw = df_raw[df_raw["Study Site Number"].isin(SITES)].copy()
|
||||
for col in date_cols:
|
||||
cz_raw[col] = pd.to_datetime(cz_raw[col], errors="coerce")
|
||||
cz_raw["CRA"] = cz_raw["Assigned To Last Name"].fillna("")
|
||||
cz_raw = cz_raw.merge(svr, on=["Study Site Number", "Reference"], how="left")
|
||||
cz_raw["Submitter Name"] = cz_raw["Submitter Name"].fillna("")
|
||||
|
||||
problems = []
|
||||
|
||||
# Pravidlo 1: Completed Date vyplněno ale Status ≠ Completed
|
||||
mask1 = cz_raw["Completed Date"].notna() & (cz_raw["Status"] != "Completed")
|
||||
for _, row in cz_raw[mask1].iterrows():
|
||||
problems.append((row, "Completed Date je vyplněno, ale Status není Completed"))
|
||||
|
||||
# Seřadit podle site a reference
|
||||
import re as _re
|
||||
def _ref_key(ref):
|
||||
if ref == "SQV": return 0
|
||||
if ref == "SIV": return 1
|
||||
if ref == "COV": return 9999
|
||||
m = _re.match(r"IMV(\d+)$", str(ref))
|
||||
return int(m.group(1)) + 1 if m else 5000
|
||||
|
||||
problems.sort(key=lambda x: (x[0]["Study Site Number"], _ref_key(x[0]["Reference"])))
|
||||
|
||||
COL_PROBLEM = "FFC7CE" # světle červená
|
||||
|
||||
ws4.merge_cells("A1:M1")
|
||||
t4 = ws4["A1"]
|
||||
t4.value = f"UCO3001 — CZ — Datové problémy k opravě v OneCTMS | {today_str}"
|
||||
t4.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
|
||||
t4.fill = PatternFill("solid", fgColor="C00000")
|
||||
t4.alignment = Alignment(horizontal="center", vertical="center")
|
||||
ws4.row_dimensions[1].height = 22
|
||||
|
||||
prob_headers = [
|
||||
("Site", 14), ("Investigátor", 22), ("CRA (Submitter)", 24),
|
||||
("Ref", 9), ("Název vizity", 24), ("Category", 18),
|
||||
("Status", 14),
|
||||
("Due Date", 13), ("Completed Date", 13),
|
||||
("", 2),
|
||||
("Důvod — co je potřeba opravit v OneCTMS", 50),
|
||||
]
|
||||
for ci, (h, w) in enumerate(prob_headers, 1):
|
||||
header_cell(ws4, 2, ci, h, width=w)
|
||||
ws4.row_dimensions[2].height = 26
|
||||
|
||||
for ri, (row, reason) in enumerate(problems, 3):
|
||||
inv = f"{row['INV_FIRST_NAME']} {row['INV_LAST_NAME']}"
|
||||
vals = [
|
||||
(row["Study Site Number"], "left", True, None),
|
||||
(inv, "left", False, None),
|
||||
(row["Submitter Name"], "left", False, None),
|
||||
(row["Reference"], "center", True, None),
|
||||
(row["Visit Name"], "left", False, None),
|
||||
(row["Category"], "left", False, None),
|
||||
(row["Status"], "center", False, None),
|
||||
(row["Due Date"], "center", False, "DD-MMM-YY"),
|
||||
(row["Completed Date"], "center", False, "DD-MMM-YY"),
|
||||
("", "center", False, None),
|
||||
(reason, "left", True, None),
|
||||
]
|
||||
for ci, (val, align, bold, fmt) in enumerate(vals, 1):
|
||||
c = data_cell(ws4, ri, ci, val, fill_color=COL_PROBLEM, align=align, bold=bold)
|
||||
if fmt and isinstance(val, pd.Timestamp) and not pd.isna(val):
|
||||
c.value = val.to_pydatetime()
|
||||
c.number_format = fmt
|
||||
ws4.row_dimensions[ri].height = 16
|
||||
|
||||
ws4.auto_filter.ref = f"A2:{get_column_letter(len(prob_headers))}2"
|
||||
|
||||
wb.save(OUTPUT_FILE)
|
||||
print(f"Report uložen: {OUTPUT_FILE}")
|
||||
print(f" Sheet 'Přehled CZ' : {len(sites)} sites")
|
||||
print(f" Sheet 'Detail CZ' : {len(detail)} řádků")
|
||||
print(f" Sheet 'Nadcházející vizity': {len(upcoming)} vizit")
|
||||
print(f" Sheet 'Problémy' : {len(problems)} záznamů")
|
||||
@@ -0,0 +1,44 @@
|
||||
# OneCTMS — Visit Schedule Notes
|
||||
|
||||
## Zdroj
|
||||
LTM Local Trial Manager OneCTMS Manual, ver. 11.0, 15-Dec-2024 (stránky 11–28)
|
||||
|
||||
## Statusy vizit
|
||||
|
||||
| Status | Popis |
|
||||
|---|---|
|
||||
| **Planned** | Vizita existuje v harmonogramu, SM zatím nezadal Visit Start Date. Dropdown nabídka jej obsahuje, ale manuál jeho použití na str. 11–28 blíže nevysvětluje. |
|
||||
| **Scheduled** | SM zadal Visit Start Date → datum se automaticky propíše do ATLAS jako "Next Scheduled Visit Date". |
|
||||
| **Completed** | SM označil vizitu za dokončenou. |
|
||||
| **Not applicable** | Nevyužitý placeholder — prázdný slot ze DSM šablony (výchozích 50 MV). Nemá vypovídací hodnotu, filtrujeme ven. |
|
||||
|
||||
Přechod stavů dle manuálu (str. 24):
|
||||
```
|
||||
Planned → Scheduled → Completed
|
||||
```
|
||||
|
||||
## DSM specifika (studie UCO3001)
|
||||
|
||||
- Studie používá **Dynamic Site Monitoring (DSM)** — šablona SIV + SCV + 50 MV s 8týdenními intervaly.
|
||||
- **Due Date se v DSM nepoužívá pro řazení** — vizity se řadí podle číselné sekvence (IMV1, IMV2, ...).
|
||||
- Správné pořadí vizit: **SQV → SIV → IMV1 → IMV2 → … → COV**
|
||||
- `Not applicable` vizity jsou nevyužité sloty šablony → vyřadit z reportů a počtů.
|
||||
- `Planned` vizity jsou reálné budoucí vizity bez potvrzeného data → ponechat.
|
||||
|
||||
## Zdrojové soubory
|
||||
|
||||
| Soubor | Systém | Odkud |
|
||||
|---|---|---|
|
||||
| `filename.csv` | **OneCTMS** | modul Visits → EMEA export (středníkový CSV) |
|
||||
| `Site Visit Report (2).xlsx` | **VIPER** | SVR Metrics report |
|
||||
|
||||
`filename.csv` obsahuje harmonogram vizit (plánované i completed), ale pole Assigned To je vyplněno nesystematicky — nelze spolehlivě použít jako zdroj CRA.
|
||||
|
||||
`Site Visit Report (2).xlsx` obsahuje pouze vizity se schváleným reportem (SVR Status = Reviewed and Approved), ale má klíčové pole **Submitter Name** = kdo vizitu reálně provedl. Oba soubory se propojují přes Site ID + Reference (SQV/SIV/IMV1...).
|
||||
|
||||
## Report skript
|
||||
|
||||
`20_report_CZ.py` — generuje Excel report pro 12 CZ center (Buzalka/Cetkovská porfolio):
|
||||
- Sheet **Přehled CZ** — souhrn per site
|
||||
- Sheet **Detail CZ** — všechny vizity, řazeno SQV→SIV→IMV1…→COV
|
||||
- Sheet **Nadcházející vizity** — Scheduled + Planned seřazeno dle Due Date
|
||||
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user