Initial commit — clean history (removed large test files, browser profiles, Medidata/Clario downloads)

This commit is contained in:
2026-06-01 15:36:31 +02:00
commit bb604e593e
1304 changed files with 116480 additions and 0 deletions
+11
View File
@@ -0,0 +1,11 @@
__pycache__/
__pycache__/
*.pyc
.idea/
.claude/
EmailsImport/SouboryRůznéVelikosti/
Clario/Downloads/
Medidata/downloads/
*/browser_profile*/
auth.json
session.json
+14
View File
@@ -0,0 +1,14 @@
{
"mcpServers": {
"janssen-mongo": {
"command": "python",
"args": ["U:\\PythonProject\\Janssen\\mcp_mongo.py"],
"cwd": "U:\\PythonProject\\Janssen"
},
"jnjemails": {
"command": "python",
"args": ["U:\\PythonProject\\Janssen\\EmailsImport\\mcp_jnjemails.py"],
"cwd": "U:\\PythonProject\\Janssen\\EmailsImport"
}
}
}
+117
View File
@@ -0,0 +1,117 @@
# PanoramaContacts — CLAUDE.md
## Účel adresáře
Import kontaktů středisek (site contacts) z exportů systému PANORAMA (CTMS) do MySQL a jejich zobrazení ve Streamlit web reportu.
Filtruje pouze záznamy pro **Czechia**. Aktuálně pokryté protokoly:
| Protocol ID | TA |
|---|---|
| `77242113UCO3001` | Immunology |
| `42847922MDD3003` | Neuroscience |
---
## Soubory
| Soubor | Účel |
|---|---|
| `import_CZ_contacts.py` | Import xlsx → MySQL |
| `webreport.py` | Streamlit web report |
| `run_webreport.py` | PyCharm launcher (`streamlit run webreport.py`) |
| `sql/create_CTMS_contacts.sql` | DDL tabulky `CTMS_contacts` |
| `SourceData/*.xlsx` | PANORAMA Dashboard exporty (zdrojová data) |
| `filter_state.json` | Automaticky ukládaný stav filtrů (generuje app) |
---
## MySQL
- **Host:** 192.168.1.76:3306 · **DB:** `studie` · **Tabulka:** `CTMS_contacts`
- **Sheet v xlsx:** `Site Contacts`, header na řádku 6 (0-based index 5)
### Klíčové sloupce tabulky
| Sloupec | Typ | Poznámka |
|---|---|---|
| `file_date` | DATE | Z `dcterms:created` v docProps/core.xml xlsx |
| `imported_at` | DATETIME | Auto timestamp importu |
| `protocol_id` | VARCHAR(20) | Identifikátor studie |
| `site_id` | VARCHAR(15) | Středisko (např. `DD5-CZ10006`) |
| `contact_role` | VARCHAR(50) | Role kontaktu (PI, Study Coordinator, …) |
| `contact_start_date` | DATE | Začátek platnosti kontaktu |
| `contact_end_date` | DATE | Konec platnosti — NULL = stále aktivní |
| `email` | VARCHAR(100) | Hlavní e-mail |
---
## import_CZ_contacts.py
- Zpracuje všechny `*.xlsx` v `SourceData/`
- Přeskočí soubory, jejichž `file_date` ≠ dnešní datum (UTC)
- Přepis: DELETE + INSERT podle `(file_date, protocol_id, country_name)`
- `clean_value()` převede NaN / NaT / Timestamp na typy přijatelné MySQL driverem
---
## webreport.py — Streamlit app
### Filtry (sidebar)
| Filtr | Widget | Logika options |
|---|---|---|
| **Střediska** | radio | Aktivní / Neaktivní / Všechna |
| **Protokol** | selectbox | Z celé DB |
| **Role** | multiselect | Filtrováno dle protokolu + aktivní/neaktivní |
| **Site** | multiselect | Filtrováno dle protokolu + aktivní/neaktivní |
| **Hledání** | text_input | Fulltext přes všechny sloupce řádku |
### Logika filtru Střediska
| Hodnota | Site podmínka | End Date podmínka |
|---|---|---|
| **Aktivní** | `site_id` v `ACTIVE_SITES` | `contact_end_date IS NULL` |
| **Neaktivní** | `site_id` NOT v `ACTIVE_SITES` | bez omezení |
| **Všechna** | bez omezení | bez omezení |
### Aktivní střediska (ACTIVE_SITES)
```python
"77242113UCO3001": {
"DD5-CZ10001", "DD5-CZ10003", "DD5-CZ10006", "DD5-CZ10009",
"DD5-CZ10010", "DD5-CZ10012", "DD5-CZ10013", "DD5-CZ10015",
"DD5-CZ10016", "DD5-CZ10020", "DD5-CZ10021", "DD5-CZ10022",
}
"42847922MDD3003": {
"S10-CZ10004", "S10-CZ10008", "S10-CZ10011", "S10-CZ10012",
}
```
### Perzistence filtrů
- Stav se ukládá do `filter_state.json` při každé změně filtru (`on_change=save_filter_state`)
- Načítá se jednou za session přes flag `filters_initialized` v `st.session_state`
- Při načítání se hodnoty validují vůči aktuálním options (ochrana před zastaralými daty)
### Clipboard tlačítko
- Knihovna `pyperclip` — kopíruje přímo do Windows clipboardu ze serverové strany
- Formát: `Jméno Příjmení <email@domain.cz>; …`
- Reaguje na aktuálně zobrazené (filtrované) záznamy
### Cache
- `@st.cache_data(ttl=300)` — data se drží 5 minut
- Tlačítko 🔄 Obnovit data volá `st.cache_data.clear()` + `st.rerun()`
---
## Závislosti (venv)
```
mysql-connector-python
pandas
openpyxl
streamlit
pyperclip
```
+9
View File
@@ -0,0 +1,9 @@
{
"sel_status": "Aktivní",
"sel_proto": "77242113UCO3001",
"sel_role": [
"Principal Investigator",
"Sub-Investigator"
],
"sel_site": []
}
+192
View File
@@ -0,0 +1,192 @@
"""
import_CZ_contacts.py
Importuje kontakty středisek Czechia z PANORAMA Dashboard xlsx do MySQL tabulky CTMS_contacts.
- Zpracuje všechny *.xlsx soubory ve SOURCE_DIR
- Filtruje pouze řádky Country Name == 'Czechia'
- file_date bere z document properties xlsx (dcterms:created)
- Každý soubor vždy přepíše (delete + insert podle file_date + protocol_id + country)
"""
import zipfile
import xml.etree.ElementTree as ET
from datetime import datetime, timezone
from pathlib import Path
import pandas as pd
import mysql.connector
# ── Konfigurace ────────────────────────────────────────────────────────────────
SOURCE_DIR = Path(r"U:\PythonProject\Janssen\CTMS\PanoramaContacts\SourceData")
DB_CONFIG = {
"host": "192.168.1.76",
"port": 3306,
"user": "root",
"password": "Vlado9674+",
"database": "studie",
"charset": "utf8mb4",
}
TABLE = "CTMS_contacts"
COUNTRY = "Czechia"
SHEET = "Site Contacts"
HEADER_ROW = 5 # 0-based → řádek č. 6 v Excelu
COL_MAP = {
"Sector": "sector",
"TA": "ta",
"Protocol ID": "protocol_id",
"GTL-GTM/CTM": "gtl_ctm",
"Country Name": "country_name",
"LTM Name": "ltm_name",
"Site ID": "site_id",
"SM Name": "sm_name",
"PI Full Name": "pi_full_name",
"Institution Name": "institution_name",
"Contact Identifier": "contact_identifier",
"Title": "contact_title",
"Last Name": "last_name",
"First Name": "first_name",
"Contact Role": "contact_role",
"Contact Type": "contact_type",
"Pr St Cont Primary Indicator": "primary_indicator",
"SUA Reporting Indicator": "sua_reporting_indicator",
"Financial Disclosure Indicator": "financial_disclosure_indicator",
"Contact Phone Number": "phone",
"Alternative Phone Number": "phone_alt",
"Mobile Phone Number": "phone_mobile",
"Contact Fax Number": "fax",
"Contact Email Address": "email",
"SUA Reporting Email Address": "email_sua",
"Contact Start Date": "contact_start_date",
"Contact End Date": "contact_end_date",
"Degree/qualification": "degree_qualification",
"Job Title": "job_title",
"Contact Address Line 1": "address_line1",
"Contact Address Line 2": "address_line2",
"Contact Address Line 3": "address_line3",
"Contact City": "city",
"Contact Addr State/Province": "state_province",
"Contact Zip/Postal Code": "zip_postal_code",
}
# ── Pomocné funkce ─────────────────────────────────────────────────────────────
def get_file_created_date(xlsx_path: Path):
"""Vrátí date z dcterms:created v docProps/core.xml."""
with zipfile.ZipFile(xlsx_path) as z:
with z.open("docProps/core.xml") as f:
root = ET.parse(f).getroot()
el = root.find("{http://purl.org/dc/terms/}created")
dt = datetime.fromisoformat(el.text.replace("Z", "+00:00"))
return dt.astimezone(timezone.utc).date()
def clean_value(val):
"""Převede NaN / NaT / Timestamp na typy přijatelné MySQL driverem."""
import math
if val is None:
return None
if isinstance(val, float):
return None if math.isnan(val) else val
if isinstance(val, pd.Timestamp):
return None if pd.isnull(val) else val.date()
try:
if pd.isna(val):
return None
except Exception:
pass
return val
def get_protocol_id(xlsx_path: Path) -> str:
"""Přečte protocol_id z prvního datového řádku (rychle, bez načtení celého souboru)."""
df = pd.read_excel(xlsx_path, sheet_name=SHEET, header=HEADER_ROW,
usecols=["Protocol ID"], nrows=1)
return str(df["Protocol ID"].iloc[0])
def import_file(xlsx_path: Path, cursor, conn):
"""Zpracuje jeden xlsx soubor — vždy přepíše (delete + insert)."""
file_date = get_file_created_date(xlsx_path)
protocol_id = get_protocol_id(xlsx_path)
print(f" file_date : {file_date}")
print(f" protocol_id : {protocol_id}")
df = pd.read_excel(xlsx_path, sheet_name=SHEET, header=HEADER_ROW)
df_cz = df[df["Country Name"] == COUNTRY].copy()
print(f" radku CZ : {len(df_cz)}")
if df_cz.empty:
print(" -> zadne CZ radky, preskoceno")
return 0
df_cz = df_cz.rename(columns=COL_MAP)
db_cols = list(COL_MAP.values())
# Smazání stávajících záznamů pro tento soubor (přepis)
cursor.execute(
f"DELETE FROM {TABLE} "
f"WHERE file_date = %s AND protocol_id = %s AND country_name = %s",
(file_date, protocol_id, COUNTRY)
)
deleted = cursor.rowcount
if deleted:
print(f" prepis : smazano {deleted} starych radku")
placeholders = ", ".join(["%s"] * (len(db_cols) + 1))
insert_cols = "file_date, " + ", ".join(db_cols)
sql_insert = f"INSERT INTO {TABLE} ({insert_cols}) VALUES ({placeholders})"
for _, row in df_cz.iterrows():
values = [file_date] + [clean_value(row.get(col)) for col in db_cols]
cursor.execute(sql_insert, values)
conn.commit()
return len(df_cz)
# ── Hlavní logika ──────────────────────────────────────────────────────────────
def main():
files = sorted(SOURCE_DIR.glob("*.xlsx"))
if not files:
print(f"Zadne xlsx soubory v {SOURCE_DIR}")
return
today = datetime.now(timezone.utc).date()
print(f"Nalezeno souboru: {len(files)} | dnesni datum: {today}")
print(f"Pripojuji se k MySQL...")
conn = mysql.connector.connect(**DB_CONFIG)
cursor = conn.cursor()
summary = []
for xlsx in files:
print(f"\n[{xlsx.name}]")
try:
file_date = get_file_created_date(xlsx)
if file_date != today:
print(f" file_date : {file_date} -> PRESKOCENO (neni dnesni datum)")
summary.append((xlsx.name, f"preskoceno (file_date={file_date})"))
continue
n = import_file(xlsx, cursor, conn)
if n is None:
summary.append((xlsx.name, "preskoceno"))
else:
summary.append((xlsx.name, f"importovano {n} radku"))
except Exception as e:
conn.rollback()
summary.append((xlsx.name, f"CHYBA: {e}"))
print(f" CHYBA: {e}")
cursor.close()
conn.close()
print("\n" + "=" * 60)
print("SOUHRN:")
for name, status in summary:
print(f" {name:<45} {status}")
print("=" * 60)
if __name__ == "__main__":
main()
+6
View File
@@ -0,0 +1,6 @@
import subprocess
import sys
from pathlib import Path
app = Path(__file__).parent / "webreport.py"
subprocess.run([sys.executable, "-m", "streamlit", "run", str(app)])
@@ -0,0 +1,83 @@
-- ============================================================
-- Databáze : studie
-- Tabulka : CTMS_contacts
-- Popis : Kontakty center ze systému PANORAMA (CTMS)
-- studie 42847922MDD3003 (Neuroscience)
-- Vytvořeno : 2026-05-07
-- ============================================================
USE studie;
CREATE TABLE IF NOT EXISTS CTMS_contacts (
-- ── Interní klíče ──────────────────────────────────────────
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
file_date DATE NOT NULL COMMENT 'Datum vytvoření zdrojového souboru (PANORAMA export)',
imported_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Datum a čas importu záznamu do DB',
-- ── Studie / organizace ────────────────────────────────────
sector VARCHAR(20) COMMENT 'Pharma / ...',
ta VARCHAR(30) COMMENT 'Therapeutic Area',
protocol_id VARCHAR(20) COMMENT 'Protocol ID (např. 42847922MDD3003)',
gtl_ctm VARCHAR(50) COMMENT 'GTL-GTM/CTM jméno',
-- ── Lokalita (site) ────────────────────────────────────────
country_name VARCHAR(60) COMMENT 'Název země',
ltm_name VARCHAR(50) COMMENT 'LTM Name',
site_id VARCHAR(15) COMMENT 'Identifikátor střediska (např. S10-CZ10008)',
sm_name VARCHAR(60) COMMENT 'Site Manager Name',
pi_full_name VARCHAR(80) COMMENT 'Principal Investigator celé jméno',
institution_name VARCHAR(100) COMMENT 'Název instituce / kliniky',
-- ── Kontaktní osoba ────────────────────────────────────────
contact_identifier INT UNSIGNED COMMENT 'PANORAMA interní ID kontaktu',
contact_title VARCHAR(25) COMMENT 'Titul (Mr, Ms, Dr, ...)',
last_name VARCHAR(50) COMMENT 'Příjmení',
first_name VARCHAR(40) COMMENT 'Jméno',
contact_role VARCHAR(50) COMMENT 'Role kontaktu (Study Coordinator, PI, ...)',
contact_type VARCHAR(30) COMMENT 'Typ kontaktu (Study-Site Staff, ...)',
-- ── Indikátory ─────────────────────────────────────────────
primary_indicator ENUM('Yes','No') COMMENT 'Pr St Cont Primary Indicator',
sua_reporting_indicator ENUM('Yes','No') COMMENT 'SUA Reporting Indicator',
financial_disclosure_indicator ENUM('Yes','No') COMMENT 'Financial Disclosure Indicator',
-- ── Kontaktní údaje ────────────────────────────────────────
phone VARCHAR(40) COMMENT 'Hlavní telefonní číslo',
phone_alt VARCHAR(40) COMMENT 'Alternativní telefonní číslo',
phone_mobile VARCHAR(40) COMMENT 'Mobilní číslo',
fax VARCHAR(40) COMMENT 'Faxové číslo',
email VARCHAR(100) COMMENT 'Hlavní e-mailová adresa',
email_sua VARCHAR(100) COMMENT 'SUA Reporting e-mail',
-- ── Datumy ─────────────────────────────────────────────────
contact_start_date DATE COMMENT 'Datum začátku platnosti kontaktu',
contact_end_date DATE COMMENT 'Datum konce platnosti kontaktu',
-- ── Kvalifikace ────────────────────────────────────────────
degree_qualification VARCHAR(30) COMMENT 'Titul / kvalifikace',
job_title VARCHAR(40) COMMENT 'Pracovní pozice',
-- ── Adresa ─────────────────────────────────────────────────
address_line1 VARCHAR(100) COMMENT 'Adresní řádek 1',
address_line2 VARCHAR(60) COMMENT 'Adresní řádek 2',
address_line3 VARCHAR(100) COMMENT 'Adresní řádek 3',
city VARCHAR(50) COMMENT 'Město',
state_province VARCHAR(40) COMMENT 'Stát / provincie',
zip_postal_code VARCHAR(20) COMMENT 'PSČ',
-- ── Klíče ──────────────────────────────────────────────────
PRIMARY KEY (id),
-- Rychlé vyhledávání podle nejčastěji dotazovaných polí
INDEX idx_file_date (file_date),
INDEX idx_country (country_name),
INDEX idx_site_id (site_id),
INDEX idx_protocol (protocol_id),
INDEX idx_contact_role (contact_role),
INDEX idx_email (email),
INDEX idx_contact_identifier (contact_identifier)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_unicode_ci
COMMENT='CTMS contacts Site Contacts, studie 42847922MDD3003';
+223
View File
@@ -0,0 +1,223 @@
"""
create_report.py
Streamlit report kontaktů z MySQL tabulky CTMS_contacts.
Spuštění: streamlit run create_report.py
"""
import json
from pathlib import Path
import mysql.connector
import pandas as pd
import pyperclip
import streamlit as st
# ── Konfigurace ────────────────────────────────────────────────────────────────
DB_CONFIG = {
"host": "192.168.1.76",
"port": 3306,
"user": "root",
"password": "Vlado9674+",
"database": "studie",
"charset": "utf8mb4",
}
TABLE = "CTMS_contacts"
STATE_FILE = Path(__file__).parent / "filter_state.json"
ACTIVE_SITES = {
"77242113UCO3001": {
"DD5-CZ10001", "DD5-CZ10003", "DD5-CZ10006", "DD5-CZ10009",
"DD5-CZ10010", "DD5-CZ10012", "DD5-CZ10013", "DD5-CZ10015",
"DD5-CZ10016", "DD5-CZ10020", "DD5-CZ10021", "DD5-CZ10022",
},
"42847922MDD3003": {
"S10-CZ10004", "S10-CZ10008", "S10-CZ10011", "S10-CZ10012",
},
}
DISPLAY_COLS = {
"site_id": "Site ID",
"institution_name": "Institution",
"pi_full_name": "PI",
"contact_title": "Title",
"last_name": "Last Name",
"first_name": "First Name",
"contact_role": "Role",
"primary_indicator": "Primary",
"phone": "Phone",
"phone_mobile": "Mobile",
"email": "Email",
"contact_start_date": "Start Date",
"contact_end_date": "End Date",
}
STATUS_OPTIONS = ["Aktivní", "Neaktivní", "Všechna"]
DEFAULT_STATUS = "Aktivní"
# ── Perzistence filtrů ─────────────────────────────────────────────────────────
def load_filter_state() -> dict:
if STATE_FILE.exists():
try:
return json.loads(STATE_FILE.read_text(encoding="utf-8"))
except Exception:
pass
return {}
def save_filter_state():
state = {
"sel_status": st.session_state.get("sel_status", DEFAULT_STATUS),
"sel_proto": st.session_state.get("sel_proto", "Všechny"),
"sel_role": st.session_state.get("sel_role", []),
"sel_site": st.session_state.get("sel_site", []),
}
STATE_FILE.write_text(json.dumps(state, ensure_ascii=False, indent=2), encoding="utf-8")
# ── Data ───────────────────────────────────────────────────────────────────────
@st.cache_data(ttl=300)
def load_data() -> pd.DataFrame:
cols = ", ".join(DISPLAY_COLS.keys())
sql = (
f"SELECT protocol_id, file_date, {cols} "
f"FROM {TABLE} "
f"ORDER BY protocol_id, site_id, contact_role, last_name, first_name"
)
conn = mysql.connector.connect(**DB_CONFIG)
cursor = conn.cursor(dictionary=True)
cursor.execute(sql)
rows = cursor.fetchall()
cursor.close()
conn.close()
return pd.DataFrame(rows)
# ── Aplikace ───────────────────────────────────────────────────────────────────
st.set_page_config(page_title="CTMS Contacts", page_icon="🏥", layout="wide")
st.title("🏥 CTMS Contacts — Czechia")
try:
df = load_data()
except Exception as e:
st.error(f"Chyba připojení k MySQL: {e}")
st.stop()
protocols = ["Všechny"] + sorted(df["protocol_id"].unique().tolist())
# Načti uložený stav jednou za session
if "filters_initialized" not in st.session_state:
saved = load_filter_state()
st.session_state["sel_status"] = saved.get("sel_status", DEFAULT_STATUS) if saved.get("sel_status") in STATUS_OPTIONS else DEFAULT_STATUS
st.session_state["sel_proto"] = saved.get("sel_proto", "Všechny") if saved.get("sel_proto") in protocols else "Všechny"
st.session_state["sel_role"] = saved.get("sel_role", [])
st.session_state["sel_site"] = saved.get("sel_site", [])
st.session_state["filters_initialized"] = True
# Role a centra podle vybraného protokolu + aktivní/neaktivní
all_active = set().union(*ACTIVE_SITES.values())
df_opts = df.copy()
if st.session_state["sel_proto"] != "Všechny":
df_opts = df_opts[df_opts["protocol_id"] == st.session_state["sel_proto"]]
if st.session_state["sel_status"] == "Aktivní":
df_opts = df_opts[df_opts["site_id"].isin(all_active) & df_opts["contact_end_date"].isna()]
elif st.session_state["sel_status"] == "Neaktivní":
df_opts = df_opts[~df_opts["site_id"].isin(all_active)]
roles = sorted(df_opts["contact_role"].dropna().unique().tolist())
sites = sorted(df_opts["site_id"].dropna().unique().tolist())
# Pročisti neplatné výběry po změně protokolu
st.session_state["sel_role"] = [r for r in st.session_state["sel_role"] if r in roles]
st.session_state["sel_site"] = [s for s in st.session_state["sel_site"] if s in sites]
# ── Sidebar filtry ─────────────────────────────────────────────────────────────
with st.sidebar:
st.header("Filtry")
st.radio(
"Střediska", STATUS_OPTIONS, horizontal=True,
key="sel_status", on_change=save_filter_state,
)
st.selectbox(
"Protokol", protocols,
key="sel_proto", on_change=save_filter_state,
)
st.multiselect(
"Role", roles,
key="sel_role", on_change=save_filter_state,
)
st.multiselect(
"Site", sites,
key="sel_site", on_change=save_filter_state,
)
search = st.text_input("Hledat (jméno, email…)")
st.divider()
if st.button("🔄 Obnovit data"):
st.cache_data.clear()
st.rerun()
st.caption(f"Naposledy načteno: {pd.Timestamp.now().strftime('%H:%M:%S')}")
# ── Filtrování ─────────────────────────────────────────────────────────────────
filtered = df.copy()
if st.session_state["sel_proto"] != "Všechny":
filtered = filtered[filtered["protocol_id"] == st.session_state["sel_proto"]]
if st.session_state["sel_status"] == "Aktivní":
filtered = filtered[filtered["site_id"].isin(all_active) & filtered["contact_end_date"].isna()]
elif st.session_state["sel_status"] == "Neaktivní":
filtered = filtered[~filtered["site_id"].isin(all_active)]
if st.session_state["sel_role"]:
filtered = filtered[filtered["contact_role"].isin(st.session_state["sel_role"])]
if st.session_state["sel_site"]:
filtered = filtered[filtered["site_id"].isin(st.session_state["sel_site"])]
if search:
mask = filtered.apply(
lambda row: row.astype(str).str.contains(search, case=False, na=False).any(),
axis=1,
)
filtered = filtered[mask]
# ── Metriky ────────────────────────────────────────────────────────────────────
col1, col2, col3, col4 = st.columns(4)
col1.metric("Kontaktů celkem", len(filtered))
col2.metric("Protokolů", filtered["protocol_id"].nunique())
col3.metric("Středisek", filtered["site_id"].nunique())
col4.metric("Rolí", filtered["contact_role"].nunique())
st.divider()
# ── Tabulka ────────────────────────────────────────────────────────────────────
display = filtered[["protocol_id", "file_date"] + list(DISPLAY_COLS.keys())].copy()
display = display.rename(columns={"protocol_id": "Protocol", "file_date": "File Date", **DISPLAY_COLS})
st.dataframe(
display,
width="stretch",
hide_index=True,
column_config={
"Email": st.column_config.LinkColumn("Email", display_text=".*"),
"Start Date": st.column_config.DateColumn("Start Date", format="DD-MMM-YYYY"),
"End Date": st.column_config.DateColumn("End Date", format="DD-MMM-YYYY"),
},
)
st.caption(f"Zobrazeno {len(filtered)} z {len(df)} záznamů")
st.divider()
email_rows = filtered[["first_name", "last_name", "email"]].dropna(subset=["email"])
email_rows = email_rows[email_rows["email"].str.strip() != ""]
entries = [
f"{row.first_name} {row.last_name} <{row.email}>"
for row in email_rows.itertuples()
]
email_str = "; ".join(entries)
if st.button(f"📋 Kopírovat emaily do clipboardu ({len(entries)} adres)"):
if entries:
pyperclip.copy(email_str)
st.success(f"✅ Zkopírováno {len(entries)} adres — vlož přímo do pole Komu.")
@@ -0,0 +1,47 @@
import pandas as pd
CSV_FILE = "filename.csv"
df = pd.read_csv(CSV_FILE, sep=";", encoding="utf-8-sig")
# Parse dates
date_cols = ["Original Due Date", "Due Date", "Window Start Date", "Cutoff Date", "Completed Date"]
for col in date_cols:
df[col] = pd.to_datetime(df[col], errors="coerce")
# Country from site number
df["Country"] = df["Study Site Number"].str.extract(r"DD5-([A-Z]+)\d+")
print("=" * 60)
print("CTMS VISITS EXPORT — přehled dat")
print("=" * 60)
print(f"\nCelkem řádků : {len(df):,}")
print(f"Celkem sloupců: {len(df.columns)}")
print(f"\nSloupce:\n " + "\n ".join(df.columns.tolist()))
print(f"\nSites celkem : {df['Study Site Number'].nunique()}")
print(f"Zemí celkem : {df['Country'].nunique()}")
print(f"Země : {', '.join(sorted(df['Country'].dropna().unique()))}")
print("\nStatus:")
for k, v in df["Status"].value_counts().items():
print(f" {k:<20} {v:>6,}")
print("\nCategory:")
for k, v in df["Category"].value_counts().items():
print(f" {k:<25} {v:>6,}")
print("\nSub Category:")
for k, v in df["Sub Category"].value_counts().items():
print(f" {k:<20} {v:>6,}")
print(f"\nReference kódy: {sorted(df['Reference'].dropna().unique().tolist())}")
print("\nRozsah dat:")
for col in ["Due Date", "Completed Date"]:
vals = df[col].dropna()
if len(vals):
print(f" {col:<20} {vals.min().date()}{vals.max().date()}")
print("\nNáhled (5 řádků):")
print(df.head(5).to_string())
+401
View File
@@ -0,0 +1,401 @@
import pandas as pd
import openpyxl
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, numbers
from openpyxl.utils import get_column_letter
from datetime import date
import os
CSV_FILE = "filename.csv"
SVR_FILE = "Site Visit Report (2).xlsx"
OUTPUT_DIR = os.path.join("..", "..", "CTMS", "output")
os.makedirs(OUTPUT_DIR, exist_ok=True)
today_str = date.today().strftime("%Y-%m-%d")
OUTPUT_FILE = os.path.join(OUTPUT_DIR, f"{today_str} UCO3001 CZ CTMS Visits.xlsx")
# --- Load & filter ---
df = pd.read_csv(CSV_FILE, sep=";", encoding="utf-8-sig")
df["Country"] = df["Study Site Number"].str.extract(r"DD5-([A-Z]+)\d+")
cz = df[df["Country"] == "CZ"].copy()
date_cols = ["Original Due Date", "Due Date", "Window Start Date", "Cutoff Date", "Completed Date"]
for col in date_cols:
cz[col] = pd.to_datetime(cz[col], errors="coerce")
SITES = [
"DD5-CZ10001", "DD5-CZ10003", "DD5-CZ10006", "DD5-CZ10009",
"DD5-CZ10010", "DD5-CZ10012", "DD5-CZ10013", "DD5-CZ10015",
"DD5-CZ10016", "DD5-CZ10020", "DD5-CZ10021", "DD5-CZ10022",
]
cz = cz[cz["Study Site Number"].isin(SITES) & cz["Status"].isin(["Completed", "Scheduled", "Planned"])].copy()
cz["CRA"] = cz["Assigned To Last Name"].fillna("")
# --- Merge Site Visit Report (2) ---
import re as _re
def _svid_to_ref(svid):
svid = str(svid).replace("MCTMS|", "")
if svid == "Qualification Visit": return "SQV"
if svid == "Site Initiation": return "SIV"
if svid == "Closure Visit": return "COV"
m = _re.match(r"Monitoring Visit (\d+)", svid)
return f"IMV{m.group(1)}" if m else svid
svr = pd.read_excel(SVR_FILE, header=5)
svr = svr[svr["Site ID"].isin(SITES)].copy()
svr["Reference"] = svr["Site Visit ID"].apply(_svid_to_ref)
svr = svr[["Site ID", "Reference", "Site Visit Type", "Submitter Name", "Approver Name"]].rename(columns={"Site ID": "Study Site Number"})
cz = cz.merge(svr, on=["Study Site Number", "Reference"], how="left")
# --- Styles ---
FONT_NAME = "Arial"
COL_HEADER = "1F5C99" # dark blue
COL_COMPL = "E2EFDA" # light green
COL_SCHED = "FFF2CC" # light yellow
COL_PLAN = "FCE4D6" # light orange
COL_NA = "F2F2F2" # grey
WHITE = "FFFFFF"
DARK_TEXT = "000000"
STATUS_COLORS = {
"Completed": COL_COMPL,
"Scheduled": COL_SCHED,
"Planned": COL_PLAN,
"Not applicable": COL_NA,
}
thin = Side(style="thin", color="BFBFBF")
med = Side(style="medium", color="808080")
def border(left=thin, right=thin, top=thin, bottom=thin):
return Border(left=left, right=right, top=top, bottom=bottom)
def header_cell(ws, row, col, value, width=None):
c = ws.cell(row=row, column=col, value=value)
c.font = Font(name=FONT_NAME, bold=True, color=WHITE, size=10)
c.fill = PatternFill("solid", fgColor=COL_HEADER)
c.alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
c.border = Border(left=Side(style="medium", color=WHITE),
right=Side(style="medium", color=WHITE),
top=thin, bottom=thin)
if width and col <= ws.max_column or width:
ws.column_dimensions[get_column_letter(col)].width = width
return c
def data_cell(ws, row, col, value, fill_color=WHITE, align="left", bold=False, num_fmt=None, date_val=False):
c = ws.cell(row=row, column=col, value=value)
c.font = Font(name=FONT_NAME, size=9, bold=bold, color=DARK_TEXT)
if fill_color != WHITE:
c.fill = PatternFill("solid", fgColor=fill_color)
c.alignment = Alignment(horizontal=align, vertical="center")
c.border = border()
if num_fmt:
c.number_format = num_fmt
elif date_val and isinstance(value, (pd.Timestamp, type(None))):
c.number_format = "DD-MMM-YYYY"
return c
# =========================================================
# SHEET 1: Přehled per site
# =========================================================
wb = openpyxl.Workbook()
ws1 = wb.active
ws1.title = "Přehled CZ"
ws1.freeze_panes = "A3"
# Title
ws1.merge_cells("A1:M1")
title = ws1["A1"]
title.value = f"UCO3001 — CZ CTMS Visits Overview | {today_str}"
title.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
title.fill = PatternFill("solid", fgColor="2E4057")
title.alignment = Alignment(horizontal="center", vertical="center")
ws1.row_dimensions[1].height = 22
# Headers
headers = [
("Site", 14), ("Investigátor", 22),
("SQV", 11), ("SIV", 11),
("IMV\nCompleted", 11), ("IMV\nScheduled", 11), ("IMV\nPlanned", 11),
("COV", 11),
("Poslední vizita\nDatum", 14), ("Poslední vizita\nTyp", 16),
("Příští vizita\nDatum", 14), ("Příští vizita\nTyp", 16),
("Celkem\nvizit", 10),
]
for ci, (h, w) in enumerate(headers, 1):
header_cell(ws1, 2, ci, h, width=w)
ws1.row_dimensions[2].height = 30
# Data per site
sites = sorted(cz["Study Site Number"].unique())
for ri, site in enumerate(sites, 3):
s = cz[cz["Study Site Number"] == site]
inv_row = s.iloc[0]
inv = f"{inv_row['INV_FIRST_NAME']} {inv_row['INV_LAST_NAME']}"
cra = s["CRA"].replace("", pd.NA).dropna().iloc[0] if not s["CRA"].replace("", pd.NA).dropna().empty else ""
sqv = s[s["Reference"] == "SQV"]
siv = s[s["Reference"] == "SIV"]
cov = s[s["Reference"] == "COV"]
imv = s[s["Category"] == "Monitoring Visit"]
def visit_status(sub):
if sub.empty:
return ("", COL_NA)
st = sub.iloc[0]["Status"]
return (st, STATUS_COLORS.get(st, WHITE))
sqv_st, sqv_c = visit_status(sqv)
siv_st, siv_c = visit_status(siv)
cov_st, cov_c = visit_status(cov)
imv_comp = int((imv["Status"] == "Completed").sum())
imv_sch = int((imv["Status"] == "Scheduled").sum())
imv_plan = int((imv["Status"] == "Planned").sum())
# Last completed
comp = s[s["Status"] == "Completed"].dropna(subset=["Completed Date"])
last_comp = comp.sort_values("Completed Date").iloc[-1] if not comp.empty else None
last_date = last_comp["Completed Date"] if last_comp is not None else None
last_type = last_comp["Reference"] if last_comp is not None else ""
# Next upcoming — pouze vizity s Due Date po poslední Completed
upcoming = s[s["Status"].isin(["Scheduled", "Planned"])].dropna(subset=["Due Date"])
if last_date is not None:
upcoming = upcoming[upcoming["Due Date"] > last_date]
next_vis = upcoming.sort_values("Due Date").iloc[0] if not upcoming.empty else None
next_date = next_vis["Due Date"] if next_vis is not None else None
next_type = next_vis["Reference"] if next_vis is not None else ""
total = len(s)
bg = WHITE if ri % 2 == 0 else "F7F9FC"
row_data = [
(site, "left", True, None, None),
(inv, "left", False, None, None),
(sqv_st, "center", False, None, sqv_c),
(siv_st, "center", False, None, siv_c),
(imv_comp, "center", False, "#,##0", None),
(imv_sch, "center", False, "#,##0", None),
(imv_plan, "center", False, "#,##0", None),
(cov_st, "center", False, None, cov_c),
(last_date, "center", False, "DD-MMM-YY",None),
(last_type, "center", False, None, None),
(next_date, "center", False, "DD-MMM-YY",None),
(next_type, "center", False, None, None),
(total, "center", True, "#,##0", None),
]
for ci, (val, align, bold, fmt, fill) in enumerate(row_data, 1):
fc = fill if fill else bg
c = data_cell(ws1, ri, ci, val, fill_color=fc, align=align, bold=bold)
if fmt:
c.number_format = fmt
ws1.row_dimensions[ri].height = 16
# Autofilter
ws1.auto_filter.ref = f"A2:{get_column_letter(len(headers))}2"
# =========================================================
# SHEET 2: Detail všech CZ vizit
# =========================================================
ws2 = wb.create_sheet("Detail CZ")
ws2.freeze_panes = "A3"
ws2.merge_cells("A1:N1")
t2 = ws2["A1"]
t2.value = f"UCO3001 — CZ CTMS Visits — Detail | {today_str}"
t2.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
t2.fill = PatternFill("solid", fgColor="2E4057")
t2.alignment = Alignment(horizontal="center", vertical="center")
ws2.row_dimensions[1].height = 22
det_headers = [
("Site", 14), ("Investigátor", 22), ("CRA (Submitter)", 24),
("Ref", 9), ("Název vizity", 24), ("Category", 20), ("Sub Category", 16),
("Status", 14),
("Due Date", 13), ("Window Start", 13), ("Cutoff Date", 13), ("Completed Date", 13),
("Typ vizity", 12),
]
for ci, (h, w) in enumerate(det_headers, 1):
header_cell(ws2, 2, ci, h, width=w)
ws2.row_dimensions[2].height = 26
# Sort: site → SQV → SIV → IMV1 → IMV2 … → COV
ref_order = {"SQV": 0, "SIV": 1, "COV": 9999}
def ref_sort_key(ref):
if ref in ref_order:
return ref_order[ref]
import re
m = re.match(r"IMV(\d+)$", str(ref))
return int(m.group(1)) + 1 if m else 5000
cz["_ref_ord"] = cz["Reference"].apply(ref_sort_key)
detail = cz.sort_values(["Study Site Number", "_ref_ord"]).reset_index(drop=True)
for ri, row in detail.iterrows():
r = ri + 3
st = row["Status"]
bg = STATUS_COLORS.get(st, WHITE)
inv = f"{row['INV_FIRST_NAME']} {row['INV_LAST_NAME']}"
submitter = row["Submitter Name"] if pd.notna(row.get("Submitter Name")) else ""
visit_type = row["Site Visit Type"] if pd.notna(row.get("Site Visit Type")) else ""
vals = [
(row["Study Site Number"], "left", True),
(inv, "left", False),
(submitter, "left", False),
(row["Reference"], "center", True),
(row["Visit Name"], "left", False),
(row["Category"], "left", False),
(row["Sub Category"], "left", False),
(st, "center", False),
(row["Due Date"], "center", False),
(row["Window Start Date"], "center", False),
(row["Cutoff Date"], "center", False),
(row["Completed Date"], "center", False),
(visit_type, "center", False),
]
for ci, (val, align, bold) in enumerate(vals, 1):
c = data_cell(ws2, r, ci, val, fill_color=bg, align=align, bold=bold)
if isinstance(val, pd.Timestamp) and not pd.isna(val):
c.value = val.to_pydatetime()
c.number_format = "DD-MMM-YY"
ws2.row_dimensions[r].height = 14
ws2.auto_filter.ref = f"A2:{get_column_letter(len(det_headers))}2"
# =========================================================
# SHEET 3: Nadcházející / Scheduled+Planned
# =========================================================
ws3 = wb.create_sheet("Nadcházející vizity")
ws3.freeze_panes = "A3"
ws3.merge_cells("A1:J1")
t3 = ws3["A1"]
t3.value = f"UCO3001 — CZ — Nadcházející vizity (Scheduled + Planned) | {today_str}"
t3.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
t3.fill = PatternFill("solid", fgColor="2E4057")
t3.alignment = Alignment(horizontal="center", vertical="center")
ws3.row_dimensions[1].height = 22
upc_headers = [
("Due Date", 13), ("Site", 14), ("Investigátor", 22), ("CRA", 14),
("Ref", 9), ("Název vizity", 24), ("Category", 20),
("Status", 12), ("Window Start", 13), ("Cutoff Date", 13),
]
for ci, (h, w) in enumerate(upc_headers, 1):
header_cell(ws3, 2, ci, h, width=w)
ws3.row_dimensions[2].height = 26
upcoming = cz[cz["Status"].isin(["Scheduled", "Planned"])].sort_values(["Due Date", "Study Site Number"]).reset_index(drop=True)
for ri, row in upcoming.iterrows():
r = ri + 3
bg = STATUS_COLORS.get(row["Status"], WHITE)
inv = f"{row['INV_FIRST_NAME']} {row['INV_LAST_NAME']}"
vals = [
(row["Due Date"], "center", True),
(row["Study Site Number"], "left", False),
(inv, "left", False),
(row["CRA"], "center", False),
(row["Reference"], "center", True),
(row["Visit Name"], "left", False),
(row["Category"], "left", False),
(row["Status"], "center", False),
(row["Window Start Date"], "center", False),
(row["Cutoff Date"], "center", False),
]
for ci, (val, align, bold) in enumerate(vals, 1):
c = data_cell(ws3, r, ci, val, fill_color=bg, align=align, bold=bold)
if isinstance(val, pd.Timestamp) and not pd.isna(val):
c.value = val.to_pydatetime()
c.number_format = "DD-MMM-YY"
ws3.row_dimensions[r].height = 14
ws3.auto_filter.ref = f"A2:{get_column_letter(len(upc_headers))}2"
# =========================================================
# SHEET 4: Problémy — datové nesoulady
# =========================================================
ws4 = wb.create_sheet("Problémy")
ws4.freeze_panes = "A3"
# Načteme původní data bez statusového filtru pro detekci problémů
df_raw = pd.read_csv(CSV_FILE, sep=";", encoding="utf-8-sig")
df_raw["Country"] = df_raw["Study Site Number"].str.extract(r"DD5-([A-Z]+)\d+")
cz_raw = df_raw[df_raw["Study Site Number"].isin(SITES)].copy()
for col in date_cols:
cz_raw[col] = pd.to_datetime(cz_raw[col], errors="coerce")
cz_raw["CRA"] = cz_raw["Assigned To Last Name"].fillna("")
cz_raw = cz_raw.merge(svr, on=["Study Site Number", "Reference"], how="left")
cz_raw["Submitter Name"] = cz_raw["Submitter Name"].fillna("")
problems = []
# Pravidlo 1: Completed Date vyplněno ale Status ≠ Completed
mask1 = cz_raw["Completed Date"].notna() & (cz_raw["Status"] != "Completed")
for _, row in cz_raw[mask1].iterrows():
problems.append((row, "Completed Date je vyplněno, ale Status není Completed"))
# Seřadit podle site a reference
import re as _re
def _ref_key(ref):
if ref == "SQV": return 0
if ref == "SIV": return 1
if ref == "COV": return 9999
m = _re.match(r"IMV(\d+)$", str(ref))
return int(m.group(1)) + 1 if m else 5000
problems.sort(key=lambda x: (x[0]["Study Site Number"], _ref_key(x[0]["Reference"])))
COL_PROBLEM = "FFC7CE" # světle červená
ws4.merge_cells("A1:M1")
t4 = ws4["A1"]
t4.value = f"UCO3001 — CZ — Datové problémy k opravě v OneCTMS | {today_str}"
t4.font = Font(name=FONT_NAME, bold=True, size=12, color=WHITE)
t4.fill = PatternFill("solid", fgColor="C00000")
t4.alignment = Alignment(horizontal="center", vertical="center")
ws4.row_dimensions[1].height = 22
prob_headers = [
("Site", 14), ("Investigátor", 22), ("CRA (Submitter)", 24),
("Ref", 9), ("Název vizity", 24), ("Category", 18),
("Status", 14),
("Due Date", 13), ("Completed Date", 13),
("", 2),
("Důvod — co je potřeba opravit v OneCTMS", 50),
]
for ci, (h, w) in enumerate(prob_headers, 1):
header_cell(ws4, 2, ci, h, width=w)
ws4.row_dimensions[2].height = 26
for ri, (row, reason) in enumerate(problems, 3):
inv = f"{row['INV_FIRST_NAME']} {row['INV_LAST_NAME']}"
vals = [
(row["Study Site Number"], "left", True, None),
(inv, "left", False, None),
(row["Submitter Name"], "left", False, None),
(row["Reference"], "center", True, None),
(row["Visit Name"], "left", False, None),
(row["Category"], "left", False, None),
(row["Status"], "center", False, None),
(row["Due Date"], "center", False, "DD-MMM-YY"),
(row["Completed Date"], "center", False, "DD-MMM-YY"),
("", "center", False, None),
(reason, "left", True, None),
]
for ci, (val, align, bold, fmt) in enumerate(vals, 1):
c = data_cell(ws4, ri, ci, val, fill_color=COL_PROBLEM, align=align, bold=bold)
if fmt and isinstance(val, pd.Timestamp) and not pd.isna(val):
c.value = val.to_pydatetime()
c.number_format = fmt
ws4.row_dimensions[ri].height = 16
ws4.auto_filter.ref = f"A2:{get_column_letter(len(prob_headers))}2"
wb.save(OUTPUT_FILE)
print(f"Report uložen: {OUTPUT_FILE}")
print(f" Sheet 'Přehled CZ' : {len(sites)} sites")
print(f" Sheet 'Detail CZ' : {len(detail)} řádků")
print(f" Sheet 'Nadcházející vizity': {len(upcoming)} vizit")
print(f" Sheet 'Problémy' : {len(problems)} záznamů")
+44
View File
@@ -0,0 +1,44 @@
# OneCTMS — Visit Schedule Notes
## Zdroj
LTM Local Trial Manager OneCTMS Manual, ver. 11.0, 15-Dec-2024 (stránky 1128)
## Statusy vizit
| Status | Popis |
|---|---|
| **Planned** | Vizita existuje v harmonogramu, SM zatím nezadal Visit Start Date. Dropdown nabídka jej obsahuje, ale manuál jeho použití na str. 1128 blíže nevysvětluje. |
| **Scheduled** | SM zadal Visit Start Date → datum se automaticky propíše do ATLAS jako "Next Scheduled Visit Date". |
| **Completed** | SM označil vizitu za dokončenou. |
| **Not applicable** | Nevyužitý placeholder — prázdný slot ze DSM šablony (výchozích 50 MV). Nemá vypovídací hodnotu, filtrujeme ven. |
Přechod stavů dle manuálu (str. 24):
```
Planned → Scheduled → Completed
```
## DSM specifika (studie UCO3001)
- Studie používá **Dynamic Site Monitoring (DSM)** — šablona SIV + SCV + 50 MV s 8týdenními intervaly.
- **Due Date se v DSM nepoužívá pro řazení** — vizity se řadí podle číselné sekvence (IMV1, IMV2, ...).
- Správné pořadí vizit: **SQV → SIV → IMV1 → IMV2 → … → COV**
- `Not applicable` vizity jsou nevyužité sloty šablony → vyřadit z reportů a počtů.
- `Planned` vizity jsou reálné budoucí vizity bez potvrzeného data → ponechat.
## Zdrojové soubory
| Soubor | Systém | Odkud |
|---|---|---|
| `filename.csv` | **OneCTMS** | modul Visits → EMEA export (středníkový CSV) |
| `Site Visit Report (2).xlsx` | **VIPER** | SVR Metrics report |
`filename.csv` obsahuje harmonogram vizit (plánované i completed), ale pole Assigned To je vyplněno nesystematicky — nelze spolehlivě použít jako zdroj CRA.
`Site Visit Report (2).xlsx` obsahuje pouze vizity se schváleným reportem (SVR Status = Reviewed and Approved), ale má klíčové pole **Submitter Name** = kdo vizitu reálně provedl. Oba soubory se propojují přes Site ID + Reference (SQV/SIV/IMV1...).
## Report skript
`20_report_CZ.py` — generuje Excel report pro 12 CZ center (Buzalka/Cetkovská porfolio):
- Sheet **Přehled CZ** — souhrn per site
- Sheet **Detail CZ** — všechny vizity, řazeno SQV→SIV→IMV1…→COV
- Sheet **Nadcházející vizity** — Scheduled + Planned seřazeno dle Due Date
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large Load Diff
Binary file not shown.
Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More