z230

2026-06-19 14:28:20 +02:00
parent c9e592d58f
commit 1bc7950520
43 changed files with 802 additions and 21038 deletions
@@ -1,40 +0,0 @@
-# sipiq_import_v1.2 — import SIPIQ odpovědí (folder workflow + provenance)
-
-**Verze:** 1.2 · **Datum:** 2026-06-17 · **Studie:** 77242113UCO3002 (ICONIC / DAWN)
-
-## Změny
- **v1.2:** ke každé odpovědi `source_exported_at` = **datum/čas reportu podle filesystému**
-  (mtime CSV souboru). Mimo content-hash → nezpůsobuje zbytečné UPDATE; backfilluje se i na
-  "beze změny" cestě. v1.1 → `Feasibility\TRASH`.
- **v1.1:** FOLDER workflow (`--folder`) — sebere *.csv, delta import, přesun do `Zpracováno`.
-
-## Kolekce
- `sipiq_questions` — slovník dotazníku (rekonstrukce SIPIQ jako v PDF).
- `sipiq_responses` — 1 dok = 1 odpověď (`_id`=ResponseId), ploché `answers{}`,
-  soft-link `investigator_oid`, `source_file` + `source_exported_at`, delta + `history[]`.
-
-Zdroj = Qualtrics **CSV** (ř.1 Qcode, ř.2 text otázky, ř.3 ImportId=QID). Export labels,
-desetinná tečka, recode unanswered vypnuté.
-
-## Delta (přepíše JEN změněná data)
-nová→INSERT; beze změn (shodný `content_sha256`)→jen `last_seen_at` + `source_file` + `source_exported_at`;
-změna→`$set` jen změněných polí + `$push` do `history[]`.
-
-## Soft-link na investigators (nedestruktivní)
-pi_email → email/email2 (lower), pak recipient_email, fallback příjmení (bez diakritiky)+země.
-
-## Použití
-```
-.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --dry-run     # folder režim, default složka
-.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --apply
-.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --folder "<cesta>" --apply
-.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --csv "<cesta.csv>" --apply   # jediný soubor, NEpřesouvá
-```
-Default složka `…\77242113UCO2001\ImportSIPIQcompled`; přesun do `Zpracováno` jen v `--apply` + folder režimu.
-`--scope czsk` (default) / `all`. Default = dry-run.
-
-## Workflow
-Uživatel pokládá kompletní SIPIQ reporty (Qualtrics CSV, název
-`ICONIC+Phase+3b+UC+Study+(77242113UCO3002)_SipIQ_V1_13MAY2026_<datum>_<čas>.csv`) do
-`ImportSIPIQcompled\`. Po `--apply` se naimportují (delta) a přesunou do `Zpracováno\`.
-`source_exported_at` se bere z mtime souboru (datum/čas reportu dle filesystému).
@@ -1,489 +0,0 @@
-#!/usr/bin/env python3
-# -*- coding: utf-8 -*-
-"""
-sipiq_import_v1.2.py
-====================
-Verze:  1.2
-Datum:  2026-06-17
-Autor:  Claude Code (pro MUDr. Vladimíra Buzalku)
-
-Změny proti v1.1
----------------
- PROVENANCE: ke každé odpovědi se ukládá `source_exported_at` = datum/čas reportu
-  podle FILESYSTÉMU (mtime CSV souboru). Mimo content-hash → nezpůsobuje zbytečné
-  UPDATE; backfilluje se i na "beze změny" cestě. Stará v1.1 ponechána v TRASH.
-
-Změny proti v1.0
----------------
- FOLDER WORKFLOW (v1.1): režim --folder sebere *.csv ve složce, naimportuje (delta)
-  a přesune do podsložky `Zpracováno`. Default složka =
-  U:\\PythonProject\\Janssen\\Feasibility\\77242113UCO2001\\ImportSIPIQcompled.
-
-Popis
-----
-Import SIPIQ odpovědí (Qualtrics CSV export, studie 77242113UCO3002 / ICONIC DAWN)
-do MongoDB db `feasibility`. Dvě kolekce:
-  * sipiq_questions  – slovník dotazníku (1 dok = 1 logická otázka).
-  * sipiq_responses  – 1 dok = 1 odpověď (_id = Qualtrics ResponseId), ploché answers{},
-                       soft-link investigator_oid, delta bookkeeping + history[].
-
-DELTA import (přepíše JEN změněná data): nová->insert; beze změn->jen last_seen_at;
-změna->$set jen změněných polí + push do history[].
-
-Použití
-------
-  python sipiq_import_v1.2.py --dry-run            # folder režim, default složka
-  python sipiq_import_v1.2.py --apply
-  python sipiq_import_v1.2.py --folder "<cesta>" --apply
-  python sipiq_import_v1.2.py --csv "<cesta.csv>" --apply   # jediný soubor (NEpřesouvá)
-
-Závislosti: pymongo (.venv). Mongo 192.168.1.76:27017, bez auth.
-"""
-import argparse
-import csv
-import glob
-import hashlib
-import json
-import os
-import re
-import shutil
-import sys
-import unicodedata
-from datetime import datetime, timezone
-
-try:
-    from pymongo import MongoClient
-except ImportError:
-    print("CHYBA: pymongo není nainstalován v aktuálním pythonu.", file=sys.stderr)
-    raise
-
-MONGO_URI = "mongodb://192.168.1.76:27017"
-DB_NAME = "feasibility"
-COL_Q = "sipiq_questions"
-COL_R = "sipiq_responses"
-DEFAULT_FOLDER = r"U:\PythonProject\Janssen\Feasibility\77242113UCO2001\ImportSIPIQcompled"
-PROCESSED_SUBDIR = "Zpracováno"
-
-META_COLS = {
-    "StartDate", "EndDate", "Status", "IPAddress", "Progress", "Duration (in seconds)",
-    "Finished", "RecordedDate", "ResponseId", "RecipientLastName", "RecipientFirstName",
-    "RecipientEmail", "ExternalReference", "LocationLatitude", "LocationLongitude",
-    "DistributionChannel", "UserLanguage",
-}
-
-PROMOTE = [
-    "site_name", "site_address", "site_city", "site_state", "site_postcode", "site_country",
-    "pi_first_name", "pi_last_name", "pi_phone", "pi_email",
-    "sdl_site_id", "fire_site_id", "fire_investigator_id", "mailinglist_id",
-    "survey_generated_by", "Date", "Time",
-]
-
-SECTION_BY_QNUM = {}
-def _sec(rng, name):
-    for n in rng:
-        SECTION_BY_QNUM[n] = name
-_sec([2], "J&J Internal Assessment")
-_sec([6, 7, 8, 9, 10, 11, 12, 13], "Contact Information")
-_sec(range(14, 22), "Confidentiality Statement")
-_sec([25, 26, 27], "Interest")
-_sec([29, 30, 31, 32, 33, 34], "Protocol Requirements")
-_sec([36, 37, 38], "Enrollment")
-_sec([40, 41, 42, 43], "Patient Demographics Overview")
-_sec([45, 46, 47, 48, 49], "Site Overview")
-_sec([51], "Operational Considerations")
-_sec([53, 54], "Comments")
-_sec([57, 58, 59, 60, 61], "Patient Population")
-_sec([63, 64, 65, 66, 67], "Site Experience and Staffing")
-_sec([69], "Equipment and Facility Requirements")
-_sec([71, 72, 73, 74, 75], "Institutional Review Board, Ethics Committee, and Contracts")
-
-STEM_OVERRIDE = {
-    "Q31": "At your site, at what line(s) of treatment do you most commonly prescribe "
-           "vedolizumab for patients with moderately to severely active ulcerative colitis?",
-    "Q63": "Do you or your site staff have experience in performing the following types of "
-           "study assessments/procedures?",
-    "Q64": "The following personnel are required to run the study. "
-           "Will your site have the following available?",
-    "Q69": "The following equipment and facilities are required to run the studies. "
-           "Are these available at your site?",
-}
-
-
-def now_iso():
-    return datetime.now(timezone.utc).astimezone().isoformat(timespec="seconds")
-
-
-def file_mtime_iso(path):
-    return datetime.fromtimestamp(os.path.getmtime(path)).astimezone().isoformat(timespec="seconds")
-
-
-def strip_accents(s):
-    if not s:
-        return ""
-    return "".join(c for c in unicodedata.normalize("NFKD", s) if not unicodedata.combining(c))
-
-
-def norm_name(s):
-    return re.sub(r"\s+", " ", strip_accents(s or "").lower()).strip()
-
-
-def sanitize_key(qcode):
-    return qcode.replace("#", "_").replace(".", "_")
-
-
-def qnum(qcode):
-    m = re.match(r"Q(\d+)", qcode)
-    return int(m.group(1)) if m else None
-
-
-def qbase(qcode):
-    m = re.match(r"(Q\d+)", qcode)
-    return m.group(1) if m else qcode
-
-
-def import_id(h3_cell):
-    try:
-        return json.loads(h3_cell).get("ImportId", "")
-    except Exception:
-        return h3_cell
-
-
-def split_text(text):
-    parts = [p.strip() for p in re.split(r"\s+-\s+", text)]
-    stem = parts[0]
-    if len(parts) == 1:
-        return stem, None
-    label_parts = [p for p in parts[1:] if p.lower() != "selected choice"]
-    label_parts = [p for p in label_parts if not re.fullmatch(r"Q\d+#\d+", p)]
-    return stem, (" - ".join(label_parts) if label_parts else None)
-
-
-def detect_type(qcode, observed):
-    has_hash = "#" in qcode
-    vals = [v for v in observed if v]
-    yesno = vals and all(v in ("Yes", "No") for v in vals)
-    numeric = vals and all(re.fullmatch(r"-?\d+(\.\d+)?", v) for v in vals)
-    if has_hash and yesno:
-        return "matrix_yesno"
-    if has_hash and numeric:
-        return "matrix_percent"
-    if has_hash:
-        return "matrix"
-    if numeric:
-        return "numeric"
-    if yesno:
-        return "yesno"
-    return "single_or_text"
-
-
-def load_csv(path):
-    with open(path, encoding="utf-8-sig", newline="") as fh:
-        rows = list(csv.reader(fh))
-    h1, h2, h3 = rows[0], rows[1], rows[2]
-    data = rows[3:]
-    cols = [{"i": i, "code": c, "text": t, "qid": import_id(j)}
-            for i, (c, t, j) in enumerate(zip(h1, h2, h3))]
-    return cols, data
-
-
-def col_getter(cols, data):
-    idx = {c["code"]: c["i"] for c in cols}
-    def get(row, code):
-        i = idx.get(code)
-        return (row[i].strip() if i is not None and i < len(row) else "")
-    return get, idx
-
-
-def is_question_col(code):
-    return bool(re.match(r"Q\d", code))
-
-
-def build_questions(cols, data):
-    qcols = [c for c in cols if is_question_col(c["code"])]
-    observed = {c["code"]: set() for c in qcols}
-    for row in data:
-        for c in qcols:
-            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
-            if v:
-                observed[c["code"]].add(v)
-    groups, order_seen = {}, []
-    for c in qcols:
-        base = qbase(c["code"])
-        if base not in groups:
-            groups[base] = {"_id": base, "order": c["i"], "qnum": qnum(c["code"]),
-                            "section": SECTION_BY_QNUM.get(qnum(c["code"]), "Other"),
-                            "qids": [], "text": split_text(c["text"])[0],
-                            "items": [], "_obs": set(), "_types": []}
-            order_seen.append(base)
-        g = groups[base]
-        bq = re.match(r"(QID\d+)", c["qid"] or "")
-        if bq and bq.group(1) not in g["qids"]:
-            g["qids"].append(bq.group(1))
-        _, label = split_text(c["text"])
-        item = {"key": sanitize_key(c["code"]), "qcode": c["code"], "qid": c["qid"]}
-        if label:
-            item["label"] = label
-        g["items"].append(item)
-        g["_obs"] |= observed[c["code"]]
-        g["_types"].append(detect_type(c["code"], observed[c["code"]]))
-    out = []
-    for n, base in enumerate(order_seen):
-        g = groups[base]
-        obs = sorted(g.pop("_obs"))
-        types = g.pop("_types")
-        gtype = max(set(types), key=types.count) if types else "single_or_text"
-        g["type"] = gtype
-        if gtype in ("yesno", "matrix_yesno"):
-            g["options"] = ["Yes", "No"]
-        elif gtype == "single_or_text" and obs and len(obs) <= 12:
-            g["options"] = obs
-        else:
-            g["options"] = []
-        if base in STEM_OVERRIDE:
-            g["text"] = STEM_OVERRIDE[base]
-        g["order"] = n
-        if len(g["items"]) == 1 and "label" not in g["items"][0]:
-            g["items"] = []
-        out.append(g)
-    return out
-
-
-def build_response(cols, get, row, source_file):
-    rid = get(row, "ResponseId")
-    answers = {}
-    for c in cols:
-        if is_question_col(c["code"]):
-            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
-            if v:
-                answers[sanitize_key(c["code"])] = v
-    meta = {
-        "start_date": get(row, "StartDate") or None,
-        "end_date": get(row, "EndDate") or None,
-        "recorded_date": get(row, "RecordedDate") or None,
-        "status": get(row, "Status") or None,
-        "progress": int(get(row, "Progress")) if get(row, "Progress").isdigit() else (get(row, "Progress") or None),
-        "finished": get(row, "Finished") in ("True", "1", "TRUE"),
-        "duration_sec": int(get(row, "Duration (in seconds)")) if get(row, "Duration (in seconds)").isdigit() else None,
-        "user_language": get(row, "UserLanguage") or None,
-        "distribution_channel": get(row, "DistributionChannel") or None,
-        "ip_address": get(row, "IPAddress") or None,
-        "location_lat": get(row, "LocationLatitude") or None,
-        "location_lng": get(row, "LocationLongitude") or None,
-        "survey_date": get(row, "Date") or None,
-        "survey_time": get(row, "Time") or None,
-    }
-    doc = {
-        "_id": rid, "study": "77242113UCO3002",
-        "site_country": get(row, "site_country") or None,
-        "site_name": get(row, "site_name") or None,
-        "site_city": get(row, "site_city") or None,
-        "site_state": get(row, "site_state") or None,
-        "site_postcode": get(row, "site_postcode") or None,
-        "site_address": get(row, "site_address") or None,
-        "pi_first_name": get(row, "pi_first_name") or None,
-        "pi_last_name": get(row, "pi_last_name") or None,
-        "pi_email": (get(row, "pi_email") or "").lower() or None,
-        "pi_phone": get(row, "pi_phone") or None,
-        "sdl_site_id": get(row, "sdl_site_id") or None,
-        "fire_site_id": get(row, "fire_site_id") or None,
-        "fire_investigator_id": get(row, "fire_investigator_id") or None,
-        "mailinglist_id": get(row, "mailinglist_id") or None,
-        "survey_generated_by": get(row, "survey_generated_by") or None,
-        "recipient_email": (get(row, "RecipientEmail") or "").lower() or None,
-        "recipient_last_name": get(row, "RecipientLastName") or None,
-        "recipient_first_name": get(row, "RecipientFirstName") or None,
-        "meta": meta,
-        "is_full_sipiq": any(k.startswith(("Q57", "Q58", "Q59", "Q63", "Q66", "Q71")) for k in answers),
-        "interested": answers.get("Q25"),
-        "answers": answers,
-        "investigator_oid": None, "investigator_match": None,
-        "source_file": source_file,
-    }
-    return doc
-
-
-def content_hash(doc):
-    payload = {k: doc[k] for k in doc if k not in
-               ("content_sha256", "first_imported_at", "last_seen_at", "last_updated_at",
-                "history", "investigator_oid", "investigator_match", "source_file",
-                "source_exported_at")}
-    return hashlib.sha256(json.dumps(payload, sort_keys=True, ensure_ascii=False,
-                                     default=str).encode("utf-8")).hexdigest()
-
-
-def load_investigators(db):
-    inv = list(db.investigators.find(
-        {"zeme": {"$in": ["Czech Republic", "Slovakia"]}},
-        {"prijmeni": 1, "jmeno": 1, "email": 1, "email2": 1, "zeme": 1, "KROK": 1}))
-    by_email, by_name = {}, {}
-    for d in inv:
-        for ef in ("email", "email2"):
-            e = (d.get(ef) or "").lower().strip()
-            if e:
-                by_email.setdefault(e, d)
-        nm = norm_name(d.get("prijmeni"))
-        if nm:
-            by_name.setdefault((nm, d.get("zeme")), []).append(d)
-    return inv, by_email, by_name
-
-
-def soft_link(doc, by_email, by_name):
-    e = (doc.get("pi_email") or "").lower().strip()
-    if e and e in by_email:
-        d = by_email[e]; return d["_id"], f"email:{e}", d
-    e2 = (doc.get("recipient_email") or "").lower().strip()
-    if e2 and e2 in by_email:
-        d = by_email[e2]; return d["_id"], f"recipient_email:{e2}", d
-    nm = norm_name(doc.get("pi_last_name"))
-    cand = by_name.get((nm, doc.get("site_country")), [])
-    if len(cand) == 1:
-        return cand[0]["_id"], f"prijmeni:{nm}", cand[0]
-    if len(cand) > 1:
-        return None, f"prijmeni_ambiguous:{nm}({len(cand)})", None
-    return None, "NENALEZENO", None
-
-
-def diff_docs(old, new):
-    changes = []
-    def walk(prefix, o, n):
-        for k in sorted(set((o or {}).keys()) | set((n or {}).keys())):
-            ov, nv = (o or {}).get(k), (n or {}).get(k)
-            if isinstance(ov, dict) or isinstance(nv, dict):
-                walk(f"{prefix}{k}.", ov or {}, nv or {})
-            elif ov != nv:
-                changes.append({"key": f"{prefix}{k}", "old": ov, "new": nv})
-    for field in ("answers", "meta"):
-        walk(f"{field}.", old.get(field, {}), new.get(field, {}))
-    for k in ("site_name", "pi_email", "pi_last_name", "interested", "is_full_sipiq"):
-        if old.get(k) != new.get(k):
-            changes.append({"key": k, "old": old.get(k), "new": new.get(k)})
-    return changes
-
-
-# ---------------------------------------------------------------------------
-def process_file(db, csv_path, scope, dry, by_email, by_name):
-    source_file = os.path.basename(csv_path)
-    exported_at = file_mtime_iso(csv_path)   # datum/čas reportu dle filesystému (mtime)
-    cols, data = load_csv(csv_path)
-    get, _ = col_getter(cols, data)
-    if scope == "czsk":
-        data = [r for r in data if get(r, "site_country") in ("Czech Republic", "Slovakia")]
-    print(f"\n########## {source_file}  (rozsah={scope}, odpovědí={len(data)}, export={exported_at}) ##########")
-
-    cols_all, data_all = load_csv(csv_path)
-    questions = build_questions(cols_all, data_all)
-
-    docs, link_rows = [], []
-    for r in data:
-        doc = build_response(cols, get, r, source_file)
-        oid, how, matched = soft_link(doc, by_email, by_name)
-        doc["investigator_oid"] = oid
-        doc["investigator_match"] = how
-        doc["source_exported_at"] = exported_at
-        doc["content_sha256"] = content_hash(doc)
-        docs.append(doc)
-        link_rows.append((doc, how, matched))
-
-    existing = {d["_id"]: d for d in db[COL_R].find({}, {"content_sha256": 1})}
-    to_insert = [d for d in docs if d["_id"] not in existing]
-    to_update = [d for d in docs if d["_id"] in existing and existing[d["_id"]].get("content_sha256") != d["content_sha256"]]
-    unchanged = [d for d in docs if d["_id"] in existing and existing[d["_id"]].get("content_sha256") == d["content_sha256"]]
-
-    mk7 = mko = un = 0
-    for doc, how, m in link_rows:
-        krok = (m or {}).get("KROK", "")
-        if m and str(krok).startswith("7"): mk7 += 1
-        elif m: mko += 1
-        else: un += 1
-    print(f"  slovník: {len(questions)} otázek | soft-link: KROK7={mk7}, jiný={mko}, nenapárováno={un}")
-    print(f"  delta: INSERT={len(to_insert)}, UPDATE={len(to_update)}, beze změny={len(unchanged)}")
-    if un:
-        for doc, how, m in link_rows:
-            if not m:
-                print(f"    ✗ NENAPÁROVÁNO: {doc.get('pi_last_name')} / {doc.get('pi_email')} ({how})")
-
-    if dry:
-        print("  [DRY-RUN] nezapsáno")
-        return {"insert": 0, "update": 0, "unchanged": 0, "wrote": False}
-
-    for q in questions:
-        db[COL_Q].replace_one({"_id": q["_id"]}, q, upsert=True)
-    ts = now_iso()
-    ni = nu = ns = 0
-    for d in docs:
-        cur = db[COL_R].find_one({"_id": d["_id"]})
-        if cur is None:
-            d.update({"first_imported_at": ts, "last_seen_at": ts, "last_updated_at": ts, "history": []})
-            db[COL_R].insert_one(d); ni += 1
-        elif cur.get("content_sha256") != d["content_sha256"]:
-            changes = diff_docs(cur, d)
-            db[COL_R].update_one({"_id": d["_id"]}, {
-                "$set": {**{k: d[k] for k in d if k != "_id"}, "last_seen_at": ts, "last_updated_at": ts},
-                "$push": {"history": {"changed_at": ts, "source_file": source_file, "changes": changes}}})
-            nu += 1
-        else:
-            db[COL_R].update_one({"_id": d["_id"]}, {"$set": {
-                "last_seen_at": ts, "source_file": source_file, "source_exported_at": d["source_exported_at"]}})
-            ns += 1
-    print(f"  [APPLY] questions upsert={len(questions)} | responses insert={ni}, update={nu}, beze změny={ns}")
-    return {"insert": ni, "update": nu, "unchanged": ns, "wrote": True}
-
-
-def move_to_processed(csv_path, folder):
-    dest_dir = os.path.join(folder, PROCESSED_SUBDIR)
-    os.makedirs(dest_dir, exist_ok=True)
-    base = os.path.basename(csv_path)
-    dest = os.path.join(dest_dir, base)
-    if os.path.exists(dest):
-        stem, ext = os.path.splitext(base)
-        n = 1
-        while os.path.exists(os.path.join(dest_dir, f"{stem}_{n}{ext}")):
-            n += 1
-        dest = os.path.join(dest_dir, f"{stem}_{n}{ext}")
-    shutil.move(csv_path, dest)
-    print(f"  -> přesunuto do {PROCESSED_SUBDIR}\\{os.path.basename(dest)}")
-
-
-def main():
-    ap = argparse.ArgumentParser()
-    ap.add_argument("--csv", help="jediný soubor (NEpřesouvá)")
-    ap.add_argument("--folder", default=DEFAULT_FOLDER, help="složka se SIPIQ CSV (přesune do Zpracováno)")
-    ap.add_argument("--scope", choices=["czsk", "all"], default="czsk")
-    ap.add_argument("--apply", action="store_true")
-    ap.add_argument("--dry-run", action="store_true")
-    args = ap.parse_args()
-    dry = not args.apply
-
-    if args.csv:
-        files, move_mode, folder = [args.csv], False, None
-    else:
-        folder = args.folder
-        files = sorted(glob.glob(os.path.join(folder, "*.csv")))
-        move_mode = True
-        print(f"Složka: {folder}\nNalezeno CSV ke zpracování: {len(files)}")
-        if not files:
-            print("Nic ke zpracování (žádné *.csv).")
-            return
-
-    client = MongoClient(MONGO_URI, serverSelectionTimeoutMS=8000)
-    db = client[DB_NAME]
-    client.admin.command("ping")
-    inv, by_email, by_name = load_investigators(db)
-    print(f"Investigatorů CZ+SK v DB: {len(inv)}")
-
-    total = {"insert": 0, "update": 0, "unchanged": 0}
-    for f in files:
-        res = process_file(db, f, args.scope, dry, by_email, by_name)
-        for k in total:
-            total[k] += res[k]
-        if move_mode and res["wrote"]:
-            move_to_processed(f, folder)
-
-    print(f"\n=== CELKEM: insert={total['insert']}, update={total['update']}, beze změny={total['unchanged']} ===")
-    if dry:
-        print("[DRY-RUN] Nic se nezapsalo ani nepřesunulo. Ostrý běh: --apply")
-    client.close()
-
-
-if __name__ == "__main__":
-    main()
@@ -144,7 +144,9 @@ def main():
                f'{body}</body></html>')
        msg.set_content(html, subtype="html", charset="utf-8", cte="base64")
        os.makedirs(OUT_DIR, exist_ok=True)
-        fn = f"pripominka_sipiq_{ascii_slug(d['prijmeni'])}_18JUN2026.eml"
+        _t = date.today()
+        _dd = _t.strftime("%d") + ["JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"][_t.month-1] + _t.strftime("%Y")
+        fn = f"pripominka_sipiq_{ascii_slug(d['prijmeni'])}_{_dd}.eml"
        with open(os.path.join(OUT_DIR, fn), "wb") as fh:
            fh.write(bytes(msg))
        done.append(fn)
@@ -1,38 +0,0 @@
-# store_cda_seaweed_v1.0.py
-
-**Verze:** 1.0 · **Datum:** 2026-06-17
-
-## Účel
-Uloží podepsané CDA (PDF) z e-mailů asistentek (CTA) do Mongo
-`feasibility.investigators` do pole `cda.*` a posune lékaře na
-`KROK "5 - CDA podepsano"`.
-
-Na rozdíl od `store_cda_batch` (stahuje `.msg` přes SFTP z Toweru a tahá přílohu
-přes `extract_msg`) tahle verze stahuje PDF **přímo ze SeaweedFS** přes
-`seaweed_url`, který parser ukládá k příloze v `emaily."vbuzalka@its.jnj.com"`
-(`attachments[].seaweed_url` + `sha256`). Jednodušší, bez SFTP.
-
-## Jak to funguje
- `MAPPING` = explicitní párování `investigator _id → (seaweed_url, filename, sha256, size, source_msg_id)`.
- Pro každý záznam: stáhne PDF (urllib), ověří **SHA256 + velikost + PDF hlavičku**,
-  base64-zakóduje a uloží do `cda`:
-  `data_base64, data_sha256, data_filename, data_mime, data_size, data_stored_at,
-  data_source_msg` + metadata `stav="podepsano", soubor, zdroj`.
- Nastaví `KROK = "5 - CDA podepsano"` a předřadí řádek do `STATUS`.
- `_id` se konvertuje na `ObjectId` (čisté pymongo nekonvertuje string→ObjectId samo).
-
-## Použití
-```
-.venv\Scripts\python.exe Feasibility\store_cda_seaweed_v1.0.py            # dry-run (ověří stažení+SHA, nezapisuje)
-.venv\Scripts\python.exe Feasibility\store_cda_seaweed_v1.0.py --apply    # zapíše do Mongo
-```
-
-## Běh 17JUN2026 (--apply)
-Uloženo 5/5 (všechny SHA256 OK), KROK 4 → 5:
-Závada Filip, Bruncák Michal (FNsP B. Bystrica), Machytka Evžen (Asclepiades),
-Pumprla Jiří (PreventaMed), Zapotocká Júlia (PAV-MED).
-GASTROMART/Molnár přeskočen (už KROK 6, CDA dříve uloženo).
-
-## Závislosti
-`pymongo`, `bson` (+ stdlib). SeaweedFS volume server `192.168.1.50:8888`.
-Mongo `192.168.1.76:27017`.
@@ -1,126 +0,0 @@
-# -*- coding: utf-8 -*-
-# =============================================================================
-# Nazev:   store_cda_seaweed_v1.0.py
-# Verze:   1.0
-# Datum:   2026-06-17
-# Popis:   Ulozi podepsane CDA (PDF) z e-mailu asistentek do Mongo
-#          feasibility.investigators do pole cda.* a posune lekare na
-#          KROK "5 - CDA podepsano". PDF se stahuji primo ze SeaweedFS
-#          (seaweed_url z attachments v emaily."vbuzalka@its.jnj.com"),
-#          overuje se SHA256 proti metadatum z Mongo.
-# Pouziti: python store_cda_seaweed_v1.0.py           (dry-run / nahled)
-#          python store_cda_seaweed_v1.0.py --apply    (zapise do Mongo)
-# Pozn.:   MAPPING nize = explicitni parovani investigator -> CDA priloha.
-#          Jen stdlib + pymongo. SeaweedFS host 192.168.1.50:8888.
-# =============================================================================
-
-import sys
-import base64
-import hashlib
-import urllib.request
-from datetime import datetime, timezone
-from pymongo import MongoClient
-from bson import ObjectId
-
-MONGO_URI = "mongodb://192.168.1.76:27017"
-DBN, COL = "feasibility", "investigators"
-
-# (investigator _id, seaweed_url, filename, sha256, size, source_msg_id, label)
-MAPPING = [
-    ("6a198b661218c31ab0f5ba57",
-     "http://192.168.1.50:8888/mail-attachments/1a/86/1a86e987b9d3da57c1d863b47734133f2e2d7eae3f5cfe91112c475eb86d86e9",
-     "CZ_CDA PI_MUDr. Filip Zavada_fully signed_16Jun2026.pdf",
-     "1a86e987b9d3da57c1d863b47734133f2e2d7eae3f5cfe91112c475eb86d86e9",
-     479026, "<CH2PR07MB7190A5538ACDC1D49F8B430780E52@CH2PR07MB7190.namprd07.prod.outlook.com>",
-     "Zavada Filip"),
-    ("6a19832b5fc2213518257957",
-     "http://192.168.1.50:8888/mail-attachments/64/b0/64b06d48bfe3c49095e326988f14c04fd5849728b227647f6653b2e3c3095538",
-     "SK_CDA PI_Bruncak_FNsP BBystrica_fully signed 16Jun2026.pdf",
-     "64b06d48bfe3c49095e326988f14c04fd5849728b227647f6653b2e3c3095538",
-     498069, "<SA1PR07MB952874B8654156369CDE44448CE52@SA1PR07MB9528.namprd07.prod.outlook.com>",
-     "Bruncak Michal"),
-    ("6a19832b5fc2213518257961",
-     "http://192.168.1.50:8888/mail-attachments/c2/72/c272ca62bd27ca10aed35cb54054d880f4f0e2f59940ed3b067b17d51a9ac041",
-     "CZ_CDA Institution_Asclepiades s.r.o._MUDr. Machytka_16Jun2026.pdf",
-     "c272ca62bd27ca10aed35cb54054d880f4f0e2f59940ed3b067b17d51a9ac041",
-     460977, "<PH0PR07MB97879A9C9BF9C00D38D4798A9FE52@PH0PR07MB9787.namprd07.prod.outlook.com>",
-     "Machytka Evzen (Asclepiades)"),
-    ("6a19832b5fc2213518257967",
-     "http://192.168.1.50:8888/mail-attachments/99/37/99372c399be3b001428ef4b36d43e250dedced5955de5d1f3a2d63a9f0c1728b",
-     "CZ_CDA institution_PreventaMed sro_fully signed_16Jun2026.pdf",
-     "99372c399be3b001428ef4b36d43e250dedced5955de5d1f3a2d63a9f0c1728b",
-     457745, "<CH2PR07MB719008DB0B3CAFD764AE2E8280E52@CH2PR07MB7190.namprd07.prod.outlook.com>",
-     "Pumprla Jiri (PreventaMed)"),
-    ("6a1c4275aa46d8b608065ce9",
-     "http://192.168.1.50:8888/mail-attachments/94/95/9495c742407873efd8dd9713e1dc962cb08e55e0d3690e4a79a90132ee358dee",
-     "SK_CDA Institution_PAV-MED s r.o_fully signed_15Jun2026.pdf",
-     "9495c742407873efd8dd9713e1dc962cb08e55e0d3690e4a79a90132ee358dee",
-     460246, "<CH2PR07MB719008DB0B3CAFD764AE2E8280E52@CH2PR07MB7190.namprd07.prod.outlook.com>",
-     "Zapotocka Julia (PAV-MED)"),
-]
-
-
-def fetch(url):
-    with urllib.request.urlopen(url, timeout=30) as r:
-        return r.read()
-
-
-def main():
-    apply = "--apply" in sys.argv
-    cli = MongoClient(MONGO_URI)
-    col = cli[DBN][COL]
-    now = datetime.now(timezone.utc).isoformat()
-
-    ok = 0
-    for _id, url, fname, sha, size, src, label in MAPPING:
-        oid = ObjectId(_id)
-        doc = col.find_one({"_id": oid}, {"STATUS": 1, "KROK": 1, "cda.stav": 1})
-        if not doc:
-            print(f"  !! {label}: investigator _id={_id} NENALEZEN"); continue
-        try:
-            raw = fetch(url)
-        except Exception as e:
-            print(f"  !! {label}: stazeni selhalo: {e}"); continue
-        got = hashlib.sha256(raw).hexdigest()
-        sha_ok = (got == sha)
-        size_ok = (len(raw) == size)
-        head_ok = raw[:5] == b"%PDF-"
-        print(f"  [{label}]")
-        print(f"     soubor   : {fname}")
-        print(f"     stazeno  : {len(raw)} B (ocek. {size}) {'OK' if size_ok else 'MISMATCH'}")
-        print(f"     sha256   : {'OK' if sha_ok else 'MISMATCH! ' + got}")
-        print(f"     PDF hdr  : {'OK' if head_ok else 'NENI PDF'}")
-        print(f"     KROK     : {doc.get('KROK')} -> 5 - CDA podepsano")
-        if not (sha_ok and size_ok and head_ok):
-            print("     >> PRESKAKUJI (kontrola selhala)"); continue
-        if not apply:
-            ok += 1; continue
-
-        b64 = base64.b64encode(raw).decode("ascii")
-        old_status = doc.get("STATUS", "") or ""
-        new_line = (f"17JUN2026: podepsane CDA ULOZENO do Mongo (cda.data) — {fname} "
-                    f"(z e-mailu asistentky). KROK 5, pripraveno na SIPIQ.")
-        col.update_one({"_id": oid}, {"$set": {
-            "KROK": "5 - CDA podepsano",
-            "STATUS": new_line + "\n" + old_status,
-            "cda.stav": "podepsano",
-            "cda.soubor": fname,
-            "cda.zdroj": "e-mail asistentky (SeaweedFS)",
-            "cda.data_base64": b64,
-            "cda.data_sha256": sha,
-            "cda.data_filename": fname,
-            "cda.data_mime": "application/pdf",
-            "cda.data_size": len(raw),
-            "cda.data_stored_at": now,
-            "cda.data_source_msg": src,
-        }})
-        ok += 1
-        print("     >> ULOZENO + KROK 5")
-
-    print(f"\n{'ZAPSANO' if apply else 'DRY-RUN OK'}: {ok}/{len(MAPPING)}")
-    if not apply:
-        print(">>> Pro zapis spust s --apply")
-
-
-if __name__ == "__main__":
-    main()