z230

2026-06-17 15:05:10 +02:00
parent de959d849d
commit 4884117227
85 changed files with 34611 additions and 0 deletions
@@ -0,0 +1,70 @@
+# sipiq_import_v1.0 — import SIPIQ odpovědí do MongoDB
+
+**Verze:** 1.0 · **Datum:** 2026-06-17 · **Studie:** 77242113UCO3002 (ICONIC / DAWN)
+
+## Účel
+Import SIPIQ odpovědí (Qualtrics CSV export) do MongoDB `feasibility` tak, aby šlo:
+1. **křížově analyzovat** „otázka × otázka" (ploché `answers{}` keyed by Qcode),
+2. **zrekonstruovat kompletní SIPIQ** jako v prázdném PDF, jen vyplněný (slovník otázek
+   se sekcemi / pořadím / popisky podčástí / typem / options).
+
+## Vstup
+Qualtrics **CSV** export (Download a data table → CSV, *Download all fields*, *Export labels*,
+desetinná **tečka** = NEzaškrtnuto „Use commas for decimals"). CSV má 3 hlavičkové řádky:
+- ř.1 = Qcode (Q2, Q6_4, Q31#1_1 …)
+- ř.2 = **text otázky** (legenda)
+- ř.3 = `{"ImportId":"QID…"}` = QID kód shodný s XML exportem (most XML↔CSV)
+
+XML export NEobsahuje text otázky (jen QID tagy) → proto importujeme z CSV.
+
+## Dvě kolekce v `feasibility`
+### `sipiq_questions` — slovník dotazníku (1 dok = 1 logická otázka)
+`{_id=Qcode báze (Q63), order, qnum, section, qids[QID…], text, type, items[{key,qcode,qid,label}], options[]}`
+- `type`: `single_or_text` | `yesno` | `numeric` | `matrix_yesno` | `matrix_percent` | `matrix`
+- `items[]` = podčásti (řádky matic, části %, kontaktní pole) v pořadí; `key` = sanitizovaný Qcode (`#`/`.`→`_`)
+- `options[]` = odvozené z pozorovaných hodnot (yes/no a single-choice)
+- Idempotentní `replace_one(upsert)`. Stav 17JUN2026: **56 otázek** (27 vícedílných).
+- **STEM_OVERRIDE**: u maticových otázek (Q31/Q63/Q64/Q69) Qualtrics v CSV hlavičce text ořezává „…",
+  proto plné znění doplněno z prázdného SIPIQ PDF.
+
+### `sipiq_responses` — 1 dok = 1 odpověď
+- `_id` = **Qualtrics ResponseId** (`R_…`, unikátní, stálý)
+- identita centra/PI povýšená nahoru (`site_*`, `pi_*`, `sdl_site_id`, `fire_*`, `mailinglist_id`,
+  `recipient_*`) → queryable
+- `meta{}` = dates, status, progress, finished, duration, jazyk, kanál, IP, geo, survey date/time
+- `answers{}` = **plochá mapa** Qcode→hodnota (`answers.Q37_1`, `answers.Q63_1_1`) — jádro pro křížovou analýzu
+- `is_full_sipiq`, `interested` (Q25) pro pohodlí
+- **`investigator_oid`** = ObjectId ref na `feasibility.investigators` (+`investigator_match` = jak)
+- delta bookkeeping: `content_sha256`, `source_file`, `first_imported_at`, `last_seen_at`,
+  `last_updated_at`, `history[]`
+
+## Delta import (přepíše JEN změněná data)
+- nová odpověď → INSERT
+- existuje, beze změn (shodný `content_sha256`) → aktualizuje pouze `last_seen_at`
+- existuje, změna → `$set` jen změněných polí + `$push` do `history[]` `{changed_at, source_file, changes:[{key,old,new}]}`
+
+## Soft-link na investigators (nedestruktivní)
+1. `pi_email` == `email`/`email2` (lowercase), 2. `recipient_email`, 3. fallback příjmení
+(bez diakritiky) + země. Reportuje napárování + KROK. **investigators se NEMĚNÍ.**
+
+## Použití
+```
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.0.py --csv "<cesta.csv>" --dry-run
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.0.py --csv "<cesta.csv>" --apply
+```
+`--scope czsk` (default, jen CZ+SK) | `--scope all` (všech 276). Mongo 192.168.1.76:27017, bez auth, pymongo.
+
+## Stav 17JUN2026 (ostrý běh proveden)
+- `sipiq_questions`: 56 · `sipiq_responses`: 15 (CZ 8 + SK 7)
+- **soft-link 15/15 přes e-mail, všech 15 = KROK 7** (validace: vyplněné SIPIQ = naši KROK-7 investigátoři)
+- `investigator_oid` uložen jako ObjectId → připraveno na `$lookup`
+
+## Dotazy (příklady)
+```js
+// křížově: kdo očekává problémy s náborem A má >X eligible
+db.sipiq_responses.find({"answers.Q33":"Yes"}, {pi_last_name:1,"answers.Q37_1":1})
+// join s evidencí investigatora
+db.sipiq_responses.aggregate([{$lookup:{from:"investigators",localField:"investigator_oid",
+  foreignField:"_id",as:"inv"}}])
+// rekonstrukce SIPIQ: seřaď sipiq_questions dle order, pro každou otázku/item vezmi answers[key]
+```
@@ -0,0 +1,534 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+sipiq_import_v1.0.py
+====================
+Verze:  1.0
+Datum:  2026-06-17
+Autor:  Claude Code (pro MUDr. Vladimíra Buzalku)
+
+Popis
+-----
+Import SIPIQ odpovědí (Qualtrics CSV export, studie 77242113UCO3002 / ICONIC DAWN)
+do MongoDB db `feasibility`. Cílem je:
+  (a) umožnit křížovou analýzu „otázka × otázka" (ploché odpovědi keyed by Qcode),
+  (b) umožnit zrekonstruovat KOMPLETNÍ SIPIQ tak, jak ho zkoušející vidí v PDF,
+      jen s vyplněnými odpověďmi (slovník otázek se sekcí/pořadím/popisky).
+
+Dvě kolekce v db `feasibility`:
+  * sipiq_questions  – slovník dotazníku (1 dok = 1 logická otázka; section, order,
+                       text, items[], type, options). Idempotentní (upsert dle _id).
+  * sipiq_responses  – 1 dok = 1 odpověď (_id = Qualtrics ResponseId). Identita centra/PI
+                       nahoře, ploché answers{}, meta{}, soft-link investigator_oid,
+                       delta bookkeeping (content_sha256, history[], timestamps).
+
+DELTA import (přepíše JEN změněná data):
+  - nová odpověď              -> insert
+  - existuje, beze změn       -> aktualizuje pouze last_seen_at (+ source_file)
+  - existuje, něco se změnilo -> $set jen změněných polí + push do history[] {key,old,new}
+
+Soft-link na feasibility.investigators:
+  - primárně pi_email == email / email2 (lowercase)
+  - fallback příjmení (bez diakritiky, lower) + země (CZ/SK)
+  - nedestruktivní: kolekci investigators NEMĚNÍ, jen ukládá investigator_oid do response.
+
+Rozsah: default CZ + SK (--scope czsk). --scope all = všech 276.
+
+Použití:
+  python sipiq_import_v1.0.py --csv "<cesta.csv>" --dry-run
+  python sipiq_import_v1.0.py --csv "<cesta.csv>" --apply
+
+Závislosti: pymongo (.venv). Mongo 192.168.1.76:27017, bez auth.
+"""
+import argparse
+import csv
+import hashlib
+import json
+import re
+import sys
+import unicodedata
+from datetime import datetime, timezone
+
+try:
+    from pymongo import MongoClient
+except ImportError:
+    print("CHYBA: pymongo není nainstalován v aktuálním pythonu.", file=sys.stderr)
+    raise
+
+MONGO_URI = "mongodb://192.168.1.76:27017"
+DB_NAME = "feasibility"
+COL_Q = "sipiq_questions"
+COL_R = "sipiq_responses"
+
+# Qualtrics systémová meta pole (NEjdou do answers)
+META_COLS = {
+    "StartDate", "EndDate", "Status", "IPAddress", "Progress", "Duration (in seconds)",
+    "Finished", "RecordedDate", "ResponseId", "RecipientLastName", "RecipientFirstName",
+    "RecipientEmail", "ExternalReference", "LocationLatitude", "LocationLongitude",
+    "DistributionChannel", "UserLanguage",
+}
+
+# Embedded SDL pole povýšená nahoru do dokumentu (queryable identita)
+PROMOTE = [
+    "site_name", "site_address", "site_city", "site_state", "site_postcode", "site_country",
+    "pi_first_name", "pi_last_name", "pi_phone", "pi_email",
+    "sdl_site_id", "fire_site_id", "fire_investigator_id", "mailinglist_id",
+    "survey_generated_by", "Date", "Time",
+]
+
+# Sekce dle ověřeného katalogu (mapování báze Q-čísla -> sekce v PDF)
+SECTION_BY_QNUM = {}
+def _sec(rng, name):
+    for n in rng:
+        SECTION_BY_QNUM[n] = name
+_sec([2], "J&J Internal Assessment")
+_sec([6, 7, 8, 9, 10, 11, 12, 13], "Contact Information")
+_sec(range(14, 22), "Confidentiality Statement")
+_sec([25, 26, 27], "Interest")
+_sec([29, 30, 31, 32, 33, 34], "Protocol Requirements")
+_sec([36, 37, 38], "Enrollment")
+_sec([40, 41, 42, 43], "Patient Demographics Overview")
+_sec([45, 46, 47, 48, 49], "Site Overview")
+_sec([51], "Operational Considerations")
+_sec([53, 54], "Comments")
+_sec([57, 58, 59, 60, 61], "Patient Population")
+_sec([63, 64, 65, 66, 67], "Site Experience and Staffing")
+_sec([69], "Equipment and Facility Requirements")
+_sec([71, 72, 73, 74, 75], "Institutional Review Board, Ethics Committee, and Contracts")
+
+# Plné znění otázek, které Qualtrics v hlavičce CSV ořezává "..." (maticové otázky).
+# Zdroj: prázdný SIPIQ PDF (ICONIC ... _SipIQ_V1_13MAY2026.pdf).
+STEM_OVERRIDE = {
+    "Q31": "At your site, at what line(s) of treatment do you most commonly prescribe "
+           "vedolizumab for patients with moderately to severely active ulcerative colitis?",
+    "Q63": "Do you or your site staff have experience in performing the following types of "
+           "study assessments/procedures?",
+    "Q64": "The following personnel are required to run the study. "
+           "Will your site have the following available?",
+    "Q69": "The following equipment and facilities are required to run the studies. "
+           "Are these available at your site?",
+}
+
+
+def now_iso():
+    return datetime.now(timezone.utc).astimezone().isoformat(timespec="seconds")
+
+
+def strip_accents(s):
+    if not s:
+        return ""
+    nfkd = unicodedata.normalize("NFKD", s)
+    return "".join(c for c in nfkd if not unicodedata.combining(c))
+
+
+def norm_name(s):
+    return re.sub(r"\s+", " ", strip_accents(s or "").lower()).strip()
+
+
+def sanitize_key(qcode):
+    """Qcode -> klíč do answers{} (MongoDB-safe): '#' a '.' -> '_'."""
+    return qcode.replace("#", "_").replace(".", "_")
+
+
+def qnum(qcode):
+    """Číslo otázky z Qcode (Q63#1_2 -> 63, Q40_6_TEXT -> 40)."""
+    m = re.match(r"Q(\d+)", qcode)
+    return int(m.group(1)) if m else None
+
+
+def qbase(qcode):
+    """Logická báze otázky (Q63#1_2 -> Q63, Q40_6 -> Q40, Q25 -> Q25)."""
+    m = re.match(r"(Q\d+)", qcode)
+    return m.group(1) if m else qcode
+
+
+def import_id(h3_cell):
+    try:
+        return json.loads(h3_cell).get("ImportId", "")
+    except Exception:
+        return h3_cell
+
+
+def split_text(text):
+    """Vrátí (stem, item_label). Stem = text otázky, item_label = popisek podčásti."""
+    parts = [p.strip() for p in re.split(r"\s+-\s+", text)]
+    stem = parts[0]
+    if len(parts) == 1:
+        return stem, None
+    # poslední část = popisek řádku/části; vyčisti Qualtrics artefakty
+    label_parts = parts[1:]
+    # zahodit "Selected Choice" (artefakt single-choice s Other)
+    label_parts = [p for p in label_parts if p.lower() != "selected choice"]
+    # zahodit interní statement kód typu "Q63#1"
+    label_parts = [p for p in label_parts if not re.fullmatch(r"Q\d+#\d+", p)]
+    label = " - ".join(label_parts) if label_parts else None
+    return stem, label
+
+
+def detect_type(qcode, observed):
+    """Heuristika typu otázky z Qcode a pozorovaných hodnot."""
+    has_hash = "#" in qcode
+    vals = [v for v in observed if v]
+    yesno = vals and all(v in ("Yes", "No") for v in vals)
+    numeric = vals and all(re.fullmatch(r"-?\d+(\.\d+)?", v) for v in vals)
+    if has_hash and yesno:
+        return "matrix_yesno"
+    if has_hash and numeric:
+        return "matrix_percent"
+    if has_hash:
+        return "matrix"
+    if numeric:
+        return "numeric"
+    if yesno:
+        return "yesno"
+    return "single_or_text"
+
+
+# ---------------------------------------------------------------------------
+def load_csv(path):
+    with open(path, encoding="utf-8-sig", newline="") as fh:
+        rows = list(csv.reader(fh))
+    h1, h2, h3 = rows[0], rows[1], rows[2]
+    data = rows[3:]
+    cols = []
+    for i, (code, text, imp) in enumerate(zip(h1, h2, h3)):
+        cols.append({"i": i, "code": code, "text": text, "qid": import_id(imp)})
+    return cols, data
+
+
+def col_getter(cols, data):
+    idx = {c["code"]: c["i"] for c in cols}
+    def get(row, code):
+        i = idx.get(code)
+        return (row[i].strip() if i is not None and i < len(row) else "")
+    return get, idx
+
+
+def is_question_col(code):
+    return bool(re.match(r"Q\d", code))
+
+
+# ---------------------------------------------------------------------------
+def build_questions(cols, data):
+    """Slovník otázek -> list dokumentů (1 = 1 logická otázka)."""
+    # observed hodnoty per Qcode (pro typ + options)
+    qcols = [c for c in cols if is_question_col(c["code"])]
+    observed = {c["code"]: set() for c in qcols}
+    for row in data:
+        for c in qcols:
+            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
+            if v:
+                observed[c["code"]].add(v)
+
+    groups = {}  # base -> dict
+    order_seen = []
+    for c in qcols:
+        base = qbase(c["code"])
+        if base not in groups:
+            groups[base] = {
+                "_id": base,
+                "order": c["i"],
+                "qnum": qnum(c["code"]),
+                "section": SECTION_BY_QNUM.get(qnum(c["code"]), "Other"),
+                "qids": [],
+                "text": split_text(c["text"])[0],
+                "items": [],
+                "_obs": set(),
+                "_types": [],
+            }
+            order_seen.append(base)
+        g = groups[base]
+        base_qid = re.match(r"(QID\d+)", c["qid"] or "")
+        if base_qid and base_qid.group(1) not in g["qids"]:
+            g["qids"].append(base_qid.group(1))
+        stem, label = split_text(c["text"])
+        key = sanitize_key(c["code"])
+        item = {"key": key, "qcode": c["code"], "qid": c["qid"]}
+        if label:
+            item["label"] = label
+        g["items"].append(item)
+        g["_obs"] |= observed[c["code"]]
+        g["_types"].append(detect_type(c["code"], observed[c["code"]]))
+
+    out = []
+    for n, base in enumerate(order_seen):
+        g = groups[base]
+        obs = sorted(g.pop("_obs"))
+        types = g.pop("_types")
+        # typ skupiny: nejčastější netriviální
+        gtype = max(set(types), key=types.count) if types else "single_or_text"
+        g["type"] = gtype
+        # options jen u kategorických (yesno/single)
+        if gtype in ("yesno", "matrix_yesno"):
+            g["options"] = ["Yes", "No"]
+        elif gtype == "single_or_text" and obs and len(obs) <= 12:
+            g["options"] = obs
+        else:
+            g["options"] = []
+        if base in STEM_OVERRIDE:
+            g["text"] = STEM_OVERRIDE[base]
+        g["order"] = n  # přečíslovat 0..N dle pořadí v CSV
+        # pokud má jen 1 item bez labelu, items vynech (je to prostá otázka)
+        if len(g["items"]) == 1 and "label" not in g["items"][0]:
+            g["items"] = []
+        out.append(g)
+    return out
+
+
+# ---------------------------------------------------------------------------
+def build_response(cols, get, row, source_file):
+    rid = get(row, "ResponseId")
+    answers = {}
+    for c in cols:
+        if is_question_col(c["code"]):
+            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
+            if v:
+                answers[sanitize_key(c["code"])] = v
+
+    def g(*names):
+        for nm in names:
+            v = get(row, nm)
+            if v:
+                return v
+        return None
+
+    meta = {
+        "start_date": get(row, "StartDate") or None,
+        "end_date": get(row, "EndDate") or None,
+        "recorded_date": get(row, "RecordedDate") or None,
+        "status": get(row, "Status") or None,
+        "progress": int(get(row, "Progress")) if get(row, "Progress").isdigit() else get(row, "Progress") or None,
+        "finished": get(row, "Finished") in ("True", "1", "TRUE"),
+        "duration_sec": int(get(row, "Duration (in seconds)")) if get(row, "Duration (in seconds)").isdigit() else None,
+        "user_language": get(row, "UserLanguage") or None,
+        "distribution_channel": get(row, "DistributionChannel") or None,
+        "ip_address": get(row, "IPAddress") or None,
+        "location_lat": get(row, "LocationLatitude") or None,
+        "location_lng": get(row, "LocationLongitude") or None,
+        "survey_date": get(row, "Date") or None,
+        "survey_time": get(row, "Time") or None,
+    }
+
+    doc = {
+        "_id": rid,
+        "study": "77242113UCO3002",
+        "site_country": get(row, "site_country") or None,
+        "site_name": get(row, "site_name") or None,
+        "site_city": get(row, "site_city") or None,
+        "site_state": get(row, "site_state") or None,
+        "site_postcode": get(row, "site_postcode") or None,
+        "site_address": get(row, "site_address") or None,
+        "pi_first_name": get(row, "pi_first_name") or None,
+        "pi_last_name": get(row, "pi_last_name") or None,
+        "pi_email": (get(row, "pi_email") or "").lower() or None,
+        "pi_phone": get(row, "pi_phone") or None,
+        "sdl_site_id": get(row, "sdl_site_id") or None,
+        "fire_site_id": get(row, "fire_site_id") or None,
+        "fire_investigator_id": get(row, "fire_investigator_id") or None,
+        "mailinglist_id": get(row, "mailinglist_id") or None,
+        "survey_generated_by": get(row, "survey_generated_by") or None,
+        "recipient_email": (get(row, "RecipientEmail") or "").lower() or None,
+        "recipient_last_name": get(row, "RecipientLastName") or None,
+        "recipient_first_name": get(row, "RecipientFirstName") or None,
+        "meta": meta,
+        "is_full_sipiq": any(k.startswith(("Q57", "Q58", "Q59", "Q63", "Q66", "Q71")) for k in answers),
+        "interested": answers.get("Q25"),
+        "answers": answers,
+        "investigator_oid": None,
+        "investigator_match": None,
+        "source_file": source_file,
+    }
+    return doc
+
+
+def content_hash(doc):
+    payload = {k: doc[k] for k in doc if k not in
+               ("content_sha256", "first_imported_at", "last_seen_at", "last_updated_at", "history",
+                "investigator_oid", "investigator_match", "source_file")}
+    blob = json.dumps(payload, sort_keys=True, ensure_ascii=False, default=str)
+    return hashlib.sha256(blob.encode("utf-8")).hexdigest()
+
+
+# ---------------------------------------------------------------------------
+def load_investigators(db):
+    inv = list(db.investigators.find(
+        {"zeme": {"$in": ["Czech Republic", "Slovakia"]}},
+        {"prijmeni": 1, "jmeno": 1, "email": 1, "email2": 1, "zeme": 1, "KROK": 1, "pracoviste": 1},
+    ))
+    by_email = {}
+    by_name = {}
+    for d in inv:
+        for ef in ("email", "email2"):
+            e = (d.get(ef) or "").lower().strip()
+            if e:
+                by_email.setdefault(e, d)
+        nm = norm_name(d.get("prijmeni"))
+        if nm:
+            by_name.setdefault((nm, d.get("zeme")), []).append(d)
+    return inv, by_email, by_name
+
+
+def soft_link(doc, by_email, by_name):
+    e = (doc.get("pi_email") or "").lower().strip()
+    if e and e in by_email:
+        d = by_email[e]
+        return d["_id"], f"email:{e}", d
+    e2 = (doc.get("recipient_email") or "").lower().strip()
+    if e2 and e2 in by_email:
+        d = by_email[e2]
+        return d["_id"], f"recipient_email:{e2}", d
+    nm = norm_name(doc.get("pi_last_name"))
+    cand = by_name.get((nm, doc.get("site_country")), [])
+    if len(cand) == 1:
+        return cand[0]["_id"], f"prijmeni:{nm}", cand[0]
+    if len(cand) > 1:
+        return None, f"prijmeni_ambiguous:{nm}({len(cand)})", None
+    return None, "NENALEZENO", None
+
+
+# ---------------------------------------------------------------------------
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--csv", required=True)
+    ap.add_argument("--scope", choices=["czsk", "all"], default="czsk")
+    ap.add_argument("--apply", action="store_true", help="ostrý zápis (jinak dry-run)")
+    ap.add_argument("--dry-run", action="store_true")
+    args = ap.parse_args()
+    dry = not args.apply
+    source_file = args.csv.replace("\\", "/").split("/")[-1]
+
+    cols, data = load_csv(args.csv)
+    get, idx = col_getter(cols, data)
+
+    # filtr rozsahu
+    if args.scope == "czsk":
+        data = [r for r in data if get(r, "site_country") in ("Czech Republic", "Slovakia")]
+    print(f"Zdroj: {source_file}  | rozsah={args.scope}  | odpovědí k importu: {len(data)}")
+
+    # --- slovník otázek (staví se z PLNÉHO CSV, ne jen scope) ---
+    cols_all, data_all = load_csv(args.csv)
+    questions = build_questions(cols_all, data_all)
+    print(f"Slovník otázek: {len(questions)} logických otázek "
+          f"(z toho {sum(1 for q in questions if q['items'])} vícedílných).")
+
+    # --- Mongo ---
+    client = MongoClient(MONGO_URI, serverSelectionTimeoutMS=8000)
+    db = client[DB_NAME]
+    client.admin.command("ping")
+    inv, by_email, by_name = load_investigators(db)
+    print(f"Investigatorů CZ+SK v DB: {len(inv)}")
+
+    # --- response dokumenty + soft-link ---
+    docs = []
+    link_rows = []
+    for r in data:
+        doc = build_response(cols, get, r, source_file)
+        oid, how, matched = soft_link(doc, by_email, by_name)
+        doc["investigator_oid"] = oid
+        doc["investigator_match"] = how
+        doc["content_sha256"] = content_hash(doc)
+        docs.append(doc)
+        link_rows.append((doc, how, matched))
+
+    # --- delta proti DB ---
+    existing = {d["_id"]: d for d in db[COL_R].find({}, {"content_sha256": 1})}
+    to_insert = [d for d in docs if d["_id"] not in existing]
+    to_update, unchanged = [], []
+    for d in docs:
+        if d["_id"] in existing:
+            if existing[d["_id"]].get("content_sha256") != d["content_sha256"]:
+                to_update.append(d)
+            else:
+                unchanged.append(d)
+
+    # ===================== REPORT =====================
+    print("\n=== SOFT-LINK na investigators ===")
+    matched_k7 = matched_other = unmatched = 0
+    for doc, how, m in link_rows:
+        krok = (m or {}).get("KROK", "")
+        tag = "✓" if m else "✗"
+        if m and str(krok).startswith("7"):
+            matched_k7 += 1
+        elif m:
+            matched_other += 1
+        else:
+            unmatched += 1
+        print(f"  {tag} {doc.get('site_country','?')[:2]} {str(doc.get('pi_last_name'))[:18]:18} "
+              f"{str(doc.get('pi_email'))[:32]:32} -> {how[:40]:40} {('KROK '+str(krok)) if m else ''}")
+    print(f"  Souhrn: napárováno KROK7={matched_k7}, jiný KROK={matched_other}, nenapárováno={unmatched}")
+
+    print("\n=== DELTA ===")
+    print(f"  INSERT (nové):     {len(to_insert)}")
+    print(f"  UPDATE (změněné):  {len(to_update)}")
+    print(f"  beze změny:        {len(unchanged)}")
+
+    # ukázka 1 dokumentu
+    if docs:
+        s = dict(docs[0])
+        s["answers"] = {k: s["answers"][k] for k in list(s["answers"])[:6]}
+        s["answers"]["…"] = f"(+{len(docs[0]['answers'])-6} dalších)"
+        print("\n=== UKÁZKA response dokumentu (zkráceno) ===")
+        print(json.dumps(s, ensure_ascii=False, indent=2, default=str)[:1800])
+
+    if dry:
+        print("\n[DRY-RUN] Nic se nezapsalo. Ostrý běh: přidej --apply")
+        client.close()
+        return
+
+    # ===================== ZÁPIS =====================
+    # 1) slovník otázek (idempotentní upsert)
+    nq = 0
+    for q in questions:
+        db[COL_Q].replace_one({"_id": q["_id"]}, q, upsert=True)
+        nq += 1
+    print(f"\n[APPLY] sipiq_questions: upsertnuto {nq}")
+
+    # 2) responses (delta)
+    ts = now_iso()
+    ni = nu = ns = 0
+    for d in docs:
+        cur = db[COL_R].find_one({"_id": d["_id"]})
+        if cur is None:
+            d["first_imported_at"] = ts
+            d["last_seen_at"] = ts
+            d["last_updated_at"] = ts
+            d["history"] = []
+            db[COL_R].insert_one(d)
+            ni += 1
+        elif cur.get("content_sha256") != d["content_sha256"]:
+            changes = diff_docs(cur, d)
+            db[COL_R].update_one({"_id": d["_id"]}, {
+                "$set": {**{k: d[k] for k in d if k not in ("_id",)},
+                         "last_seen_at": ts, "last_updated_at": ts},
+                "$push": {"history": {"changed_at": ts, "source_file": source_file, "changes": changes}},
+            })
+            nu += 1
+        else:
+            db[COL_R].update_one({"_id": d["_id"]},
+                                 {"$set": {"last_seen_at": ts, "source_file": source_file}})
+            ns += 1
+    print(f"[APPLY] sipiq_responses: insert={ni}, update={nu}, beze změny={ns}")
+    client.close()
+
+
+def diff_docs(old, new):
+    """Field-level diff pro history (jen answers + povýšená pole + meta)."""
+    changes = []
+    def walk(prefix, o, n):
+        keys = set((o or {}).keys()) | set((n or {}).keys())
+        for k in sorted(keys):
+            ov, nv = (o or {}).get(k), (n or {}).get(k)
+            if isinstance(ov, dict) or isinstance(nv, dict):
+                walk(f"{prefix}{k}.", ov or {}, nv or {})
+            elif ov != nv:
+                changes.append({"key": f"{prefix}{k}", "old": ov, "new": nv})
+    for field in ("answers", "meta"):
+        walk(f"{field}.", old.get(field, {}), new.get(field, {}))
+    for k in ("site_name", "pi_email", "pi_last_name", "interested", "is_full_sipiq"):
+        if old.get(k) != new.get(k):
+            changes.append({"key": k, "old": old.get(k), "new": new.get(k)})
+    return changes
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,47 @@
+# sipiq_import_v1.1 — import SIPIQ odpovědí do MongoDB (folder workflow)
+
+**Verze:** 1.1 · **Datum:** 2026-06-17 · **Studie:** 77242113UCO3002 (ICONIC / DAWN)
+
+## Změny proti v1.0
+- **FOLDER WORKFLOW** (`--folder`): sebere všechna `*.csv` ve složce, naimportuje (delta)
+  a po úspěšném zpracování **přesune soubor do podsložky `Zpracováno`**.
+  Default složka = `U:\PythonProject\Janssen\Feasibility\77242113UCO2001\ImportSIPIQcompled`.
+  Vzor Incoming/Processed (jako IWRS / Panorama). Stará v1.0 → `Feasibility\TRASH`.
+
+## Účel a kolekce
+(stejné jako v1.0) Import Qualtrics CSV exportu do db `feasibility`:
+- `sipiq_questions` — slovník dotazníku (rekonstrukce SIPIQ jako v PDF).
+- `sipiq_responses` — 1 dok = 1 odpověď (`_id`=ResponseId), ploché `answers{}`,
+  soft-link `investigator_oid`, delta + `history[]`.
+
+Zdroj = CSV (ř.1 Qcode, ř.2 text otázky, ř.3 ImportId=QID). XML neobsahuje text otázky.
+
+## Delta import (přepíše JEN změněná data)
+nová→INSERT; beze změn (shodný `content_sha256`)→jen `last_seen_at`;
+změna→`$set` jen změněných polí + `$push` do `history[]`.
+
+## Soft-link na investigators (nedestruktivní)
+pi_email → email/email2 (lower), pak recipient_email, fallback příjmení (bez diakritiky)+země.
+
+## Použití
+```
+# folder režim (default složka): zpracuje vše a přesune do Zpracováno
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.1.py --dry-run
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.1.py --apply
+# jiná složka
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.1.py --folder "<cesta>" --apply
+# jediný soubor (NEpřesouvá)
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.1.py --csv "<cesta.csv>" --apply
+```
+`--scope czsk` (default) / `all`. Default = dry-run, ostrý = `--apply`.
+Přesun do `Zpracováno` proběhne JEN v `--apply` a JEN ve folder režimu (ne u `--csv`).
+Kolize jmen v Zpracováno → přípona `_N`.
+
+## Workflow (domluva 17JUN2026)
+Uživatel pokládá kompletní SIPIQ reporty (Qualtrics CSV) do `ImportSIPIQcompled\`.
+Po zpracování skript přesune soubor do `ImportSIPIQcompled\Zpracováno\`. Delta zajistí,
+že opakovaný/rozšířený export jen doplní nové/změněné odpovědi (zbytek beze změny).
+
+## Stav 17JUN2026
+Folder + Zpracováno připraveny. Iniciální import (15 CZ+SK z 06.06 exportu) proveden ještě v1.0:
+`sipiq_questions`:56, `sipiq_responses`:15, soft-link 15/15 přes e-mail = KROK 7.
@@ -0,0 +1,480 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+sipiq_import_v1.1.py
+====================
+Verze:  1.1
+Datum:  2026-06-17
+Autor:  Claude Code (pro MUDr. Vladimíra Buzalku)
+
+Změny proti v1.0
+----------------
+- FOLDER WORKFLOW: režim --folder sebere všechna *.csv ve složce, naimportuje (delta)
+  a po úspěšném zpracování přesune soubor do podsložky `Zpracováno`. Default složka =
+  U:\\PythonProject\\Janssen\\Feasibility\\77242113UCO2001\\ImportSIPIQcompled.
+  (Vzor Incoming/Processed jako IWRS / Panorama.) Stará v1.0 ponechána v TRASH.
+
+Popis
+-----
+Import SIPIQ odpovědí (Qualtrics CSV export, studie 77242113UCO3002 / ICONIC DAWN)
+do MongoDB db `feasibility`. Dvě kolekce:
+  * sipiq_questions  – slovník dotazníku (1 dok = 1 logická otázka).
+  * sipiq_responses  – 1 dok = 1 odpověď (_id = Qualtrics ResponseId), ploché answers{},
+                       soft-link investigator_oid, delta bookkeeping + history[].
+
+DELTA import (přepíše JEN změněná data): nová->insert; beze změn->jen last_seen_at;
+změna->$set jen změněných polí + push do history[].
+
+Použití
+-------
+  # folder režim (default složka): zpracuje vše a přesune do Zpracováno
+  python sipiq_import_v1.1.py --dry-run
+  python sipiq_import_v1.1.py --apply
+  # konkrétní složka
+  python sipiq_import_v1.1.py --folder "<cesta>" --apply
+  # jediný soubor (NEpřesouvá)
+  python sipiq_import_v1.1.py --csv "<cesta.csv>" --apply
+
+Závislosti: pymongo (.venv). Mongo 192.168.1.76:27017, bez auth.
+"""
+import argparse
+import csv
+import glob
+import hashlib
+import json
+import os
+import re
+import shutil
+import sys
+import unicodedata
+from datetime import datetime, timezone
+
+try:
+    from pymongo import MongoClient
+except ImportError:
+    print("CHYBA: pymongo není nainstalován v aktuálním pythonu.", file=sys.stderr)
+    raise
+
+MONGO_URI = "mongodb://192.168.1.76:27017"
+DB_NAME = "feasibility"
+COL_Q = "sipiq_questions"
+COL_R = "sipiq_responses"
+DEFAULT_FOLDER = r"U:\PythonProject\Janssen\Feasibility\77242113UCO2001\ImportSIPIQcompled"
+PROCESSED_SUBDIR = "Zpracováno"
+
+META_COLS = {
+    "StartDate", "EndDate", "Status", "IPAddress", "Progress", "Duration (in seconds)",
+    "Finished", "RecordedDate", "ResponseId", "RecipientLastName", "RecipientFirstName",
+    "RecipientEmail", "ExternalReference", "LocationLatitude", "LocationLongitude",
+    "DistributionChannel", "UserLanguage",
+}
+
+PROMOTE = [
+    "site_name", "site_address", "site_city", "site_state", "site_postcode", "site_country",
+    "pi_first_name", "pi_last_name", "pi_phone", "pi_email",
+    "sdl_site_id", "fire_site_id", "fire_investigator_id", "mailinglist_id",
+    "survey_generated_by", "Date", "Time",
+]
+
+SECTION_BY_QNUM = {}
+def _sec(rng, name):
+    for n in rng:
+        SECTION_BY_QNUM[n] = name
+_sec([2], "J&J Internal Assessment")
+_sec([6, 7, 8, 9, 10, 11, 12, 13], "Contact Information")
+_sec(range(14, 22), "Confidentiality Statement")
+_sec([25, 26, 27], "Interest")
+_sec([29, 30, 31, 32, 33, 34], "Protocol Requirements")
+_sec([36, 37, 38], "Enrollment")
+_sec([40, 41, 42, 43], "Patient Demographics Overview")
+_sec([45, 46, 47, 48, 49], "Site Overview")
+_sec([51], "Operational Considerations")
+_sec([53, 54], "Comments")
+_sec([57, 58, 59, 60, 61], "Patient Population")
+_sec([63, 64, 65, 66, 67], "Site Experience and Staffing")
+_sec([69], "Equipment and Facility Requirements")
+_sec([71, 72, 73, 74, 75], "Institutional Review Board, Ethics Committee, and Contracts")
+
+STEM_OVERRIDE = {
+    "Q31": "At your site, at what line(s) of treatment do you most commonly prescribe "
+           "vedolizumab for patients with moderately to severely active ulcerative colitis?",
+    "Q63": "Do you or your site staff have experience in performing the following types of "
+           "study assessments/procedures?",
+    "Q64": "The following personnel are required to run the study. "
+           "Will your site have the following available?",
+    "Q69": "The following equipment and facilities are required to run the studies. "
+           "Are these available at your site?",
+}
+
+
+def now_iso():
+    return datetime.now(timezone.utc).astimezone().isoformat(timespec="seconds")
+
+
+def strip_accents(s):
+    if not s:
+        return ""
+    return "".join(c for c in unicodedata.normalize("NFKD", s) if not unicodedata.combining(c))
+
+
+def norm_name(s):
+    return re.sub(r"\s+", " ", strip_accents(s or "").lower()).strip()
+
+
+def sanitize_key(qcode):
+    return qcode.replace("#", "_").replace(".", "_")
+
+
+def qnum(qcode):
+    m = re.match(r"Q(\d+)", qcode)
+    return int(m.group(1)) if m else None
+
+
+def qbase(qcode):
+    m = re.match(r"(Q\d+)", qcode)
+    return m.group(1) if m else qcode
+
+
+def import_id(h3_cell):
+    try:
+        return json.loads(h3_cell).get("ImportId", "")
+    except Exception:
+        return h3_cell
+
+
+def split_text(text):
+    parts = [p.strip() for p in re.split(r"\s+-\s+", text)]
+    stem = parts[0]
+    if len(parts) == 1:
+        return stem, None
+    label_parts = [p for p in parts[1:] if p.lower() != "selected choice"]
+    label_parts = [p for p in label_parts if not re.fullmatch(r"Q\d+#\d+", p)]
+    return stem, (" - ".join(label_parts) if label_parts else None)
+
+
+def detect_type(qcode, observed):
+    has_hash = "#" in qcode
+    vals = [v for v in observed if v]
+    yesno = vals and all(v in ("Yes", "No") for v in vals)
+    numeric = vals and all(re.fullmatch(r"-?\d+(\.\d+)?", v) for v in vals)
+    if has_hash and yesno:
+        return "matrix_yesno"
+    if has_hash and numeric:
+        return "matrix_percent"
+    if has_hash:
+        return "matrix"
+    if numeric:
+        return "numeric"
+    if yesno:
+        return "yesno"
+    return "single_or_text"
+
+
+def load_csv(path):
+    with open(path, encoding="utf-8-sig", newline="") as fh:
+        rows = list(csv.reader(fh))
+    h1, h2, h3 = rows[0], rows[1], rows[2]
+    data = rows[3:]
+    cols = [{"i": i, "code": c, "text": t, "qid": import_id(j)}
+            for i, (c, t, j) in enumerate(zip(h1, h2, h3))]
+    return cols, data
+
+
+def col_getter(cols, data):
+    idx = {c["code"]: c["i"] for c in cols}
+    def get(row, code):
+        i = idx.get(code)
+        return (row[i].strip() if i is not None and i < len(row) else "")
+    return get, idx
+
+
+def is_question_col(code):
+    return bool(re.match(r"Q\d", code))
+
+
+def build_questions(cols, data):
+    qcols = [c for c in cols if is_question_col(c["code"])]
+    observed = {c["code"]: set() for c in qcols}
+    for row in data:
+        for c in qcols:
+            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
+            if v:
+                observed[c["code"]].add(v)
+    groups, order_seen = {}, []
+    for c in qcols:
+        base = qbase(c["code"])
+        if base not in groups:
+            groups[base] = {"_id": base, "order": c["i"], "qnum": qnum(c["code"]),
+                            "section": SECTION_BY_QNUM.get(qnum(c["code"]), "Other"),
+                            "qids": [], "text": split_text(c["text"])[0],
+                            "items": [], "_obs": set(), "_types": []}
+            order_seen.append(base)
+        g = groups[base]
+        bq = re.match(r"(QID\d+)", c["qid"] or "")
+        if bq and bq.group(1) not in g["qids"]:
+            g["qids"].append(bq.group(1))
+        _, label = split_text(c["text"])
+        item = {"key": sanitize_key(c["code"]), "qcode": c["code"], "qid": c["qid"]}
+        if label:
+            item["label"] = label
+        g["items"].append(item)
+        g["_obs"] |= observed[c["code"]]
+        g["_types"].append(detect_type(c["code"], observed[c["code"]]))
+    out = []
+    for n, base in enumerate(order_seen):
+        g = groups[base]
+        obs = sorted(g.pop("_obs"))
+        types = g.pop("_types")
+        gtype = max(set(types), key=types.count) if types else "single_or_text"
+        g["type"] = gtype
+        if gtype in ("yesno", "matrix_yesno"):
+            g["options"] = ["Yes", "No"]
+        elif gtype == "single_or_text" and obs and len(obs) <= 12:
+            g["options"] = obs
+        else:
+            g["options"] = []
+        if base in STEM_OVERRIDE:
+            g["text"] = STEM_OVERRIDE[base]
+        g["order"] = n
+        if len(g["items"]) == 1 and "label" not in g["items"][0]:
+            g["items"] = []
+        out.append(g)
+    return out
+
+
+def build_response(cols, get, row, source_file):
+    rid = get(row, "ResponseId")
+    answers = {}
+    for c in cols:
+        if is_question_col(c["code"]):
+            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
+            if v:
+                answers[sanitize_key(c["code"])] = v
+    meta = {
+        "start_date": get(row, "StartDate") or None,
+        "end_date": get(row, "EndDate") or None,
+        "recorded_date": get(row, "RecordedDate") or None,
+        "status": get(row, "Status") or None,
+        "progress": int(get(row, "Progress")) if get(row, "Progress").isdigit() else (get(row, "Progress") or None),
+        "finished": get(row, "Finished") in ("True", "1", "TRUE"),
+        "duration_sec": int(get(row, "Duration (in seconds)")) if get(row, "Duration (in seconds)").isdigit() else None,
+        "user_language": get(row, "UserLanguage") or None,
+        "distribution_channel": get(row, "DistributionChannel") or None,
+        "ip_address": get(row, "IPAddress") or None,
+        "location_lat": get(row, "LocationLatitude") or None,
+        "location_lng": get(row, "LocationLongitude") or None,
+        "survey_date": get(row, "Date") or None,
+        "survey_time": get(row, "Time") or None,
+    }
+    doc = {
+        "_id": rid, "study": "77242113UCO3002",
+        "site_country": get(row, "site_country") or None,
+        "site_name": get(row, "site_name") or None,
+        "site_city": get(row, "site_city") or None,
+        "site_state": get(row, "site_state") or None,
+        "site_postcode": get(row, "site_postcode") or None,
+        "site_address": get(row, "site_address") or None,
+        "pi_first_name": get(row, "pi_first_name") or None,
+        "pi_last_name": get(row, "pi_last_name") or None,
+        "pi_email": (get(row, "pi_email") or "").lower() or None,
+        "pi_phone": get(row, "pi_phone") or None,
+        "sdl_site_id": get(row, "sdl_site_id") or None,
+        "fire_site_id": get(row, "fire_site_id") or None,
+        "fire_investigator_id": get(row, "fire_investigator_id") or None,
+        "mailinglist_id": get(row, "mailinglist_id") or None,
+        "survey_generated_by": get(row, "survey_generated_by") or None,
+        "recipient_email": (get(row, "RecipientEmail") or "").lower() or None,
+        "recipient_last_name": get(row, "RecipientLastName") or None,
+        "recipient_first_name": get(row, "RecipientFirstName") or None,
+        "meta": meta,
+        "is_full_sipiq": any(k.startswith(("Q57", "Q58", "Q59", "Q63", "Q66", "Q71")) for k in answers),
+        "interested": answers.get("Q25"),
+        "answers": answers,
+        "investigator_oid": None, "investigator_match": None,
+        "source_file": source_file,
+    }
+    return doc
+
+
+def content_hash(doc):
+    payload = {k: doc[k] for k in doc if k not in
+               ("content_sha256", "first_imported_at", "last_seen_at", "last_updated_at",
+                "history", "investigator_oid", "investigator_match", "source_file")}
+    return hashlib.sha256(json.dumps(payload, sort_keys=True, ensure_ascii=False,
+                                     default=str).encode("utf-8")).hexdigest()
+
+
+def load_investigators(db):
+    inv = list(db.investigators.find(
+        {"zeme": {"$in": ["Czech Republic", "Slovakia"]}},
+        {"prijmeni": 1, "jmeno": 1, "email": 1, "email2": 1, "zeme": 1, "KROK": 1}))
+    by_email, by_name = {}, {}
+    for d in inv:
+        for ef in ("email", "email2"):
+            e = (d.get(ef) or "").lower().strip()
+            if e:
+                by_email.setdefault(e, d)
+        nm = norm_name(d.get("prijmeni"))
+        if nm:
+            by_name.setdefault((nm, d.get("zeme")), []).append(d)
+    return inv, by_email, by_name
+
+
+def soft_link(doc, by_email, by_name):
+    e = (doc.get("pi_email") or "").lower().strip()
+    if e and e in by_email:
+        d = by_email[e]; return d["_id"], f"email:{e}", d
+    e2 = (doc.get("recipient_email") or "").lower().strip()
+    if e2 and e2 in by_email:
+        d = by_email[e2]; return d["_id"], f"recipient_email:{e2}", d
+    nm = norm_name(doc.get("pi_last_name"))
+    cand = by_name.get((nm, doc.get("site_country")), [])
+    if len(cand) == 1:
+        return cand[0]["_id"], f"prijmeni:{nm}", cand[0]
+    if len(cand) > 1:
+        return None, f"prijmeni_ambiguous:{nm}({len(cand)})", None
+    return None, "NENALEZENO", None
+
+
+def diff_docs(old, new):
+    changes = []
+    def walk(prefix, o, n):
+        for k in sorted(set((o or {}).keys()) | set((n or {}).keys())):
+            ov, nv = (o or {}).get(k), (n or {}).get(k)
+            if isinstance(ov, dict) or isinstance(nv, dict):
+                walk(f"{prefix}{k}.", ov or {}, nv or {})
+            elif ov != nv:
+                changes.append({"key": f"{prefix}{k}", "old": ov, "new": nv})
+    for field in ("answers", "meta"):
+        walk(f"{field}.", old.get(field, {}), new.get(field, {}))
+    for k in ("site_name", "pi_email", "pi_last_name", "interested", "is_full_sipiq"):
+        if old.get(k) != new.get(k):
+            changes.append({"key": k, "old": old.get(k), "new": new.get(k)})
+    return changes
+
+
+# ---------------------------------------------------------------------------
+def process_file(db, csv_path, scope, dry, by_email, by_name):
+    source_file = os.path.basename(csv_path)
+    cols, data = load_csv(csv_path)
+    get, _ = col_getter(cols, data)
+    if scope == "czsk":
+        data = [r for r in data if get(r, "site_country") in ("Czech Republic", "Slovakia")]
+    print(f"\n########## {source_file}  (rozsah={scope}, odpovědí={len(data)}) ##########")
+
+    # slovník z plného CSV
+    cols_all, data_all = load_csv(csv_path)
+    questions = build_questions(cols_all, data_all)
+
+    docs, link_rows = [], []
+    for r in data:
+        doc = build_response(cols, get, r, source_file)
+        oid, how, matched = soft_link(doc, by_email, by_name)
+        doc["investigator_oid"] = oid
+        doc["investigator_match"] = how
+        doc["content_sha256"] = content_hash(doc)
+        docs.append(doc)
+        link_rows.append((doc, how, matched))
+
+    existing = {d["_id"]: d for d in db[COL_R].find({}, {"content_sha256": 1})}
+    to_insert = [d for d in docs if d["_id"] not in existing]
+    to_update = [d for d in docs if d["_id"] in existing and existing[d["_id"]].get("content_sha256") != d["content_sha256"]]
+    unchanged = [d for d in docs if d["_id"] in existing and existing[d["_id"]].get("content_sha256") == d["content_sha256"]]
+
+    mk7 = mko = un = 0
+    for doc, how, m in link_rows:
+        krok = (m or {}).get("KROK", "")
+        if m and str(krok).startswith("7"): mk7 += 1
+        elif m: mko += 1
+        else: un += 1
+    print(f"  slovník: {len(questions)} otázek | soft-link: KROK7={mk7}, jiný={mko}, nenapárováno={un}")
+    print(f"  delta: INSERT={len(to_insert)}, UPDATE={len(to_update)}, beze změny={len(unchanged)}")
+    if un:
+        for doc, how, m in link_rows:
+            if not m:
+                print(f"    ✗ NENAPÁROVÁNO: {doc.get('pi_last_name')} / {doc.get('pi_email')} ({how})")
+
+    if dry:
+        print("  [DRY-RUN] nezapsáno")
+        return {"insert": 0, "update": 0, "unchanged": 0, "wrote": False}
+
+    for q in questions:
+        db[COL_Q].replace_one({"_id": q["_id"]}, q, upsert=True)
+    ts = now_iso()
+    ni = nu = ns = 0
+    for d in docs:
+        cur = db[COL_R].find_one({"_id": d["_id"]})
+        if cur is None:
+            d.update({"first_imported_at": ts, "last_seen_at": ts, "last_updated_at": ts, "history": []})
+            db[COL_R].insert_one(d); ni += 1
+        elif cur.get("content_sha256") != d["content_sha256"]:
+            changes = diff_docs(cur, d)
+            db[COL_R].update_one({"_id": d["_id"]}, {
+                "$set": {**{k: d[k] for k in d if k != "_id"}, "last_seen_at": ts, "last_updated_at": ts},
+                "$push": {"history": {"changed_at": ts, "source_file": source_file, "changes": changes}}})
+            nu += 1
+        else:
+            db[COL_R].update_one({"_id": d["_id"]}, {"$set": {"last_seen_at": ts, "source_file": source_file}})
+            ns += 1
+    print(f"  [APPLY] questions upsert={len(questions)} | responses insert={ni}, update={nu}, beze změny={ns}")
+    return {"insert": ni, "update": nu, "unchanged": ns, "wrote": True}
+
+
+def move_to_processed(csv_path, folder):
+    dest_dir = os.path.join(folder, PROCESSED_SUBDIR)
+    os.makedirs(dest_dir, exist_ok=True)
+    base = os.path.basename(csv_path)
+    dest = os.path.join(dest_dir, base)
+    if os.path.exists(dest):  # kolize -> přípona _N
+        stem, ext = os.path.splitext(base)
+        n = 1
+        while os.path.exists(os.path.join(dest_dir, f"{stem}_{n}{ext}")):
+            n += 1
+        dest = os.path.join(dest_dir, f"{stem}_{n}{ext}")
+    shutil.move(csv_path, dest)
+    print(f"  -> přesunuto do {PROCESSED_SUBDIR}\\{os.path.basename(dest)}")
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--csv", help="jediný soubor (NEpřesouvá)")
+    ap.add_argument("--folder", default=DEFAULT_FOLDER, help="složka se SIPIQ CSV (přesune do Zpracováno)")
+    ap.add_argument("--scope", choices=["czsk", "all"], default="czsk")
+    ap.add_argument("--apply", action="store_true")
+    ap.add_argument("--dry-run", action="store_true")
+    args = ap.parse_args()
+    dry = not args.apply
+
+    if args.csv:
+        files, move_mode, folder = [args.csv], False, None
+    else:
+        folder = args.folder
+        files = sorted(glob.glob(os.path.join(folder, "*.csv")))
+        move_mode = True
+        print(f"Složka: {folder}\nNalezeno CSV ke zpracování: {len(files)}")
+        if not files:
+            print("Nic ke zpracování (žádné *.csv).")
+            return
+
+    client = MongoClient(MONGO_URI, serverSelectionTimeoutMS=8000)
+    db = client[DB_NAME]
+    client.admin.command("ping")
+    inv, by_email, by_name = load_investigators(db)
+    print(f"Investigatorů CZ+SK v DB: {len(inv)}")
+
+    total = {"insert": 0, "update": 0, "unchanged": 0}
+    for f in files:
+        res = process_file(db, f, args.scope, dry, by_email, by_name)
+        for k in total:
+            total[k] += res[k]
+        if move_mode and res["wrote"]:
+            move_to_processed(f, folder)
+
+    print(f"\n=== CELKEM: insert={total['insert']}, update={total['update']}, beze změny={total['unchanged']} ===")
+    if dry:
+        print("[DRY-RUN] Nic se nezapsalo ani nepřesunulo. Ostrý běh: --apply")
+    client.close()
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,32 @@
+# analyze_sent_suspects_v1.0.py
+
+**Verze:** 1.0 · **Datum:** 2026-06-16
+
+Lokální (Z230) analyzátor `.msg` přenesených z JNJ (výstup
+`jnj_scan_failed_sent`). Přes **olefile** projde všechny `.msg` ve složce,
+u každého vytáhne klíčové MAPI vlastnosti a klasifikuje, zda jde o **neodeslaný**
+e-mail. Výstup = přehled do konzole + timestampovaný `.xlsx`.
+
+## Klasifikace
+- **FAIL_BODY** — tělo/report obsahuje „could not be sent" / „SendAsDenied" / …
+- **SENDAS_BUZ** — send-account / SentRepresenting / Sender obsahuje `buzalka.cz`
+- **NO_MSGID** — chybí Internet Message-ID (0x1035)
+- `failed = ANO`, pokud FAIL_BODY nebo SENDAS_BUZ (skoro jisté neodeslání).
+
+Vytáhne i **příjemce-lékaře** (externí adresa, ne `its.jnj.com`), subjekt,
+send-account a Message-ID. Datum bere z názvu souboru (`..._YYYY-MM-DD_...`).
+
+## Spuštění
+```
+python analyze_sent_suspects_v1.0.py [SLOZKA_S_MSG]
+```
+- Bez argumentu použije `INPUT_DIR` (default
+  `U:\Dropbox\!!!Days\Downloads Z230\sent_suspects`).
+- `.xlsx` se uloží do `U:\Dropbox\!!!Days\Downloads Z230\`.
+- Vyžaduje `olefile` + `openpyxl` (jsou ve venv `U:\janssen\.venv`).
+
+## Po analýze (další krok)
+Seznam příjemců s `failed=ANO` = lékaři, kterým **úvodní nabídka nedorazila**.
+Cross-ref na `feasibility.investigators` ukáže, komu (a v jakém KROK) je třeba
+poslat nabídku znovu — **se správným From `vbuzalka@its.jnj.com`**.
+"""
@@ -0,0 +1,196 @@
+# -*- coding: utf-8 -*-
+# =============================================================================
+# Nazev:   analyze_sent_suspects_v1.0.py
+# Verze:   1.0
+# Datum:   2026-06-16
+# Popis:   LOKALNI (Z230) analyzator .msg souboru prenesenych z JNJ (vystup
+#          jnj_scan_failed_sent). Pres olefile precte u kazdeho .msg klicove
+#          MAPI vlastnosti a klasifikuje, zda jde o NEODESLANY e-mail:
+#            FAIL_BODY  = telo/report obsahuje "could not be sent"/"SendAsDenied"
+#            SENDAS_BUZ = send-account / sentrep / sender obsahuje "buzalka.cz"
+#            NO_MSGID   = chybi Internet Message-ID (0x1035)
+#          Vytahne prijemce (externi = lekar), subjekt, send-account, Message-ID.
+#          Vystup: prehled do konzole + timestampovany .xlsx.
+# Pouziti: python analyze_sent_suspects_v1.0.py [SLOZKA_S_MSG]
+#          (default INPUT_DIR nize). Vyzaduje olefile + openpyxl.
+# =============================================================================
+
+import os
+import re
+import sys
+import glob
+import datetime
+import olefile
+import openpyxl
+
+INPUT_DIR = r"U:\Dropbox\!!!Days\Downloads Z230\sent_suspects"
+OUT_DIR = r"U:\Dropbox\!!!Days\Downloads Z230"
+
+FAIL_SIGNS = [
+    "could not be sent", "sendasdenied",
+    "permission to send the message on behalf",
+    "transportsend operation has failed", "mapiexceptionsendasdenied",
+]
+INTERNAL = ("its.jnj.com",)   # interni = ne-lekar (vc. cc Kocourkova/Bartosova)
+
+
+def rd(o, tag):
+    """Precti string stream __substg1.0_<tag> (zkousi 001F unicode i 001E ansi)."""
+    for t in (tag, tag[:-1] + "F", tag[:-1] + "E"):
+        name = "__substg1.0_" + t
+        if o.exists(name):
+            b = o.openstream(name).read()
+            if t.endswith("001F"):
+                try:
+                    return b.decode("utf-16-le")
+                except Exception:
+                    pass
+            for enc in ("cp1250", "latin-1", "utf-8"):
+                try:
+                    return b.decode(enc)
+                except Exception:
+                    pass
+    return ""
+
+
+def read_body(o):
+    txt = rd(o, "1000001F")            # PR_BODY
+    if not txt:
+        txt = rd(o, "1001001F")        # ReportText
+    # PR_HTML (binary) jako fallback
+    if not txt and o.exists("__substg1.0_10130102"):
+        try:
+            txt = o.openstream("__substg1.0_10130102").read().decode("latin-1", "ignore")
+        except Exception:
+            pass
+    return txt or ""
+
+
+def recipients_smtp(o):
+    """Posbira SMTP vsech prijemcu z __recip_version1.0_#xxxx storages."""
+    out = []
+    seen = set()
+    for entry in o.listdir():
+        # entry je list segmentu cesty; zajima nas prvni segment recip storage
+        if entry and entry[0].startswith("__recip_version1.0_#") and len(entry) == 2:
+            top = entry[0]
+            if top in seen:
+                continue
+            seen.add(top)
+            smtp = ""
+            for tag in ("39FE001F", "39FE001E", "3003001F", "3003001E", "0C1F001F"):
+                nm = top + "/__substg1.0_" + tag
+                if o.exists(nm):
+                    b = o.openstream(nm).read()
+                    try:
+                        s = b.decode("utf-16-le") if tag.endswith("1F") else b.decode("cp1250")
+                    except Exception:
+                        s = b.decode("latin-1", "ignore")
+                    s = s.strip()
+                    if "@" in s:
+                        smtp = s
+                        break
+            if smtp:
+                out.append(smtp)
+    return out
+
+
+def analyze_file(path):
+    o = olefile.OleFileIO(path)
+    try:
+        subject = rd(o, "0037001F")
+        msgid = rd(o, "1035001F")
+        sendacct = rd(o, "0E28001F")
+        sentrep = rd(o, "0065001F")
+        sender = rd(o, "0C1F001F")
+        body = read_body(o)
+        recs = recipients_smtp(o)
+    finally:
+        o.close()
+
+    low = body.lower()
+    flags = []
+    if any(s in low for s in FAIL_SIGNS):
+        flags.append("FAIL_BODY")
+    joined = " ".join([sendacct, sentrep, sender]).lower()
+    if "buzalka.cz" in joined:
+        flags.append("SENDAS_BUZ")
+    if not msgid:
+        flags.append("NO_MSGID")
+
+    # prijemce-lekar = externi (ne its.jnj.com)
+    ext = [r for r in recs if not any(d in r.lower() for d in INTERNAL)]
+    recipient = ext[0] if ext else (recs[0] if recs else "")
+
+    # datum z nazvu souboru (STRONG_YYYY-MM-DD_... / weak_YYYY-MM-DD_...)
+    m = re.search(r"(\d{4}-\d{2}-\d{2})", os.path.basename(path))
+    date = m.group(1) if m else ""
+
+    return {
+        "file": os.path.basename(path),
+        "date": date,
+        "recipient": recipient,
+        "subject": subject.strip(),
+        "msgid": msgid.strip(),
+        "send_account": sendacct.strip(),
+        "sentrep": sentrep.strip(),
+        "flags": "+".join(flags),
+        "failed": "ANO" if ("FAIL_BODY" in flags or "SENDAS_BUZ" in flags) else "?",
+    }
+
+
+def main():
+    indir = sys.argv[1] if len(sys.argv) > 1 else INPUT_DIR
+    files = sorted(glob.glob(os.path.join(indir, "*.msg")))
+    if not files:
+        print("Zadne .msg v:", indir)
+        return
+
+    rows = []
+    for f in files:
+        try:
+            rows.append(analyze_file(f))
+        except Exception as e:
+            rows.append({"file": os.path.basename(f), "date": "", "recipient": "",
+                         "subject": "<chyba cteni>", "msgid": "", "send_account": "",
+                         "sentrep": "", "flags": "ERR:" + str(e), "failed": "?"})
+
+    # serad: nejdriv jiste selhane, pak dle data
+    rows.sort(key=lambda r: (r["failed"] != "ANO", r["date"]))
+
+    n_fail = sum(1 for r in rows if r["failed"] == "ANO")
+    n_sendas = sum(1 for r in rows if "SENDAS_BUZ" in r["flags"])
+    n_failbody = sum(1 for r in rows if "FAIL_BODY" in r["flags"])
+    n_nomid = sum(1 for r in rows if "NO_MSGID" in r["flags"])
+
+    print(f"Souboru: {len(rows)}")
+    print(f"  jiste selhane (FAIL_BODY/SENDAS_BUZ): {n_fail}")
+    print(f"  z toho SENDAS_BUZ (buzalka.cz): {n_sendas} | FAIL_BODY: {n_failbody}")
+    print(f"  jen NO_MSGID (slabe): {n_nomid - n_fail if n_nomid>=n_fail else n_nomid}")
+    print("=" * 110)
+    print(f"{'datum':10} {'prijemce':32} {'fail':4} {'flags':22} subjekt")
+    print("-" * 110)
+    for r in rows:
+        print(f"{r['date']:10} {r['recipient'][:32]:32} {r['failed']:4} {r['flags']:22} {r['subject'][:40]}")
+
+    # xlsx
+    wb = openpyxl.Workbook()
+    ws = wb.active
+    ws.title = "suspects"
+    cols = ["file", "date", "recipient", "subject", "msgid", "send_account", "sentrep", "flags", "failed"]
+    from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE
+
+    def clean(v):
+        return ILLEGAL_CHARACTERS_RE.sub("", str(v)) if v is not None else ""
+
+    ws.append(cols)
+    for r in rows:
+        ws.append([clean(r[c]) for c in cols])
+    stamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    out = os.path.join(OUT_DIR, f"sent_suspects_analyza_{stamp}.xlsx")
+    wb.save(out)
+    print("\nXLSX:", out)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,51 @@
+# doplnujici_dotazy_v1.0 — evidence doplňujících dotazů na centra
+
+**Verze:** 1.0 · **Datum:** 2026-06-17 · **Studie:** 77242113UCO3002 (ICONIC / DAWN)
+
+## Účel
+Když v SIPIQ chybí odpověď a do dotazníku už NELZE vstoupit, doptáváme se centra zvlášť.
+Kolekce `feasibility.doplnujici_dotazy` eviduje, **ke kterému centru a ke které otázce** dotaz
+patří a v jakém je stavu. Souvisí s `sipiq_responses` / `sipiq_questions` (viz sipiq_import).
+
+## Model (domluva 17JUN2026)
+- **1 dok = dotazová UDÁLOST** (může nést více otázek v `questions[]`).
+- Když centrum odpoví → odpověď se **promítne do `sipiq_responses.answers_supplement{}`**
+  (`{value, source:"doplneno", doplnujici_dotaz_id, answered_at, answer_source}`); původní
+  Qualtrics `answers` se **NEMĚNÍ**. Analýza/rekonstrukce pak může překrýt answers o answers_supplement.
+
+## Struktura dokumentu
+```jsonc
+{
+  "_id": ObjectId,
+  "response_id": "R_…",            // ref sipiq_responses._id
+  "investigator_oid": ObjectId,    // ref investigators
+  "pi_last_name","site_name","site_country","pi_email",   // denormalizace
+  "status": "open",                // open → asked → answered → closed / no_response
+  "asked_at": null, "asked_via": null, "reason": "…", "note": null,
+  "questions": [
+    {"qcode":"Q72_1","question_base":"Q72","question_text":"…","section":"…",
+     "answer":null,"answered_at":null,"answer_source":null,"status":"open"}
+  ],
+  "created_at":"…","updated_at":"…","history":[]
+}
+```
+Indexy: `investigator_oid`, `response_id`, `status`, `questions.qcode`, `questions.status`.
+
+## Příkazy
+```
+.venv\Scripts\python.exe Feasibility\doplnujici_dotazy_v1.0.py ensure
+.venv\Scripts\python.exe Feasibility\doplnujici_dotazy_v1.0.py add --center <email|prijmeni|R_id> [--country CZ|SK] \
+        --qcodes Q72_1,Q73_1 [--reason "…"] [--asked-via "…"] [--status asked] [--note "…"] [--apply]
+.venv\Scripts\python.exe Feasibility\doplnujici_dotazy_v1.0.py answer --id <dotaz_id> --qcode Q72_1 \
+        --answer "8" [--source "email 18JUN2026"] [--apply]
+.venv\Scripts\python.exe Feasibility\doplnujici_dotazy_v1.0.py list [--center …] [--open]
+```
+- `add`/`answer` defaultně **dry-run**, ostrý běh `--apply`.
+- `add` dohledá centrum v `sipiq_responses` (R_id / pi_email / příjmení+země) a text+sekci otázky
+  v `sipiq_questions` (qcode může být leaf, např. Q72_1 → text báze Q72 + popisek item).
+- `answer` zapíše odpověď k otázce, přepočítá stav události (answered až když všechny otázky answered)
+  a promítne do `sipiq_responses.answers_supplement`.
+
+## Stav 17JUN2026
+Kolekce + indexy založeny (`ensure`), zatím 0 dokumentů. Dry-run `add` ověřen (Svoboda, Q72_1+Q73_1).
+Mongo 192.168.1.76:27017, bez auth, pymongo.
@@ -0,0 +1,254 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+doplnujici_dotazy_v1.0.py
+=========================
+Verze:  1.0
+Datum:  2026-06-17
+Autor:  Claude Code (pro MUDr. Vladimíra Buzalku)
+
+Popis
+-----
+Správa kolekce `feasibility.doplnujici_dotazy` — evidence doplňujících dotazů na centra,
+když v SIPIQ chybí odpověď a do dotazníku už NELZE vstoupit. Víme tak, ke kterému centru
+(a ke které otázce) dotaz patří, a v jakém je stavu.
+
+Model (domluva 17JUN2026): **1 dok = dotazová UDÁLOST** (může nést více otázek v `questions[]`).
+Když centrum odpoví, odpověď se PROMÍTNE i do `sipiq_responses.answers_supplement{}`
+(s příznakem source="doplneno"); původní Qualtrics `answers` se NEMĚNÍ.
+
+Životní cyklus dotazu: open → asked → answered → closed / no_response.
+
+Příkazy
+-------
+  ensure
+      Založí kolekci + indexy (idempotentní).
+
+  add --center <email|prijmeni|R_id> [--country CZ|SK] --qcodes Q72_1,Q73_1
+      [--reason "…"] [--asked-via "…"] [--status asked] [--note "…"] [--apply]
+      Založí novou dotazovou událost. Centrum + otázky se dohledají v sipiq_responses
+      / sipiq_questions; identita se denormalizuje. Default dry-run.
+
+  answer --id <dotaz_id> --qcode Q72_1 --answer "8" [--source "email 18JUN2026"] [--apply]
+      Zapíše odpověď k jedné otázce události, promítne do sipiq_responses.answers_supplement,
+      přepočítá stav události. Default dry-run.
+
+  list [--center <email|prijmeni>] [--open]
+      Vypíše dotazy (volitelně jen otevřené / pro jedno centrum).
+
+Mongo 192.168.1.76:27017, bez auth, pymongo.
+"""
+import argparse
+import re
+import sys
+from datetime import datetime, timezone
+
+from pymongo import MongoClient, ASCENDING
+from bson import ObjectId
+
+MONGO_URI = "mongodb://192.168.1.76:27017"
+DB = "feasibility"
+COL = "doplnujici_dotazy"
+COL_R = "sipiq_responses"
+COL_Q = "sipiq_questions"
+
+OPEN_STATES = ("open", "asked")
+
+
+def now_iso():
+    return datetime.now(timezone.utc).astimezone().isoformat(timespec="seconds")
+
+
+def qbase(qcode):
+    m = re.match(r"(Q\d+)", qcode)
+    return m.group(1) if m else qcode
+
+
+def db_conn():
+    c = MongoClient(MONGO_URI, serverSelectionTimeoutMS=8000)
+    c.admin.command("ping")
+    return c, c[DB]
+
+
+def ensure(db):
+    db[COL].create_index([("investigator_oid", ASCENDING)])
+    db[COL].create_index([("response_id", ASCENDING)])
+    db[COL].create_index([("status", ASCENDING)])
+    db[COL].create_index([("questions.qcode", ASCENDING)])
+    db[COL].create_index([("questions.status", ASCENDING)])
+    print(f"OK: kolekce '{COL}' + indexy připraveny. Dokumentů: {db[COL].count_documents({})}")
+
+
+def find_center(db, key, country=None):
+    """Najde sipiq_responses dle ResponseId / pi_email / příjmení."""
+    if key.startswith("R_"):
+        d = db[COL_R].find_one({"_id": key})
+        if d:
+            return d
+    d = db[COL_R].find_one({"pi_email": key.lower()})
+    if d:
+        return d
+    flt = {"pi_last_name": re.compile(f"^{re.escape(key)}$", re.I)}
+    if country:
+        flt["site_country"] = {"CZ": "Czech Republic", "SK": "Slovakia"}.get(country, country)
+    cands = list(db[COL_R].find(flt))
+    if len(cands) == 1:
+        return cands[0]
+    if len(cands) > 1:
+        raise SystemExit(f"CHYBA: '{key}' je nejednoznačné ({len(cands)} center). Upřesni e-mailem nebo --country / R_id.")
+    raise SystemExit(f"CHYBA: centrum '{key}' nenalezeno v {COL_R}.")
+
+
+def question_meta(db, qcode):
+    """Text + sekce otázky z sipiq_questions (qcode může být leaf, např. Q72_1)."""
+    base = qbase(qcode)
+    q = db[COL_Q].find_one({"_id": base})
+    if not q:
+        return {"question_base": base, "question_text": None, "section": None}
+    text = q.get("text")
+    label = None
+    for it in q.get("items", []):
+        if it.get("key") == qcode:
+            label = it.get("label")
+            break
+    full = f"{text} — {label}" if label else text
+    return {"question_base": base, "question_text": full, "section": q.get("section")}
+
+
+def cmd_add(db, args, dry):
+    center = find_center(db, args.center, args.country)
+    qcodes = [q.strip() for q in args.qcodes.split(",") if q.strip()]
+    questions = []
+    for qc in qcodes:
+        meta = question_meta(db, qc)
+        questions.append({
+            "qcode": qc, "question_base": meta["question_base"],
+            "question_text": meta["question_text"], "section": meta["section"],
+            "answer": None, "answered_at": None, "answer_source": None, "status": "open",
+        })
+    ts = now_iso()
+    doc = {
+        "response_id": center["_id"],
+        "investigator_oid": center.get("investigator_oid"),
+        "pi_last_name": center.get("pi_last_name"),
+        "site_name": center.get("site_name"),
+        "site_country": center.get("site_country"),
+        "pi_email": center.get("pi_email"),
+        "status": args.status,
+        "asked_at": ts if args.status == "asked" else None,
+        "asked_via": args.asked_via,
+        "reason": args.reason or "neodpovězeno v SIPIQ; dotazník už uzavřen",
+        "note": args.note,
+        "questions": questions,
+        "created_at": ts, "updated_at": ts, "history": [],
+    }
+    print(f"Centrum: {doc['pi_last_name']} / {doc['site_name']} ({doc['site_country']})  resp={doc['response_id']}")
+    for q in questions:
+        print(f"  • {q['qcode']:10} [{q['section']}]  {q['question_text']}")
+    if dry:
+        print("[DRY-RUN] Nezaloženo. Ostrý: --apply")
+        return
+    res = db[COL].insert_one(doc)
+    print(f"[APPLY] Založen dotaz _id={res.inserted_id}")
+
+
+def cmd_answer(db, args, dry):
+    doc = db[COL].find_one({"_id": ObjectId(args.id)})
+    if not doc:
+        raise SystemExit(f"CHYBA: dotaz _id={args.id} nenalezen.")
+    qs = doc["questions"]
+    target = next((q for q in qs if q["qcode"] == args.qcode), None)
+    if not target:
+        raise SystemExit(f"CHYBA: otázka {args.qcode} není v tomto dotazu (má: {[q['qcode'] for q in qs]}).")
+    ts = now_iso()
+    print(f"Centrum: {doc['pi_last_name']} / {doc['site_name']}  resp={doc['response_id']}")
+    print(f"  {args.qcode}: {target.get('answer')!r} -> {args.answer!r}  (zdroj: {args.source})")
+    print(f"  + promítnutí do {COL_R}.answers_supplement.{args.qcode}")
+    if dry:
+        print("[DRY-RUN] Nezapsáno. Ostrý: --apply")
+        return
+    # 1) update otázky v události
+    for q in qs:
+        if q["qcode"] == args.qcode:
+            q["answer"] = args.answer
+            q["answered_at"] = ts
+            q["answer_source"] = args.source
+            q["status"] = "answered"
+    all_answered = all(q["status"] == "answered" for q in qs)
+    new_status = "answered" if all_answered else "asked"
+    db[COL].update_one({"_id": doc["_id"]}, {
+        "$set": {"questions": qs, "status": new_status, "updated_at": ts},
+        "$push": {"history": {"changed_at": ts, "action": "answer",
+                              "qcode": args.qcode, "answer": args.answer, "source": args.source}},
+    })
+    # 2) promítnout do sipiq_responses.answers_supplement (původní answers NEMĚNÍM)
+    db[COL_R].update_one({"_id": doc["response_id"]}, {
+        "$set": {f"answers_supplement.{args.qcode}": {
+            "value": args.answer, "source": "doplneno",
+            "doplnujici_dotaz_id": doc["_id"], "answered_at": ts, "answer_source": args.source,
+        }}
+    })
+    print(f"[APPLY] Odpověď zapsána; stav události = {new_status}; promítnuto do {COL_R}.")
+
+
+def cmd_list(db, args):
+    flt = {}
+    if args.open:
+        flt["status"] = {"$in": list(OPEN_STATES)}
+    if args.center:
+        key = args.center
+        if key.startswith("R_"):
+            flt["response_id"] = key
+        elif "@" in key:
+            flt["pi_email"] = key.lower()
+        else:
+            flt["pi_last_name"] = re.compile(f"^{re.escape(key)}$", re.I)
+    docs = list(db[COL].find(flt).sort("created_at", -1))
+    print(f"Dotazů: {len(docs)}")
+    for d in docs:
+        print(f"\n[{d['_id']}] {d['pi_last_name']} / {d['site_name']} ({d.get('site_country')}) — {d['status']}")
+        for q in d["questions"]:
+            a = q.get("answer")
+            print(f"    {q['qcode']:10} {q['status']:9} {('= '+str(a)) if a else '(čeká)'}  | {q.get('question_text')}")
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    sub = ap.add_subparsers(dest="cmd", required=True)
+    sub.add_parser("ensure")
+    pa = sub.add_parser("add")
+    pa.add_argument("--center", required=True)
+    pa.add_argument("--country")
+    pa.add_argument("--qcodes", required=True)
+    pa.add_argument("--reason")
+    pa.add_argument("--asked-via", dest="asked_via")
+    pa.add_argument("--status", default="open", choices=["open", "asked"])
+    pa.add_argument("--note")
+    pa.add_argument("--apply", action="store_true")
+    pn = sub.add_parser("answer")
+    pn.add_argument("--id", required=True)
+    pn.add_argument("--qcode", required=True)
+    pn.add_argument("--answer", required=True)
+    pn.add_argument("--source")
+    pn.add_argument("--apply", action="store_true")
+    pl = sub.add_parser("list")
+    pl.add_argument("--center")
+    pl.add_argument("--open", action="store_true")
+    args = ap.parse_args()
+
+    client, db = db_conn()
+    try:
+        if args.cmd == "ensure":
+            ensure(db)
+        elif args.cmd == "add":
+            cmd_add(db, args, dry=not args.apply)
+        elif args.cmd == "answer":
+            cmd_answer(db, args, dry=not args.apply)
+        elif args.cmd == "list":
+            cmd_list(db, args)
+    finally:
+        client.close()
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,48 @@
+# jnj_dump_recipient_msgs_v1.0.py
+
+**Verze:** 1.0 · **Datum:** 2026-06-16
+
+JNJ-native (pywin32 / MAPI). Najde **všechny e-maily danému příjemci** (default
+Hušták) napříč vybranými složkami, **uloží je jako `.msg`** a u každého **vypíše
+diagnostické MAPI vlastnosti čtené ze živé položky**. Účel: ověřit, zda
+vlastnosti (GAL jméno, ReportText, send-account, Message-ID…) zůstanou i
+v uloženém `.msg` (porovnání olefilem doma).
+
+Skript **nic neodesílá ani nemaže** — jen čte a ukládá `.msg` kopie.
+
+## Spuštění (JNJ stroj s Outlookem)
+```
+pip install pywin32
+python jnj_dump_recipient_msgs_v1.0.py
+```
+
+## Co vypíše u každého e-mailu (ze ŽIVÉ položky)
+- složka, role (To/Cc), `item.Sent`, `PR_MESSAGE_FLAGS` (0x0E07)
+- subjekt, čas odeslání
+- **Msg-ID** `0x1035`
+- **SenderName** `0x0C1A` + addrtype `0x0C1E`
+- **SentRepresentingName** `0x0042` + addrtype `0x0064`
+- **PrimarySendAccount** `0x0E28` (odhalí posílání „jako buzalka.cz")
+- **ReportText** `0x1001` (NDR „could not be sent…" = selhání)
+
+…a pak položku uloží jako `.msg` do `OUTPUT_DIR`.
+
+## Konfigurace
+- `TARGET_EMAIL` — koho hledat (default `rastislav.hustak@fntt.sk`).
+- `SCAN_FOLDERS` — názvy složek (vč. podsložek); default Sent Items, Drafts,
+  Deleted Items, Archive, Inbox. `SCAN_ALL=True` = celá schránka (pomalé).
+- `OUTPUT_DIR` — kam ukládat `.msg` (default `C:\Users\vbuzalka\hustak_dump`).
+- `SENDER_SMTP` — účet, jehož store se prohledává.
+
+## Po spuštění
+1. Porovnej výpis (živé vlastnosti) — uvidíš, který e-mail má GAL jméno /
+   ReportText / send-account buzalka.cz.
+2. Přenes `.msg` z `OUTPUT_DIR` domů (libovolně, např. přes msgreceiver
+   upload nebo ručně) a olefilem zkontroluj, zda jsou v uloženém `.msg`
+   stejné vlastnosti jako na živé položce.
+
+## Pozn.
+- Match příjemce přes `PR_SMTP_ADDRESS` (0x39FE) → spolehlivě i pro interní
+  Exchange příjemce.
+- `olMSG = 3` (SaveAs typ). Název souboru = index + složka + subjekt + konec
+  EntryID (kvůli párování).
@@ -0,0 +1,188 @@
+# -*- coding: utf-8 -*-
+# =============================================================================
+# Nazev:   jnj_dump_recipient_msgs_v1.0.py
+# Verze:   1.0
+# Datum:   2026-06-16
+# Popis:   JNJ-native (MAPI / pywin32). Najde VSECHNY e-maily danemu prijemci
+#          (default Hustak) napric vybranymi slozkami, ULOZI je jako .msg a
+#          u kazdeho VYPISE diagnosticke MAPI vlastnosti precteni ze ZIVE
+#          polozky (Message-ID 0x1035, SenderName 0x0C1A, SentRepresentingName
+#          0x0042, addrtype 0x0C1E/0x0064, ReportText 0x1001, PrimarySendAccount
+#          0x0E28, MessageFlags 0x0E07, item.Sent). Cil: porovnat, zda tyto
+#          vlastnosti zustanou i v ulozenem .msg (olefile kontrola doma).
+# Pouziti: Spustit v JNJ Pythonu (Thonny), Outlook s JNJ schrankou.
+#          pip install pywin32 ;  python jnj_dump_recipient_msgs_v1.0.py
+#          Skript NIC neodesila ani nemaze, jen CTE a uklada .msg kopie.
+# =============================================================================
+
+import os
+import re
+import sys
+import win32com.client  # pywin32
+
+# ----------------------------- KONFIGURACE -----------------------------------
+
+SENDER_SMTP = "vbuzalka@its.jnj.com"           # ucet (jeho store se prohledava)
+TARGET_EMAIL = "rastislav.hustak@fntt.sk"      # koho hledame (To NEBO Cc)
+
+# Slozky k prohledani (shoda na NAZEV slozky kdekoli ve strome; vc. podslozek).
+# Prazdny seznam + SCAN_ALL=True => projde celou schranku (pomale!).
+SCAN_FOLDERS = ["Sent Items", "Drafts", "Deleted Items", "Archive", "Inbox"]
+SCAN_ALL = False
+
+# Kam ulozit .msg kopie (na JNJ stroji). Vytvori se, kdyz neexistuje.
+OUTPUT_DIR = r"C:\Users\vbuzalka\hustak_dump"
+
+# -----------------------------------------------------------------------------
+
+OL_MSG = 3              # olMSG (SaveAs typ)
+OL_FOLDER_SENT = 5
+PA = "http://schemas.microsoft.com/mapi/proptag/0x{:s}"
+
+# Diagnosticke tagy (PT_UNICODE 001F, dlouhe 0003)
+TAGS = [
+    ("Msg-ID",            "1035001F"),
+    ("SenderName",        "0C1A001F"),
+    ("SenderAddrType",    "0C1E001F"),
+    ("SentRepName",       "0042001F"),
+    ("SentRepAddrType",   "0064001F"),
+    ("ReportText",        "1001001F"),
+    ("PrimarySendAcct",   "0E28001F"),
+]
+TAG_MSGFLAGS = "0E070003"
+TAG_RCPT_ADDRTYPE = "3002001F"
+
+
+def smtp_of(recipient):
+    try:
+        return (recipient.PropertyAccessor.GetProperty(PA.format("39FE001E")) or "").lower()
+    except Exception:
+        try:
+            return (recipient.Address or "").lower()
+        except Exception:
+            return ""
+
+
+def get_prop(item, tag):
+    try:
+        v = item.PropertyAccessor.GetProperty(PA.format(tag))
+        return v
+    except Exception:
+        return None
+
+
+def get_store_root(ns):
+    try:
+        for acct in ns.Accounts:
+            if (acct.SmtpAddress or "").lower() == SENDER_SMTP.lower():
+                return acct.DeliveryStore.GetRootFolder()
+    except Exception:
+        pass
+    return ns.GetDefaultFolder(OL_FOLDER_SENT).Parent  # fallback: koren default store
+
+
+def iter_target_folders(root):
+    """Yield slozek, ktere se maji skenovat (dle nazvu + jejich podslozky)."""
+    def walk(folder, inscope):
+        scope = inscope or SCAN_ALL or (folder.Name in SCAN_FOLDERS)
+        if scope:
+            yield folder
+        try:
+            for sub in folder.Folders:
+                yield from walk(sub, scope)
+        except Exception:
+            pass
+    yield from walk(root, False)
+
+
+def safe(s, n=40):
+    s = re.sub(r"[^A-Za-z0-9._-]+", "_", (s or ""))
+    return s[:n].strip("_")
+
+
+def matches_target(item):
+    """Vrati ('To'/'Cc') kdyz je TARGET_EMAIL mezi prijemci, jinak None."""
+    tgt = TARGET_EMAIL.lower()
+    try:
+        for r in item.Recipients:
+            if smtp_of(r) == tgt:
+                return {1: "To", 2: "Cc", 3: "Bcc"}.get(r.Type, "To")
+    except Exception:
+        pass
+    return None
+
+
+def main():
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    outlook = win32com.client.Dispatch("Outlook.Application")
+    ns = outlook.GetNamespace("MAPI")
+    root = get_store_root(ns)
+
+    print(f"Hledam e-maily, kde je prijemce: {TARGET_EMAIL}")
+    print(f"Slozky: {'VSE' if SCAN_ALL else ', '.join(SCAN_FOLDERS)}")
+    print(f"Vystup .msg: {OUTPUT_DIR}")
+    print("=" * 90)
+
+    idx = 0
+    for folder in iter_target_folders(root):
+        try:
+            items = folder.Items
+        except Exception:
+            continue
+        for it in list(items):
+            try:
+                if it.Class != 43:   # olMail
+                    continue
+            except Exception:
+                continue
+            role = matches_target(it)
+            if not role:
+                continue
+            idx += 1
+
+            # --- diagnostika ze ZIVE polozky ---
+            try:
+                sent_flag = it.Sent
+            except Exception:
+                sent_flag = "?"
+            flags = get_prop(it, TAG_MSGFLAGS)
+            props = {label: get_prop(it, tag) for label, tag in TAGS}
+            try:
+                sent_on = it.SentOn
+            except Exception:
+                sent_on = None
+            try:
+                entry_tail = (it.EntryID or "")[-20:]
+            except Exception:
+                entry_tail = ""
+
+            print(f"\n[{idx}] slozka='{folder.Name}'  role={role}  Sent={sent_flag}  flags={flags}")
+            print(f"    subject : {getattr(it,'Subject','')}")
+            print(f"    sent_on : {sent_on}")
+            print(f"    Msg-ID         : {props['Msg-ID']}")
+            print(f"    SenderName     : {props['SenderName']}   (addrtype {props['SenderAddrType']})")
+            print(f"    SentRepName    : {props['SentRepName']}   (addrtype {props['SentRepAddrType']})")
+            print(f"    PrimarySendAcct: {props['PrimarySendAcct']}")
+            rt = props["ReportText"]
+            print(f"    ReportText 0x1001: {'ANO -> ' + repr(rt[:120]) if rt else '-'}")
+
+            # --- ulozeni .msg ---
+            fn = f"{idx:02d}_{safe(folder.Name,18)}_{safe(getattr(it,'Subject',''),28)}_{entry_tail}.msg"
+            path = os.path.join(OUTPUT_DIR, fn)
+            try:
+                it.SaveAs(path, OL_MSG)
+                print(f"    ulozeno: {fn}")
+            except Exception as e:
+                print(f"    !! SaveAs chyba: {e}")
+
+    print("\n" + "=" * 90)
+    print(f"Hotovo. Nalezeno a ulozeno: {idx} polozek do {OUTPUT_DIR}")
+    print("Prines .msg domu a porovnej vlastnosti olefilem (zive vs ulozene).")
+
+
+if __name__ == "__main__":
+    try:
+        main()
+    except Exception as e:
+        print("CHYBA:", e)
+        sys.exit(1)
@@ -0,0 +1,45 @@
+# jnj_scan_failed_sent_v1.0.py
+
+**Verze:** 1.0 · **Datum:** 2026-06-16
+
+JNJ-native (pywin32 / MAPI). Projde **Sent Items za posledních N dní** (default 60),
+najde **podezřelé = pravděpodobně neodeslané** e-maily, uloží je jako `.msg`
+a vypíše, které příznaky se trefily. **Nic neodesílá ani nemaže.**
+
+## Příznaky (čteno ze ŽIVÉ položky)
+- **FAIL_BODY** (silný) — tělo / ReportText obsahuje „could not be sent",
+  „SendAsDenied", „permission to send the message on behalf",
+  „TransportSend operation has failed", „MapiExceptionSendAsDenied".
+- **SENDAS_BUZ** (silný) — `PrimarySendAccount` (0x0E28) / SentRepresenting (0x0065)
+  / Sender (0x0C1F) obsahuje `buzalka.cz` → posíláno přes špatnou identitu.
+- **NO_MSGID** (slabý) — chybí Internet Message-ID (0x1035); může být i
+  provizorní kopie, co se později dokončí.
+
+`STRONG_*` soubory = silný příznak (skoro jistě neodesláno).
+`weak_*` soubory = jen NO_MSGID.
+
+## Spuštění (JNJ stroj s Outlookem)
+```
+pip install pywin32
+python jnj_scan_failed_sent_v1.0.py
+```
+
+## Konfigurace
+- `DAYS` = okno (default 60).
+- `OUTPUT_DIR` = kam ukládat `.msg` (default `C:\Users\vbuzalka\sent_suspects`).
+- `INCLUDE_NO_MSGID` = ukládat i jen-NO_MSGID položky (default True; dej False,
+  když chceš jen tvrdé FAIL/SENDAS).
+- `SENDER_SMTP` = účet, jehož Sent Items se skenuje.
+
+## Postup
+1. Spusť na JNJ → ve výpisu uvidíš podezřelé + uložené `.msg`.
+2. Přines `.msg` z `OUTPUT_DIR` domů → olefilem je projdeme a potvrdíme,
+   které opravdu neodešly (a komu je třeba poslat znovu se správným From).
+
+## Pozn.
+- Okno 60 dní = výkon (řazeno SentOn desc, starší se přeskočí brzy).
+- Detekce funguje nad **živou** položkou (čerstvý SaveAs) — proto se pouští
+  přímo na JNJ, ne nad starými batch kopiemi.
+- Hlavní příčina selhání: From = `vladimir.buzalka@buzalka.cz` na účtu
+  `vbuzalka@its.jnj.com` bez SendAs → Exchange odmítne. Viz paměť
+  project_jnj_unsent_detection.
@@ -0,0 +1,191 @@
+# -*- coding: utf-8 -*-
+# =============================================================================
+# Nazev:   jnj_scan_failed_sent_v1.0.py
+# Verze:   1.0
+# Datum:   2026-06-16
+# Popis:   JNJ-native (MAPI / pywin32). Projde slozku Odeslane (Sent Items) za
+#          poslednich N dni a najde PODEZRELE e-maily = pravdepodobne NEODESLANE
+#          (napr. SendAs denied). Kazdy podezrely ULOZI jako .msg a vypise, ktere
+#          priznaky se trefily. NIC neodesila ani nemaze, jen CTE a uklada.
+# Priznaky podezreni (cteno ze ZIVE polozky):
+#   FAIL_BODY   = telo/ReportText obsahuje "could not be sent" / "SendAsDenied"
+#                 / "permission to send the message on behalf" / "TransportSend"
+#   SENDAS_BUZ  = PrimarySendAccount/SentRepresenting/Sender obsahuje "buzalka.cz"
+#   NO_MSGID    = chybi Internet Message-ID (0x1035) -- slabsi priznak
+# Pouziti: JNJ Python (Thonny), Outlook s JNJ schrankou.
+#          pip install pywin32 ;  python jnj_scan_failed_sent_v1.0.py
+# =============================================================================
+
+import os
+import re
+import sys
+import datetime
+import win32com.client  # pywin32
+
+# ----------------------------- KONFIGURACE -----------------------------------
+
+SENDER_SMTP = "vbuzalka@its.jnj.com"
+DAYS = 60                                   # okno: poslednich N dni
+OUTPUT_DIR = r"C:\Users\vbuzalka\sent_suspects"
+
+# Ukladat i polozky, ktere maji JEN slaby priznak NO_MSGID (bez FAIL/SENDAS)?
+# True = vc. provizornich kopii bez Message-ID (muze byt vic souboru).
+INCLUDE_NO_MSGID = True
+
+# -----------------------------------------------------------------------------
+
+OL_MSG = 3
+OL_FOLDER_SENT = 5
+PA = "http://schemas.microsoft.com/mapi/proptag/0x{:s}"
+
+P_MSGID      = "1035001F"
+P_SENDACCT   = "0E28001F"   # PrimarySendAccount
+P_SENTREP_EM = "0065001F"   # SentRepresentingEmailAddress
+P_SENDER_EM  = "0C1F001F"   # SenderEmailAddress
+P_REPORTTEXT = "1001001F"   # ReportText (kdyz existuje)
+
+FAIL_SIGNS = [
+    "could not be sent",
+    "sendasdenied",
+    "permission to send the message on behalf",
+    "transportsend operation has failed",
+    "mapiexceptionsendasdenied",
+    "tuto zpravu nelze odeslat",          # pro pripad lokalizace
+]
+
+
+def gp(item, tag):
+    try:
+        return item.PropertyAccessor.GetProperty(PA.format(tag))
+    except Exception:
+        return None
+
+
+def get_sent_folder(ns):
+    try:
+        for acct in ns.Accounts:
+            if (acct.SmtpAddress or "").lower() == SENDER_SMTP.lower():
+                return acct.DeliveryStore.GetDefaultFolder(OL_FOLDER_SENT)
+    except Exception:
+        pass
+    return ns.GetDefaultFolder(OL_FOLDER_SENT)
+
+
+def safe(s, n=34):
+    return re.sub(r"[^A-Za-z0-9._-]+", "_", (s or ""))[:n].strip("_")
+
+
+def analyze(item):
+    """Vrati seznam priznaku (flags) pro polozku."""
+    flags = []
+
+    # 1) FAIL_BODY: telo + ReportText
+    blob = ""
+    try:
+        blob += (item.Body or "")
+    except Exception:
+        pass
+    rt = gp(item, P_REPORTTEXT)
+    if rt:
+        blob += "\n" + str(rt)
+    low = blob.lower()
+    if any(s in low for s in FAIL_SIGNS):
+        flags.append("FAIL_BODY")
+
+    # 2) SENDAS_BUZ: nektera z odesilatelskych poloz. obsahuje buzalka.cz
+    for tag in (P_SENDACCT, P_SENTREP_EM, P_SENDER_EM):
+        v = gp(item, tag)
+        if v and "buzalka.cz" in str(v).lower():
+            flags.append("SENDAS_BUZ")
+            break
+
+    # 3) NO_MSGID
+    mid = gp(item, P_MSGID)
+    if not mid:
+        flags.append("NO_MSGID")
+
+    return flags, (mid or "")
+
+
+def main():
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    cutoff = datetime.date.today() - datetime.timedelta(days=DAYS)
+
+    outlook = win32com.client.Dispatch("Outlook.Application")
+    ns = outlook.GetNamespace("MAPI")
+    sent = get_sent_folder(ns)
+    items = sent.Items
+    items.Sort("[SentOn]", True)  # nejnovejsi prvni
+
+    print(f"Slozka : {sent.FolderPath}")
+    print(f"Okno   : poslednich {DAYS} dni (od {cutoff.isoformat()})")
+    print(f"Vystup : {OUTPUT_DIR}")
+    print(f"NO_MSGID se uklada: {INCLUDE_NO_MSGID}")
+    print("=" * 90)
+
+    scanned = saved = strong = 0
+    for it in list(items):
+        try:
+            if it.Class != 43:
+                continue
+        except Exception:
+            continue
+        # datum + early stop
+        try:
+            s = it.SentOn
+            sdate = datetime.date(s.year, s.month, s.day)
+        except Exception:
+            sdate = None
+        if sdate is not None:
+            if sdate < cutoff:
+                break            # dale uz jen starsi (serazeno desc)
+        scanned += 1
+
+        flags, mid = analyze(it)
+        if not flags:
+            continue
+        is_strong = ("FAIL_BODY" in flags) or ("SENDAS_BUZ" in flags)
+        if not is_strong and not (INCLUDE_NO_MSGID and "NO_MSGID" in flags):
+            continue
+
+        saved += 1
+        if is_strong:
+            strong += 1
+
+        subj = ""
+        try:
+            subj = it.Subject or ""
+        except Exception:
+            pass
+        try:
+            tail = (it.EntryID or "")[-20:]
+        except Exception:
+            tail = ""
+
+        tagstr = "+".join(flags)
+        print(f"\n[{saved}] {sdate}  flags={tagstr}")
+        print(f"    subj : {subj}")
+        print(f"    msgid: {mid if mid else '<chybi>'}")
+
+        fn = f"{('STRONG' if is_strong else 'weak')}_{sdate}_{safe(subj,30)}_{tail}.msg"
+        path = os.path.join(OUTPUT_DIR, fn)
+        try:
+            it.SaveAs(path, OL_MSG)
+            print(f"    ulozeno: {fn}")
+        except Exception as e:
+            print(f"    !! SaveAs chyba: {e}")
+
+    print("\n" + "=" * 90)
+    print(f"Prohledano (v okne): {scanned}")
+    print(f"Ulozeno podezrelych: {saved}  (z toho silnych FAIL/SENDAS: {strong})")
+    print(f"Soubory v: {OUTPUT_DIR}  -> prines je domu ke kontrole.")
+    print("Pozn.: STRONG_* = telo NDR nebo send-account buzalka.cz (skoro jiste neodeslano).")
+    print("       weak_*   = jen chybi Message-ID (muze byt i provizorni kopie, co se pozdeji dokonci).")
+
+
+if __name__ == "__main__":
+    try:
+        main()
+    except Exception as e:
+        print("CHYBA:", e)
+        sys.exit(1)
@@ -0,0 +1,63 @@
+# -*- coding: utf-8 -*-
+# =============================================================================
+# Nazev:   promote_sipiq_submitted_v1.0.py
+# Verze:   1.0
+# Datum:   2026-06-17
+# Popis:   Posune dane investigatory (KROK 6 - SIPIQ odeslan) na
+#          KROK "7 - SIPIQ vyplneny" na zaklade Illuminator exportu
+#          (status "SIPIQ Submitted"). Illuminator = ultimatni zdroj, protoze
+#          lekar vyplneni SIPIQ nemusi oznamit e-mailem. Predřadi radek do STATUS.
+# Pouziti: python promote_sipiq_submitted_v1.0.py           (dry-run)
+#          python promote_sipiq_submitted_v1.0.py --apply
+# =============================================================================
+import sys
+from pymongo import MongoClient
+from bson import ObjectId
+
+MONGO_URI = "mongodb://192.168.1.76:27017"
+LINE = ("17JUN2026: SIPIQ VYPLNENY — dle Illuminator exportu (status „SIPIQ "
+        "Submitted“); lekar vyplneni neoznamil, Illuminator = ultimatni zdroj. KROK 7.")
+
+# 13 investigatoru se SIPIQ Submitted v Illuminatoru, v Mongo zatim KROK 6
+IDS = [
+    ("6a19832b5fc2213518257969", "Durina Juraj"),
+    ("6a19832b5fc221351825796e", "Falc Matej"),
+    ("6a19832b5fc2213518257954", "Fedurco Miroslav"),
+    ("6a19832b5fc221351825796c", "Gregar Jan"),
+    ("6a19832b5fc221351825794f", "Hlavaty Tibor"),
+    ("6a19832b5fc2213518257973", "Horvath Frantisek"),
+    ("6a19832b5fc221351825796f", "Konecny Michal"),
+    ("6a19832b5fc2213518257972", "Konecny Stefan"),
+    ("6a1c4275aa46d8b608065cec", "Lukac Ludovit"),
+    ("6a19832b5fc2213518257958", "Mihalkanin Lubomir"),
+    ("6a198b661218c31ab0f5ba41", "Pesta Martin"),
+    ("6a19832b5fc221351825795e", "Stepek David"),
+    ("6a198b661218c31ab0f5ba43", "Tichy Michal"),
+]
+
+
+def main():
+    apply = "--apply" in sys.argv
+    col = MongoClient(MONGO_URI)["feasibility"]["investigators"]
+    n = 0
+    for hid, label in IDS:
+        oid = ObjectId(hid)
+        d = col.find_one({"_id": oid}, {"STATUS": 1, "KROK": 1})
+        if not d:
+            print(f"  !! {label}: NENALEZEN"); continue
+        krok = d.get("KROK", "")
+        if not krok.startswith("6"):
+            print(f"  ~~ {label}: KROK={krok} (neni 6) -> preskakuji"); continue
+        print(f"  [{label}] KROK {krok} -> 7 - SIPIQ vyplneny")
+        if apply:
+            new_status = LINE + "\n" + (d.get("STATUS", "") or "")
+            col.update_one({"_id": oid}, {"$set": {
+                "KROK": "7 - SIPIQ vyplneny", "STATUS": new_status}})
+            n += 1
+    print(f"\n{'ZAPSANO' if apply else 'DRY-RUN'}: {n if apply else len(IDS)}/{len(IDS)}")
+    if not apply:
+        print(">>> Pro zapis spust s --apply")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,40 @@
+# sipiq_import_v1.2 — import SIPIQ odpovědí (folder workflow + provenance)
+
+**Verze:** 1.2 · **Datum:** 2026-06-17 · **Studie:** 77242113UCO3002 (ICONIC / DAWN)
+
+## Změny
+- **v1.2:** ke každé odpovědi `source_exported_at` = **datum/čas reportu podle filesystému**
+  (mtime CSV souboru). Mimo content-hash → nezpůsobuje zbytečné UPDATE; backfilluje se i na
+  "beze změny" cestě. v1.1 → `Feasibility\TRASH`.
+- **v1.1:** FOLDER workflow (`--folder`) — sebere *.csv, delta import, přesun do `Zpracováno`.
+
+## Kolekce
+- `sipiq_questions` — slovník dotazníku (rekonstrukce SIPIQ jako v PDF).
+- `sipiq_responses` — 1 dok = 1 odpověď (`_id`=ResponseId), ploché `answers{}`,
+  soft-link `investigator_oid`, `source_file` + `source_exported_at`, delta + `history[]`.
+
+Zdroj = Qualtrics **CSV** (ř.1 Qcode, ř.2 text otázky, ř.3 ImportId=QID). Export labels,
+desetinná tečka, recode unanswered vypnuté.
+
+## Delta (přepíše JEN změněná data)
+nová→INSERT; beze změn (shodný `content_sha256`)→jen `last_seen_at` + `source_file` + `source_exported_at`;
+změna→`$set` jen změněných polí + `$push` do `history[]`.
+
+## Soft-link na investigators (nedestruktivní)
+pi_email → email/email2 (lower), pak recipient_email, fallback příjmení (bez diakritiky)+země.
+
+## Použití
+```
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --dry-run     # folder režim, default složka
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --apply
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --folder "<cesta>" --apply
+.venv\Scripts\python.exe Feasibility\sipiq_import_v1.2.py --csv "<cesta.csv>" --apply   # jediný soubor, NEpřesouvá
+```
+Default složka `…\77242113UCO2001\ImportSIPIQcompled`; přesun do `Zpracováno` jen v `--apply` + folder režimu.
+`--scope czsk` (default) / `all`. Default = dry-run.
+
+## Workflow
+Uživatel pokládá kompletní SIPIQ reporty (Qualtrics CSV, název
+`ICONIC+Phase+3b+UC+Study+(77242113UCO3002)_SipIQ_V1_13MAY2026_<datum>_<čas>.csv`) do
+`ImportSIPIQcompled\`. Po `--apply` se naimportují (delta) a přesunou do `Zpracováno\`.
+`source_exported_at` se bere z mtime souboru (datum/čas reportu dle filesystému).
@@ -0,0 +1,489 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+sipiq_import_v1.2.py
+====================
+Verze:  1.2
+Datum:  2026-06-17
+Autor:  Claude Code (pro MUDr. Vladimíra Buzalku)
+
+Změny proti v1.1
+----------------
+- PROVENANCE: ke každé odpovědi se ukládá `source_exported_at` = datum/čas reportu
+  podle FILESYSTÉMU (mtime CSV souboru). Mimo content-hash → nezpůsobuje zbytečné
+  UPDATE; backfilluje se i na "beze změny" cestě. Stará v1.1 ponechána v TRASH.
+
+Změny proti v1.0
+----------------
+- FOLDER WORKFLOW (v1.1): režim --folder sebere *.csv ve složce, naimportuje (delta)
+  a přesune do podsložky `Zpracováno`. Default složka =
+  U:\\PythonProject\\Janssen\\Feasibility\\77242113UCO2001\\ImportSIPIQcompled.
+
+Popis
+-----
+Import SIPIQ odpovědí (Qualtrics CSV export, studie 77242113UCO3002 / ICONIC DAWN)
+do MongoDB db `feasibility`. Dvě kolekce:
+  * sipiq_questions  – slovník dotazníku (1 dok = 1 logická otázka).
+  * sipiq_responses  – 1 dok = 1 odpověď (_id = Qualtrics ResponseId), ploché answers{},
+                       soft-link investigator_oid, delta bookkeeping + history[].
+
+DELTA import (přepíše JEN změněná data): nová->insert; beze změn->jen last_seen_at;
+změna->$set jen změněných polí + push do history[].
+
+Použití
+-------
+  python sipiq_import_v1.2.py --dry-run            # folder režim, default složka
+  python sipiq_import_v1.2.py --apply
+  python sipiq_import_v1.2.py --folder "<cesta>" --apply
+  python sipiq_import_v1.2.py --csv "<cesta.csv>" --apply   # jediný soubor (NEpřesouvá)
+
+Závislosti: pymongo (.venv). Mongo 192.168.1.76:27017, bez auth.
+"""
+import argparse
+import csv
+import glob
+import hashlib
+import json
+import os
+import re
+import shutil
+import sys
+import unicodedata
+from datetime import datetime, timezone
+
+try:
+    from pymongo import MongoClient
+except ImportError:
+    print("CHYBA: pymongo není nainstalován v aktuálním pythonu.", file=sys.stderr)
+    raise
+
+MONGO_URI = "mongodb://192.168.1.76:27017"
+DB_NAME = "feasibility"
+COL_Q = "sipiq_questions"
+COL_R = "sipiq_responses"
+DEFAULT_FOLDER = r"U:\PythonProject\Janssen\Feasibility\77242113UCO2001\ImportSIPIQcompled"
+PROCESSED_SUBDIR = "Zpracováno"
+
+META_COLS = {
+    "StartDate", "EndDate", "Status", "IPAddress", "Progress", "Duration (in seconds)",
+    "Finished", "RecordedDate", "ResponseId", "RecipientLastName", "RecipientFirstName",
+    "RecipientEmail", "ExternalReference", "LocationLatitude", "LocationLongitude",
+    "DistributionChannel", "UserLanguage",
+}
+
+PROMOTE = [
+    "site_name", "site_address", "site_city", "site_state", "site_postcode", "site_country",
+    "pi_first_name", "pi_last_name", "pi_phone", "pi_email",
+    "sdl_site_id", "fire_site_id", "fire_investigator_id", "mailinglist_id",
+    "survey_generated_by", "Date", "Time",
+]
+
+SECTION_BY_QNUM = {}
+def _sec(rng, name):
+    for n in rng:
+        SECTION_BY_QNUM[n] = name
+_sec([2], "J&J Internal Assessment")
+_sec([6, 7, 8, 9, 10, 11, 12, 13], "Contact Information")
+_sec(range(14, 22), "Confidentiality Statement")
+_sec([25, 26, 27], "Interest")
+_sec([29, 30, 31, 32, 33, 34], "Protocol Requirements")
+_sec([36, 37, 38], "Enrollment")
+_sec([40, 41, 42, 43], "Patient Demographics Overview")
+_sec([45, 46, 47, 48, 49], "Site Overview")
+_sec([51], "Operational Considerations")
+_sec([53, 54], "Comments")
+_sec([57, 58, 59, 60, 61], "Patient Population")
+_sec([63, 64, 65, 66, 67], "Site Experience and Staffing")
+_sec([69], "Equipment and Facility Requirements")
+_sec([71, 72, 73, 74, 75], "Institutional Review Board, Ethics Committee, and Contracts")
+
+STEM_OVERRIDE = {
+    "Q31": "At your site, at what line(s) of treatment do you most commonly prescribe "
+           "vedolizumab for patients with moderately to severely active ulcerative colitis?",
+    "Q63": "Do you or your site staff have experience in performing the following types of "
+           "study assessments/procedures?",
+    "Q64": "The following personnel are required to run the study. "
+           "Will your site have the following available?",
+    "Q69": "The following equipment and facilities are required to run the studies. "
+           "Are these available at your site?",
+}
+
+
+def now_iso():
+    return datetime.now(timezone.utc).astimezone().isoformat(timespec="seconds")
+
+
+def file_mtime_iso(path):
+    return datetime.fromtimestamp(os.path.getmtime(path)).astimezone().isoformat(timespec="seconds")
+
+
+def strip_accents(s):
+    if not s:
+        return ""
+    return "".join(c for c in unicodedata.normalize("NFKD", s) if not unicodedata.combining(c))
+
+
+def norm_name(s):
+    return re.sub(r"\s+", " ", strip_accents(s or "").lower()).strip()
+
+
+def sanitize_key(qcode):
+    return qcode.replace("#", "_").replace(".", "_")
+
+
+def qnum(qcode):
+    m = re.match(r"Q(\d+)", qcode)
+    return int(m.group(1)) if m else None
+
+
+def qbase(qcode):
+    m = re.match(r"(Q\d+)", qcode)
+    return m.group(1) if m else qcode
+
+
+def import_id(h3_cell):
+    try:
+        return json.loads(h3_cell).get("ImportId", "")
+    except Exception:
+        return h3_cell
+
+
+def split_text(text):
+    parts = [p.strip() for p in re.split(r"\s+-\s+", text)]
+    stem = parts[0]
+    if len(parts) == 1:
+        return stem, None
+    label_parts = [p for p in parts[1:] if p.lower() != "selected choice"]
+    label_parts = [p for p in label_parts if not re.fullmatch(r"Q\d+#\d+", p)]
+    return stem, (" - ".join(label_parts) if label_parts else None)
+
+
+def detect_type(qcode, observed):
+    has_hash = "#" in qcode
+    vals = [v for v in observed if v]
+    yesno = vals and all(v in ("Yes", "No") for v in vals)
+    numeric = vals and all(re.fullmatch(r"-?\d+(\.\d+)?", v) for v in vals)
+    if has_hash and yesno:
+        return "matrix_yesno"
+    if has_hash and numeric:
+        return "matrix_percent"
+    if has_hash:
+        return "matrix"
+    if numeric:
+        return "numeric"
+    if yesno:
+        return "yesno"
+    return "single_or_text"
+
+
+def load_csv(path):
+    with open(path, encoding="utf-8-sig", newline="") as fh:
+        rows = list(csv.reader(fh))
+    h1, h2, h3 = rows[0], rows[1], rows[2]
+    data = rows[3:]
+    cols = [{"i": i, "code": c, "text": t, "qid": import_id(j)}
+            for i, (c, t, j) in enumerate(zip(h1, h2, h3))]
+    return cols, data
+
+
+def col_getter(cols, data):
+    idx = {c["code"]: c["i"] for c in cols}
+    def get(row, code):
+        i = idx.get(code)
+        return (row[i].strip() if i is not None and i < len(row) else "")
+    return get, idx
+
+
+def is_question_col(code):
+    return bool(re.match(r"Q\d", code))
+
+
+def build_questions(cols, data):
+    qcols = [c for c in cols if is_question_col(c["code"])]
+    observed = {c["code"]: set() for c in qcols}
+    for row in data:
+        for c in qcols:
+            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
+            if v:
+                observed[c["code"]].add(v)
+    groups, order_seen = {}, []
+    for c in qcols:
+        base = qbase(c["code"])
+        if base not in groups:
+            groups[base] = {"_id": base, "order": c["i"], "qnum": qnum(c["code"]),
+                            "section": SECTION_BY_QNUM.get(qnum(c["code"]), "Other"),
+                            "qids": [], "text": split_text(c["text"])[0],
+                            "items": [], "_obs": set(), "_types": []}
+            order_seen.append(base)
+        g = groups[base]
+        bq = re.match(r"(QID\d+)", c["qid"] or "")
+        if bq and bq.group(1) not in g["qids"]:
+            g["qids"].append(bq.group(1))
+        _, label = split_text(c["text"])
+        item = {"key": sanitize_key(c["code"]), "qcode": c["code"], "qid": c["qid"]}
+        if label:
+            item["label"] = label
+        g["items"].append(item)
+        g["_obs"] |= observed[c["code"]]
+        g["_types"].append(detect_type(c["code"], observed[c["code"]]))
+    out = []
+    for n, base in enumerate(order_seen):
+        g = groups[base]
+        obs = sorted(g.pop("_obs"))
+        types = g.pop("_types")
+        gtype = max(set(types), key=types.count) if types else "single_or_text"
+        g["type"] = gtype
+        if gtype in ("yesno", "matrix_yesno"):
+            g["options"] = ["Yes", "No"]
+        elif gtype == "single_or_text" and obs and len(obs) <= 12:
+            g["options"] = obs
+        else:
+            g["options"] = []
+        if base in STEM_OVERRIDE:
+            g["text"] = STEM_OVERRIDE[base]
+        g["order"] = n
+        if len(g["items"]) == 1 and "label" not in g["items"][0]:
+            g["items"] = []
+        out.append(g)
+    return out
+
+
+def build_response(cols, get, row, source_file):
+    rid = get(row, "ResponseId")
+    answers = {}
+    for c in cols:
+        if is_question_col(c["code"]):
+            v = (row[c["i"]].strip() if c["i"] < len(row) else "")
+            if v:
+                answers[sanitize_key(c["code"])] = v
+    meta = {
+        "start_date": get(row, "StartDate") or None,
+        "end_date": get(row, "EndDate") or None,
+        "recorded_date": get(row, "RecordedDate") or None,
+        "status": get(row, "Status") or None,
+        "progress": int(get(row, "Progress")) if get(row, "Progress").isdigit() else (get(row, "Progress") or None),
+        "finished": get(row, "Finished") in ("True", "1", "TRUE"),
+        "duration_sec": int(get(row, "Duration (in seconds)")) if get(row, "Duration (in seconds)").isdigit() else None,
+        "user_language": get(row, "UserLanguage") or None,
+        "distribution_channel": get(row, "DistributionChannel") or None,
+        "ip_address": get(row, "IPAddress") or None,
+        "location_lat": get(row, "LocationLatitude") or None,
+        "location_lng": get(row, "LocationLongitude") or None,
+        "survey_date": get(row, "Date") or None,
+        "survey_time": get(row, "Time") or None,
+    }
+    doc = {
+        "_id": rid, "study": "77242113UCO3002",
+        "site_country": get(row, "site_country") or None,
+        "site_name": get(row, "site_name") or None,
+        "site_city": get(row, "site_city") or None,
+        "site_state": get(row, "site_state") or None,
+        "site_postcode": get(row, "site_postcode") or None,
+        "site_address": get(row, "site_address") or None,
+        "pi_first_name": get(row, "pi_first_name") or None,
+        "pi_last_name": get(row, "pi_last_name") or None,
+        "pi_email": (get(row, "pi_email") or "").lower() or None,
+        "pi_phone": get(row, "pi_phone") or None,
+        "sdl_site_id": get(row, "sdl_site_id") or None,
+        "fire_site_id": get(row, "fire_site_id") or None,
+        "fire_investigator_id": get(row, "fire_investigator_id") or None,
+        "mailinglist_id": get(row, "mailinglist_id") or None,
+        "survey_generated_by": get(row, "survey_generated_by") or None,
+        "recipient_email": (get(row, "RecipientEmail") or "").lower() or None,
+        "recipient_last_name": get(row, "RecipientLastName") or None,
+        "recipient_first_name": get(row, "RecipientFirstName") or None,
+        "meta": meta,
+        "is_full_sipiq": any(k.startswith(("Q57", "Q58", "Q59", "Q63", "Q66", "Q71")) for k in answers),
+        "interested": answers.get("Q25"),
+        "answers": answers,
+        "investigator_oid": None, "investigator_match": None,
+        "source_file": source_file,
+    }
+    return doc
+
+
+def content_hash(doc):
+    payload = {k: doc[k] for k in doc if k not in
+               ("content_sha256", "first_imported_at", "last_seen_at", "last_updated_at",
+                "history", "investigator_oid", "investigator_match", "source_file",
+                "source_exported_at")}
+    return hashlib.sha256(json.dumps(payload, sort_keys=True, ensure_ascii=False,
+                                     default=str).encode("utf-8")).hexdigest()
+
+
+def load_investigators(db):
+    inv = list(db.investigators.find(
+        {"zeme": {"$in": ["Czech Republic", "Slovakia"]}},
+        {"prijmeni": 1, "jmeno": 1, "email": 1, "email2": 1, "zeme": 1, "KROK": 1}))
+    by_email, by_name = {}, {}
+    for d in inv:
+        for ef in ("email", "email2"):
+            e = (d.get(ef) or "").lower().strip()
+            if e:
+                by_email.setdefault(e, d)
+        nm = norm_name(d.get("prijmeni"))
+        if nm:
+            by_name.setdefault((nm, d.get("zeme")), []).append(d)
+    return inv, by_email, by_name
+
+
+def soft_link(doc, by_email, by_name):
+    e = (doc.get("pi_email") or "").lower().strip()
+    if e and e in by_email:
+        d = by_email[e]; return d["_id"], f"email:{e}", d
+    e2 = (doc.get("recipient_email") or "").lower().strip()
+    if e2 and e2 in by_email:
+        d = by_email[e2]; return d["_id"], f"recipient_email:{e2}", d
+    nm = norm_name(doc.get("pi_last_name"))
+    cand = by_name.get((nm, doc.get("site_country")), [])
+    if len(cand) == 1:
+        return cand[0]["_id"], f"prijmeni:{nm}", cand[0]
+    if len(cand) > 1:
+        return None, f"prijmeni_ambiguous:{nm}({len(cand)})", None
+    return None, "NENALEZENO", None
+
+
+def diff_docs(old, new):
+    changes = []
+    def walk(prefix, o, n):
+        for k in sorted(set((o or {}).keys()) | set((n or {}).keys())):
+            ov, nv = (o or {}).get(k), (n or {}).get(k)
+            if isinstance(ov, dict) or isinstance(nv, dict):
+                walk(f"{prefix}{k}.", ov or {}, nv or {})
+            elif ov != nv:
+                changes.append({"key": f"{prefix}{k}", "old": ov, "new": nv})
+    for field in ("answers", "meta"):
+        walk(f"{field}.", old.get(field, {}), new.get(field, {}))
+    for k in ("site_name", "pi_email", "pi_last_name", "interested", "is_full_sipiq"):
+        if old.get(k) != new.get(k):
+            changes.append({"key": k, "old": old.get(k), "new": new.get(k)})
+    return changes
+
+
+# ---------------------------------------------------------------------------
+def process_file(db, csv_path, scope, dry, by_email, by_name):
+    source_file = os.path.basename(csv_path)
+    exported_at = file_mtime_iso(csv_path)   # datum/čas reportu dle filesystému (mtime)
+    cols, data = load_csv(csv_path)
+    get, _ = col_getter(cols, data)
+    if scope == "czsk":
+        data = [r for r in data if get(r, "site_country") in ("Czech Republic", "Slovakia")]
+    print(f"\n########## {source_file}  (rozsah={scope}, odpovědí={len(data)}, export={exported_at}) ##########")
+
+    cols_all, data_all = load_csv(csv_path)
+    questions = build_questions(cols_all, data_all)
+
+    docs, link_rows = [], []
+    for r in data:
+        doc = build_response(cols, get, r, source_file)
+        oid, how, matched = soft_link(doc, by_email, by_name)
+        doc["investigator_oid"] = oid
+        doc["investigator_match"] = how
+        doc["source_exported_at"] = exported_at
+        doc["content_sha256"] = content_hash(doc)
+        docs.append(doc)
+        link_rows.append((doc, how, matched))
+
+    existing = {d["_id"]: d for d in db[COL_R].find({}, {"content_sha256": 1})}
+    to_insert = [d for d in docs if d["_id"] not in existing]
+    to_update = [d for d in docs if d["_id"] in existing and existing[d["_id"]].get("content_sha256") != d["content_sha256"]]
+    unchanged = [d for d in docs if d["_id"] in existing and existing[d["_id"]].get("content_sha256") == d["content_sha256"]]
+
+    mk7 = mko = un = 0
+    for doc, how, m in link_rows:
+        krok = (m or {}).get("KROK", "")
+        if m and str(krok).startswith("7"): mk7 += 1
+        elif m: mko += 1
+        else: un += 1
+    print(f"  slovník: {len(questions)} otázek | soft-link: KROK7={mk7}, jiný={mko}, nenapárováno={un}")
+    print(f"  delta: INSERT={len(to_insert)}, UPDATE={len(to_update)}, beze změny={len(unchanged)}")
+    if un:
+        for doc, how, m in link_rows:
+            if not m:
+                print(f"    ✗ NENAPÁROVÁNO: {doc.get('pi_last_name')} / {doc.get('pi_email')} ({how})")
+
+    if dry:
+        print("  [DRY-RUN] nezapsáno")
+        return {"insert": 0, "update": 0, "unchanged": 0, "wrote": False}
+
+    for q in questions:
+        db[COL_Q].replace_one({"_id": q["_id"]}, q, upsert=True)
+    ts = now_iso()
+    ni = nu = ns = 0
+    for d in docs:
+        cur = db[COL_R].find_one({"_id": d["_id"]})
+        if cur is None:
+            d.update({"first_imported_at": ts, "last_seen_at": ts, "last_updated_at": ts, "history": []})
+            db[COL_R].insert_one(d); ni += 1
+        elif cur.get("content_sha256") != d["content_sha256"]:
+            changes = diff_docs(cur, d)
+            db[COL_R].update_one({"_id": d["_id"]}, {
+                "$set": {**{k: d[k] for k in d if k != "_id"}, "last_seen_at": ts, "last_updated_at": ts},
+                "$push": {"history": {"changed_at": ts, "source_file": source_file, "changes": changes}}})
+            nu += 1
+        else:
+            db[COL_R].update_one({"_id": d["_id"]}, {"$set": {
+                "last_seen_at": ts, "source_file": source_file, "source_exported_at": d["source_exported_at"]}})
+            ns += 1
+    print(f"  [APPLY] questions upsert={len(questions)} | responses insert={ni}, update={nu}, beze změny={ns}")
+    return {"insert": ni, "update": nu, "unchanged": ns, "wrote": True}
+
+
+def move_to_processed(csv_path, folder):
+    dest_dir = os.path.join(folder, PROCESSED_SUBDIR)
+    os.makedirs(dest_dir, exist_ok=True)
+    base = os.path.basename(csv_path)
+    dest = os.path.join(dest_dir, base)
+    if os.path.exists(dest):
+        stem, ext = os.path.splitext(base)
+        n = 1
+        while os.path.exists(os.path.join(dest_dir, f"{stem}_{n}{ext}")):
+            n += 1
+        dest = os.path.join(dest_dir, f"{stem}_{n}{ext}")
+    shutil.move(csv_path, dest)
+    print(f"  -> přesunuto do {PROCESSED_SUBDIR}\\{os.path.basename(dest)}")
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--csv", help="jediný soubor (NEpřesouvá)")
+    ap.add_argument("--folder", default=DEFAULT_FOLDER, help="složka se SIPIQ CSV (přesune do Zpracováno)")
+    ap.add_argument("--scope", choices=["czsk", "all"], default="czsk")
+    ap.add_argument("--apply", action="store_true")
+    ap.add_argument("--dry-run", action="store_true")
+    args = ap.parse_args()
+    dry = not args.apply
+
+    if args.csv:
+        files, move_mode, folder = [args.csv], False, None
+    else:
+        folder = args.folder
+        files = sorted(glob.glob(os.path.join(folder, "*.csv")))
+        move_mode = True
+        print(f"Složka: {folder}\nNalezeno CSV ke zpracování: {len(files)}")
+        if not files:
+            print("Nic ke zpracování (žádné *.csv).")
+            return
+
+    client = MongoClient(MONGO_URI, serverSelectionTimeoutMS=8000)
+    db = client[DB_NAME]
+    client.admin.command("ping")
+    inv, by_email, by_name = load_investigators(db)
+    print(f"Investigatorů CZ+SK v DB: {len(inv)}")
+
+    total = {"insert": 0, "update": 0, "unchanged": 0}
+    for f in files:
+        res = process_file(db, f, args.scope, dry, by_email, by_name)
+        for k in total:
+            total[k] += res[k]
+        if move_mode and res["wrote"]:
+            move_to_processed(f, folder)
+
+    print(f"\n=== CELKEM: insert={total['insert']}, update={total['update']}, beze změny={total['unchanged']} ===")
+    if dry:
+        print("[DRY-RUN] Nic se nezapsalo ani nepřesunulo. Ostrý běh: --apply")
+    client.close()
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,38 @@
+# store_cda_seaweed_v1.0.py
+
+**Verze:** 1.0 · **Datum:** 2026-06-17
+
+## Účel
+Uloží podepsané CDA (PDF) z e-mailů asistentek (CTA) do Mongo
+`feasibility.investigators` do pole `cda.*` a posune lékaře na
+`KROK "5 - CDA podepsano"`.
+
+Na rozdíl od `store_cda_batch` (stahuje `.msg` přes SFTP z Toweru a tahá přílohu
+přes `extract_msg`) tahle verze stahuje PDF **přímo ze SeaweedFS** přes
+`seaweed_url`, který parser ukládá k příloze v `emaily."vbuzalka@its.jnj.com"`
+(`attachments[].seaweed_url` + `sha256`). Jednodušší, bez SFTP.
+
+## Jak to funguje
+- `MAPPING` = explicitní párování `investigator _id → (seaweed_url, filename, sha256, size, source_msg_id)`.
+- Pro každý záznam: stáhne PDF (urllib), ověří **SHA256 + velikost + PDF hlavičku**,
+  base64-zakóduje a uloží do `cda`:
+  `data_base64, data_sha256, data_filename, data_mime, data_size, data_stored_at,
+  data_source_msg` + metadata `stav="podepsano", soubor, zdroj`.
+- Nastaví `KROK = "5 - CDA podepsano"` a předřadí řádek do `STATUS`.
+- `_id` se konvertuje na `ObjectId` (čisté pymongo nekonvertuje string→ObjectId samo).
+
+## Použití
+```
+.venv\Scripts\python.exe Feasibility\store_cda_seaweed_v1.0.py            # dry-run (ověří stažení+SHA, nezapisuje)
+.venv\Scripts\python.exe Feasibility\store_cda_seaweed_v1.0.py --apply    # zapíše do Mongo
+```
+
+## Běh 17JUN2026 (--apply)
+Uloženo 5/5 (všechny SHA256 OK), KROK 4 → 5:
+Závada Filip, Bruncák Michal (FNsP B. Bystrica), Machytka Evžen (Asclepiades),
+Pumprla Jiří (PreventaMed), Zapotocká Júlia (PAV-MED).
+GASTROMART/Molnár přeskočen (už KROK 6, CDA dříve uloženo).
+
+## Závislosti
+`pymongo`, `bson` (+ stdlib). SeaweedFS volume server `192.168.1.50:8888`.
+Mongo `192.168.1.76:27017`.
@@ -0,0 +1,126 @@
+# -*- coding: utf-8 -*-
+# =============================================================================
+# Nazev:   store_cda_seaweed_v1.0.py
+# Verze:   1.0
+# Datum:   2026-06-17
+# Popis:   Ulozi podepsane CDA (PDF) z e-mailu asistentek do Mongo
+#          feasibility.investigators do pole cda.* a posune lekare na
+#          KROK "5 - CDA podepsano". PDF se stahuji primo ze SeaweedFS
+#          (seaweed_url z attachments v emaily."vbuzalka@its.jnj.com"),
+#          overuje se SHA256 proti metadatum z Mongo.
+# Pouziti: python store_cda_seaweed_v1.0.py           (dry-run / nahled)
+#          python store_cda_seaweed_v1.0.py --apply    (zapise do Mongo)
+# Pozn.:   MAPPING nize = explicitni parovani investigator -> CDA priloha.
+#          Jen stdlib + pymongo. SeaweedFS host 192.168.1.50:8888.
+# =============================================================================
+
+import sys
+import base64
+import hashlib
+import urllib.request
+from datetime import datetime, timezone
+from pymongo import MongoClient
+from bson import ObjectId
+
+MONGO_URI = "mongodb://192.168.1.76:27017"
+DBN, COL = "feasibility", "investigators"
+
+# (investigator _id, seaweed_url, filename, sha256, size, source_msg_id, label)
+MAPPING = [
+    ("6a198b661218c31ab0f5ba57",
+     "http://192.168.1.50:8888/mail-attachments/1a/86/1a86e987b9d3da57c1d863b47734133f2e2d7eae3f5cfe91112c475eb86d86e9",
+     "CZ_CDA PI_MUDr. Filip Zavada_fully signed_16Jun2026.pdf",
+     "1a86e987b9d3da57c1d863b47734133f2e2d7eae3f5cfe91112c475eb86d86e9",
+     479026, "<CH2PR07MB7190A5538ACDC1D49F8B430780E52@CH2PR07MB7190.namprd07.prod.outlook.com>",
+     "Zavada Filip"),
+    ("6a19832b5fc2213518257957",
+     "http://192.168.1.50:8888/mail-attachments/64/b0/64b06d48bfe3c49095e326988f14c04fd5849728b227647f6653b2e3c3095538",
+     "SK_CDA PI_Bruncak_FNsP BBystrica_fully signed 16Jun2026.pdf",
+     "64b06d48bfe3c49095e326988f14c04fd5849728b227647f6653b2e3c3095538",
+     498069, "<SA1PR07MB952874B8654156369CDE44448CE52@SA1PR07MB9528.namprd07.prod.outlook.com>",
+     "Bruncak Michal"),
+    ("6a19832b5fc2213518257961",
+     "http://192.168.1.50:8888/mail-attachments/c2/72/c272ca62bd27ca10aed35cb54054d880f4f0e2f59940ed3b067b17d51a9ac041",
+     "CZ_CDA Institution_Asclepiades s.r.o._MUDr. Machytka_16Jun2026.pdf",
+     "c272ca62bd27ca10aed35cb54054d880f4f0e2f59940ed3b067b17d51a9ac041",
+     460977, "<PH0PR07MB97879A9C9BF9C00D38D4798A9FE52@PH0PR07MB9787.namprd07.prod.outlook.com>",
+     "Machytka Evzen (Asclepiades)"),
+    ("6a19832b5fc2213518257967",
+     "http://192.168.1.50:8888/mail-attachments/99/37/99372c399be3b001428ef4b36d43e250dedced5955de5d1f3a2d63a9f0c1728b",
+     "CZ_CDA institution_PreventaMed sro_fully signed_16Jun2026.pdf",
+     "99372c399be3b001428ef4b36d43e250dedced5955de5d1f3a2d63a9f0c1728b",
+     457745, "<CH2PR07MB719008DB0B3CAFD764AE2E8280E52@CH2PR07MB7190.namprd07.prod.outlook.com>",
+     "Pumprla Jiri (PreventaMed)"),
+    ("6a1c4275aa46d8b608065ce9",
+     "http://192.168.1.50:8888/mail-attachments/94/95/9495c742407873efd8dd9713e1dc962cb08e55e0d3690e4a79a90132ee358dee",
+     "SK_CDA Institution_PAV-MED s r.o_fully signed_15Jun2026.pdf",
+     "9495c742407873efd8dd9713e1dc962cb08e55e0d3690e4a79a90132ee358dee",
+     460246, "<CH2PR07MB719008DB0B3CAFD764AE2E8280E52@CH2PR07MB7190.namprd07.prod.outlook.com>",
+     "Zapotocka Julia (PAV-MED)"),
+]
+
+
+def fetch(url):
+    with urllib.request.urlopen(url, timeout=30) as r:
+        return r.read()
+
+
+def main():
+    apply = "--apply" in sys.argv
+    cli = MongoClient(MONGO_URI)
+    col = cli[DBN][COL]
+    now = datetime.now(timezone.utc).isoformat()
+
+    ok = 0
+    for _id, url, fname, sha, size, src, label in MAPPING:
+        oid = ObjectId(_id)
+        doc = col.find_one({"_id": oid}, {"STATUS": 1, "KROK": 1, "cda.stav": 1})
+        if not doc:
+            print(f"  !! {label}: investigator _id={_id} NENALEZEN"); continue
+        try:
+            raw = fetch(url)
+        except Exception as e:
+            print(f"  !! {label}: stazeni selhalo: {e}"); continue
+        got = hashlib.sha256(raw).hexdigest()
+        sha_ok = (got == sha)
+        size_ok = (len(raw) == size)
+        head_ok = raw[:5] == b"%PDF-"
+        print(f"  [{label}]")
+        print(f"     soubor   : {fname}")
+        print(f"     stazeno  : {len(raw)} B (ocek. {size}) {'OK' if size_ok else 'MISMATCH'}")
+        print(f"     sha256   : {'OK' if sha_ok else 'MISMATCH! ' + got}")
+        print(f"     PDF hdr  : {'OK' if head_ok else 'NENI PDF'}")
+        print(f"     KROK     : {doc.get('KROK')} -> 5 - CDA podepsano")
+        if not (sha_ok and size_ok and head_ok):
+            print("     >> PRESKAKUJI (kontrola selhala)"); continue
+        if not apply:
+            ok += 1; continue
+
+        b64 = base64.b64encode(raw).decode("ascii")
+        old_status = doc.get("STATUS", "") or ""
+        new_line = (f"17JUN2026: podepsane CDA ULOZENO do Mongo (cda.data) — {fname} "
+                    f"(z e-mailu asistentky). KROK 5, pripraveno na SIPIQ.")
+        col.update_one({"_id": oid}, {"$set": {
+            "KROK": "5 - CDA podepsano",
+            "STATUS": new_line + "\n" + old_status,
+            "cda.stav": "podepsano",
+            "cda.soubor": fname,
+            "cda.zdroj": "e-mail asistentky (SeaweedFS)",
+            "cda.data_base64": b64,
+            "cda.data_sha256": sha,
+            "cda.data_filename": fname,
+            "cda.data_mime": "application/pdf",
+            "cda.data_size": len(raw),
+            "cda.data_stored_at": now,
+            "cda.data_source_msg": src,
+        }})
+        ok += 1
+        print("     >> ULOZENO + KROK 5")
+
+    print(f"\n{'ZAPSANO' if apply else 'DRY-RUN OK'}: {ok}/{len(MAPPING)}")
+    if not apply:
+        print(">>> Pro zapis spust s --apply")
+
+
+if __name__ == "__main__":
+    main()