This commit is contained in:
2026-06-16 14:32:00 +02:00
parent 4a665b90fa
commit 91d7f03f2b
39 changed files with 31 additions and 18762 deletions
+11 -1
View File
@@ -58,7 +58,7 @@ Bearer token: `13e1bb01-9fd5-44a8-8ce9-4ee27133d340`
| Endpoint | Přijímá | Chování |
|---|---|---|
| `POST /upload` | `.msg` / `.emsg` | `.emsg` Fernet dešifruje → uloží `.msg` do `/msgs`, přeskočí pokud existuje; volitelně import do Graphu |
| `POST /upload` | `.msg` / `.emsg` | `.emsg` Fernet dešifruje → uloží `.msg` do `/msgs`, přeskočí pokud existuje; volitelně import do Graphu. **v2.4:** form pole `overwrite=1` → existující `.msg` **přepíše** (re-upload změněného e-mailu z `jnj_mailbox_sync >= v1.3`); při overwrite se Graph re-import nedělá |
| `POST /upload-db` | `.db` / `.db.xz.enc` | **v2.1:** `.db.xz.enc` Fernet dešifruje + lzma rozbalí → plain `.db`; pak smaže staré `.db` v `/msgs/db` a uloží. Plain `.db` bere i nadále (zpětná kompatibilita) |
| `POST /upload-dropbox` | cokoliv | Nahraje do Dropboxu (overwrite) |
@@ -68,6 +68,16 @@ Bearer token: `13e1bb01-9fd5-44a8-8ce9-4ee27133d340`
> (stdlib) — ověřeno v kontejneru. Nasazení = jen restart (app.py je bind-mount),
> bez rebuildu.
> **v2.4 (2026-06-16):** `/upload` — nové form pole `overwrite=1`. Když `.msg`
> už v `/msgs` existuje, místo `{"status":"exists"}` ho **přepíše** a vrátí
> `{"status":"overwritten"}`. Bez pole zůstává původní idempotentní skip (žádná
> regrese). Slouží pro re-upload **změněného** e-mailu z `jnj_mailbox_sync >= v1.3`
> (detekce změny obsahu — např. dopsaná chyba `SendAsDenied` do neodeslané Sent
> položky). Při overwrite se **Graph re-import nedělá** (klient posílá `folder=""`,
> takže nevznikne duplikát v Graph zrcadle); přepsaný soubor má novější mtime →
> Tower (`jnj_tower_ingest`) ho přeparsuje a upsertne dokument v Mongu dle `_id`.
> Nasazení = jen `docker restart` (bind-mount).
> **v2.3 (2026-06-10):** `/item/{token}` — při `Accept: application/json`
> (klient `janssenpc_file_receive >= v1.2`) vrací `{"data": "<fernet_b64>"}`
> místo binární přílohy. Důvod: JNJ filtr (Zscaler/SiteMinder) blokoval binární
+20 -6
View File
@@ -1,6 +1,12 @@
# app.py | v2.3 | 2026-06-10
# app.py | v2.4 | 2026-06-16
# FastAPI server pro příjem .msg a .db souborů, upload do Dropboxu a import do Graph API.
# Endpointy: /upload (.msg/.emsg → /msgs + Graph import),
# v2.4: /upload + form pole overwrite=1 — když .msg už existuje, PŘEPÍŠE ho (jinak
# jako dřív vrátí "exists"). Slouží pro re-upload změněného e-mailu z
# jnj_mailbox_sync >= v1.3 (detekce změny obsahu, např. dopsaná chyba
# SendAsDenied). Při overwrite se NEdělá Graph re-import (klient posílá
# folder="" → žádný duplikát v Graph zrcadle; jen se obnoví soubor v /msgs,
# Tower si ho přeparsuje a aktualizuje dokument v Mongu).
# Endpointy: /upload (.msg/.emsg → /msgs + Graph import; overwrite=1 přepíše),
# /upload-db (.db NEBO .db.xz.enc → Fernet desifruj + lzma rozbal → /msgs/db),
# /upload-dropbox (→ Dropbox /!!!Days/Downloads Z230),
# /message-delete, /message-update (sync: smazání, přečtení, přesun složky),
@@ -336,6 +342,7 @@ async def upload_msg(
file: UploadFile = File(...),
authorization: str = Header(None),
folder: str = Form(""),
overwrite: str = Form(""),
):
if authorization != f"Bearer {TOKEN}":
raise HTTPException(status_code=401, detail="Unauthorized")
@@ -347,7 +354,12 @@ async def upload_msg(
# Ukládáme vždy jako .msg
msg_filename = file.filename[:-5] + ".msg" if is_encrypted else file.filename
dest = SAVE_DIR / msg_filename
if dest.exists():
existed = dest.exists()
do_overwrite = overwrite in ("1", "true", "True", "yes")
# v2.4: bez overwrite zustava puvodni idempotentni skip; s overwrite=1
# prepiseme (re-upload zmeneneho e-mailu z jnj_mailbox_sync >= v1.3).
if existed and not do_overwrite:
return {"status": "exists", "file": msg_filename}
content = await file.read()
@@ -357,13 +369,15 @@ async def upload_msg(
with dest.open("wb") as f:
f.write(content)
# Import to Graph API if folder was provided by client
# Graph import jen pri PRVNIM ulozeni a kdyz klient poslal folder.
# Pri overwrite (re-upload) se Graph re-import NEdela — predesle by vznikl
# duplikat v Graph zrcadle; Tower si soubor preparsuje sam (upsert dle _id).
graph_id = None
if folder:
if folder and not existed:
graph_id = _import_msg_to_graph(dest, folder)
return {
"status": "saved",
"status": "overwritten" if (existed and do_overwrite) else "saved",
"file": msg_filename,
"graph_id": graph_id,
}
-57
View File
@@ -1,57 +0,0 @@
# jnj_mailbox_sync v1.2.0
**Soubor:** `jnj_mailbox_sync_v1.2.py`
**Datum:** 2026-06-10
**Autor:** vladimir.buzalka
**Běží:** JNJ stroj (Outlook MAPI), Python z Thonny.
## Co to je
Synchronizace JNJ Outlooku (MAPI) → osobní schránka (přes msgreceiver) + bookkeeping
v SQLite (`C:\Users\vbuzalka\SQLITE\jnjemails.db`). Sleduje přesuny e-mailů mezi
složkami a příznak „už není ve schránce" — bez opětovného přenosu těla.
Skenované složky: **Inbox + Sent Items + Deleted Items** (vč. podsložek).
## Novinka v1.2 — komprimovaný + šifrovaný upload SQLite
Dřív se ~37 MB SQLite posílalo na `/upload-db` **plain** (jen HTTPS+token).
Teď `upload_db()`:
1. **Komprese na max**`lzma` (xz), `preset 9 | PRESET_EXTREME` (stdlib).
2. **Šifrování** — stávající Fernet (klíč odvozený z TOKENu, `sha256 → urlsafe_b64`).
3. Upload jako `jnjemails_<ts>.db.xz.enc`.
Přijímací **msgreceiver `/upload-db` (app.py ≥ v2.1)** soubor Fernetem dešifruje,
lzma rozbalí a uloží plain `.db` do `/msgs/db`. Domácí `jnj_tower_ingest` tím pádem
**zůstává beze změny** (čte nejnovější plain `.db` read-only).
Důvod šifrování: bezpečný průchod přes JNJ proxy (Zscaler/DLP) — stejný vzor jako
`.emsg` u jednotlivých `.msg`. Round-trip ověřen (bajt na bajt).
## Závislost na serveru
⚠️ Vyžaduje **msgreceiver app.py ≥ v2.1**. Server bere `.db.xz.enc` i starý plain `.db`,
takže nasazovací pořadí je **server → JNJ** bez výpadku.
## Argumenty
`--mode {capture,update-paths,full-update}` (default capture), `--days N`
(0 = celé), `--dry-run`, `--limit N`, `--no-db-upload`.
## Spouštění (JNJ stroj, plné cesty)
```
"C:\Users\vbuzalka\AppData\Local\Programs\Thonny\python.exe" "c:\Users\vbuzalka\OneDrive - JNJ\##JNJPrenos\Python\jnj_mailbox_sync_v1.2.py" --mode full-update --days 30
```
## Revert
Stará verze: `Trash/jnj_mailbox_sync_v1.1.py` (plain DB upload). Server zůstává
zpětně kompatibilní, takže revert na JNJ straně nevyžaduje zásah na serveru.
## Historie
- **1.0.0** — režimy capture/update-paths/full-update, sledování přesunů, updated_at.
- **1.1.0** — + Deleted Items do skenovaných složek.
- **1.2.0** — upload SQLite komprimován (lzma/xz max) + šifrován (Fernet) → `.db.xz.enc`;
vyžaduje msgreceiver app.py ≥ v2.1.
-604
View File
@@ -1,604 +0,0 @@
"""
jnj_mailbox_sync v1.2
Nazev: jnj_mailbox_sync_v1.2.py
Verze: 1.2.0
Datum: 2026-06-10
Autor: vladimir.buzalka
Popis:
Synchronizace JNJ Outlooku (MAPI) -> osobni schranka + bookkeeping v SQLite.
Nasledník inbox_full_sync_v1.1. Nove navic sleduje PRESUN emailu mezi
slozkami a priznak "uz neni ve schrance" — BEZ opetovneho prenosu tela.
Scope: primarni schranka, Inbox + Sent Items + Deleted Items vcetne vsech
podsložek. (v1.1: pridano Deleted Items — uzivatel po precteni maily MAZE,
takze precteny-smazany mail se ted sleduje jako /Deleted Items misto aby
skoncil jako "ghost" s posledni cestou /Inbox.)
Online Archive se NEskenuje — firemni pravidla tam presouvaji nejstarsi
emaily, ktere uz mame davno stazene. Kdyz email ze skenovane schranky
zmizi (presun do nesken. slozky / vyprazdneni Deleted), ponecha se POSLEDNI
ZNAMA cesta a nastavi se priznak not_in_mailbox_anymore=1.
Identita emailu = Internet Message-ID (stabilni pres presuny). EntryID se
pri presunu meni — drzime ho jen jako pomocny.
Sloupce cest v SQLite:
folder = cesta pri PRVNIM zachyceni (historie, neprepisuje se)
jnj_folder = AKTUALNI ziva cesta (prepisuje se pri presunu)
Sloupec updated_at se bumpne pri insertu i kazde zmene — slouzi pro
inkrementalni sync na domaci strane (watermark).
Upload SQLite (v1.2): DB se pred odeslanim KOMPRIMUJE (lzma/xz, max) a
SIFRUJE (Fernet, klic z TOKENu) a nahrava jako .db.xz.enc. Server
(msgreceiver /upload-db) ji desifruje + rozbali zpet na plain .db do
/msgs/db. Sifruje se kvuli prenosu pres JNJ proxy (Zscaler) — stejny
vzor jako .emsg u .msg. ~37 MB DB se scvrkne na jednotky MB.
Rezimy (--mode):
capture (default) Projde cely Inbox+Sent, nove emaily ulozi a nahraje
(jako inbox_full_sync). Okno --days se IGNORUJE (bere VSE).
Detekce "opustilo schranku" se v tomto rezimu NEdela (neskenuje
se archiv, takze by to delalo falesne poplachy).
update-paths Jen METADATA. Projde okno poslednich --days dni, aktualizuje
cesty/precteno znamych emailu a oznaci ty, co ze schranky
zmizely. NIC nenahrava (zadny .msg upload).
full-update update-paths + navic dorovna chybejici emaily (SaveAs+upload).
Argumenty:
--mode {capture,update-paths,full-update} default capture
--days N velikost okna ve dnech (default 30). 0 = cely Inbox+Sent.
--dry-run NIC nezapise/nenahraje, jen vypise co by udelal (+ souhrn).
--limit N zpracovat max N polozek (rychly test).
--no-db-upload na konci nenahravat SQLite na server.
Spousteni:
# 1) Nejdriv si PRECIST, co by full-update prinesl (NIC nezmeni):
python jnj_mailbox_sync_v1.2.py --mode full-update --days 30 --dry-run
# 2) Pak naostro:
python jnj_mailbox_sync_v1.2.py --mode full-update --days 30
Zavislosti:
pywin32, requests, cryptography, sqlite3 + lzma (stdlib).
Python 3.10+, Windows, Outlook musi byt spusteny a prihlaseny.
Historie verzi:
1.0.0 2026-06-09 Nova generace: rezimy capture/update-paths/full-update,
sledovani presunu (jnj_folder), priznak
not_in_mailbox_anymore, sloupec updated_at pro
inkrementalni sync domu. Nasledník inbox_full_sync_v1.1.
1.1.0 2026-06-10 + Deleted Items do SYNC_FOLDERS (olFolderDeletedItems=3).
Precteny-smazany mail se ted sleduje jako /Deleted Items;
drive ghost s posledni cestou /Inbox. Pri 1. behu se
drive zghostovane maily najdou v Deleted -> jnj_folder
opraven na /Deleted Items + not_in_mailbox_anymore=0.
1.2.0 2026-06-10 Upload SQLite KOMPRIMOVAN (lzma/xz max) + SIFROVAN
(Fernet) -> .db.xz.enc. Server desifruje+rozbali zpet
na .db. Drive se ~37 MB DB posilalo plain; ted jednotky
MB sifrovane (bypass JNJ proxy). Vyzaduje msgreceiver
app.py >= v2.1 (umi .db.xz.enc; zpetne bere i plain .db).
"""
import argparse
import base64
import hashlib
import logging
import lzma
import sqlite3
import sys
import tempfile
from datetime import datetime, timedelta
from pathlib import Path
import win32com.client
import requests
import urllib3
from cryptography.fernet import Fernet
if hasattr(sys.stdout, "reconfigure"):
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# ─── KONFIGURACE ──────────────────────────────────────────────────────────────
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_UPLOAD_URL = "https://msgs.buzalka.cz/upload-db"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnj_mailbox_sync_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
SCRIPT_NAME = "jnj_mailbox_sync"
SCRIPT_VERSION = "1.2.0"
# olFolderInbox=6, olFolderSentMail=5, olFolderDeletedItems=3
SYNC_FOLDERS = [(6, "Inbox"), (5, "Sent Items"), (3, "Deleted Items")]
OLSAVE_MSG = 3 # OlSaveAsType.olMSG
# Sifrovaci klic odvozeny z TOKENu (stejny algoritmus jako server)
_FERNET = Fernet(base64.urlsafe_b64encode(hashlib.sha256(TOKEN.encode()).digest()))
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
# ──────────────────────────────────────────────────────────────────────────────
# ─── SQLite ───────────────────────────────────────────────────────────────────
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now')),
entry_id TEXT,
graph_id TEXT,
is_read INTEGER DEFAULT 0,
jnj_folder TEXT,
not_in_mailbox_anymore INTEGER DEFAULT 0,
left_mailbox_at TEXT,
updated_at TEXT
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.execute("""
CREATE TABLE IF NOT EXISTS runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
script TEXT NOT NULL,
version TEXT,
started_at TEXT NOT NULL,
finished_at TEXT,
mode TEXT,
window_days INTEGER,
dry_run INTEGER DEFAULT 0,
found INTEGER DEFAULT 0,
new_captured INTEGER DEFAULT 0,
path_updated INTEGER DEFAULT 0,
read_updated INTEGER DEFAULT 0,
returned INTEGER DEFAULT 0,
left_mailbox INTEGER DEFAULT 0,
skipped INTEGER DEFAULT 0,
errors INTEGER DEFAULT 0
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(id),
level TEXT NOT NULL,
event TEXT NOT NULL,
subject TEXT,
folder TEXT,
graph_id TEXT,
detail TEXT,
created_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_log_run_id ON log(run_id)")
# Migrace existujici jnjemails.db (z inbox_full_sync) — pridej chybejici sloupce
for col, ddl in [
("entry_id", "TEXT"), ("graph_id", "TEXT"), ("is_read", "INTEGER DEFAULT 0"),
("jnj_folder", "TEXT"), ("not_in_mailbox_anymore", "INTEGER DEFAULT 0"),
("left_mailbox_at", "TEXT"), ("updated_at", "TEXT"),
]:
try:
conn.execute(f"ALTER TABLE messages ADD COLUMN {col} {ddl}")
except Exception:
pass
for col, ddl in [
("mode", "TEXT"), ("window_days", "INTEGER"), ("dry_run", "INTEGER DEFAULT 0"),
("found", "INTEGER DEFAULT 0"), ("new_captured", "INTEGER DEFAULT 0"),
("path_updated", "INTEGER DEFAULT 0"), ("read_updated", "INTEGER DEFAULT 0"),
("returned", "INTEGER DEFAULT 0"), ("left_mailbox", "INTEGER DEFAULT 0"),
]:
try:
conn.execute(f"ALTER TABLE runs ADD COLUMN {col} {ddl}")
except Exception:
pass
# Indexy na sloupce, ktere mohly vzniknout az migraci vyse
conn.execute("CREATE INDEX IF NOT EXISTS idx_updated_at ON messages(updated_at)")
conn.commit()
def start_run(conn, mode, days, dry):
cur = conn.execute(
"""INSERT INTO runs (script, version, started_at, mode, window_days, dry_run)
VALUES (?, ?, datetime('now'), ?, ?, ?)""",
(SCRIPT_NAME, SCRIPT_VERSION, mode, days, 1 if dry else 0),
)
conn.commit()
return cur.lastrowid
def finish_run(conn, run_id, stats):
conn.execute(
"""UPDATE runs SET finished_at=datetime('now'),
found=?, new_captured=?, path_updated=?, read_updated=?,
returned=?, left_mailbox=?, skipped=?, errors=?
WHERE id=?""",
(stats["found"], stats["new_captured"], stats["path_updated"],
stats["read_updated"], stats["returned"], stats["left_mailbox"],
stats["skipped"], stats["errors"], run_id),
)
conn.commit()
def db_log(conn, run_id, level, event, subject=None, folder=None, graph_id=None, detail=None):
conn.execute(
"""INSERT INTO log (run_id, level, event, subject, folder, graph_id, detail)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
(run_id, level, event, subject, folder, graph_id, detail),
)
conn.commit()
def info(conn, run_id, event, **kw):
db_log(conn, run_id, "INFO", event, **kw)
def error(conn, run_id, event, **kw):
db_log(conn, run_id, "ERROR", event, **kw)
def db_get(conn, mid):
cur = conn.execute(
"""SELECT message_id, folder, jnj_folder, is_read, not_in_mailbox_anymore
FROM messages WHERE message_id=?""", (mid,))
r = cur.fetchone()
if not r:
return None
return {"message_id": r[0], "folder": r[1], "jnj_folder": r[2],
"is_read": r[3], "not_in_mailbox_anymore": r[4]}
def apply_update(conn, mid, changes):
sets, vals = [], []
for k, v in changes.items():
sets.append(f"{k}=?")
vals.append(v)
sets.append("updated_at=datetime('now')")
vals.append(mid)
conn.execute(f"UPDATE messages SET {', '.join(sets)} WHERE message_id=?", vals)
conn.commit()
# ─── Outlook / prenos ────────────────────────────────────────────────────────
def get_mid(item) -> str:
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except Exception:
mid = None
return mid or f"entryid:{item.EntryID}"
def upload_msg(msg_path, filename, folder=""):
with open(msg_path, "rb") as f:
encrypted = _FERNET.encrypt(f.read())
enc_filename = Path(filename).stem + ".emsg"
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (enc_filename, encrypted, "application/octet-stream")},
data={"folder": folder},
timeout=60,
)
if not resp.ok:
raise requests.HTTPError(f"{resp.status_code} {resp.reason} | {resp.text[:200]}")
return resp.json()
def upload_db(db_path):
"""Komprese (lzma/xz, max) -> Fernet sifra -> upload jako .db.xz.enc.
Server (msgreceiver /upload-db, app.py >= v2.1) data desifruje + rozbali
zpet na plain .db do /msgs/db. Sifruje se kvuli prenosu pres JNJ proxy
(Zscaler) — stejny vzor jako .emsg u .msg."""
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{ts}.db"
try:
with open(db_path, "rb") as f:
raw = f.read()
compressed = lzma.compress(raw, preset=9 | lzma.PRESET_EXTREME)
encrypted = _FERNET.encrypt(compressed)
enc_filename = filename + ".xz.enc"
resp = requests.post(
DB_UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (enc_filename, encrypted, "application/octet-stream")},
timeout=300,
)
mb_raw, mb_xz, mb_enc = (len(raw) / 1048576,
len(compressed) / 1048576,
len(encrypted) / 1048576)
print(f" DB upload: {resp.json()} "
f"({mb_raw:.1f} MB -> xz {mb_xz:.1f} MB -> enc {mb_enc:.1f} MB)")
except Exception as e:
print(f" DB upload CHYBA: {e}")
def capture_new(conn, run_id, item, mid, current, is_read, subject, stats):
"""Novy email: SaveAs -> upload -> insert. Vraci True pri uspechu."""
with tempfile.TemporaryDirectory() as tmp:
safe = f"{item.EntryID[-20:]}.msg"
p = Path(tmp) / safe
item.SaveAs(str(p), OLSAVE_MSG)
result = upload_msg(p, safe, current)
graph_id = result.get("graph_id")
try:
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
except Exception:
received = None
try:
sender = item.SenderEmailAddress or ""
except Exception:
sender = ""
conn.execute(
"""INSERT OR IGNORE INTO messages
(message_id, subject, sender, received_at, folder, source,
entry_id, graph_id, is_read, jnj_folder,
not_in_mailbox_anymore, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 0, datetime('now'))""",
(mid, subject, sender, received, current, SCRIPT_NAME,
item.EntryID, graph_id, is_read, current),
)
conn.commit()
info(conn, run_id, "captured", subject=subject, folder=current, graph_id=graph_id)
print(f" NEW | {subject[:70]}")
return True
def process_item(conn, run_id, item, current, stats, seen, mode, dry):
try:
mid = get_mid(item)
except Exception:
return
seen.add(mid)
stats["found"] += 1
try:
is_read = 0 if item.UnRead else 1
except Exception:
is_read = 0
subject = str(getattr(item, "Subject", "") or "")
row = db_get(conn, mid)
# ── Novy email (neni v DB) ────────────────────────────────────────────
if row is None:
if mode in ("capture", "full-update"):
if dry:
stats["new_captured"] += 1
print(f" NEW* | {subject[:70]}")
else:
try:
if capture_new(conn, run_id, item, mid, current, is_read, subject, stats):
stats["new_captured"] += 1
except Exception as e:
stats["errors"] += 1
error(conn, run_id, "capture_error", subject=subject, folder=current, detail=str(e))
print(f" CHYBA NEW | {subject[:50]} | {e}")
else: # update-paths — telo nemame, nelze dorovnat
stats["new_uncaptured"] += 1
return
# ── Znamy email — porovnej zmeny ──────────────────────────────────────
changes = {}
current_known = row.get("jnj_folder") or row.get("folder")
if current_known != current:
changes["jnj_folder"] = current
stats["path_updated"] += 1
if row.get("is_read") != is_read:
changes["is_read"] = is_read
stats["read_updated"] += 1
if row.get("not_in_mailbox_anymore"):
changes["not_in_mailbox_anymore"] = 0
changes["left_mailbox_at"] = None
stats["returned"] += 1
if changes:
if not dry:
apply_update(conn, mid, changes)
what = []
if "jnj_folder" in changes:
what.append(f"-> {current}")
if "is_read" in changes:
what.append("precteno" if is_read else "neprecteno")
if "not_in_mailbox_anymore" in changes:
what.append("vraceno do schranky")
marker = "*" if dry else " "
print(f" UPD{marker} | {subject[:55]} | {', '.join(what)}")
info(conn, run_id, "path_update", subject=subject, folder=current, detail="; ".join(what))
else:
stats["skipped"] += 1
def walk(conn, run_id, folder, folder_path, cutoff_local, stats, seen, mode, dry, limit):
current = f"{folder_path}/{folder.Name}"
try:
items = folder.Items
if cutoff_local is not None:
restrict = ("@SQL=\"urn:schemas:httpmail:datereceived\" >= '%s'"
% cutoff_local.strftime("%Y/%m/%d %H:%M:%S"))
items = items.Restrict(restrict)
items.Sort("[ReceivedTime]", True) # newest first
except Exception as e:
print(f" CHYBA slozka {current}: {e}")
error(conn, run_id, "folder_error", folder=current, detail=str(e))
return
n = 0
for item in items:
if limit and stats["found"] >= limit:
break
try:
if not str(getattr(item, "MessageClass", "")).upper().startswith("IPM.NOTE"):
continue
except Exception:
continue
process_item(conn, run_id, item, current, stats, seen, mode, dry)
n += 1
print(f" {current}: {n} polozek")
info(conn, run_id, "folder_done", folder=current, detail=str(n))
try:
subs = list(folder.Folders)
except Exception:
subs = []
for sub in subs:
if limit and stats["found"] >= limit:
break
walk(conn, run_id, sub, current, cutoff_local, stats, seen, mode, dry, limit)
def _parse_dt(s):
if not s:
return None
try:
dt = datetime.fromisoformat(s)
if dt.tzinfo:
dt = dt.astimezone().replace(tzinfo=None)
return dt
except Exception:
return None
def flag_left_mailbox(conn, run_id, cutoff_local, seen, scanned_roots, stats, dry):
"""Emaily v DB v okne, ktere jsme ve SKENOVANE casti schranky (Inbox/Sent)
NEvideli -> opustily pracovni schranku. Ponecha posledni znamou cestu,
nastavi priznak.
DULEZITE: hodnotime JEN emaily, jejichz POSLEDNI ZNAMA cesta je pod nekterym
skenovanym korenem (scanned_roots = Inbox/Sent/Deleted Items primarni
schranky). Emaily naposledy videne MIMO skenovany rozsah (Archive, Online
Archive, Junk, Drafts, Sync Issues, vlastni top-level slozky, ...) se
NEhodnoti — tam jsme je necekali, takze jejich absence nic neznamena (jinak
falesne GONE). Pozn.: po vyprazdneni Deleted Items se tamni maily korektne
oznaci GONE (posledni cesta /Deleted Items zustane)."""
cur = conn.execute(
"""SELECT message_id, received_at, jnj_folder, folder, not_in_mailbox_anymore
FROM messages""")
to_flag = []
for mid, received_at, jnjf, fld, flag in cur.fetchall():
if mid in seen or flag:
continue
path = jnjf or fld or ""
if not any(path.startswith(root) for root in scanned_roots):
continue # posledni znama cesta mimo skenovany rozsah -> nehodnotime
rec = _parse_dt(received_at)
if rec is None or rec < cutoff_local:
continue # mimo okno / neparsovatelne -> nehodnotime
to_flag.append((mid, path))
for mid, path in to_flag:
if not dry:
conn.execute(
"""UPDATE messages SET not_in_mailbox_anymore=1,
left_mailbox_at=datetime('now'), updated_at=datetime('now')
WHERE message_id=?""", (mid,))
stats["left_mailbox"] += 1
print(f" GONE{'*' if dry else ' '} | {path}")
if not dry and to_flag:
conn.commit()
info(conn, run_id, "left_mailbox", detail=str(len(to_flag)))
# ─── MAIN ─────────────────────────────────────────────────────────────────────
def main():
ap = argparse.ArgumentParser(description=f"jnj_mailbox_sync v{SCRIPT_VERSION}")
ap.add_argument("--mode", choices=["capture", "update-paths", "full-update"],
default="capture")
ap.add_argument("--days", type=int, default=30,
help="Okno ve dnech pro update-paths/full-update (0 = vse)")
ap.add_argument("--dry-run", action="store_true",
help="Nic nezapise/nenahraje, jen vypise co by udelal")
ap.add_argument("--limit", type=int, default=0, help="Max N polozek (test)")
ap.add_argument("--no-db-upload", action="store_true")
args = ap.parse_args()
mode, dry = args.mode, args.dry_run
# capture ignoruje okno (bere vse); ostatni rezimy okno pouzivaji (0 = vse)
if mode == "capture":
cutoff_local = None
else:
cutoff_local = None if args.days == 0 else (datetime.now() - timedelta(days=args.days))
win = "vse" if cutoff_local is None else f"{args.days} dni (od {cutoff_local:%Y-%m-%d %H:%M})"
print(f"=== jnj_mailbox_sync v{SCRIPT_VERSION} ===")
print(f"Start: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Rezim: {mode} Okno: {win} {'[DRY-RUN — nic se nemeni]' if dry else ''}")
print(f"DB: {DB_PATH}")
conn = sqlite3.connect(DB_PATH)
init_db(conn)
run_id = start_run(conn, mode, args.days, dry)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
stats = {"found": 0, "new_captured": 0, "new_uncaptured": 0, "path_updated": 0,
"read_updated": 0, "returned": 0, "left_mailbox": 0, "skipped": 0, "errors": 0}
seen = set()
scanned_roots = set()
for fid, label in SYNC_FOLDERS:
root = ns.GetDefaultFolder(fid)
mailbox = root.Parent.Name
scanned_roots.add(f"/{mailbox}/{root.Name}")
print(f"\n=== {label} ({mailbox}) ===")
walk(conn, run_id, root, f"/{mailbox}", cutoff_local, stats, seen, mode, dry, args.limit)
# Detekce "opustilo schranku" — jen oknove rezimy s platnym cutoff.
# Hodnoti jen emaily naposledy videne pod scanned_roots (Inbox/Sent/Deleted).
if mode in ("update-paths", "full-update") and cutoff_local is not None and not (args.limit):
print("\n--- Kontrola 'opustilo schranku' (v okne, Inbox/Sent/Deleted) ---")
flag_left_mailbox(conn, run_id, cutoff_local, seen, scanned_roots, stats, dry)
elif args.limit:
print("\n(--limit aktivni -> detekce 'opustilo schranku' preskocena)")
finish_run(conn, run_id, stats)
# ── Souhrn ─────────────────────────────────────────────────────────────
print(f"\n{'='*60}")
print(f"SOUHRN [{mode}{' / DRY-RUN' if dry else ''}]")
print(f" Nalezeno ve schrance: {stats['found']}")
if mode in ("capture", "full-update"):
lbl = "by se nahralo" if dry else "nahrano"
print(f" Nove zachyceno ({lbl}): {stats['new_captured']}")
else:
print(f" Nove (bez tela, nedorovnano):{stats['new_uncaptured']}")
print(f" Aktualizovana cesta: {stats['path_updated']}")
print(f" Zmena precteno/neprecteno: {stats['read_updated']}")
print(f" Vraceno do schranky: {stats['returned']}")
print(f" Opustilo schranku (GONE): {stats['left_mailbox']}")
print(f" Beze zmeny (skip): {stats['skipped']}")
print(f" Chyby: {stats['errors']}")
print(f"{'='*60}")
if dry:
print("DRY-RUN: SQLite ani server se NEMENILY.")
elif not args.no_db_upload:
print("\nUpload SQLite na server...")
upload_db(DB_PATH)
print(f"\nKonec: {datetime.now():%Y-%m-%d %H:%M:%S}")
if stats["errors"]:
print(f"Chyby logovany do: {LOG_PATH}")
conn.close()
if __name__ == "__main__":
main()
-63
View File
@@ -1,63 +0,0 @@
# jnj_tower_ingest v1.2.0
**Soubor:** `jnj_tower_ingest_v1.2.py`
**Datum:** 2026-06-10
**Autor:** vladimir.buzalka
**Běží:** Docker kontejner `python-runner` na Unraid Tower (192.168.1.76), u MongoDB.
## Co to je
Sjednocený **Tower-side ingest** JNJ e-mailů — tři fáze v jednom běhu (cron `*/5`):
| Fáze | Co dělá |
|---|---|
| **1. PARSE** | `.msg` z `/mnt/JNJEMAILS` → tělo do Mongo `emaily."vbuzalka@its.jnj.com"`. Inkrementálně přes mtime watermark (`parse_state`). |
| **2. SYNC** | nejnovější SQLite (read-only) → zrcadlo `jnj_messages` + `jnj_folder`/stav do `emaily`. Watermark `updated_at` + `last_db` + **NULL-safe** (viz níže). |
| **3. ENRICH** | sdílený `5_enrich_fulltext_emails --mailbox vbuzalka@its.jnj.com` → PG fulltext. Jen když parse přidal nové dokumenty. |
Pořadí **parse → sync → enrich**. Klíč = Internet Message-ID = Mongo `_id`.
## NULL-safe sync (v1.2 — oprava nesouladu Sent)
**Problém:** na JNJ stroji běží vedle `jnj_mailbox_sync` i starý **`inbox_full_sync`**, který
zapisuje řádky do SQLite s **`updated_at = NULL`** (stará schémata to pole neměla). Domácí
sync přitom filtroval `WHERE updated_at > watermark`, a v SQL je `NULL > x = false`
**všechny NULL řádky tiše vypadly** (měly tělo v Mongu, ale nikdy nedostaly `jnj_folder`).
Týkalo se 69 400 ze 70 060 řádků.
**Oprava:** sync teď bere i řádky s `updated_at IS NULL`, které ještě **nejsou** v
`jnj_messages` (zpracují se právě jednou; už zrcadlené NULL řádky se levně přeskočí).
Nic se už tiše nezahodí. `last_db` short-circuit zůstává (nezměněná SQLite = okamžitý no-op).
**Kořen na JNJ straně (mimo tento skript):** ideálně vyřadit/nahradit naplánovaný
`inbox_full_sync` za `jnj_mailbox_sync --mode capture` (nastavuje `updated_at`).
## Argumenty
`--dry-run`, `--full`, `--limit N`, `--reindex`, `--force` (sync: ignoruj last_db),
`--parse-only` / `--sync-only` / `--enrich-only`, `--no-enrich`, `--enrich-always`.
## Spouštění
```bash
docker exec python-runner python3 /scripts/jnj_tower_ingest_v1.2.py # cron
docker exec -it python-runner python3 /scripts/jnj_tower_ingest_v1.2.py --dry-run
docker exec python-runner python3 /scripts/jnj_tower_ingest_v1.2.py --sync-only --full # backfill
```
## Plánování
Unraid User Scripts `jnj_state_sync` (cron `*/5`) → wrapper s `flock` volá v1.2.
Log jen reálná práce → `/mnt/user/Scripts/logs/jnj_tower_ingest.log`.
## Revert
`jnj_tower_ingest_v1.1.py` (bez NULL-safe), `_v1.0.py` (bez enrich),
`parse_emails_tower_v1.3.py`, `sync_jnj_state_v1.0.py` zůstávají v `/scripts/`.
## Historie verzí
- **1.0.0** — sjednocení parse + sync (mtime watermark).
- **1.1.0** — + fáze ENRICH (sdílený `5_enrich --mailbox`).
- **1.2.0** — SYNC NULL-safe: bere i `updated_at IS NULL` řádky (jinak je watermark filtr
tiše zahazoval → maily měly tělo, ale ne `jnj_folder`). + jednorázový `--full` backfill.
File diff suppressed because it is too large Load Diff