Initial commit — clean history (removed large test files, browser profiles, Medidata/Clario downloads)
This commit is contained in:
@@ -0,0 +1,3 @@
|
||||
DROPBOX_APP_KEY=4scysbfek6ddwwm
|
||||
DROPBOX_APP_SECRET=gn9ph1q3oro2nq0
|
||||
DROPBOX_APP_REFRESH_TOKEN=VShbST3VjUgAAAAAAAAAAXeZZzFLns6eE80-VJKIc5oq61PyXW6sCx9Dw5kM1w8c
|
||||
@@ -0,0 +1,45 @@
|
||||
# msgreceiver — build & deploy na Unraid
|
||||
|
||||
## Umístění na Unraidu
|
||||
- Appdata: `/mnt/user/appdata/msgreceiver/` (síťově `\\tower\appdata\msgreceiver\`)
|
||||
- Emaily: `/mnt/user/JNJEMAILS` (mount jako `/msgs` v kontejneru)
|
||||
|
||||
## Kopírování souborů z Windows
|
||||
Všechny soubory z `U:\janssen\EmailsImport\DockerCustomApp\` nakopírovat do `\\tower\appdata\msgreceiver\`.
|
||||
|
||||
**DŮLEŽITÉ:** Po každé změně `app.py` je nutný rebuild a restart kontejneru (viz níže). Bez toho běží stará verze.
|
||||
|
||||
## Build & restart (SSH)
|
||||
```bash
|
||||
# Připojení: ssh root@192.168.1.76, heslo: 7309208104
|
||||
# Nebo přes paramiko v Pythonu (viz EmailsImport skripty)
|
||||
|
||||
cd /mnt/user/appdata/msgreceiver
|
||||
docker build -t msgreceiver .
|
||||
docker stop msgreceiver
|
||||
docker rm msgreceiver
|
||||
docker run -d --name msgreceiver \
|
||||
-p 8765:8765 \
|
||||
-v /mnt/user/JNJEMAILS:/msgs \
|
||||
--restart unless-stopped \
|
||||
msgreceiver
|
||||
```
|
||||
|
||||
## Kontejner
|
||||
- Port: 8765
|
||||
- Restart policy: unless-stopped
|
||||
- Endpointy:
|
||||
- `/upload` (msg + volitelný `folder` → uloží na disk + import do Graph API)
|
||||
- `/upload-db` (db → /msgs/db, maže staré)
|
||||
- `/upload-dropbox` (soubory do Dropboxu)
|
||||
- Auth: Bearer token v app.py
|
||||
- Dropbox credentials: v `.env` uvnitř image
|
||||
- Graph API credentials: přímo v app.py (Mail.ReadWrite + Mail.Send, tenant TrialHelp s.r.o.)
|
||||
|
||||
## Graph import
|
||||
Při uploadu .msg s parametrem `folder` (plná cesta z JNJ Outlooku) server:
|
||||
1. Uloží .msg na disk
|
||||
2. Parsuje .msg a importuje do schránky `vladimir.buzalka@buzalka.cz` do `Inbox/JNJ/...`
|
||||
3. Složky se vytvářejí automaticky, mapování: `/vbuzalka@its.jnj.com/X` → `JNJ/X`, `/Online Archive.../X` → `JNJ/Online Archive/X`
|
||||
|
||||
Klient v1.4 (`janssenpc_email_send_new_v1.4.py`) posílá `folder` automaticky.
|
||||
@@ -0,0 +1,7 @@
|
||||
FROM python:3.12-slim
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install -r requirements.txt
|
||||
COPY app.py .
|
||||
COPY .env .
|
||||
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8765"]
|
||||
Binary file not shown.
@@ -0,0 +1,63 @@
|
||||
# msgreceiver — deployment instrukce
|
||||
|
||||
## Soubory
|
||||
- Zdrojový skript: `U:\PythonProject\Janssen\EmailsImport\DockerCustomApp\app.py`
|
||||
- Network share: `\\tower\appdata\msgreceiver\app.py`
|
||||
- Unraid cesta: `/mnt/user/appdata/msgreceiver/`
|
||||
|
||||
## Přihlašovací údaje
|
||||
- **Unraid SSH:** `root@192.168.1.76`, heslo: `7309208104`
|
||||
- **Docker kontejner:** `msgreceiver`
|
||||
|
||||
## Postup při nové verzi app.py
|
||||
|
||||
### 1. Zkopírovat app.py na server
|
||||
```powershell
|
||||
Copy-Item "U:\PythonProject\Janssen\EmailsImport\DockerCustomApp\app.py" "\\tower\appdata\msgreceiver\app.py" -Force
|
||||
```
|
||||
|
||||
### 2. Připojit se přes SSH a přebuildovat Docker (přes Python paramiko)
|
||||
```python
|
||||
import paramiko
|
||||
c = paramiko.SSHClient()
|
||||
c.set_missing_host_key_policy(paramiko.AutoAddPolicy())
|
||||
c.connect('192.168.1.76', username='root', password='7309208104')
|
||||
|
||||
# Build
|
||||
_, stdout, stderr = c.exec_command('docker build -t msgreceiver /mnt/user/appdata/msgreceiver/ 2>&1')
|
||||
print(stdout.read().decode())
|
||||
|
||||
# Restart
|
||||
_, stdout, stderr = c.exec_command('docker restart msgreceiver')
|
||||
print(stdout.read().decode())
|
||||
|
||||
c.close()
|
||||
```
|
||||
|
||||
> Poznámka: `sshpass` není na tomto Windows stroji k dispozici, Windows OpenSSH neumí neinteraktivní heslo — proto vždy použij **paramiko**.
|
||||
|
||||
## Struktura adresáře na serveru
|
||||
```
|
||||
/mnt/user/appdata/msgreceiver/
|
||||
├── Dockerfile
|
||||
├── app.py
|
||||
├── requirements.txt
|
||||
└── .env ← Dropbox credentials
|
||||
```
|
||||
|
||||
## Dropbox konfigurace (.env)
|
||||
Proměnné načítané z `.env`:
|
||||
- `DROPBOX_APP_KEY`
|
||||
- `DROPBOX_APP_SECRET`
|
||||
- `DROPBOX_APP_REFRESH_TOKEN`
|
||||
|
||||
Upload cesta v Dropboxu: `/!!!Days/Downloads Z230/{filename}`
|
||||
|
||||
## API endpointy
|
||||
Bearer token: `13e1bb01-9fd5-44a8-8ce9-4ee27133d340`
|
||||
|
||||
| Endpoint | Přijímá | Chování |
|
||||
|---|---|---|
|
||||
| `POST /upload` | `.msg` | Uloží do `/msgs`, přeskočí pokud existuje |
|
||||
| `POST /upload-db` | `.db` | Smaže všechny staré `.db` v `/msgs/db`, uloží novou |
|
||||
| `POST /upload-dropbox` | cokoliv | Nahraje do Dropboxu (overwrite) |
|
||||
@@ -0,0 +1,412 @@
|
||||
# app.py | v1.6 | 2026-06-01
|
||||
# FastAPI server pro příjem .msg a .db souborů, upload do Dropboxu a import do Graph API.
|
||||
# Endpointy: /upload (.msg → /msgs + Graph import), /upload-db (.db → /msgs/db),
|
||||
# /upload-dropbox (→ Dropbox /!!!Days/Downloads Z230),
|
||||
# /message-delete, /message-update (sync: smazání, přečtení, přesun složky).
|
||||
|
||||
from fastapi import FastAPI, UploadFile, File, Form, Header, HTTPException
|
||||
from pydantic import BaseModel
|
||||
import shutil
|
||||
import base64
|
||||
import hashlib
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
import os
|
||||
import dropbox
|
||||
import msal
|
||||
import requests as http_requests
|
||||
import extract_msg
|
||||
from dateutil import parser as dtparser
|
||||
from datetime import timezone
|
||||
from dotenv import load_dotenv
|
||||
from cryptography.fernet import Fernet
|
||||
|
||||
load_dotenv(Path(__file__).parent / ".env")
|
||||
|
||||
app = FastAPI()
|
||||
log = logging.getLogger("msgreceiver")
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
# Šifrovací klíč odvozený z TOKENu (Fernet = AES-128 CBC + HMAC)
|
||||
_FERNET = Fernet(base64.urlsafe_b64encode(hashlib.sha256(TOKEN.encode()).digest()))
|
||||
|
||||
SAVE_DIR = Path("/msgs")
|
||||
DB_DIR = Path("/msgs/db")
|
||||
|
||||
SAVE_DIR.mkdir(parents=True, exist_ok=True)
|
||||
DB_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
DROPBOX_APP_KEY = os.getenv("DROPBOX_APP_KEY", "")
|
||||
DROPBOX_APP_SECRET = os.getenv("DROPBOX_APP_SECRET", "")
|
||||
DROPBOX_REFRESH_TOKEN = os.getenv("DROPBOX_APP_REFRESH_TOKEN", "")
|
||||
|
||||
# --- Graph API config ---
|
||||
GRAPH_TENANT_ID = "7d269944-37a4-43a1-8140-c7517dc426e9"
|
||||
GRAPH_CLIENT_ID = "4b222bfd-78c9-4239-a53f-43006b3ed07f"
|
||||
GRAPH_CLIENT_SECRET = "Txg8Q~MjhocuopxsJyJBhPmDfMxZ2r5WpTFj1dfk"
|
||||
GRAPH_MAILBOX = "vladimir.buzalka@buzalka.cz"
|
||||
GRAPH_ROOT_FOLDER = "JNJ" # subfolder under Inbox — root for imported emails
|
||||
GRAPH_URL = "https://graph.microsoft.com/v1.0"
|
||||
|
||||
# Cache: folder path → Graph folder ID
|
||||
_folder_id_cache: dict[str, str] = {}
|
||||
_graph_token: Optional[str] = None
|
||||
|
||||
|
||||
def _get_graph_token() -> str:
|
||||
global _graph_token
|
||||
msalapp = msal.ConfidentialClientApplication(
|
||||
GRAPH_CLIENT_ID,
|
||||
authority=f"https://login.microsoftonline.com/{GRAPH_TENANT_ID}",
|
||||
client_credential=GRAPH_CLIENT_SECRET,
|
||||
)
|
||||
result = msalapp.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])
|
||||
if "access_token" not in result:
|
||||
raise RuntimeError(f"Graph auth failed: {result}")
|
||||
_graph_token = result["access_token"]
|
||||
return _graph_token
|
||||
|
||||
|
||||
def _graph_headers() -> dict:
|
||||
token = _graph_token or _get_graph_token()
|
||||
return {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
|
||||
|
||||
|
||||
def _ensure_folder(path_parts: list[str]) -> str:
|
||||
"""Ensure folder hierarchy exists under Inbox, return leaf folder ID."""
|
||||
cache_key = "/".join(path_parts)
|
||||
if cache_key in _folder_id_cache:
|
||||
return _folder_id_cache[cache_key]
|
||||
|
||||
headers = _graph_headers()
|
||||
parent_id = "Inbox"
|
||||
|
||||
for i, part in enumerate(path_parts):
|
||||
partial_key = "/".join(path_parts[: i + 1])
|
||||
if partial_key in _folder_id_cache:
|
||||
parent_id = _folder_id_cache[partial_key]
|
||||
continue
|
||||
|
||||
# List children of parent
|
||||
if parent_id == "Inbox":
|
||||
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/mailFolders/Inbox/childFolders"
|
||||
else:
|
||||
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/mailFolders/{parent_id}/childFolders"
|
||||
|
||||
r = http_requests.get(url, headers=headers, timeout=15)
|
||||
if r.status_code == 401:
|
||||
_get_graph_token()
|
||||
headers = _graph_headers()
|
||||
r = http_requests.get(url, headers=headers, timeout=15)
|
||||
|
||||
found = None
|
||||
for f in r.json().get("value", []):
|
||||
if f["displayName"].lower() == part.lower():
|
||||
found = f["id"]
|
||||
break
|
||||
|
||||
if not found:
|
||||
# Create folder
|
||||
cr = http_requests.post(url, headers=headers, json={"displayName": part}, timeout=15)
|
||||
if cr.status_code in (200, 201):
|
||||
found = cr.json()["id"]
|
||||
elif cr.status_code == 409:
|
||||
# Already exists (race condition) — re-fetch
|
||||
r2 = http_requests.get(url, headers=headers, timeout=15)
|
||||
for f in r2.json().get("value", []):
|
||||
if f["displayName"].lower() == part.lower():
|
||||
found = f["id"]
|
||||
break
|
||||
if not found:
|
||||
raise RuntimeError(f"Cannot create folder '{part}': {cr.text}")
|
||||
|
||||
_folder_id_cache[partial_key] = found
|
||||
parent_id = found
|
||||
|
||||
return parent_id
|
||||
|
||||
|
||||
def _map_jnj_folder(folder: str) -> list[str]:
|
||||
"""Map JNJ folder path to Graph folder parts under JNJ root.
|
||||
|
||||
'/vbuzalka@its.jnj.com/Inbox/TMP' → ['JNJ', 'Inbox', 'TMP']
|
||||
'/Online Archive - vbuzalka@its.jnj.com/Inbox' → ['JNJ', 'Online Archive', 'Inbox']
|
||||
"""
|
||||
parts = [p for p in folder.split("/") if p]
|
||||
if not parts:
|
||||
return [GRAPH_ROOT_FOLDER]
|
||||
|
||||
# First part is mailbox name — strip it but detect Online Archive
|
||||
mailbox = parts[0]
|
||||
rest = parts[1:]
|
||||
|
||||
prefix = [GRAPH_ROOT_FOLDER]
|
||||
if "online archive" in mailbox.lower():
|
||||
prefix.append("Online Archive")
|
||||
|
||||
return prefix + rest if rest else prefix
|
||||
|
||||
|
||||
def _make_recipient(addr: str) -> dict:
|
||||
if "<" in addr and ">" in addr:
|
||||
name = addr[: addr.index("<")].strip().strip('"')
|
||||
email = addr[addr.index("<") + 1 : addr.index(">")].strip()
|
||||
else:
|
||||
name = addr
|
||||
email = addr
|
||||
return {"emailAddress": {"name": name, "address": email}}
|
||||
|
||||
|
||||
def _import_msg_to_graph(msg_path: Path, folder: str) -> Optional[str]:
|
||||
"""Parse .msg and import into Graph API mailbox. Returns message ID or None."""
|
||||
try:
|
||||
msg = extract_msg.Message(str(msg_path))
|
||||
|
||||
subject = msg.subject or "(no subject)"
|
||||
|
||||
# Čtení těla — extract_msg může selhat na nestandartním kódování (cp1252 apod.)
|
||||
try:
|
||||
body_html = msg.htmlBody
|
||||
if isinstance(body_html, bytes):
|
||||
body_html = body_html.decode("utf-8", errors="replace")
|
||||
except Exception:
|
||||
body_html = None
|
||||
|
||||
try:
|
||||
body_text = msg.body or ""
|
||||
except Exception:
|
||||
body_text = ""
|
||||
|
||||
try:
|
||||
sender_email = msg.sender or ""
|
||||
except Exception:
|
||||
sender_email = ""
|
||||
try:
|
||||
sender_name = getattr(msg, "senderName", None) or sender_email
|
||||
except Exception:
|
||||
sender_name = sender_email
|
||||
try:
|
||||
to_raw = msg.to or ""
|
||||
except Exception:
|
||||
to_raw = ""
|
||||
try:
|
||||
cc_raw = msg.cc or ""
|
||||
except Exception:
|
||||
cc_raw = ""
|
||||
try:
|
||||
date_raw = msg.date
|
||||
except Exception:
|
||||
date_raw = None
|
||||
|
||||
att_list = []
|
||||
for att in msg.attachments:
|
||||
if att.data and att.longFilename:
|
||||
att_list.append({
|
||||
"@odata.type": "#microsoft.graph.fileAttachment",
|
||||
"name": att.longFilename,
|
||||
"contentType": getattr(att, "mimetype", None) or "application/octet-stream",
|
||||
"contentBytes": base64.b64encode(att.data).decode(),
|
||||
})
|
||||
|
||||
msg.close()
|
||||
|
||||
to_list = [a.strip() for a in to_raw.split(";") if a.strip()]
|
||||
cc_list = [a.strip() for a in cc_raw.split(";") if a.strip()]
|
||||
|
||||
# Map folder and ensure it exists
|
||||
folder_parts = _map_jnj_folder(folder)
|
||||
folder_id = _ensure_folder(folder_parts)
|
||||
|
||||
payload = {
|
||||
"subject": subject,
|
||||
"body": {
|
||||
"contentType": "HTML" if body_html else "Text",
|
||||
"content": body_html or body_text,
|
||||
},
|
||||
"from": _make_recipient(f"{sender_name} <{sender_email}>"),
|
||||
"toRecipients": [_make_recipient(a) for a in to_list],
|
||||
"ccRecipients": [_make_recipient(a) for a in cc_list],
|
||||
"isRead": True,
|
||||
"singleValueExtendedProperties": [
|
||||
{"id": "Integer 0x0E07", "value": "1"}
|
||||
],
|
||||
}
|
||||
|
||||
if date_raw:
|
||||
try:
|
||||
dt = dtparser.parse(str(date_raw))
|
||||
payload["receivedDateTime"] = dt.astimezone(timezone.utc).strftime(
|
||||
"%Y-%m-%dT%H:%M:%SZ"
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if att_list:
|
||||
payload["attachments"] = att_list
|
||||
|
||||
headers = _graph_headers()
|
||||
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/mailFolders/{folder_id}/messages"
|
||||
r = http_requests.post(url, headers=headers, json=payload, timeout=30)
|
||||
|
||||
if r.status_code == 401:
|
||||
_get_graph_token()
|
||||
headers = _graph_headers()
|
||||
r = http_requests.post(url, headers=headers, json=payload, timeout=30)
|
||||
|
||||
if r.status_code in (200, 201):
|
||||
msg_id = r.json().get("id", "")
|
||||
log.info("Graph OK: %s → %s", subject[:60], "/".join(folder_parts))
|
||||
return msg_id
|
||||
else:
|
||||
log.error("Graph FAIL [%d]: %s | %s", r.status_code, subject[:60], r.text[:200])
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
log.error("Graph import error for %s: %s", msg_path.name, e)
|
||||
return None
|
||||
|
||||
|
||||
@app.post("/upload")
|
||||
async def upload_msg(
|
||||
file: UploadFile = File(...),
|
||||
authorization: str = Header(None),
|
||||
folder: str = Form(""),
|
||||
):
|
||||
if authorization != f"Bearer {TOKEN}":
|
||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||
|
||||
is_encrypted = file.filename.endswith(".emsg")
|
||||
if not file.filename.endswith(".msg") and not is_encrypted:
|
||||
raise HTTPException(status_code=400, detail="Only .msg or .emsg files accepted")
|
||||
|
||||
# Ukládáme vždy jako .msg
|
||||
msg_filename = file.filename[:-5] + ".msg" if is_encrypted else file.filename
|
||||
dest = SAVE_DIR / msg_filename
|
||||
if dest.exists():
|
||||
return {"status": "exists", "file": msg_filename}
|
||||
|
||||
content = await file.read()
|
||||
if is_encrypted:
|
||||
content = _FERNET.decrypt(content)
|
||||
|
||||
with dest.open("wb") as f:
|
||||
f.write(content)
|
||||
|
||||
# Import to Graph API if folder was provided by client
|
||||
graph_id = None
|
||||
if folder:
|
||||
graph_id = _import_msg_to_graph(dest, folder)
|
||||
|
||||
return {
|
||||
"status": "saved",
|
||||
"file": msg_filename,
|
||||
"graph_id": graph_id,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/upload-db")
|
||||
async def upload_db(
|
||||
file: UploadFile = File(...),
|
||||
authorization: str = Header(None)
|
||||
):
|
||||
if authorization != f"Bearer {TOKEN}":
|
||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||
if not file.filename.endswith(".db"):
|
||||
raise HTTPException(status_code=400, detail="Only .db files accepted")
|
||||
for old in DB_DIR.glob("*.db"):
|
||||
old.unlink()
|
||||
dest = DB_DIR / file.filename
|
||||
with dest.open("wb") as f:
|
||||
shutil.copyfileobj(file.file, f)
|
||||
return {"status": "saved", "file": file.filename}
|
||||
|
||||
|
||||
class MessageDeleteRequest(BaseModel):
|
||||
graph_id: str
|
||||
|
||||
|
||||
class MessageUpdateRequest(BaseModel):
|
||||
graph_id: str
|
||||
is_read: Optional[bool] = None
|
||||
folder: Optional[str] = None
|
||||
|
||||
|
||||
def _retry_graph(method, url, headers_fn, **kwargs):
|
||||
"""Call Graph API, refresh token once on 401."""
|
||||
headers = headers_fn()
|
||||
r = method(url, headers=headers, **kwargs)
|
||||
if r.status_code == 401:
|
||||
_get_graph_token()
|
||||
headers = headers_fn()
|
||||
r = method(url, headers=headers, **kwargs)
|
||||
return r
|
||||
|
||||
|
||||
@app.post("/message-delete")
|
||||
async def message_delete(req: MessageDeleteRequest, authorization: str = Header(None)):
|
||||
if authorization != f"Bearer {TOKEN}":
|
||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/messages/{req.graph_id}"
|
||||
r = _retry_graph(http_requests.delete, url, _graph_headers, timeout=15)
|
||||
if r.status_code in (200, 204):
|
||||
log.info("Graph DELETE OK: %s", req.graph_id)
|
||||
return {"status": "deleted"}
|
||||
raise HTTPException(status_code=500, detail=f"Graph DELETE failed: {r.status_code} {r.text[:200]}")
|
||||
|
||||
|
||||
@app.post("/message-update")
|
||||
async def message_update(req: MessageUpdateRequest, authorization: str = Header(None)):
|
||||
if authorization != f"Bearer {TOKEN}":
|
||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||
|
||||
current_graph_id = req.graph_id
|
||||
result: dict = {"status": "ok"}
|
||||
|
||||
# Move first — returns new graph_id which we use for subsequent read-status update
|
||||
if req.folder:
|
||||
folder_parts = _map_jnj_folder(req.folder)
|
||||
folder_id = _ensure_folder(folder_parts)
|
||||
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/messages/{current_graph_id}/move"
|
||||
r = _retry_graph(http_requests.post, url, _graph_headers,
|
||||
json={"destinationId": folder_id}, timeout=15)
|
||||
if r.status_code in (200, 201):
|
||||
current_graph_id = r.json().get("id", current_graph_id)
|
||||
result["moved"] = True
|
||||
log.info("Graph MOVE OK: %s → %s", req.graph_id, "/".join(folder_parts))
|
||||
else:
|
||||
log.error("Graph MOVE FAIL [%d]: %s", r.status_code, r.text[:200])
|
||||
result["moved"] = False
|
||||
|
||||
if req.is_read is not None:
|
||||
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/messages/{current_graph_id}"
|
||||
r = _retry_graph(http_requests.patch, url, _graph_headers,
|
||||
json={"isRead": req.is_read}, timeout=15)
|
||||
result["read_updated"] = r.status_code in (200, 201)
|
||||
if not result["read_updated"]:
|
||||
log.error("Graph PATCH isRead FAIL [%d]: %s", r.status_code, r.text[:200])
|
||||
|
||||
result["graph_id"] = current_graph_id
|
||||
return result
|
||||
|
||||
|
||||
@app.post("/upload-dropbox")
|
||||
async def upload_dropbox(
|
||||
file: UploadFile = File(...),
|
||||
authorization: str = Header(None),
|
||||
):
|
||||
if authorization != f"Bearer {TOKEN}":
|
||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||
if not DROPBOX_REFRESH_TOKEN:
|
||||
raise HTTPException(status_code=500, detail="Dropbox not configured")
|
||||
|
||||
content = await file.read()
|
||||
dbx = dropbox.Dropbox(
|
||||
app_key=DROPBOX_APP_KEY,
|
||||
app_secret=DROPBOX_APP_SECRET,
|
||||
oauth2_refresh_token=DROPBOX_REFRESH_TOKEN,
|
||||
)
|
||||
dropbox_path = f"/!!!Days/Downloads Z230/{file.filename}"
|
||||
dbx.files_upload(content, dropbox_path, mode=dropbox.files.WriteMode.overwrite)
|
||||
return {"status": "uploaded", "file": file.filename, "dropbox_path": dropbox_path}
|
||||
@@ -0,0 +1,10 @@
|
||||
fastapi
|
||||
uvicorn
|
||||
python-multipart
|
||||
dropbox
|
||||
python-dotenv
|
||||
msal
|
||||
requests
|
||||
extract-msg
|
||||
python-dateutil
|
||||
cryptography
|
||||
@@ -0,0 +1,180 @@
|
||||
# test_import_msg.py — pokusný import .msg do schránky přes Graph API
|
||||
# Parsuje .msg soubor a vytvoří zprávu v Inbox cílové schránky.
|
||||
|
||||
import base64
|
||||
import msal
|
||||
import requests
|
||||
import extract_msg
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# === CONFIG ===
|
||||
TENANT_ID = "7d269944-37a4-43a1-8140-c7517dc426e9"
|
||||
CLIENT_ID = "4b222bfd-78c9-4239-a53f-43006b3ed07f"
|
||||
CLIENT_SECRET = "Txg8Q~MjhocuopxsJyJBhPmDfMxZ2r5WpTFj1dfk"
|
||||
MAILBOX = "vladimir.buzalka@buzalka.cz"
|
||||
|
||||
AUTHORITY = f"https://login.microsoftonline.com/{TENANT_ID}"
|
||||
SCOPE = ["https://graph.microsoft.com/.default"]
|
||||
GRAPH_URL = "https://graph.microsoft.com/v1.0"
|
||||
TARGET_FOLDER = "JNJ" # subfolder under Inbox
|
||||
|
||||
# === MSG FILE ===
|
||||
MSG_PATH = Path(__file__).parent / "FC130007ACFE5DCB0000.msg"
|
||||
|
||||
|
||||
def get_token():
|
||||
app = msal.ConfidentialClientApplication(
|
||||
CLIENT_ID, authority=AUTHORITY, client_credential=CLIENT_SECRET
|
||||
)
|
||||
token = app.acquire_token_for_client(scopes=SCOPE)
|
||||
if "access_token" not in token:
|
||||
raise RuntimeError(f"Auth failed: {token}")
|
||||
return token["access_token"]
|
||||
|
||||
|
||||
def parse_msg(path):
|
||||
"""Parse .msg file and return dict with message properties."""
|
||||
msg = extract_msg.Message(str(path))
|
||||
|
||||
# Read all properties before closing
|
||||
subject = msg.subject or "(no subject)"
|
||||
body_html = msg.htmlBody
|
||||
if isinstance(body_html, bytes):
|
||||
body_html = body_html.decode("utf-8", errors="replace")
|
||||
body_text = msg.body or ""
|
||||
|
||||
sender_email = msg.sender or ""
|
||||
sender_name = getattr(msg, "senderName", None) or sender_email
|
||||
|
||||
to_raw = msg.to or ""
|
||||
cc_raw = msg.cc or ""
|
||||
date_raw = msg.date
|
||||
|
||||
att_list = []
|
||||
for att in msg.attachments:
|
||||
if att.data and att.longFilename:
|
||||
att_list.append({
|
||||
"@odata.type": "#microsoft.graph.fileAttachment",
|
||||
"name": att.longFilename,
|
||||
"contentType": getattr(att, "mimetype", None) or "application/octet-stream",
|
||||
"contentBytes": base64.b64encode(att.data).decode(),
|
||||
})
|
||||
|
||||
msg.close()
|
||||
|
||||
# Process after close
|
||||
to_list = [a.strip() for a in to_raw.split(";") if a.strip()]
|
||||
cc_list = [a.strip() for a in cc_raw.split(";") if a.strip()]
|
||||
received = str(date_raw) if date_raw else None
|
||||
|
||||
return {
|
||||
"subject": subject,
|
||||
"body_html": body_html,
|
||||
"body_text": body_text,
|
||||
"sender_email": sender_email,
|
||||
"sender_name": sender_name,
|
||||
"to": to_list,
|
||||
"cc": cc_list,
|
||||
"received": received,
|
||||
"attachments": att_list,
|
||||
}
|
||||
|
||||
|
||||
def make_recipient(addr):
|
||||
"""Create Graph API recipient object from email address."""
|
||||
# Handle 'Name <email>' format
|
||||
if "<" in addr and ">" in addr:
|
||||
name = addr[:addr.index("<")].strip().strip('"')
|
||||
email = addr[addr.index("<") + 1 : addr.index(">")].strip()
|
||||
else:
|
||||
name = addr
|
||||
email = addr
|
||||
return {"emailAddress": {"name": name, "address": email}}
|
||||
|
||||
|
||||
def import_msg(msg_path):
|
||||
token = get_token()
|
||||
headers = {
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
print(f"Parsing: {msg_path}")
|
||||
data = parse_msg(msg_path)
|
||||
print(f" Subject: {data['subject']}")
|
||||
print(f" From: {data['sender_name']} <{data['sender_email']}>")
|
||||
print(f" To: {data['to']}")
|
||||
print(f" Date: {data['received']}")
|
||||
print(f" Attachments: {len(data['attachments'])}")
|
||||
|
||||
# 1. Create message in mailFolder (Inbox)
|
||||
payload = {
|
||||
"subject": data["subject"],
|
||||
"body": {
|
||||
"contentType": "HTML" if data["body_html"] else "Text",
|
||||
"content": data["body_html"] or data["body_text"],
|
||||
},
|
||||
"from": make_recipient(
|
||||
f"{data['sender_name']} <{data['sender_email']}>"
|
||||
),
|
||||
"toRecipients": [make_recipient(a) for a in data["to"]],
|
||||
"ccRecipients": [make_recipient(a) for a in data["cc"]],
|
||||
"isRead": True,
|
||||
# PR_MESSAGE_FLAGS (0x0E07) = 1 → read, NOT draft (without MSGFLAG_UNSENT=0x08)
|
||||
"singleValueExtendedProperties": [
|
||||
{
|
||||
"id": "Integer 0x0E07",
|
||||
"value": "1",
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
if data["received"]:
|
||||
# Graph API expects ISO 8601 UTC format
|
||||
from datetime import datetime, timezone
|
||||
try:
|
||||
from dateutil import parser as dtparser
|
||||
dt = dtparser.parse(data["received"])
|
||||
payload["receivedDateTime"] = dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||
except Exception as e:
|
||||
print(f" Warning: cannot parse date '{data['received']}': {e}")
|
||||
|
||||
if data["attachments"]:
|
||||
payload["attachments"] = data["attachments"]
|
||||
|
||||
# Find target folder (Inbox/JNJ)
|
||||
folder_url = f"{GRAPH_URL}/users/{MAILBOX}/mailFolders/Inbox/childFolders"
|
||||
r_folders = requests.get(folder_url, headers=headers, timeout=15)
|
||||
folder_id = None
|
||||
for f in r_folders.json().get("value", []):
|
||||
if f["displayName"].lower() == TARGET_FOLDER.lower():
|
||||
folder_id = f["id"]
|
||||
break
|
||||
|
||||
if not folder_id:
|
||||
# Create the folder if it doesn't exist
|
||||
r_create = requests.post(
|
||||
folder_url, headers=headers,
|
||||
json={"displayName": TARGET_FOLDER}, timeout=15
|
||||
)
|
||||
folder_id = r_create.json()["id"]
|
||||
print(f" Created folder '{TARGET_FOLDER}'")
|
||||
|
||||
url = f"{GRAPH_URL}/users/{MAILBOX}/mailFolders/{folder_id}/messages"
|
||||
print(f"\nPOST -> Inbox/{TARGET_FOLDER}")
|
||||
|
||||
r = requests.post(url, headers=headers, json=payload, timeout=30)
|
||||
|
||||
if r.status_code in (200, 201):
|
||||
msg_id = r.json().get("id", "?")
|
||||
print(f" OK! Message created, id={msg_id[:40]}...")
|
||||
return r.json()
|
||||
else:
|
||||
print(f" FAILED [{r.status_code}]: {r.text[:500]}")
|
||||
return None
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
path = sys.argv[1] if len(sys.argv) > 1 else MSG_PATH
|
||||
import_msg(Path(path))
|
||||
@@ -0,0 +1,25 @@
|
||||
import dropbox
|
||||
from dotenv import load_dotenv
|
||||
from pathlib import Path
|
||||
import os
|
||||
from dropbox import DropboxOAuth2FlowNoRedirect
|
||||
|
||||
load_dotenv(Path(__file__).parent / ".env")
|
||||
|
||||
APP_KEY = os.getenv("DROPBOX_APP_KEY", "")
|
||||
APP_SECRET = os.getenv("DROPBOX_APP_SECRET", "")
|
||||
|
||||
auth_flow = DropboxOAuth2FlowNoRedirect(
|
||||
APP_KEY,
|
||||
APP_SECRET,
|
||||
token_access_type='offline' # důležité — dá refresh token
|
||||
)
|
||||
|
||||
authorize_url = auth_flow.start()
|
||||
print(f"Otevři v prohlížeči:\n{authorize_url}")
|
||||
|
||||
auth_code = input("Vlož autorizační kód: ").strip()
|
||||
oauth_result = auth_flow.finish(auth_code)
|
||||
|
||||
print(f"Refresh token: {oauth_result.refresh_token}")
|
||||
# Tento token ulož — platí "navždy" (dokud app neodvoláš)
|
||||
@@ -0,0 +1,22 @@
|
||||
import dropbox
|
||||
from dotenv import load_dotenv
|
||||
from pathlib import Path
|
||||
import os
|
||||
|
||||
load_dotenv(Path(__file__).parent / ".env")
|
||||
|
||||
APP_KEY = os.getenv("DROPBOX_APP_KEY", "")
|
||||
APP_SECRET = os.getenv("DROPBOX_APP_SECRET", "")
|
||||
REFRESH_TOKEN = os.getenv("DROPBOX_APP_REFRESH_TOKEN", "")
|
||||
|
||||
dbx = dropbox.Dropbox(
|
||||
app_key=APP_KEY,
|
||||
app_secret=APP_SECRET,
|
||||
oauth2_refresh_token=REFRESH_TOKEN,
|
||||
)
|
||||
|
||||
dropbox_path = "/!!!Days/Downloads Z230/AHOJVLADO.TXT"
|
||||
content = b"AHOJ VLADO"
|
||||
|
||||
dbx.files_upload(content, dropbox_path, mode=dropbox.files.WriteMode.overwrite)
|
||||
print(f"Nahráno: {dropbox_path}")
|
||||
@@ -0,0 +1,18 @@
|
||||
"""
|
||||
db_cleanup_inbox v1.0
|
||||
Verze: 1.0
|
||||
Datum: 2026-05-28
|
||||
Popis: Jednorázový cleanup - smaže záznamy v SQLite DB kde folder = '/Inbox'
|
||||
(záznamy bez emailové adresy v cestě, vytvořené chybnou verzí skriptu).
|
||||
"""
|
||||
import sqlite3
|
||||
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
deleted = conn.execute("DELETE FROM messages WHERE folder = '/Inbox'").rowcount
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
print(f"Smazáno záznamů: {deleted}")
|
||||
print("Hotovo.")
|
||||
@@ -0,0 +1,177 @@
|
||||
"""
|
||||
janssenpc_email_send_new v1.0
|
||||
Verze: 1.0
|
||||
Datum: 2026-05-28
|
||||
Popis: Prochází pouze složku Inbox v Outlooku (MAPI), ukládá emailové zprávy jako .msg
|
||||
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
|
||||
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
|
||||
Podporuje pokračování od posledního zpracovaného emailu (resume).
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
import io
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
conn.commit()
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source))
|
||||
conn.commit()
|
||||
|
||||
def upload_db(db_path):
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
|
||||
def upload_msg(msg_path, filename):
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=30
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["status"]
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
def process_folder(conn, folder, source, folder_path="", counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except:
|
||||
mid = None
|
||||
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
status = upload_msg(tmp_path, safe_name)
|
||||
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
|
||||
received, current_path, source)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {item.Subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA | {getattr(item, 'Subject', '?')[:40]} | {e}")
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, subfolder, source, current_path, counter)
|
||||
|
||||
# --- MAIN ---
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
inbox = ns.GetDefaultFolder(6) # 6 = olFolderInbox
|
||||
source = "mailbox"
|
||||
print(f"\n=== Inbox ===")
|
||||
process_folder(conn, inbox, source)
|
||||
|
||||
# Finální DB upload po dokončení
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
conn.close()
|
||||
print("\nHotovo.")
|
||||
@@ -0,0 +1,195 @@
|
||||
"""
|
||||
janssenpc_email_send_new v1.1
|
||||
Verze: 1.1
|
||||
Datum: 2026-05-28
|
||||
Popis: Prochází pouze složku Inbox v Outlooku (MAPI), ukládá emailové zprávy jako .msg
|
||||
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
|
||||
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
|
||||
Podporuje pokračování od posledního zpracovaného emailu (resume).
|
||||
Nově: chyby při uploadu se logují do souboru jnjemails_errors.log
|
||||
(timestamp, složka, odesílatel, předmět, chyba).
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
import io
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
|
||||
logging.basicConfig(
|
||||
filename=LOG_PATH,
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
conn.commit()
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source))
|
||||
conn.commit()
|
||||
|
||||
def upload_db(db_path):
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
|
||||
def upload_msg(msg_path, filename):
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=30
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["status"]
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
def process_folder(conn, folder, source, folder_path="", counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except:
|
||||
mid = None
|
||||
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
status = upload_msg(tmp_path, safe_name)
|
||||
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
|
||||
received, current_path, source)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {item.Subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
subject = getattr(item, 'Subject', '?')
|
||||
sender = getattr(item, 'SenderEmailAddress', '?')
|
||||
received = getattr(item, 'ReceivedTime', '?')
|
||||
print(f" CHYBA | {subject[:40]} | {e}")
|
||||
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
|
||||
current_path, sender, received, subject, e)
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, subfolder, source, current_path, counter)
|
||||
|
||||
# --- MAIN ---
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
inbox = ns.GetDefaultFolder(6) # 6 = olFolderInbox
|
||||
source = "mailbox"
|
||||
print(f"\n=== Inbox ===")
|
||||
process_folder(conn, inbox, source)
|
||||
|
||||
# Finální DB upload po dokončení
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
conn.close()
|
||||
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
|
||||
@@ -0,0 +1,199 @@
|
||||
"""
|
||||
janssenpc_email_send_new v1.2
|
||||
Verze: 1.2
|
||||
Datum: 2026-05-28
|
||||
Popis: Prochází pouze složku Inbox v Outlooku (MAPI), ukládá emailové zprávy jako .msg
|
||||
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
|
||||
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
|
||||
Podporuje pokračování od posledního zpracovaného emailu (resume).
|
||||
Chyby při uploadu se logují do souboru jnjemails_errors.log.
|
||||
Oprava v1.2: folder cesta obsahuje celé jméno schránky (např. /vbuzalka@its.jnj.com/Inbox)
|
||||
aby resume logika správně navazovala na záznamy z původního skriptu.
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
import io
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
|
||||
logging.basicConfig(
|
||||
filename=LOG_PATH,
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
conn.commit()
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source))
|
||||
conn.commit()
|
||||
|
||||
def upload_db(db_path):
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
|
||||
def upload_msg(msg_path, filename):
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=30
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["status"]
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
def process_folder(conn, folder, source, folder_path="", counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except:
|
||||
mid = None
|
||||
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
status = upload_msg(tmp_path, safe_name)
|
||||
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
|
||||
received, current_path, source)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {item.Subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
subject = getattr(item, 'Subject', '?')
|
||||
sender = getattr(item, 'SenderEmailAddress', '?')
|
||||
received = getattr(item, 'ReceivedTime', '?')
|
||||
print(f" CHYBA | {subject[:40]} | {e}")
|
||||
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
|
||||
current_path, sender, received, subject, e)
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, subfolder, source, current_path, counter)
|
||||
|
||||
# --- MAIN ---
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
inbox = ns.GetDefaultFolder(6) # 6 = olFolderInbox
|
||||
mailbox_name = inbox.Parent.Name # např. "vbuzalka@its.jnj.com"
|
||||
print(f"Schránka: {mailbox_name}")
|
||||
|
||||
source = "mailbox"
|
||||
print(f"\n=== Inbox ({mailbox_name}) ===")
|
||||
process_folder(conn, inbox, source, f"/{mailbox_name}")
|
||||
|
||||
# Finální DB upload po dokončení
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
conn.close()
|
||||
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
|
||||
@@ -0,0 +1,200 @@
|
||||
"""
|
||||
janssenpc_email_send_new v1.3
|
||||
Verze: 1.3
|
||||
Datum: 2026-05-28
|
||||
Popis: Prochází složky Inbox, Deleted Items a Sent Items v Outlooku (MAPI),
|
||||
ukládá emailové zprávy jako .msg soubory a uploaduje je na https://msgs.buzalka.cz.
|
||||
Zaznamenává zpracované zprávy do SQLite DB (jnjemails.db) a DB periodicky
|
||||
uploaduje na server. Podporuje pokračování od posledního zpracovaného emailu (resume).
|
||||
Folder cesta obsahuje celé jméno schránky (např. /vbuzalka@its.jnj.com/Inbox).
|
||||
Chyby při uploadu se logují do souboru jnjemails_errors.log.
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
import io
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
|
||||
# olFolderInbox=6, olFolderDeletedItems=3, olFolderSentMail=5
|
||||
FOLDERS_TO_PROCESS = [6, 3, 5]
|
||||
|
||||
logging.basicConfig(
|
||||
filename=LOG_PATH,
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
conn.commit()
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source))
|
||||
conn.commit()
|
||||
|
||||
def upload_db(db_path):
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
|
||||
def upload_msg(msg_path, filename):
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=30
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["status"]
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
def process_folder(conn, folder, source, folder_path="", counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except:
|
||||
mid = None
|
||||
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
status = upload_msg(tmp_path, safe_name)
|
||||
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
|
||||
received, current_path, source)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {item.Subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
subject = getattr(item, 'Subject', '?')
|
||||
sender = getattr(item, 'SenderEmailAddress', '?')
|
||||
received = getattr(item, 'ReceivedTime', '?')
|
||||
print(f" CHYBA | {subject[:40]} | {e}")
|
||||
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
|
||||
current_path, sender, received, subject, e)
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, subfolder, source, current_path, counter)
|
||||
|
||||
# --- MAIN ---
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
counter = [0]
|
||||
for folder_id in FOLDERS_TO_PROCESS:
|
||||
folder = ns.GetDefaultFolder(folder_id)
|
||||
mailbox_name = folder.Parent.Name
|
||||
print(f"\n=== {folder.Name} ({mailbox_name}) ===")
|
||||
process_folder(conn, folder, "mailbox", f"/{mailbox_name}", counter)
|
||||
|
||||
# Finální DB upload po dokončení
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
conn.close()
|
||||
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
|
||||
@@ -0,0 +1,233 @@
|
||||
"""
|
||||
janssenpc_email_send_new v1.4
|
||||
Verze: 1.4.1
|
||||
Datum: 2026-05-29
|
||||
Popis: Prochází složky Inbox, Deleted Items a Sent Items v Outlooku (MAPI),
|
||||
ukládá emailové zprávy jako .msg soubory a uploaduje je na https://msgs.buzalka.cz.
|
||||
Zaznamenává zpracované zprávy do SQLite DB (jnjemails.db) a DB uploaduje na server
|
||||
jednou za 24 hodin (ne při každém běhu). Podporuje pokračování od posledního
|
||||
zpracovaného emailu (resume). Folder cesta obsahuje celé jméno schránky
|
||||
(např. /vbuzalka@its.jnj.com/Inbox). Chyby se logují do jnjemails_errors.log.
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
import io
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
DB_UPLOAD_MARKER = r"C:\Users\vbuzalka\SQLITE\jnjemails_last_db_upload.txt"
|
||||
DB_UPLOAD_INTERVAL_H = 24
|
||||
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
|
||||
# olFolderInbox=6, olFolderDeletedItems=3, olFolderSentMail=5
|
||||
FOLDERS_TO_PROCESS = [6, 3, 5]
|
||||
|
||||
UPLOAD_LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_uploads.log"
|
||||
|
||||
logging.basicConfig(
|
||||
filename=LOG_PATH,
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
# Separate upload logger — logs every upload attempt
|
||||
_upload_log = logging.getLogger("uploads")
|
||||
_upload_log.setLevel(logging.DEBUG)
|
||||
_uh = logging.FileHandler(UPLOAD_LOG_PATH, encoding="utf-8")
|
||||
_uh.setFormatter(logging.Formatter("%(asctime)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S"))
|
||||
_upload_log.addHandler(_uh)
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
conn.commit()
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source))
|
||||
conn.commit()
|
||||
|
||||
def _db_upload_due() -> bool:
|
||||
"""Return True if 24h elapsed since last DB upload (or never uploaded)."""
|
||||
marker = Path(DB_UPLOAD_MARKER)
|
||||
if not marker.exists():
|
||||
return True
|
||||
try:
|
||||
last = datetime.fromisoformat(marker.read_text().strip())
|
||||
return (datetime.now() - last).total_seconds() >= DB_UPLOAD_INTERVAL_H * 3600
|
||||
except Exception:
|
||||
return True
|
||||
|
||||
def _db_upload_mark():
|
||||
"""Write current timestamp to marker file."""
|
||||
Path(DB_UPLOAD_MARKER).write_text(datetime.now().isoformat())
|
||||
|
||||
def upload_db(db_path, force=False):
|
||||
if not force and not _db_upload_due():
|
||||
return
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
_db_upload_mark()
|
||||
|
||||
def upload_msg(msg_path, filename, folder=""):
|
||||
_upload_log.info("UPLOAD %s | folder=%s", filename, folder)
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
data={"folder": folder},
|
||||
timeout=60
|
||||
)
|
||||
resp.raise_for_status()
|
||||
result = resp.json()
|
||||
_upload_log.info("RESPONSE %s | %s", filename, result)
|
||||
return result["status"]
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
def process_folder(conn, folder, source, folder_path="", counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except:
|
||||
mid = None
|
||||
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
status = upload_msg(tmp_path, safe_name, current_path)
|
||||
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
|
||||
received, current_path, source)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {item.Subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
subject = getattr(item, 'Subject', '?')
|
||||
sender = getattr(item, 'SenderEmailAddress', '?')
|
||||
received = getattr(item, 'ReceivedTime', '?')
|
||||
print(f" CHYBA | {subject[:40]} | {e}")
|
||||
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
|
||||
current_path, sender, received, subject, e)
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, subfolder, source, current_path, counter)
|
||||
|
||||
# --- MAIN ---
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
counter = [0]
|
||||
for folder_id in FOLDERS_TO_PROCESS:
|
||||
folder = ns.GetDefaultFolder(folder_id)
|
||||
mailbox_name = folder.Parent.Name
|
||||
print(f"\n=== {folder.Name} ({mailbox_name}) ===")
|
||||
process_folder(conn, folder, "mailbox", f"/{mailbox_name}", counter)
|
||||
|
||||
# Finální DB upload po dokončení
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
conn.close()
|
||||
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
|
||||
@@ -0,0 +1,170 @@
|
||||
# inbox_full_sync_v1.0
|
||||
|
||||
**Název:** inbox_full_sync_v1.0.py
|
||||
**Verze:** 1.0.4
|
||||
**Datum:** 2026-06-01
|
||||
**Autor:** vladimir.buzalka
|
||||
|
||||
---
|
||||
|
||||
## Účel
|
||||
|
||||
Jednorázový skript pro úplný přenos Inboxu z JNJ Outlooku (MAPI) do osobní schránky `vladimir.buzalka@buzalka.cz` přes Microsoft Graph API.
|
||||
|
||||
Spouštět ručně jako záchranná síť nebo iniciální sync. Bezpečné opakovat — duplicity se automaticky přeskočí.
|
||||
|
||||
---
|
||||
|
||||
## Co dělá
|
||||
|
||||
1. Připojí se k Outlooku přes MAPI (`win32com`)
|
||||
2. Projde celý **Inbox** včetně všech podsložek rekurzivně
|
||||
3. Pro každý email zkontroluje SQLite DB — pokud už je přenesen, přeskočí ho
|
||||
4. Nový email uloží jako `.msg` do temp složky, **zašifruje** (Fernet/AES) a odešle jako `.emsg` na `msgs.buzalka.cz/upload`
|
||||
5. Server (`app.py`) dešifruje, parsuje `.msg`, importuje do Graph API a vrátí `graph_id`
|
||||
6. Záznam se uloží do DB (`messages`, `log`)
|
||||
7. Každých 100 přenesených emailů + na konci uploaduje DB na server
|
||||
|
||||
**Online Archive se nepřenáší** — `GetDefaultFolder(6)` vrátí pouze primární schránku.
|
||||
|
||||
---
|
||||
|
||||
## Šifrování (Zscaler bypass)
|
||||
|
||||
JNJ síť používá **Zscaler DLP** — blokuje upload souborů s medicínským obsahem (ECG reporty, klinická data) na externí URL.
|
||||
|
||||
Řešení: soubor se před odesláním zašifruje pomocí **Fernet** (AES-128 CBC + HMAC). Zscaler vidí pouze šifrovaný bináč a nerozpozná obsah.
|
||||
|
||||
- Šifrovací klíč se odvozuje z `TOKEN` přes SHA-256 — žádná extra konstanta, obě strany derivují klíč samostatně
|
||||
- Soubor se odesílá s příponou `.emsg` místo `.msg`
|
||||
- Server (app.py v1.6+) automaticky detekuje `.emsg`, dešifruje a dále zpracuje standardně
|
||||
|
||||
---
|
||||
|
||||
## Konfigurace
|
||||
|
||||
Konstanty jsou přímo v kódu:
|
||||
|
||||
| Konstanta | Hodnota |
|
||||
|---|---|
|
||||
| `TOKEN` | Bearer token pro msgs.buzalka.cz (slouží i jako základ šifrovacího klíče) |
|
||||
| `UPLOAD_URL` | `https://msgs.buzalka.cz/upload` |
|
||||
| `DB_UPLOAD_URL` | `https://msgs.buzalka.cz/upload-db` |
|
||||
| `DB_PATH` | `C:\Users\vbuzalka\SQLITE\jnjemails.db` |
|
||||
| `LOG_PATH` | `C:\Users\vbuzalka\SQLITE\inbox_full_sync_errors.log` |
|
||||
|
||||
---
|
||||
|
||||
## Závislosti
|
||||
|
||||
- Python 3.10+, Windows
|
||||
- Outlook musí být spuštěn
|
||||
- `pywin32`, `requests`, `cryptography`
|
||||
- Server `msgs.buzalka.cz` musí běžet (app.py v1.6+)
|
||||
|
||||
---
|
||||
|
||||
## SQLite DB (`jnjemails.db`)
|
||||
|
||||
### Tabulka `messages`
|
||||
Jeden záznam na každý přenesený email.
|
||||
|
||||
| Sloupec | Popis |
|
||||
|---|---|
|
||||
| `message_id` | Internet Message-ID (nebo `entryid:...` jako fallback) |
|
||||
| `entry_id` | Outlook EntryID — pro zpětné dohledání v MAPI |
|
||||
| `graph_id` | ID zprávy v Graph API — pro sync operace |
|
||||
| `is_read` | Stav přečtení při přenosu (0/1) |
|
||||
| `jnj_folder` | Složka v JNJ při přenosu |
|
||||
| `source` | Vždy `inbox_full_sync` |
|
||||
|
||||
### Tabulka `runs`
|
||||
Jeden záznam na každý běh skriptu.
|
||||
|
||||
| Sloupec | Popis |
|
||||
|---|---|
|
||||
| `script` | `inbox_full_sync` |
|
||||
| `version` | verze skriptu |
|
||||
| `started_at` / `finished_at` | časy běhu |
|
||||
| `transferred` | počet nově přenesených emailů |
|
||||
| `skipped` | počet přeskočených (již v DB) |
|
||||
| `errors` | počet chyb |
|
||||
|
||||
### Tabulka `log`
|
||||
Flat event log — každý console výstup i interní událost jako řádek.
|
||||
|
||||
| Sloupec | Popis |
|
||||
|---|---|
|
||||
| `run_id` | FK na `runs.id` |
|
||||
| `level` | `INFO` / `ERROR` |
|
||||
| `event` | typ události (viz níže) |
|
||||
| `subject` | předmět emailu (pokud relevantní) |
|
||||
| `folder` | složka (pokud relevantní) |
|
||||
| `graph_id` | Graph ID (pokud relevantní) |
|
||||
| `detail` | pro `upload_saved`: `size=XKB`; pro `upload_error`: `error=... \| size=XKB \| body=... \| sender=... \| received=... \| entry_id=... \| message_id=...` |
|
||||
|
||||
#### Události (`log.event`)
|
||||
|
||||
| Event | Popis |
|
||||
|---|---|
|
||||
| `run_start` | start skriptu |
|
||||
| `mailbox` | název schránky |
|
||||
| `folder_start` | vstup do složky (detail = počet položek) |
|
||||
| `folder_done` | konec složky (detail = přeneseno/skip) |
|
||||
| `upload_saved` | nový email úspěšně přenesen (detail = size=XKB) |
|
||||
| `upload_exists` | email již v DB, přeskočen |
|
||||
| `upload_error` | chyba při uploadu — detail obsahuje sender, received, entry_id, message_id pro dohledání v Outlooku |
|
||||
| `progress` | každých 100 přenesených emailů |
|
||||
| `db_upload` | úspěšný upload DB na server |
|
||||
| `db_upload_error` | chyba uploadu DB |
|
||||
| `run_done` | konec skriptu (detail = souhrn) |
|
||||
|
||||
---
|
||||
|
||||
## Užitečné dotazy
|
||||
|
||||
**Poslední běh — kompletní log:**
|
||||
```sql
|
||||
SELECT r.script, r.version, r.started_at,
|
||||
l.level, l.event, l.subject, l.folder, l.detail, l.created_at
|
||||
FROM log l JOIN runs r ON r.id = l.run_id
|
||||
WHERE l.run_id = (SELECT MAX(id) FROM runs)
|
||||
ORDER BY l.created_at
|
||||
```
|
||||
|
||||
**Přehled všech běhů:**
|
||||
```sql
|
||||
SELECT id, script, version, started_at, finished_at,
|
||||
transferred, skipped, errors
|
||||
FROM runs ORDER BY started_at DESC
|
||||
```
|
||||
|
||||
**Chyby z posledního běhu:**
|
||||
```sql
|
||||
SELECT l.event, l.subject, l.folder, l.detail, l.created_at
|
||||
FROM log l
|
||||
WHERE l.run_id = (SELECT MAX(id) FROM runs)
|
||||
AND l.level = 'ERROR'
|
||||
ORDER BY l.created_at
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Návaznost
|
||||
|
||||
- Sdílí DB s `janssenpc_email_send_new_v1.5.py` — záznamy jsou kompatibilní
|
||||
- Emaily přenesené tímto skriptem mají `graph_id` a jsou od té chvíle hlídány sync průchodem v1.5
|
||||
- Server endpoint: `msgs.buzalka.cz/upload` musí vracet `graph_id` (app.py v1.6+)
|
||||
- nginx `client_max_body_size` nastaven na **200M** (SWAG `msgreceiver.subdomain.conf`)
|
||||
|
||||
---
|
||||
|
||||
## Historie verzí
|
||||
|
||||
| Verze | Datum | Změna |
|
||||
|---|---|---|
|
||||
| 1.0.0 | 2026-06-01 | Základní funkce: Inbox full scan, dedup přes DB, entry_id/graph_id/is_read |
|
||||
| 1.0.1 | 2026-06-01 | DB upload každých 100 emailů + finální upload |
|
||||
| 1.0.2 | 2026-06-01 | SQLite tabulky runs + log |
|
||||
| 1.0.3 | 2026-06-01 | Kompletní konzolový výstup zrcadlen do log tabulky, skipped counter |
|
||||
| 1.0.4 | 2026-06-01 | Šifrování Fernet (.emsg) pro bypass Zscaler DLP; rozšířený error detail (sender/received/entry_id/size) |
|
||||
@@ -0,0 +1,384 @@
|
||||
"""
|
||||
inbox_full_sync v1.0
|
||||
Název: inbox_full_sync_v1.0.py
|
||||
Verze: 1.0.3
|
||||
Datum: 2026-06-01
|
||||
Autor: vladimir.buzalka
|
||||
|
||||
Popis:
|
||||
Jednorázový skript pro úplný přenos Inboxu z JNJ Outlooku (MAPI) do osobní
|
||||
schránky vladimir.buzalka@buzalka.cz přes Graph API.
|
||||
|
||||
Prochází celý Inbox včetně všech podsložek. Online Archive se nepřenáší
|
||||
(GetDefaultFolder(6) vrátí pouze primární schránku).
|
||||
|
||||
Každý email se uloží jako .msg do temp složky, odešle na https://msgs.buzalka.cz/upload
|
||||
a přes Graph API se importuje do odpovídající složky v osobní schránce.
|
||||
Dedup zajišťuje SQLite DB — email který je v DB (message_id) se přeskočí.
|
||||
|
||||
Spouštění:
|
||||
Spouštět ručně jako záchranná síť nebo iniciální sync.
|
||||
Bezpečné opakovat — duplicity se přeskočí.
|
||||
|
||||
Závislosti:
|
||||
win32com, requests, sqlite3 (stdlib)
|
||||
Python 3.10+, Windows, Outlook musí být spuštěn
|
||||
|
||||
Konfigurace (konstanty v kódu):
|
||||
TOKEN Bearer token pro msgs.buzalka.cz
|
||||
UPLOAD_URL https://msgs.buzalka.cz/upload
|
||||
DB_UPLOAD_URL https://msgs.buzalka.cz/upload-db
|
||||
DB_PATH C:\\Users\\vbuzalka\\SQLITE\\jnjemails.db
|
||||
LOG_PATH C:\\Users\\vbuzalka\\SQLITE\\inbox_full_sync_errors.log
|
||||
|
||||
SQLite DB (jnjemails.db):
|
||||
messages — přenesené emaily (message_id, entry_id, graph_id, is_read, jnj_folder, ...)
|
||||
runs — jeden záznam na běh (script, version, started_at, finished_at, counts)
|
||||
log — flat event log per run (level, event, subject, folder, graph_id, detail)
|
||||
|
||||
Dotaz pro posledn běh:
|
||||
SELECT r.script, r.version, r.started_at, l.level, l.event,
|
||||
l.subject, l.folder, l.detail, l.created_at
|
||||
FROM log l JOIN runs r ON r.id = l.run_id
|
||||
WHERE l.run_id = (SELECT MAX(id) FROM runs)
|
||||
ORDER BY l.created_at
|
||||
|
||||
Log události (log.event):
|
||||
run_start — start skriptu
|
||||
mailbox — název schránky
|
||||
folder_start — vstup do složky (detail = počet položek)
|
||||
folder_done — konec složky (detail = přeneseno/skip)
|
||||
upload_saved — nový email přenesen
|
||||
upload_exists — email již v DB, přeskočen
|
||||
upload_error — chyba při uploadu (detail = chybová zpráva)
|
||||
progress — každých 100 přenesených
|
||||
db_upload — úspěšný upload DB na server
|
||||
db_upload_error — chyba uploadu DB
|
||||
run_done — konec skriptu (detail = souhrn)
|
||||
|
||||
Historie verzí:
|
||||
1.0.0 2026-06-01 Základní funkce: Inbox full scan, dedup přes DB, entry_id/graph_id/is_read
|
||||
1.0.1 2026-06-01 DB upload každých 100 emailů + finální upload
|
||||
1.0.2 2026-06-01 SQLite tabulky runs + log
|
||||
1.0.3 2026-06-01 Kompletní konzolový výstup zrcadlen do log tabulky, skipped counter
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import logging
|
||||
import hashlib
|
||||
import base64
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from cryptography.fernet import Fernet
|
||||
import tempfile
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\inbox_full_sync_errors.log"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
DB_UPLOAD_URL = "https://msgs.buzalka.cz/upload-db"
|
||||
SCRIPT_NAME = "inbox_full_sync"
|
||||
SCRIPT_VERSION = "1.0.4"
|
||||
# Šifrovací klíč odvozený z TOKENu — stejný algoritmus jako na serveru
|
||||
_FERNET = Fernet(base64.urlsafe_b64encode(hashlib.sha256(TOKEN.encode()).digest()))
|
||||
|
||||
logging.basicConfig(
|
||||
filename=LOG_PATH,
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now')),
|
||||
entry_id TEXT,
|
||||
graph_id TEXT,
|
||||
is_read INTEGER DEFAULT 0,
|
||||
jnj_folder TEXT
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS runs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
script TEXT NOT NULL,
|
||||
version TEXT,
|
||||
started_at TEXT NOT NULL,
|
||||
finished_at TEXT,
|
||||
transferred INTEGER DEFAULT 0,
|
||||
skipped INTEGER DEFAULT 0,
|
||||
sync_updated INTEGER DEFAULT 0,
|
||||
sync_deleted INTEGER DEFAULT 0,
|
||||
errors INTEGER DEFAULT 0
|
||||
)
|
||||
""")
|
||||
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS log (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(id),
|
||||
level TEXT NOT NULL,
|
||||
event TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
folder TEXT,
|
||||
graph_id TEXT,
|
||||
detail TEXT,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_log_run_id ON log(run_id)")
|
||||
|
||||
for col, definition in [
|
||||
("entry_id", "TEXT"),
|
||||
("graph_id", "TEXT"),
|
||||
("is_read", "INTEGER DEFAULT 0"),
|
||||
("jnj_folder", "TEXT"),
|
||||
]:
|
||||
try:
|
||||
conn.execute(f"ALTER TABLE messages ADD COLUMN {col} {definition}")
|
||||
except Exception:
|
||||
pass
|
||||
conn.commit()
|
||||
|
||||
|
||||
def start_run(conn):
|
||||
cur = conn.execute(
|
||||
"INSERT INTO runs (script, version, started_at) VALUES (?, ?, datetime('now'))",
|
||||
(SCRIPT_NAME, SCRIPT_VERSION)
|
||||
)
|
||||
conn.commit()
|
||||
return cur.lastrowid
|
||||
|
||||
|
||||
def finish_run(conn, run_id, transferred, skipped, errors):
|
||||
conn.execute("""
|
||||
UPDATE runs SET finished_at=datetime('now'), transferred=?, skipped=?, errors=?
|
||||
WHERE id=?
|
||||
""", (transferred, skipped, errors, run_id))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def db_log(conn, run_id, level, event, subject=None, folder=None, graph_id=None, detail=None):
|
||||
conn.execute("""
|
||||
INSERT INTO log (run_id, level, event, subject, folder, graph_id, detail)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
""", (run_id, level, event, subject, folder, graph_id, detail))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def info(conn, run_id, event, **kwargs):
|
||||
db_log(conn, run_id, "INFO", event, **kwargs)
|
||||
|
||||
|
||||
def error(conn, run_id, event, **kwargs):
|
||||
db_log(conn, run_id, "ERROR", event, **kwargs)
|
||||
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder,
|
||||
entry_id=None, graph_id=None, is_read=0):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages
|
||||
(message_id, subject, sender, received_at, folder, source,
|
||||
entry_id, graph_id, is_read, jnj_folder)
|
||||
VALUES (?, ?, ?, ?, ?, 'inbox_full_sync', ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder,
|
||||
entry_id, graph_id, is_read, folder))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def upload_db(conn, run_id):
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
try:
|
||||
with open(DB_PATH, "rb") as f:
|
||||
resp = requests.post(
|
||||
DB_UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60,
|
||||
)
|
||||
result = resp.json()
|
||||
msg = f"DB upload: {result}"
|
||||
print(f" {msg}")
|
||||
info(conn, run_id, "db_upload", detail=msg)
|
||||
except Exception as e:
|
||||
msg = str(e)
|
||||
print(f" DB upload CHYBA: {msg}")
|
||||
error(conn, run_id, "db_upload_error", detail=msg)
|
||||
|
||||
|
||||
def upload_msg(msg_path, filename, folder=""):
|
||||
size_kb = Path(msg_path).stat().st_size // 1024
|
||||
with open(msg_path, "rb") as f:
|
||||
encrypted = _FERNET.encrypt(f.read())
|
||||
enc_filename = Path(filename).stem + ".emsg"
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (enc_filename, encrypted, "application/octet-stream")},
|
||||
data={"folder": folder},
|
||||
timeout=60,
|
||||
)
|
||||
if not resp.ok:
|
||||
raise requests.HTTPError(
|
||||
f"{resp.status_code} {resp.reason} | size={size_kb}KB | body={resp.text[:300]}",
|
||||
response=resp,
|
||||
)
|
||||
return resp.json()
|
||||
|
||||
|
||||
def process_folder(conn, run_id, folder, folder_path, counter, skipped_counter, error_counter):
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
items = folder.Items
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
total = items.Count
|
||||
|
||||
msg = f"Složka: {current_path} ({total} položek)"
|
||||
print(f"\n {msg}")
|
||||
info(conn, run_id, "folder_start", folder=current_path, detail=str(total))
|
||||
|
||||
for item in items:
|
||||
subject = getattr(item, 'Subject', '?')
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except Exception:
|
||||
mid = None
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
skipped_counter[0] += 1
|
||||
continue
|
||||
|
||||
try:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
size_kb = tmp_path.stat().st_size // 1024
|
||||
result = upload_msg(tmp_path, safe_name, current_path)
|
||||
|
||||
status = result.get("status", "?")
|
||||
graph_id = result.get("graph_id")
|
||||
is_read = 0 if item.UnRead else 1
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
|
||||
save_to_db(conn, mid, subject, item.SenderEmailAddress,
|
||||
received, current_path,
|
||||
entry_id=item.EntryID, graph_id=graph_id, is_read=is_read)
|
||||
info(conn, run_id, f"upload_{status}",
|
||||
subject=subject, folder=current_path, graph_id=graph_id,
|
||||
detail=f"size={size_kb}KB")
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 100 == 0:
|
||||
msg = f"celkem přeneseno: {counter[0]}"
|
||||
print(f" → {msg}, uploaduji DB...")
|
||||
info(conn, run_id, "progress", detail=msg)
|
||||
upload_db(conn, run_id)
|
||||
|
||||
print(f" {status.upper():6} | {subject[:70]}")
|
||||
|
||||
except Exception as e:
|
||||
sender_str = getattr(item, 'SenderEmailAddress', '?')
|
||||
received_str = getattr(item, 'ReceivedTime', None)
|
||||
received_str = received_str.isoformat() if received_str else '?'
|
||||
entry_id_str = getattr(item, 'EntryID', '?')
|
||||
detail = (
|
||||
f"error={e} | "
|
||||
f"sender={sender_str} | "
|
||||
f"received={received_str} | "
|
||||
f"entry_id={entry_id_str} | "
|
||||
f"message_id={mid}"
|
||||
)
|
||||
print(f" CHYBA | {subject[:50]} | sender={sender_str} | received={received_str} | {e}")
|
||||
error(conn, run_id, "upload_error",
|
||||
subject=subject, folder=current_path, detail=detail)
|
||||
logging.error("folder=%s | %s", current_path, detail)
|
||||
error_counter[0] += 1
|
||||
|
||||
except Exception as e:
|
||||
# Neočekávaná chyba mimo upload blok (MessageClass, EntryID, apod.)
|
||||
print(f" CHYBA (item) | {subject[:50]} | {e}")
|
||||
logging.error("folder=%s | item_error | subject=%s | error=%s", current_path, subject, e)
|
||||
error_counter[0] += 1
|
||||
|
||||
msg = f"složka hotova: přeneseno {count} | skip {skipped}"
|
||||
print(f" → {msg}")
|
||||
info(conn, run_id, "folder_done", folder=current_path, detail=msg)
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, run_id, subfolder, current_path, counter, skipped_counter, error_counter)
|
||||
|
||||
|
||||
# --- MAIN ---
|
||||
print(f"=== inbox_full_sync v{SCRIPT_VERSION} ===")
|
||||
print(f"Start: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
run_id = start_run(conn)
|
||||
info(conn, run_id, "run_start", detail=f"script={SCRIPT_NAME} version={SCRIPT_VERSION}")
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
inbox = ns.GetDefaultFolder(6) # olFolderInbox — primární schránka, bez Online Archive
|
||||
mailbox_name = inbox.Parent.Name
|
||||
print(f"\nSchránka: {mailbox_name}")
|
||||
info(conn, run_id, "mailbox", detail=mailbox_name)
|
||||
|
||||
counter = [0]
|
||||
skipped_counter = [0]
|
||||
error_counter = [0]
|
||||
process_folder(conn, run_id, inbox, f"/{mailbox_name}", counter, skipped_counter, error_counter)
|
||||
|
||||
finish_run(conn, run_id,
|
||||
transferred=counter[0],
|
||||
skipped=skipped_counter[0],
|
||||
errors=error_counter[0])
|
||||
|
||||
summary = f"přeneseno {counter[0]} | skip {skipped_counter[0]} | chyby {error_counter[0]}"
|
||||
print(f"\n=== Hotovo: {summary} ===")
|
||||
info(conn, run_id, "run_done", detail=summary)
|
||||
|
||||
print("Uploaduji DB...")
|
||||
upload_db(conn, run_id)
|
||||
|
||||
print(f"Konec: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"Chyby logovány do: {LOG_PATH}")
|
||||
conn.close()
|
||||
@@ -0,0 +1,464 @@
|
||||
"""
|
||||
janssenpc_email_send_new v1.5
|
||||
Verze: 1.5.2
|
||||
Datum: 2026-06-01
|
||||
Popis: Prochází složky Inbox, Deleted Items a Sent Items v Outlooku (MAPI),
|
||||
ukládá emailové zprávy jako .msg soubory a uploaduje je na https://msgs.buzalka.cz.
|
||||
Zaznamenává zpracované zprávy do SQLite DB (jnjemails.db) a DB uploaduje na server
|
||||
jednou za 24 hodin (ne při každém běhu). Podporuje pokračování od posledního
|
||||
zpracovaného emailu (resume). Folder cesta obsahuje celé jméno schránky
|
||||
(např. /vbuzalka@its.jnj.com/Inbox). Chyby se logují do jnjemails_errors.log.
|
||||
v1.5: tracking entry_id, graph_id, is_read, jnj_folder.
|
||||
Sync průchod posledních 30 dní: detekce smazání, změny přečtení, přesunu složky.
|
||||
v1.5.2: SQLite logging — tabulky runs + log (flat event log per run).
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
DB_UPLOAD_MARKER = r"C:\Users\vbuzalka\SQLITE\jnjemails_last_db_upload.txt"
|
||||
DB_UPLOAD_INTERVAL_H = 24
|
||||
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
SYNC_DAYS = 30
|
||||
SCRIPT_NAME = "email_send"
|
||||
SCRIPT_VERSION = "1.5.2"
|
||||
|
||||
# olFolderInbox=6, olFolderDeletedItems=3, olFolderSentMail=5
|
||||
FOLDERS_TO_PROCESS = [6, 3, 5]
|
||||
|
||||
UPLOAD_LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_uploads.log"
|
||||
|
||||
logging.basicConfig(
|
||||
filename=LOG_PATH,
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
_upload_log = logging.getLogger("uploads")
|
||||
_upload_log.setLevel(logging.DEBUG)
|
||||
_uh = logging.FileHandler(UPLOAD_LOG_PATH, encoding="utf-8")
|
||||
_uh.setFormatter(logging.Formatter("%(asctime)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S"))
|
||||
_upload_log.addHandler(_uh)
|
||||
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now')),
|
||||
entry_id TEXT,
|
||||
graph_id TEXT,
|
||||
is_read INTEGER DEFAULT 0,
|
||||
jnj_folder TEXT
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS runs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
script TEXT NOT NULL,
|
||||
version TEXT,
|
||||
started_at TEXT NOT NULL,
|
||||
finished_at TEXT,
|
||||
transferred INTEGER DEFAULT 0,
|
||||
skipped INTEGER DEFAULT 0,
|
||||
sync_updated INTEGER DEFAULT 0,
|
||||
sync_deleted INTEGER DEFAULT 0,
|
||||
errors INTEGER DEFAULT 0
|
||||
)
|
||||
""")
|
||||
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS log (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(id),
|
||||
level TEXT NOT NULL,
|
||||
event TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
folder TEXT,
|
||||
graph_id TEXT,
|
||||
detail TEXT,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_log_run_id ON log(run_id)")
|
||||
|
||||
# Migrate existing DB
|
||||
for col, definition in [
|
||||
("entry_id", "TEXT"),
|
||||
("graph_id", "TEXT"),
|
||||
("is_read", "INTEGER DEFAULT 0"),
|
||||
("jnj_folder", "TEXT"),
|
||||
]:
|
||||
try:
|
||||
conn.execute(f"ALTER TABLE messages ADD COLUMN {col} {definition}")
|
||||
except Exception:
|
||||
pass
|
||||
conn.commit()
|
||||
|
||||
|
||||
def start_run(conn):
|
||||
cur = conn.execute(
|
||||
"INSERT INTO runs (script, version, started_at) VALUES (?, ?, datetime('now'))",
|
||||
(SCRIPT_NAME, SCRIPT_VERSION)
|
||||
)
|
||||
conn.commit()
|
||||
return cur.lastrowid
|
||||
|
||||
|
||||
def finish_run(conn, run_id, transferred, skipped, sync_updated=0, sync_deleted=0, errors=0):
|
||||
conn.execute("""
|
||||
UPDATE runs SET finished_at=datetime('now'),
|
||||
transferred=?, skipped=?, sync_updated=?, sync_deleted=?, errors=?
|
||||
WHERE id=?
|
||||
""", (transferred, skipped, sync_updated, sync_deleted, errors, run_id))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def db_log(conn, run_id, level, event, subject=None, folder=None, graph_id=None, detail=None):
|
||||
conn.execute("""
|
||||
INSERT INTO log (run_id, level, event, subject, folder, graph_id, detail)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
""", (run_id, level, event, subject, folder, graph_id, detail))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source,
|
||||
entry_id=None, graph_id=None, is_read=0):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages
|
||||
(message_id, subject, sender, received_at, folder, source,
|
||||
entry_id, graph_id, is_read, jnj_folder)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source,
|
||||
entry_id, graph_id, is_read, folder))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def _db_upload_due() -> bool:
|
||||
marker = Path(DB_UPLOAD_MARKER)
|
||||
if not marker.exists():
|
||||
return True
|
||||
try:
|
||||
last = datetime.fromisoformat(marker.read_text().strip())
|
||||
return (datetime.now() - last).total_seconds() >= DB_UPLOAD_INTERVAL_H * 3600
|
||||
except Exception:
|
||||
return True
|
||||
|
||||
|
||||
def _db_upload_mark():
|
||||
Path(DB_UPLOAD_MARKER).write_text(datetime.now().isoformat())
|
||||
|
||||
|
||||
def upload_db(db_path, force=False):
|
||||
if not force and not _db_upload_due():
|
||||
return
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
_db_upload_mark()
|
||||
|
||||
|
||||
def upload_msg(msg_path, filename, folder=""):
|
||||
_upload_log.info("UPLOAD %s | folder=%s", filename, folder)
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
data={"folder": folder},
|
||||
timeout=60
|
||||
)
|
||||
resp.raise_for_status()
|
||||
result = resp.json()
|
||||
_upload_log.info("RESPONSE %s | %s", filename, result)
|
||||
return result
|
||||
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
|
||||
def get_item_folder_path(item):
|
||||
"""Reconstruct full folder path from an Outlook item, e.g. /vbuzalka@its.jnj.com/Inbox/Sub."""
|
||||
parts = []
|
||||
obj = item.Parent
|
||||
while True:
|
||||
try:
|
||||
parts.insert(0, obj.Name)
|
||||
obj = obj.Parent
|
||||
except Exception:
|
||||
break
|
||||
return "/" + "/".join(parts)
|
||||
|
||||
|
||||
def process_folder(conn, run_id, folder, source, folder_path="", counter=None, error_counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
if error_counter is None:
|
||||
error_counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
subject = getattr(item, 'Subject', '?')
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except Exception:
|
||||
mid = None
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
graph_id = None
|
||||
try:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
result = upload_msg(tmp_path, safe_name, current_path)
|
||||
|
||||
status = result.get("status", "?")
|
||||
graph_id = result.get("graph_id")
|
||||
is_read = 0 if item.UnRead else 1
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
|
||||
save_to_db(conn, mid, subject, item.SenderEmailAddress,
|
||||
received, current_path, source,
|
||||
entry_id=item.EntryID, graph_id=graph_id, is_read=is_read)
|
||||
db_log(conn, run_id, "INFO", f"upload_{status}",
|
||||
subject=subject, folder=current_path, graph_id=graph_id)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
db_log(conn, run_id, "ERROR", "upload_error",
|
||||
subject=subject, folder=current_path, detail=str(e))
|
||||
raise
|
||||
|
||||
except Exception as e:
|
||||
sender = getattr(item, 'SenderEmailAddress', '?')
|
||||
received = getattr(item, 'ReceivedTime', '?')
|
||||
print(f" CHYBA | {subject[:40]} | {e}")
|
||||
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
|
||||
current_path, sender, received, subject, e)
|
||||
error_counter[0] += 1
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
|
||||
error_counter[0] += 1
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, run_id, subfolder, source, current_path, counter, error_counter)
|
||||
|
||||
|
||||
def sync_recent(conn, run_id, ns):
|
||||
"""Sync last SYNC_DAYS days: detect deleted emails, read-status changes, folder moves."""
|
||||
cutoff = (datetime.now() - timedelta(days=SYNC_DAYS)).isoformat()
|
||||
rows = conn.execute(
|
||||
"""SELECT message_id, entry_id, graph_id, is_read, jnj_folder
|
||||
FROM messages
|
||||
WHERE graph_id IS NOT NULL AND entry_id IS NOT NULL
|
||||
AND received_at > ?""",
|
||||
(cutoff,)
|
||||
).fetchall()
|
||||
|
||||
print(f"\n=== Sync průchod: {len(rows)} záznamů za posledních {SYNC_DAYS} dní ===")
|
||||
deleted = 0
|
||||
updated = 0
|
||||
errors = 0
|
||||
|
||||
for message_id, entry_id, graph_id, is_read, jnj_folder in rows:
|
||||
found = False
|
||||
current_read = None
|
||||
current_folder = None
|
||||
|
||||
try:
|
||||
item = ns.GetItemFromID(entry_id)
|
||||
current_read = 0 if item.UnRead else 1
|
||||
current_folder = get_item_folder_path(item)
|
||||
found = True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if found:
|
||||
read_changed = current_read != (is_read or 0)
|
||||
folder_changed = current_folder != (jnj_folder or "")
|
||||
|
||||
if not read_changed and not folder_changed:
|
||||
continue
|
||||
|
||||
payload = {"graph_id": graph_id}
|
||||
if read_changed:
|
||||
payload["is_read"] = bool(current_read)
|
||||
if folder_changed:
|
||||
payload["folder"] = current_folder
|
||||
|
||||
try:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/message-update",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
json=payload,
|
||||
timeout=30,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
result = resp.json()
|
||||
new_graph_id = result.get("graph_id", graph_id)
|
||||
conn.execute(
|
||||
"UPDATE messages SET is_read=?, jnj_folder=?, graph_id=? WHERE message_id=?",
|
||||
(current_read, current_folder, new_graph_id, message_id)
|
||||
)
|
||||
conn.commit()
|
||||
updated += 1
|
||||
|
||||
if read_changed:
|
||||
db_log(conn, run_id, "INFO", "sync_read_update", graph_id=new_graph_id,
|
||||
detail=f"{is_read} → {current_read}")
|
||||
if folder_changed:
|
||||
db_log(conn, run_id, "INFO", "sync_folder_move", graph_id=new_graph_id,
|
||||
folder=current_folder, detail=f"{jnj_folder} → {current_folder}")
|
||||
|
||||
changes = []
|
||||
if read_changed:
|
||||
changes.append(f"read={current_read}")
|
||||
if folder_changed:
|
||||
changes.append(f"folder={current_folder}")
|
||||
print(f" UPDATE | {', '.join(changes)}")
|
||||
|
||||
except Exception as e:
|
||||
db_log(conn, run_id, "ERROR", "sync_update_failed", graph_id=graph_id, detail=str(e))
|
||||
logging.error("sync update failed | graph_id=%s | error=%s", graph_id, e)
|
||||
errors += 1
|
||||
else:
|
||||
try:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/message-delete",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
json={"graph_id": graph_id},
|
||||
timeout=30,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
db_log(conn, run_id, "INFO", "sync_delete", graph_id=graph_id, folder=jnj_folder)
|
||||
conn.execute("DELETE FROM messages WHERE message_id=?", (message_id,))
|
||||
conn.commit()
|
||||
deleted += 1
|
||||
print(f" DELETE | graph_id={graph_id[:20]}...")
|
||||
except Exception as e:
|
||||
db_log(conn, run_id, "ERROR", "sync_delete_failed", graph_id=graph_id, detail=str(e))
|
||||
logging.error("sync delete failed | graph_id=%s | error=%s", graph_id, e)
|
||||
errors += 1
|
||||
|
||||
print(f" → sync hotov: {updated} aktualizováno | {deleted} smazáno | {errors} chyb")
|
||||
return updated, deleted, errors
|
||||
|
||||
|
||||
# --- MAIN ---
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
run_id = start_run(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
counter = [0]
|
||||
error_counter = [0]
|
||||
skipped_total = [0]
|
||||
|
||||
for folder_id in FOLDERS_TO_PROCESS:
|
||||
folder = ns.GetDefaultFolder(folder_id)
|
||||
mailbox_name = folder.Parent.Name
|
||||
print(f"\n=== {folder.Name} ({mailbox_name}) ===")
|
||||
process_folder(conn, run_id, folder, "mailbox", f"/{mailbox_name}", counter, error_counter)
|
||||
|
||||
sync_updated, sync_deleted, sync_errors = sync_recent(conn, run_id, ns)
|
||||
error_counter[0] += sync_errors
|
||||
|
||||
finish_run(conn, run_id,
|
||||
transferred=counter[0],
|
||||
skipped=skipped_total[0],
|
||||
sync_updated=sync_updated,
|
||||
sync_deleted=sync_deleted,
|
||||
errors=error_counter[0])
|
||||
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH, force=True)
|
||||
|
||||
conn.close()
|
||||
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
|
||||
@@ -0,0 +1,224 @@
|
||||
"""
|
||||
janssenpc_email_send v1.1
|
||||
Verze: 1.1
|
||||
Datum: 2026-05-28
|
||||
Popis: Prochází všechny složky Outlooku (MAPI), ukládá emailové zprávy jako .msg
|
||||
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
|
||||
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
|
||||
Podporuje pokračování od posledního zpracovaného emailu (resume).
|
||||
Nově: před startem zkontroluje jestli Outlook běží, pokud ne, spustí ho
|
||||
automaticky (cesta z registru) a počká na inicializaci MAPI.
|
||||
"""
|
||||
import win32com.client
|
||||
import requests
|
||||
import sqlite3
|
||||
import urllib3
|
||||
import subprocess
|
||||
import winreg
|
||||
import time
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
import tempfile
|
||||
import io
|
||||
import psutil
|
||||
|
||||
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
|
||||
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
|
||||
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
|
||||
|
||||
def is_outlook_running():
|
||||
return any(p.name().lower() == "outlook.exe" for p in psutil.process_iter())
|
||||
|
||||
def find_outlook_path():
|
||||
try:
|
||||
key = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE,
|
||||
r"SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\OUTLOOK.EXE")
|
||||
path, _ = winreg.QueryValueEx(key, "")
|
||||
winreg.CloseKey(key)
|
||||
return path
|
||||
except FileNotFoundError:
|
||||
return None
|
||||
|
||||
def ensure_outlook_running():
|
||||
outlook_path = find_outlook_path()
|
||||
print(f"Cesta k Outlooku: {outlook_path}")
|
||||
if is_outlook_running():
|
||||
print("Outlook již běží.")
|
||||
else:
|
||||
if not outlook_path:
|
||||
print("CHYBA: Outlook nenalezen v registru.")
|
||||
exit(1)
|
||||
print("Outlook neběží, spouštím...")
|
||||
subprocess.Popen([outlook_path])
|
||||
print("Čekám na inicializaci Outlooku", end="", flush=True)
|
||||
for _ in range(30):
|
||||
time.sleep(2)
|
||||
print(".", end="", flush=True)
|
||||
if is_outlook_running():
|
||||
break
|
||||
print()
|
||||
time.sleep(20) # dát čas MAPI vrstvě nastartovat
|
||||
|
||||
def init_db(conn):
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
message_id TEXT NOT NULL,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
received_at TEXT,
|
||||
folder TEXT,
|
||||
source TEXT,
|
||||
uploaded_at TEXT DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
|
||||
conn.commit()
|
||||
|
||||
def is_uploaded(conn, message_id):
|
||||
row = conn.execute(
|
||||
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
|
||||
).fetchone()
|
||||
return row is not None
|
||||
|
||||
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (message_id, subject, sender, received_at, folder, source))
|
||||
conn.commit()
|
||||
|
||||
def upload_db(db_path):
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"jnjemails_{timestamp}.db"
|
||||
with open(db_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
"https://msgs.buzalka.cz/upload-db",
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=60
|
||||
)
|
||||
print(f" DB upload: {resp.json()}")
|
||||
|
||||
def upload_msg(msg_path, filename):
|
||||
with open(msg_path, "rb") as f:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (filename, f, "application/octet-stream")},
|
||||
timeout=30
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["status"]
|
||||
|
||||
def get_folder_resume_date(conn, folder_path):
|
||||
row = conn.execute(
|
||||
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
|
||||
(folder_path,)
|
||||
).fetchone()
|
||||
if not row or not row[0]:
|
||||
return None
|
||||
last_dt = datetime.fromisoformat(row[0])
|
||||
return last_dt - timedelta(hours=1)
|
||||
|
||||
def process_folder(conn, folder, source, folder_path="", counter=None):
|
||||
if counter is None:
|
||||
counter = [0]
|
||||
|
||||
current_path = f"{folder_path}/{folder.Name}"
|
||||
|
||||
try:
|
||||
resume_dt = get_folder_resume_date(conn, current_path)
|
||||
|
||||
items = folder.Items
|
||||
|
||||
if resume_dt:
|
||||
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
|
||||
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
|
||||
items = folder.Items.Restrict(filter_str)
|
||||
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
|
||||
else:
|
||||
print(f"\n Složka: {current_path} | od začátku")
|
||||
|
||||
items.Sort("[ReceivedTime]", False)
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
for item in items:
|
||||
try:
|
||||
if not item.MessageClass.upper().startswith("IPM.NOTE"):
|
||||
continue
|
||||
|
||||
try:
|
||||
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
|
||||
except:
|
||||
mid = None
|
||||
|
||||
if not mid:
|
||||
mid = f"entryid:{item.EntryID}"
|
||||
|
||||
if is_uploaded(conn, mid):
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
safe_name = f"{item.EntryID[-20:]}.msg"
|
||||
tmp_path = Path(tmp) / safe_name
|
||||
item.SaveAs(str(tmp_path), 3)
|
||||
status = upload_msg(tmp_path, safe_name)
|
||||
|
||||
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
|
||||
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
|
||||
received, current_path, source)
|
||||
|
||||
counter[0] += 1
|
||||
count += 1
|
||||
|
||||
if counter[0] % 1000 == 0:
|
||||
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
print(f" {status.upper():6} | {item.Subject[:60]}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA | {getattr(item, 'Subject', '?')[:40]} | {e}")
|
||||
|
||||
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" CHYBA složka {current_path}: {e}")
|
||||
|
||||
for subfolder in folder.Folders:
|
||||
process_folder(conn, subfolder, source, current_path, counter)
|
||||
|
||||
# --- MAIN ---
|
||||
ensure_outlook_running()
|
||||
|
||||
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
init_db(conn)
|
||||
|
||||
outlook = win32com.client.Dispatch("Outlook.Application")
|
||||
ns = outlook.GetNamespace("MAPI")
|
||||
|
||||
for i in range(1, ns.Folders.Count + 1):
|
||||
root = ns.Folders.Item(i)
|
||||
|
||||
# if "Archive" in root.Name:
|
||||
# print(f"\n=== {root.Name} — přeskočeno ===")
|
||||
# continue
|
||||
|
||||
source = "mailbox"
|
||||
print(f"\n=== {root.Name} ({source}) ===")
|
||||
process_folder(conn, root, source)
|
||||
|
||||
# Finální DB upload po dokončení
|
||||
print("\nFinální upload DB...")
|
||||
upload_db(DB_PATH)
|
||||
|
||||
conn.close()
|
||||
print("\nHotovo.")
|
||||
@@ -0,0 +1,248 @@
|
||||
# Název: janssenpc_file_send.py
|
||||
# Verze: 2.1
|
||||
# Datum: 2026-05-28
|
||||
# Popis: Přejmenuje soubory ve složce ##JNJPrenos, odešle je na msgs.buzalka.cz
|
||||
# a přesune do podsložky Trash. Loguje průběh do file_send.log vedle skriptu.
|
||||
# Podporuje: PANORAMA Site Contacts (xlsx), Panorama Dashboard (xlsx),
|
||||
# Site Visit Report (xlsx), Follow-Up Letter (xlsx),
|
||||
# Clario MayoScore (csv), Clario MayoDiary (csv).
|
||||
|
||||
import os
|
||||
import time
|
||||
import shutil
|
||||
import requests
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload-dropbox"
|
||||
SOURCE_DIR = Path(r"C:\Users\vbuzalka\OneDrive - JNJ\##JNJPrenos")
|
||||
TRASH_DIR = SOURCE_DIR / "Trash"
|
||||
LOG_FILE = Path(__file__).parent / "file_send.log"
|
||||
|
||||
MAYO_DIARY_COLUMNS = [
|
||||
'Protocol', 'Country', 'Site', 'PI Name', 'Subject ID',
|
||||
'Report Date', 'Report Start Date/Time', 'Report End Date/Time',
|
||||
'Stool Frequency', 'Form Number', 'Role', 'Original Source',
|
||||
]
|
||||
|
||||
MAYO_SCORE_COLUMNS = [
|
||||
'Protocol', 'Study Population', 'Country', 'Site', 'Principal Investigator',
|
||||
'Participant ID', 'Baseline Stool Frequency', 'Visit', 'Visit Date',
|
||||
'Endoscopy Completed?', 'Central Endoscopy Score', 'Local Endoscopy Score',
|
||||
'Partial Mayo Score', 'Full Mayo Score',
|
||||
]
|
||||
|
||||
PANORAMA_COLUMNS = [
|
||||
'Part', 'Source', 'Sector', 'TA', 'Protocol ID', 'Interventional',
|
||||
'Region', 'Country Name', 'Institution Name', 'Site City',
|
||||
'Site Zip/Postal Code', 'Site Address', 'MSID', 'Site ID',
|
||||
'Site Status', 'SM Full Name', 'PI Name', 'St F Subj Enr Act',
|
||||
'ID', 'Category', 'Type', 'Priority', 'Severity', 'Description',
|
||||
'Brief Description - Subject ID', 'Comments', 'Created By',
|
||||
'Create Date', 'Last Modified Date', 'Start Date', 'Due Date',
|
||||
'End Date', 'Status', 'Days Outstanding', 'Action Taken',
|
||||
'Escalated To', 'Visit Report Status', 'Visit Report Approved',
|
||||
'Visit Report Type', 'Visit Report Status End Date', 'Active',
|
||||
'Association', 'Deviation', 'Deviation Closed Date', 'Reason For Exclusion'
|
||||
]
|
||||
|
||||
|
||||
def log(msg: str):
|
||||
ts = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
line = f"[{ts}] {msg}"
|
||||
print(line)
|
||||
with LOG_FILE.open("a", encoding="utf-8") as lf:
|
||||
lf.write(line + "\n")
|
||||
|
||||
|
||||
def move_to_trash(f: Path):
|
||||
TRASH_DIR.mkdir(exist_ok=True)
|
||||
dest = TRASH_DIR / f.name
|
||||
if dest.exists():
|
||||
ts = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
dest = TRASH_DIR / f"{f.stem}_{ts}{f.suffix}"
|
||||
shutil.move(str(f), dest)
|
||||
|
||||
|
||||
def get_timestamp(file_path: str) -> str:
|
||||
return datetime.fromtimestamp(os.path.getmtime(file_path)).strftime('%Y-%m-%d_%H-%M-%S')
|
||||
|
||||
|
||||
def prejmenuj(directory: Path) -> None:
|
||||
log(f"--- Přejmenování, adresář: {directory} ---")
|
||||
files = [f for f in directory.iterdir() if f.is_file()]
|
||||
log(f" Nalezeno souborů: {len(files)} — {[f.name for f in files]}")
|
||||
|
||||
for f in files:
|
||||
filename = f.name
|
||||
file_path = str(f)
|
||||
|
||||
# 0a. CLARIO MAYO DIARY (CSV)
|
||||
if 'MAYO-DIARY' in filename and filename.endswith('.csv'):
|
||||
log(f" Detekován MayoDiary: {filename}")
|
||||
try:
|
||||
df = pd.read_csv(file_path)
|
||||
missing = set(MAYO_DIARY_COLUMNS) - set(df.columns)
|
||||
if not missing:
|
||||
protocols = df['Protocol'].dropna().unique()
|
||||
log(f" Protocol: {list(protocols)}")
|
||||
if len(protocols) > 0:
|
||||
study = str(protocols[0]).strip()
|
||||
new_name = f"{get_timestamp(file_path)} {study} Clario MayoDiary.csv"
|
||||
f.rename(directory / new_name)
|
||||
log(f" ÚSPĚCH: -> '{new_name}'")
|
||||
else:
|
||||
log(f" VAROVÁNÍ: Sloupec Protocol je prázdný.")
|
||||
else:
|
||||
log(f" PŘESKOČENO: Chybí sloupce: {missing}")
|
||||
except Exception as e:
|
||||
log(f" CHYBA: {e}")
|
||||
continue
|
||||
|
||||
# 0b. CLARIO MAYO SCORE (CSV)
|
||||
if 'Custom.MayoScoreReport' in filename and filename.endswith('.csv'):
|
||||
log(f" Detekován MayoScore: {filename}")
|
||||
try:
|
||||
df = pd.read_csv(file_path)
|
||||
missing = set(MAYO_SCORE_COLUMNS) - set(df.columns)
|
||||
if not missing:
|
||||
protocols = df['Protocol'].dropna().unique()
|
||||
log(f" Protocol: {list(protocols)}")
|
||||
if len(protocols) > 0:
|
||||
study = str(protocols[0]).strip()
|
||||
new_name = f"{get_timestamp(file_path)} {study} Clario MayoScore.csv"
|
||||
f.rename(directory / new_name)
|
||||
log(f" ÚSPĚCH: -> '{new_name}'")
|
||||
else:
|
||||
log(f" VAROVÁNÍ: Sloupec Protocol je prázdný.")
|
||||
else:
|
||||
log(f" PŘESKOČENO: Chybí sloupce: {missing}")
|
||||
except Exception as e:
|
||||
log(f" CHYBA: {e}")
|
||||
continue
|
||||
|
||||
# Ostatní — jen xlsx
|
||||
if not filename.endswith('.xlsx'):
|
||||
log(f" Přeskočeno (neznámý typ): {filename}")
|
||||
continue
|
||||
|
||||
# 1a. PANORAMA SITE CONTACTS (XLSX) — soubor pojmenovaný "PANORAMA Dashboard"
|
||||
if 'PANORAMA Dashboard' in filename:
|
||||
log(f" Detekován PANORAMA Site Contacts: {filename}")
|
||||
try:
|
||||
with pd.ExcelFile(file_path) as xl:
|
||||
sheet_names = xl.sheet_names
|
||||
if 'Site Contacts' in sheet_names:
|
||||
df_a1 = xl.parse('Site Contacts', nrows=1, header=None)
|
||||
a1 = str(df_a1.iloc[0, 0]) if not df_a1.empty else ''
|
||||
else:
|
||||
a1 = None
|
||||
# soubor je nyní zavřen — přejmenování proběhne bez chyby
|
||||
if a1 is None:
|
||||
log(f" PŘESKOČENO: List 'Site Contacts' nenalezen.")
|
||||
elif 'Title: Site Contacts' in a1:
|
||||
new_name = f"{get_timestamp(file_path)} PANORAMA Site Contacts.xlsx"
|
||||
f.rename(directory / new_name)
|
||||
log(f" ÚSPĚCH: -> '{new_name}'")
|
||||
else:
|
||||
log(f" PŘESKOČENO: A1 neodpovídá vzoru ({a1[:50]})")
|
||||
except Exception as e:
|
||||
log(f" CHYBA: {e}")
|
||||
continue
|
||||
|
||||
# 1. PANORAMA DASHBOARD (XLSX)
|
||||
if 'Panorama Dashboard' in filename:
|
||||
log(f" Detekován Panorama: {filename}")
|
||||
try:
|
||||
df = pd.read_excel(file_path, skiprows=5)
|
||||
missing = set(PANORAMA_COLUMNS) - set(df.columns)
|
||||
if not missing:
|
||||
ids = df['Protocol ID'].dropna().unique()
|
||||
log(f" Protocol ID: {list(ids)}")
|
||||
if len(ids) > 0:
|
||||
study = str(ids[0]).strip()
|
||||
new_name = f"{get_timestamp(file_path)} {study} Panorama Deviations and Issues.xlsx"
|
||||
f.rename(directory / new_name)
|
||||
log(f" ÚSPĚCH: -> '{new_name}'")
|
||||
else:
|
||||
log(f" VAROVÁNÍ: Protocol ID je prázdný.")
|
||||
else:
|
||||
log(f" PŘESKOČENO: Chybí sloupce: {missing}")
|
||||
except Exception as e:
|
||||
log(f" CHYBA: {e}")
|
||||
continue
|
||||
|
||||
# 2. SITE VISIT REPORT A FOLLOW-UP LETTER (XLSX)
|
||||
try:
|
||||
df_a1 = pd.read_excel(file_path, nrows=1, header=None)
|
||||
if not df_a1.empty:
|
||||
a1 = str(df_a1.iloc[0, 0])
|
||||
log(f" A1: {a1[:80]}")
|
||||
is_site_visit = "Title: Site Visit Report Details" in a1
|
||||
is_follow_up = "Title: Follow-Up Letter Details" in a1
|
||||
|
||||
if is_site_visit or is_follow_up:
|
||||
suffix = "Site Visit Details.xlsx" if is_site_visit else "FUL details.xlsx"
|
||||
log(f" Detekován {'Site Visit' if is_site_visit else 'Follow-Up Letter'}: {filename}")
|
||||
df = pd.read_excel(file_path, skiprows=5)
|
||||
if 'Protocol ID' in df.columns:
|
||||
ids = df['Protocol ID'].dropna().unique()
|
||||
log(f" Protocol ID: {list(ids)}")
|
||||
if len(ids) > 0:
|
||||
study = str(ids[0]).strip()
|
||||
new_name = f"{get_timestamp(file_path)} {study} {suffix}"
|
||||
f.rename(directory / new_name)
|
||||
log(f" ÚSPĚCH: -> '{new_name}'")
|
||||
else:
|
||||
log(f" VAROVÁNÍ: Protocol ID je prázdný.")
|
||||
else:
|
||||
log(f" PŘESKOČENO: Chybí sloupec Protocol ID.")
|
||||
else:
|
||||
log(f" Přeskočeno (neznámý xlsx obsah): {filename}")
|
||||
except Exception as e:
|
||||
log(f" CHYBA: {e}")
|
||||
|
||||
log("--- Přejmenování dokončeno ---")
|
||||
|
||||
|
||||
# === HLAVNÍ LOGIKA ===
|
||||
|
||||
log("=== Spuštění ===")
|
||||
log(f"Zdrojový adresář: {SOURCE_DIR} (existuje: {SOURCE_DIR.exists()})")
|
||||
|
||||
# 1. Přejmenuj
|
||||
prejmenuj(SOURCE_DIR)
|
||||
|
||||
# 2. Počkej 10 vteřin
|
||||
log("Čekám 10 vteřin...")
|
||||
time.sleep(10)
|
||||
|
||||
# 3. Odešli soubory
|
||||
files = [f for f in SOURCE_DIR.iterdir() if f.is_file()]
|
||||
log(f"Souborů k odeslání: {len(files)}")
|
||||
for f in files:
|
||||
log(f" Nalezen: {f.name}")
|
||||
|
||||
if not files:
|
||||
log("Žádné soubory k odeslání.")
|
||||
else:
|
||||
for f in files:
|
||||
try:
|
||||
with f.open("rb") as fh:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (f.name, fh, "application/octet-stream")},
|
||||
timeout=120,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
status = resp.json().get('status', '?').upper()
|
||||
log(f" {status:10} | {f.name}")
|
||||
move_to_trash(f)
|
||||
log(f" PŘESUNUTO | {f.name} -> Trash")
|
||||
except Exception as e:
|
||||
log(f" CHYBA | {f.name} | {e}")
|
||||
|
||||
log("=== Hotovo ===")
|
||||
@@ -0,0 +1,59 @@
|
||||
import time
|
||||
import requests
|
||||
from pathlib import Path
|
||||
from watchdog.observers import Observer
|
||||
from watchdog.events import FileSystemEventHandler
|
||||
|
||||
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
|
||||
UPLOAD_URL = "https://msgs.buzalka.cz/upload-dropbox"
|
||||
SOURCE_DIR = Path(r"C:\Users\vbuzalka\OneDrive - JNJ\##JNJPrenos")
|
||||
|
||||
|
||||
def upload_file(f: Path):
|
||||
time.sleep(2)
|
||||
if not f.exists() or not f.is_file():
|
||||
return
|
||||
try:
|
||||
with f.open("rb") as fh:
|
||||
resp = requests.post(
|
||||
UPLOAD_URL,
|
||||
headers={"Authorization": f"Bearer {TOKEN}"},
|
||||
files={"file": (f.name, fh, "application/octet-stream")},
|
||||
timeout=120,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
print(f" UPLOADED | {f.name}")
|
||||
f.unlink()
|
||||
except Exception as e:
|
||||
print(f" CHYBA | {f.name} | {e}")
|
||||
|
||||
|
||||
class NewFileHandler(FileSystemEventHandler):
|
||||
def on_created(self, event):
|
||||
if event.is_directory:
|
||||
return
|
||||
upload_file(Path(event.src_path))
|
||||
|
||||
def on_moved(self, event):
|
||||
if event.is_directory:
|
||||
return
|
||||
upload_file(Path(event.dest_path))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Při startu odešli soubory, které už tam jsou
|
||||
for f in SOURCE_DIR.iterdir():
|
||||
if f.is_file():
|
||||
upload_file(f)
|
||||
|
||||
observer = Observer()
|
||||
observer.schedule(NewFileHandler(), str(SOURCE_DIR), recursive=False)
|
||||
observer.start()
|
||||
print(f"Hlídám: {SOURCE_DIR}")
|
||||
|
||||
try:
|
||||
while True:
|
||||
time.sleep(1)
|
||||
except KeyboardInterrupt:
|
||||
observer.stop()
|
||||
observer.join()
|
||||
@@ -0,0 +1,164 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
mcp_jnjemails.py | v1.0 | 2026-06-01
|
||||
MCP server pro dotazování SQLite DB jnjemails (EmailsImport pipeline).
|
||||
Automaticky načte nejnovější jnjemails_*.db z \\tower\JNJEMAILS\db\.
|
||||
"""
|
||||
import sqlite3
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from mcp.server.fastmcp import FastMCP
|
||||
|
||||
DB_DIR = Path(r"\\tower\JNJEMAILS\db")
|
||||
|
||||
|
||||
def log(msg: str):
|
||||
print(msg, file=sys.stderr, flush=True)
|
||||
|
||||
|
||||
def get_latest_db() -> Path:
|
||||
files = sorted(DB_DIR.glob("jnjemails_*.db"), key=lambda f: f.name)
|
||||
if not files:
|
||||
raise FileNotFoundError(f"Žádný jnjemails_*.db soubor v {DB_DIR}")
|
||||
return files[-1]
|
||||
|
||||
|
||||
def query(sql: str, params=()) -> list[dict]:
|
||||
db_path = get_latest_db()
|
||||
conn = sqlite3.connect(db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
try:
|
||||
cur = conn.execute(sql, params)
|
||||
return [dict(row) for row in cur.fetchall()]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
mcp = FastMCP("jnjemails")
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def db_info() -> dict:
|
||||
"""Informace o aktuální DB: cesta, velikost, název souboru."""
|
||||
db_path = get_latest_db()
|
||||
size_kb = db_path.stat().st_size // 1024
|
||||
return {
|
||||
"path": str(db_path),
|
||||
"file": db_path.name,
|
||||
"size_kb": size_kb,
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def last_run() -> list[dict]:
|
||||
"""Kompletní log posledního běhu (runs + log tabulky)."""
|
||||
return query("""
|
||||
SELECT r.script, r.version, r.started_at, r.finished_at,
|
||||
r.transferred, r.skipped, r.errors,
|
||||
l.level, l.event, l.subject, l.folder, l.detail, l.created_at
|
||||
FROM log l
|
||||
JOIN runs r ON r.id = l.run_id
|
||||
WHERE l.run_id = (SELECT MAX(id) FROM runs)
|
||||
ORDER BY l.created_at
|
||||
""")
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def list_runs(limit: int = 20) -> list[dict]:
|
||||
"""Přehled všech běhů skriptů (nejnovější nahoře)."""
|
||||
return query("""
|
||||
SELECT id, script, version, started_at, finished_at,
|
||||
transferred, skipped, errors
|
||||
FROM runs
|
||||
ORDER BY started_at DESC
|
||||
LIMIT ?
|
||||
""", (limit,))
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def errors_last_run() -> list[dict]:
|
||||
"""Chyby z posledního běhu."""
|
||||
return query("""
|
||||
SELECT l.event, l.subject, l.folder, l.detail, l.created_at
|
||||
FROM log l
|
||||
WHERE l.run_id = (SELECT MAX(id) FROM runs)
|
||||
AND l.level = 'ERROR'
|
||||
ORDER BY l.created_at
|
||||
""")
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def run_log(run_id: int) -> list[dict]:
|
||||
"""Log konkrétního běhu podle ID (z list_runs)."""
|
||||
return query("""
|
||||
SELECT level, event, subject, folder, graph_id, detail, created_at
|
||||
FROM log
|
||||
WHERE run_id = ?
|
||||
ORDER BY created_at
|
||||
""", (run_id,))
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def search_messages(subject: str = "", sender: str = "", folder: str = "", limit: int = 50) -> list[dict]:
|
||||
"""Hledání v přenesených emailech. Všechny parametry jsou volitelné (LIKE, case-insensitive)."""
|
||||
conditions = []
|
||||
params = []
|
||||
if subject:
|
||||
conditions.append("subject LIKE ?")
|
||||
params.append(f"%{subject}%")
|
||||
if sender:
|
||||
conditions.append("sender LIKE ?")
|
||||
params.append(f"%{sender}%")
|
||||
if folder:
|
||||
conditions.append("jnj_folder LIKE ?")
|
||||
params.append(f"%{folder}%")
|
||||
where = f"WHERE {' AND '.join(conditions)}" if conditions else ""
|
||||
params.append(limit)
|
||||
return query(f"""
|
||||
SELECT id, message_id, subject, sender, received_at,
|
||||
jnj_folder, graph_id, is_read, source, uploaded_at
|
||||
FROM messages
|
||||
{where}
|
||||
ORDER BY received_at DESC
|
||||
LIMIT ?
|
||||
""", params)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def messages_stats() -> dict:
|
||||
"""Statistiky: celkový počet emailů, podle source, podle složky (top 10)."""
|
||||
total = query("SELECT COUNT(*) as cnt FROM messages")[0]["cnt"]
|
||||
by_source = query("SELECT source, COUNT(*) as cnt FROM messages GROUP BY source ORDER BY cnt DESC")
|
||||
by_folder = query("""
|
||||
SELECT jnj_folder, COUNT(*) as cnt FROM messages
|
||||
WHERE jnj_folder IS NOT NULL
|
||||
GROUP BY jnj_folder
|
||||
ORDER BY cnt DESC
|
||||
LIMIT 10
|
||||
""")
|
||||
no_graph = query("SELECT COUNT(*) as cnt FROM messages WHERE graph_id IS NULL")[0]["cnt"]
|
||||
return {
|
||||
"total": total,
|
||||
"no_graph_id": no_graph,
|
||||
"by_source": by_source,
|
||||
"top_folders": by_folder,
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def sql(query_str: str) -> list[dict]:
|
||||
"""Libovolný SELECT dotaz proti DB. Pouze SELECT (ostatní jsou blokovány)."""
|
||||
stripped = query_str.strip().upper()
|
||||
if not stripped.startswith("SELECT"):
|
||||
return [{"error": "Povoleny jsou pouze SELECT dotazy."}]
|
||||
try:
|
||||
return query(query_str)
|
||||
except Exception as e:
|
||||
return [{"error": str(e)}]
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
db_path = get_latest_db()
|
||||
log(f"jnjemails MCP server — DB: {db_path.name}")
|
||||
mcp.run()
|
||||
@@ -0,0 +1,15 @@
|
||||
2026-06-01 14:58:02 | open failed [899C0000969E661B0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000002, and it could not be determined automatically.
|
||||
2026-06-01 14:59:10 | open failed [899C0000987BE65C0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 14:59:39 | open failed [899C00009A2FBD760000.msg]: Attachment method missing on attachment __attach_version1.0_#00000002, and it could not be determined automatically.
|
||||
2026-06-01 14:59:56 | open failed [899C00009B8A9A000000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:00:07 | open failed [899C00009C3844690000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:00:17 | open failed [899C00009C3844720000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:08:39 | open failed [899C0000A4683A4B0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:09:12 | open failed [899C0000A5CC64F40000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:09:32 | open failed [899C0000A5CC64FE0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:18:06 | open failed [899C0000B30157A10000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:18:08 | open failed [899C0000B30157A80000.msg]: Attachments of type data MUST have a data stream.
|
||||
2026-06-01 15:18:50 | open failed [899C0000B588253A0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000003, and it could not be determined automatically.
|
||||
2026-06-01 15:22:48 | open failed [899C0000BE168C140000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:24:07 | open failed [899C0000C0CF55D50000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
2026-06-01 15:24:18 | open failed [899C0000C1CA96890000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
|
||||
@@ -0,0 +1,322 @@
|
||||
# parse_emails_v1.0
|
||||
|
||||
**Název:** parse_emails_v1.0.py
|
||||
**Verze:** 1.0
|
||||
**Datum:** 2026-06-01
|
||||
**Autor:** vladimir.buzalka
|
||||
|
||||
---
|
||||
|
||||
## Účel
|
||||
|
||||
Jednorázový import všech `.msg` souborů do MongoDB. Z každého souboru extrahuje **všechny dostupné vlastnosti** — podobně jako EXIF u fotek.
|
||||
|
||||
- **DB:** `emaily`
|
||||
- **Kolekce:** `vbuzalka@its.jnj.com`
|
||||
- `_id` = Internet Message-ID (nebo `filename:<stem>` jako fallback)
|
||||
- Bezpečné přerušit a opakovat — upsert podle `_id`
|
||||
|
||||
---
|
||||
|
||||
## Zdroje dat
|
||||
|
||||
Výhradně `.msg` soubory — žádná závislost na SQLite ani jiné DB.
|
||||
|
||||
| Z každého .msg se extrahuje |
|
||||
|---|
|
||||
| Předmět, normalized subject |
|
||||
| Odesílatel (email, jméno, SMTP adresa) |
|
||||
| Příjemci To/CC/BCC — strukturovaně `[{type, email, name}]` |
|
||||
| Čas doručení a odeslání (UTC) |
|
||||
| Tělo plaintext + HTML (max 2 MB) |
|
||||
| Přílohy — metadata: jméno, velikost, MIME typ, inline flag |
|
||||
| Internet headers — X-Originating-IP, Received, DKIM, X-Mailer, ... |
|
||||
| MAPI: důležitost, citlivost, příznak, konverzační vlákno, kategorie |
|
||||
| In-Reply-To, References — pro rekonstrukci vlákna |
|
||||
| Všechny raw MAPI properties jako `{0xXXXX: value}` |
|
||||
|
||||
---
|
||||
|
||||
## Konfigurace
|
||||
|
||||
Konstanty přímo v kódu:
|
||||
|
||||
| Konstanta | Hodnota |
|
||||
|---|---|
|
||||
| `MSGS_DIR` | `\\tower\JNJEMAILS` |
|
||||
| `MONGO_URI` | `mongodb://192.168.1.76:27017` |
|
||||
| `MONGO_DB` | `emaily` |
|
||||
| `MONGO_COL` | `vbuzalka@its.jnj.com` |
|
||||
| `BATCH_SIZE` | 200 dokumentů na jeden bulk_write |
|
||||
| `LOG_FILE` | `parse_emails_errors.log` (vedle skriptu) |
|
||||
|
||||
---
|
||||
|
||||
## Spouštění
|
||||
|
||||
**Venv:** `U:\PythonProject\Janssen\.venv\Scripts\python.exe`
|
||||
|
||||
**1. spuštění — kompletní import:**
|
||||
```cmd
|
||||
"U:\PythonProject\Janssen\.venv\Scripts\python.exe" "U:\PythonProject\Janssen\EmailsImport\parse_emails_v1.0.py"
|
||||
```
|
||||
|
||||
**Pokračování po přerušení (druhý den):**
|
||||
```cmd
|
||||
"U:\PythonProject\Janssen\.venv\Scripts\python.exe" "U:\PythonProject\Janssen\EmailsImport\parse_emails_v1.0.py" --skip-existing
|
||||
```
|
||||
|
||||
**Test na malém vzorku:**
|
||||
```cmd
|
||||
"U:\PythonProject\Janssen\.venv\Scripts\python.exe" "U:\PythonProject\Janssen\EmailsImport\parse_emails_v1.0.py" --limit 50 --no-indexes
|
||||
```
|
||||
|
||||
### Všechny parametry
|
||||
|
||||
| Parametr | Popis |
|
||||
|---|---|
|
||||
| `--skip-existing` | Načte seznam hotových souborů z MongoDB a přeskočí je. Použij pro pokračování po přerušení. |
|
||||
| `--limit N` | Zpracuje jen prvních N souborů. Vhodné pro test. |
|
||||
| `--no-indexes` | Nevytváří indexy na konci. Použij pokud je přerušíš uprostřed — indexy vytvoř ručně až je vše hotové. |
|
||||
|
||||
---
|
||||
|
||||
## Průběh na konzoli
|
||||
|
||||
Každý email na jednom řádku:
|
||||
```
|
||||
1/69371 OK RE: Protocol deviation CZ10022 jan.novak@its.jnj.com
|
||||
2/69371 OK UCO3001: Draft FUL pro DD5-CZ10022 monitor@4gclinical.com
|
||||
3/69371 ERR ? ?
|
||||
```
|
||||
|
||||
Každých 500 emailů oddělovač s průběhem:
|
||||
```
|
||||
────────────────────────────────────────────────────────────────────────────────
|
||||
Průběh: ok=498 err=2 0.4 msg/s ETA 47h12m
|
||||
────────────────────────────────────────────────────────────────────────────────
|
||||
```
|
||||
|
||||
Na konci souhrn:
|
||||
```
|
||||
====================================================
|
||||
Vysledek: ok=69300 | skip=0 | err=71
|
||||
Celkovy cas: 47h 23m 10s
|
||||
Dokumentu v kolekci: 69300
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Struktura dokumentu v MongoDB
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "<message-id@domain>",
|
||||
"filename": "7A3F...0000.msg",
|
||||
|
||||
"subject": "RE: Protocol deviation CZ10022",
|
||||
"normalized_subject": "Protocol deviation CZ10022",
|
||||
"importance": 1,
|
||||
"sensitivity": 0,
|
||||
"flag_status": 0,
|
||||
"read_receipt_requested": false,
|
||||
"delivery_receipt_requested": false,
|
||||
"has_attachments": true,
|
||||
"attachment_count": 1,
|
||||
"message_size_bytes": 284512,
|
||||
|
||||
"conversation_topic": "Protocol deviation CZ10022",
|
||||
"conversation_index": "AcqX...",
|
||||
"in_reply_to": "<prev-id@domain>",
|
||||
"internet_references": ["<ref1@domain>", "<ref2@domain>"],
|
||||
"categories": ["UCO3001"],
|
||||
|
||||
"received_at": "2026-05-15T09:23:11",
|
||||
"sent_at": "2026-05-15T09:21:44",
|
||||
|
||||
"sender": {
|
||||
"email": "jan.novak@its.jnj.com",
|
||||
"name": "Novák Jan",
|
||||
"smtp": "jan.novak@its.jnj.com"
|
||||
},
|
||||
"to": "vladimir.buzalka@its.jnj.com",
|
||||
"cc": "petra.free@its.jnj.com",
|
||||
"bcc": "",
|
||||
"display_to": "Buzalka Vladimir",
|
||||
"display_cc": "Free Petra",
|
||||
"recipients": [
|
||||
{ "type": "to", "email": "vbuzalka@its.jnj.com", "name": "Buzalka Vladimir" },
|
||||
{ "type": "cc", "email": "petra.free@its.jnj.com", "name": "Free Petra" }
|
||||
],
|
||||
|
||||
"body_text": "Dobrý den,\n\nposílám...",
|
||||
"body_html": "<html>...",
|
||||
|
||||
"attachments": [
|
||||
{
|
||||
"filename": "PD_report_CZ10022.pdf",
|
||||
"size_bytes": 284512,
|
||||
"mime_type": "application/pdf",
|
||||
"content_id": null,
|
||||
"is_inline": false
|
||||
}
|
||||
],
|
||||
|
||||
"headers": {
|
||||
"message_id": "<xxx@domain>",
|
||||
"x_originating_ip": "10.24.1.55",
|
||||
"x_mailer": "Microsoft Outlook 16.0",
|
||||
"received": ["from SMTP01...", "from EX2019..."],
|
||||
"x_spam_status": "No"
|
||||
},
|
||||
|
||||
"mapi": {
|
||||
"0x0017": 1,
|
||||
"0x0036": 0,
|
||||
"0x0070": "Protocol deviation CZ10022",
|
||||
"0x1035": "<message-id@domain>",
|
||||
"0x1042": "<prev-id@domain>"
|
||||
},
|
||||
|
||||
"parsed_at": "2026-06-01T20:00:00"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hodnotové kódy
|
||||
|
||||
| Pole | Hodnota | Význam |
|
||||
|---|---|---|
|
||||
| `importance` | 0 | Nízká |
|
||||
| | 1 | Normální |
|
||||
| | 2 | Vysoká |
|
||||
| `sensitivity` | 0 | Normální |
|
||||
| | 1 | Osobní |
|
||||
| | 2 | Soukromé |
|
||||
| | 3 | Důvěrné |
|
||||
| `flag_status` | 0 | Bez příznaku |
|
||||
| | 1 | Označeno (follow up) |
|
||||
| | 2 | Dokončeno |
|
||||
|
||||
---
|
||||
|
||||
## MongoDB indexy
|
||||
|
||||
Automaticky vytvořeny na konci importu (`--no-indexes` přeskočí):
|
||||
|
||||
| Index | Pole |
|
||||
|---|---|
|
||||
| Chronologický | `received_at`, `sent_at` |
|
||||
| Odesílatel | `sender.email` |
|
||||
| Soubor | `filename` (unique) |
|
||||
| Konverzace | `conversation_topic` |
|
||||
| Filtry | `has_attachments`, `categories`, `importance`, `flag_status` |
|
||||
| Full-text | `subject` + `body_text` + `to` + `cc` (text index `text_search`) |
|
||||
|
||||
---
|
||||
|
||||
## Ukázkové dotazy (MongoDB shell / MCP)
|
||||
|
||||
**Emaily o UCO3001 s přílohou:**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({
|
||||
$text: { $search: "UCO3001" },
|
||||
has_attachments: true
|
||||
}).sort({ received_at: -1 })
|
||||
```
|
||||
|
||||
**Emaily od konkrétního odesílatele:**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({
|
||||
"sender.email": /covance/i
|
||||
}).sort({ received_at: -1 })
|
||||
```
|
||||
|
||||
**Celé konverzační vlákno:**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({
|
||||
conversation_topic: "Protocol deviation CZ10022"
|
||||
}).sort({ received_at: 1 })
|
||||
```
|
||||
|
||||
**Označené emaily (follow up):**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({ flag_status: 1 })
|
||||
```
|
||||
|
||||
**Vysoká priorita s přílohou:**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({
|
||||
importance: 2,
|
||||
has_attachments: true
|
||||
}).sort({ received_at: -1 })
|
||||
```
|
||||
|
||||
**Statistiky podle odesílatele (top 20):**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].aggregate([
|
||||
{ $group: { _id: "$sender.email", count: { $sum: 1 } } },
|
||||
{ $sort: { count: -1 } },
|
||||
{ $limit: 20 }
|
||||
])
|
||||
```
|
||||
|
||||
**Emaily s PDF přílohou:**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({
|
||||
"attachments.mime_type": "application/pdf"
|
||||
})
|
||||
```
|
||||
|
||||
**Hledání v těle emailu:**
|
||||
```javascript
|
||||
db["vbuzalka@its.jnj.com"].find({
|
||||
$text: { $search: "inactivation notification" }
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Chybový log
|
||||
|
||||
Soubory které selhaly jsou zalogrovány do `parse_emails_errors.log` vedle skriptu:
|
||||
```
|
||||
2026-06-01 20:14:33 | open failed [7A3F...0000.msg]: <důvod>
|
||||
2026-06-01 20:15:01 | extract_message failed [8B2C...0000.msg]: <důvod>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Výkon
|
||||
|
||||
| Parametr | Hodnota |
|
||||
|---|---|
|
||||
| Počet souborů | ~69 000 |
|
||||
| Rychlost | ~0.4 msg/s (síť SMB + htmlBody dekódování) |
|
||||
| Odhadovaný čas | 48 hodin (přes noc, s pokračováním) |
|
||||
| Batch size | 200 dokumentů / bulk_write |
|
||||
| Odhadovaná velikost DB | 2–5 GB |
|
||||
|
||||
---
|
||||
|
||||
## Závislosti
|
||||
|
||||
```
|
||||
extract-msg==0.55.0
|
||||
pymongo
|
||||
python-dateutil
|
||||
```
|
||||
|
||||
Instalace do venv:
|
||||
```cmd
|
||||
"U:\PythonProject\Janssen\.venv\Scripts\pip.exe" install extract-msg pymongo python-dateutil
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Historie verzí
|
||||
|
||||
| Verze | Datum | Změna |
|
||||
|---|---|---|
|
||||
| 1.0 | 2026-06-01 | Iniciální verze |
|
||||
@@ -0,0 +1,644 @@
|
||||
"""
|
||||
parse_emails_v1.0.py
|
||||
Nazev: parse_emails_v1.0.py
|
||||
Verze: 1.0
|
||||
Datum: 2026-06-01
|
||||
Autor: vladimir.buzalka
|
||||
|
||||
Popis:
|
||||
Parsuje vsechny .msg soubory z MSGS_DIR a importuje je jako dokumenty
|
||||
do MongoDB. Z kazdeho souboru extrahuje VSECHNY dostupne vlastnosti —
|
||||
podobne jako EXIF u fotek:
|
||||
|
||||
- predmet, odesilatel, prijemci (To/CC/BCC s typy)
|
||||
- cas doruceni a odeslani (UTC)
|
||||
- telo plaintext + HTML (max 2 MB)
|
||||
- prilohy (metadata: jmeno, velikost, MIME typ, inline flag)
|
||||
- internet headers (X-Originating-IP, Received, DKIM, ...)
|
||||
- MAPI vlastnosti: dulezitost, citlivost, priznak, konverzacni vlakno,
|
||||
kategorie, In-Reply-To, References, ...
|
||||
- vsechny raw MAPI properties jako {0xXXXX: value}
|
||||
|
||||
DB: emaily
|
||||
Kolekce: vbuzalka@its.jnj.com
|
||||
_id: Internet Message-ID (nebo "filename:<stem>" jako fallback)
|
||||
|
||||
Bezpecne prerusit a opakovat:
|
||||
- upsert podle _id — duplicity se automaticky prepisi
|
||||
- --skip-existing nacte seznam hotovych souboru z MongoDB a
|
||||
preskoci je => pokracovani po preruseni bez ztraty prace
|
||||
|
||||
Spousteni:
|
||||
python parse_emails_v1.0.py # kompletni import
|
||||
python parse_emails_v1.0.py --limit 50 # test na prvnich 50
|
||||
python parse_emails_v1.0.py --skip-existing # pokracovani po preruseni
|
||||
python parse_emails_v1.0.py --no-indexes # bez vytvoreni indexu na konci
|
||||
|
||||
Vystup na konzoli:
|
||||
Kazdy email na jednom radku:
|
||||
<poradi>/<celkem> OK/ERR <predmet 60 znaku> <odesilatel>
|
||||
Kazych 500 emailu: oddelovac s prubehem, rychlosti a ETA.
|
||||
Na konci: souhrn ok/skip/err, celkovy cas, pocet dokumentu v kolekci.
|
||||
|
||||
Zavislosti:
|
||||
pymongo, extract-msg, python-dateutil
|
||||
Python 3.10+, Windows nebo Linux
|
||||
Pristup k \\\\tower\\JNJEMAILS (SMB share)
|
||||
MongoDB na 192.168.1.76:27017
|
||||
|
||||
Struktura dokumentu v MongoDB:
|
||||
_id Internet Message-ID (nebo filename: fallback)
|
||||
filename jmeno .msg souboru (20znakovy hex + .msg)
|
||||
subject predmet zpravy
|
||||
normalized_subject predmet bez RE:/FW: prefixu
|
||||
importance 0=nizka 1=normalni 2=vysoka
|
||||
sensitivity 0=normalni 1=osobni 2=soukrome 3=duverne
|
||||
flag_status 0=bez priznaku 1=oznaceno 2=dokonceno
|
||||
read_receipt_requested bool
|
||||
delivery_receipt_requested bool
|
||||
has_attachments bool
|
||||
attachment_count int
|
||||
message_size_bytes velikost .msg souboru na disku
|
||||
conversation_topic tema vlakna (PR_CONVERSATION_TOPIC)
|
||||
conversation_index base64 PR_CONVERSATION_INDEX
|
||||
in_reply_to Message-ID predchozi zpravy
|
||||
internet_references [Message-ID] — cela historia vlakna
|
||||
categories [str] — MAPI kategorie / stitky
|
||||
read_receipt_requested bool
|
||||
delivery_receipt_requested bool
|
||||
received_at datetime UTC — cas doruceni
|
||||
sent_at datetime UTC — cas odeslani
|
||||
sender.email emailova adresa odesilatele
|
||||
sender.name zobrazovane jmeno odesilatele
|
||||
sender.smtp SMTP adresa (pro interni EX adresy)
|
||||
to retezec To (tak jak v Outlooku)
|
||||
cc retezec CC
|
||||
bcc retezec BCC
|
||||
display_to PR_DISPLAY_TO (zkraceny seznam)
|
||||
display_cc PR_DISPLAY_CC
|
||||
recipients [{type, email, name}] — to/cc/bcc s typy
|
||||
body_text plain text telo
|
||||
body_html HTML telo (max 2 MB, None pokud neni)
|
||||
attachments [{filename, size_bytes, mime_type,
|
||||
content_id, is_inline}]
|
||||
headers dict internet headers (lowercase_s_podtrzitky)
|
||||
mapi dict vsech raw MAPI properties {0xXXXX: value}
|
||||
parsed_at datetime UTC — cas parsovani
|
||||
|
||||
Indexy (vytvoreny automaticky na konci):
|
||||
received_at, sent_at, sender.email, filename (unique),
|
||||
conversation_topic, has_attachments, categories, importance,
|
||||
flag_status, text_search (subject + body_text + to + cc)
|
||||
|
||||
Chyby:
|
||||
Soubory ktere selhaly jsou zalogiovany do parse_emails_errors.log
|
||||
v adresari skriptu. Radek: timestamp | open/extract failed | duvod.
|
||||
|
||||
Historie verzi:
|
||||
1.0 2026-06-01 Inicialni verze
|
||||
"""
|
||||
|
||||
import sys
|
||||
import re
|
||||
import logging
|
||||
import argparse
|
||||
import base64
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional
|
||||
|
||||
import extract_msg
|
||||
from dateutil import parser as dtparser
|
||||
from pymongo import MongoClient, UpdateOne, ASCENDING, TEXT
|
||||
|
||||
if hasattr(sys.stdout, "reconfigure"):
|
||||
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
|
||||
|
||||
# ─── KONFIGURACE ──────────────────────────────────────────────────────────────
|
||||
MSGS_DIR = Path(r"\\tower\JNJEMAILS")
|
||||
MONGO_URI = "mongodb://192.168.1.76:27017"
|
||||
MONGO_DB = "emaily"
|
||||
MONGO_COL = "vbuzalka@its.jnj.com"
|
||||
BATCH_SIZE = 200
|
||||
LOG_FILE = Path(__file__).parent / "parse_emails_errors.log"
|
||||
SCRIPT_VERSION = "1.0"
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
logging.basicConfig(
|
||||
filename=str(LOG_FILE),
|
||||
level=logging.ERROR,
|
||||
format="%(asctime)s | %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
# ─── Pomocné funkce ───────────────────────────────────────────────────────────
|
||||
|
||||
def safe(obj, *attrs, default=None):
|
||||
"""Bezpecne cteni atributu — vrati prvni non-None hodnotu."""
|
||||
for attr in attrs:
|
||||
try:
|
||||
val = getattr(obj, attr, None)
|
||||
if val is None:
|
||||
continue
|
||||
if isinstance(val, str) and not val.strip():
|
||||
continue
|
||||
return val
|
||||
except Exception:
|
||||
continue
|
||||
return default
|
||||
|
||||
|
||||
def parse_date(raw) -> Optional[datetime]:
|
||||
"""Libovolny datum -> UTC datetime bez tzinfo (pro MongoDB)."""
|
||||
if raw is None:
|
||||
return None
|
||||
if isinstance(raw, datetime):
|
||||
if raw.tzinfo:
|
||||
return raw.astimezone(timezone.utc).replace(tzinfo=None)
|
||||
return raw
|
||||
try:
|
||||
dt = dtparser.parse(str(raw))
|
||||
if dt.tzinfo:
|
||||
return dt.astimezone(timezone.utc).replace(tzinfo=None)
|
||||
return dt
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def to_bson(val):
|
||||
"""Konvertuje hodnotu na BSON-serializovatelny typ."""
|
||||
if isinstance(val, bytes):
|
||||
return val.hex() if len(val) <= 128 else f"<bytes:{len(val)}>"
|
||||
if isinstance(val, datetime):
|
||||
return parse_date(val)
|
||||
if isinstance(val, (str, int, float, bool, type(None))):
|
||||
return val
|
||||
if isinstance(val, list):
|
||||
return [to_bson(v) for v in val]
|
||||
try:
|
||||
return int(val)
|
||||
except Exception:
|
||||
pass
|
||||
return str(val)
|
||||
|
||||
|
||||
# ─── Extrakce částí zprávy ────────────────────────────────────────────────────
|
||||
|
||||
def extract_headers(msg) -> dict:
|
||||
headers = {}
|
||||
try:
|
||||
hdr = msg.header
|
||||
if not hdr:
|
||||
return {}
|
||||
from email.header import decode_header as _dh
|
||||
|
||||
def _decode(v: str) -> str:
|
||||
try:
|
||||
parts = _dh(v)
|
||||
out = ""
|
||||
for part, enc in parts:
|
||||
out += part.decode(enc or "utf-8", errors="replace") if isinstance(part, bytes) else part
|
||||
return out
|
||||
except Exception:
|
||||
return v
|
||||
|
||||
for key in set(hdr.keys()):
|
||||
k = key.lower().replace("-", "_")
|
||||
vals = [_decode(v) for v in hdr.get_all(key, [])]
|
||||
headers[k] = vals if len(vals) > 1 else (vals[0] if vals else "")
|
||||
except Exception as e:
|
||||
logging.error("extract_headers: %s", e)
|
||||
return headers
|
||||
|
||||
|
||||
def extract_recipients(msg) -> list:
|
||||
result = []
|
||||
type_map = {1: "to", 2: "cc", 3: "bcc"}
|
||||
try:
|
||||
for r in msg.recipients:
|
||||
rtype = getattr(r, "type", 1)
|
||||
try:
|
||||
rtype = int(rtype)
|
||||
except Exception:
|
||||
try:
|
||||
rtype = int(rtype.value)
|
||||
except Exception:
|
||||
rtype = 1
|
||||
rec = {
|
||||
"type": type_map.get(rtype, "to"),
|
||||
"email": safe(r, "email", default=""),
|
||||
"name": safe(r, "name", default=""),
|
||||
}
|
||||
result.append(rec)
|
||||
except Exception as e:
|
||||
logging.error("extract_recipients: %s", e)
|
||||
return result
|
||||
|
||||
|
||||
def extract_attachments(msg) -> list:
|
||||
result = []
|
||||
try:
|
||||
for att in msg.attachments:
|
||||
fname = safe(att, "longFilename", "shortFilename", default="")
|
||||
if not fname:
|
||||
continue
|
||||
size = 0
|
||||
try:
|
||||
d = att.data
|
||||
size = len(d) if d else 0
|
||||
except Exception:
|
||||
pass
|
||||
result.append({
|
||||
"filename": fname,
|
||||
"size_bytes": size,
|
||||
"mime_type": safe(att, "mimetype", "mimeType", default="application/octet-stream"),
|
||||
"content_id": safe(att, "cid", default=None),
|
||||
"is_inline": bool(safe(att, "isInline", default=False)),
|
||||
})
|
||||
except Exception as e:
|
||||
logging.error("extract_attachments: %s", e)
|
||||
return result
|
||||
|
||||
|
||||
def extract_mapi_props(msg) -> dict:
|
||||
"""Vsechny raw MAPI properties jako {0xXXXX: value}."""
|
||||
result = {}
|
||||
try:
|
||||
props = msg.props
|
||||
if not hasattr(props, "items"):
|
||||
return {}
|
||||
for key, prop in props.items():
|
||||
try:
|
||||
val = to_bson(prop.value)
|
||||
prop_id = f"0x{key[:4].upper()}" if len(key) >= 4 else f"0x{key.upper()}"
|
||||
result[prop_id] = val
|
||||
except Exception:
|
||||
pass
|
||||
except Exception as e:
|
||||
logging.error("extract_mapi_props: %s", e)
|
||||
return result
|
||||
|
||||
|
||||
# ─── Hlavní extrakce ─────────────────────────────────────────────────────────
|
||||
|
||||
def extract_message(msg_path: Path) -> Optional[dict]:
|
||||
"""Parsuje jeden .msg soubor -> MongoDB dokument."""
|
||||
try:
|
||||
msg = extract_msg.Message(str(msg_path))
|
||||
except Exception as e:
|
||||
logging.error("open failed [%s]: %s", msg_path.name, e)
|
||||
return None
|
||||
|
||||
try:
|
||||
# ── Message-ID ────────────────────────────────────────────────
|
||||
mid = None
|
||||
for attr in ("messageId", "message_id", "internetMessageId"):
|
||||
mid = safe(msg, attr)
|
||||
if mid:
|
||||
break
|
||||
if not mid:
|
||||
mid = f"filename:{msg_path.stem}"
|
||||
mid = str(mid).strip()
|
||||
|
||||
# ── Předmět ───────────────────────────────────────────────────
|
||||
try:
|
||||
subject = msg.subject or ""
|
||||
except Exception:
|
||||
subject = ""
|
||||
|
||||
normalized_subject = safe(msg, "normalizedSubject", "normalized_subject", default="")
|
||||
|
||||
# ── Tělo ──────────────────────────────────────────────────────
|
||||
try:
|
||||
body_text = msg.body or ""
|
||||
except Exception:
|
||||
body_text = ""
|
||||
|
||||
body_html = None
|
||||
try:
|
||||
bh = msg.htmlBody
|
||||
if isinstance(bh, bytes):
|
||||
bh = bh.decode("utf-8", errors="replace")
|
||||
if bh:
|
||||
body_html = bh if len(bh) <= 2 * 1024 * 1024 else bh[:2 * 1024 * 1024]
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# ── Odesílatel ────────────────────────────────────────────────
|
||||
try:
|
||||
sender_email = msg.sender or ""
|
||||
except Exception:
|
||||
sender_email = ""
|
||||
|
||||
sender_name = safe(msg, "senderName", "sender_name", default="")
|
||||
sender_smtp = safe(msg, "senderSmtpAddress", "sent_representing_smtp_address", default="")
|
||||
|
||||
# ── Příjemci ──────────────────────────────────────────────────
|
||||
recipients = extract_recipients(msg)
|
||||
|
||||
try:
|
||||
to_raw = msg.to or ""
|
||||
except Exception:
|
||||
to_raw = ""
|
||||
try:
|
||||
cc_raw = msg.cc or ""
|
||||
except Exception:
|
||||
cc_raw = ""
|
||||
try:
|
||||
bcc_raw = getattr(msg, "bcc", None) or ""
|
||||
except Exception:
|
||||
bcc_raw = ""
|
||||
|
||||
display_to = safe(msg, "displayTo", "display_to", default="")
|
||||
display_cc = safe(msg, "displayCc", "display_cc", default="")
|
||||
|
||||
# ── Časy ──────────────────────────────────────────────────────
|
||||
try:
|
||||
received_at = parse_date(msg.date)
|
||||
except Exception:
|
||||
received_at = None
|
||||
|
||||
sent_at = None
|
||||
for attr in ("clientSubmitTime", "client_submit_time", "sentOn"):
|
||||
v = safe(msg, attr)
|
||||
if v:
|
||||
sent_at = parse_date(v)
|
||||
break
|
||||
|
||||
# ── MAPI vlastnosti ───────────────────────────────────────────
|
||||
importance = 1
|
||||
try:
|
||||
v = msg.importance
|
||||
if v is not None:
|
||||
importance = int(v)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
sensitivity = 0
|
||||
try:
|
||||
v = getattr(msg, "sensitivity", None)
|
||||
if v is not None:
|
||||
sensitivity = int(v)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
flag_status = 0
|
||||
try:
|
||||
v = safe(msg, "flagStatus", "flag_status")
|
||||
if v is not None:
|
||||
flag_status = int(v)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
conversation_topic = safe(msg, "conversationTopic", "conversation_topic", default="")
|
||||
|
||||
conversation_index = ""
|
||||
try:
|
||||
ci = safe(msg, "conversationIndex", "conversation_index")
|
||||
if isinstance(ci, bytes):
|
||||
conversation_index = base64.b64encode(ci).decode()
|
||||
elif ci:
|
||||
conversation_index = str(ci)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
in_reply_to = safe(msg, "inReplyTo", "in_reply_to", default="")
|
||||
|
||||
internet_refs = []
|
||||
try:
|
||||
refs = safe(msg, "internetReferences", "internet_references")
|
||||
if isinstance(refs, list):
|
||||
internet_refs = refs
|
||||
elif isinstance(refs, str) and refs:
|
||||
internet_refs = [r.strip() for r in refs.split() if r.strip()]
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
categories = []
|
||||
try:
|
||||
cats = safe(msg, "categories")
|
||||
if isinstance(cats, list):
|
||||
categories = [str(c) for c in cats if c]
|
||||
elif isinstance(cats, str) and cats:
|
||||
categories = [c.strip() for c in re.split(r"[;,]", cats) if c.strip()]
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
read_receipt = bool(safe(msg, "readReceiptRequested", "read_receipt_requested", default=False))
|
||||
delivery_receipt = bool(safe(msg, "deliveryReceiptRequested", "delivery_receipt_requested", default=False))
|
||||
|
||||
# ── Internet headers ──────────────────────────────────────────
|
||||
headers = extract_headers(msg)
|
||||
|
||||
if not in_reply_to:
|
||||
in_reply_to = headers.get("in_reply_to", "")
|
||||
if not internet_refs:
|
||||
refs_str = headers.get("references", "")
|
||||
if isinstance(refs_str, str) and refs_str:
|
||||
internet_refs = [r.strip() for r in refs_str.split() if r.strip()]
|
||||
|
||||
# ── Přílohy ───────────────────────────────────────────────────
|
||||
attachments = extract_attachments(msg)
|
||||
|
||||
# ── Raw MAPI ──────────────────────────────────────────────────
|
||||
mapi_raw = extract_mapi_props(msg)
|
||||
|
||||
msg.close()
|
||||
|
||||
# ── Dokument ──────────────────────────────────────────────────
|
||||
return {
|
||||
"_id": mid,
|
||||
"filename": msg_path.name,
|
||||
|
||||
"subject": subject,
|
||||
"normalized_subject": normalized_subject,
|
||||
"importance": importance,
|
||||
"sensitivity": sensitivity,
|
||||
"flag_status": flag_status,
|
||||
"read_receipt_requested": read_receipt,
|
||||
"delivery_receipt_requested": delivery_receipt,
|
||||
"has_attachments": len(attachments) > 0,
|
||||
"attachment_count": len(attachments),
|
||||
"message_size_bytes": msg_path.stat().st_size,
|
||||
|
||||
"conversation_topic": conversation_topic,
|
||||
"conversation_index": conversation_index,
|
||||
"in_reply_to": in_reply_to,
|
||||
"internet_references": internet_refs,
|
||||
"categories": categories,
|
||||
|
||||
"received_at": received_at,
|
||||
"sent_at": sent_at,
|
||||
|
||||
"sender": {
|
||||
"email": sender_email,
|
||||
"name": sender_name,
|
||||
"smtp": sender_smtp,
|
||||
},
|
||||
"to": to_raw,
|
||||
"cc": cc_raw,
|
||||
"bcc": bcc_raw,
|
||||
"display_to": display_to,
|
||||
"display_cc": display_cc,
|
||||
"recipients": recipients,
|
||||
|
||||
"body_text": body_text,
|
||||
"body_html": body_html,
|
||||
|
||||
"attachments": attachments,
|
||||
"headers": headers,
|
||||
"mapi": mapi_raw,
|
||||
|
||||
"parsed_at": datetime.now(timezone.utc).replace(tzinfo=None),
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logging.error("extract_message failed [%s]: %s", msg_path.name, e)
|
||||
return None
|
||||
|
||||
|
||||
# ─── MongoDB indexy ───────────────────────────────────────────────────────────
|
||||
|
||||
def create_indexes(col):
|
||||
print(" Vytvarim indexy...")
|
||||
col.create_index([("received_at", ASCENDING)])
|
||||
col.create_index([("sent_at", ASCENDING)])
|
||||
col.create_index([("sender.email", ASCENDING)])
|
||||
col.create_index([("filename", ASCENDING)], unique=True, sparse=True)
|
||||
col.create_index([("conversation_topic", ASCENDING)])
|
||||
col.create_index([("has_attachments", ASCENDING)])
|
||||
col.create_index([("categories", ASCENDING)])
|
||||
col.create_index([("importance", ASCENDING)])
|
||||
col.create_index([("flag_status", ASCENDING)])
|
||||
col.create_index([
|
||||
("subject", TEXT),
|
||||
("body_text", TEXT),
|
||||
("to", TEXT),
|
||||
("cc", TEXT),
|
||||
], name="text_search", default_language="none")
|
||||
print(" Indexy hotovy.")
|
||||
|
||||
|
||||
# ─── MAIN ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description=f"parse_emails v{SCRIPT_VERSION}")
|
||||
ap.add_argument("--msgs-dir", default=str(MSGS_DIR),
|
||||
help="Cesta k .msg souborum")
|
||||
ap.add_argument("--limit", type=int, default=0,
|
||||
help="Zpracovat max N souboru (0 = vse)")
|
||||
ap.add_argument("--skip-existing", action="store_true",
|
||||
help="Preskocit soubory ktere jiz jsou v MongoDB (pokracovani)")
|
||||
ap.add_argument("--no-indexes", action="store_true",
|
||||
help="Nevytvorit indexy na konci")
|
||||
args = ap.parse_args()
|
||||
|
||||
msgs_dir = Path(args.msgs_dir)
|
||||
start = datetime.now()
|
||||
|
||||
print(f"=== parse_emails v{SCRIPT_VERSION} ===")
|
||||
print(f"Start: {start.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"Zdroj: {msgs_dir}")
|
||||
print(f"MongoDB: {MONGO_URI} -> {MONGO_DB}.{MONGO_COL}")
|
||||
|
||||
# MongoDB
|
||||
client = MongoClient(MONGO_URI, serverSelectionTimeoutMS=5000)
|
||||
try:
|
||||
client.admin.command("ping")
|
||||
print(" MongoDB OK")
|
||||
except Exception as e:
|
||||
print(f" CHYBA: MongoDB neni dostupna -- {e}")
|
||||
sys.exit(1)
|
||||
|
||||
col = client[MONGO_DB][MONGO_COL]
|
||||
|
||||
# Skip existing — nacti seznam uz importovanych souboru
|
||||
existing: set = set()
|
||||
if args.skip_existing:
|
||||
print(" Nacitam existujici zaznamy z MongoDB...")
|
||||
existing = set(col.distinct("filename"))
|
||||
print(f" {len(existing)} jiz importovano")
|
||||
|
||||
# Scan
|
||||
print(f"\nSkenuji {msgs_dir} ...")
|
||||
all_files = sorted(msgs_dir.glob("*.msg"))
|
||||
if args.limit:
|
||||
all_files = all_files[:args.limit]
|
||||
|
||||
to_process = [f for f in all_files if f.name not in existing]
|
||||
skipped = len(all_files) - len(to_process)
|
||||
total = len(to_process)
|
||||
|
||||
print(f" Celkem .msg: {len(all_files)}")
|
||||
print(f" Preskoceno: {skipped}")
|
||||
print(f" Ke zpracovani: {total}\n")
|
||||
|
||||
if total == 0:
|
||||
print("Neni co importovat.")
|
||||
client.close()
|
||||
return
|
||||
|
||||
batch = []
|
||||
ok_count = 0
|
||||
err_count = 0
|
||||
|
||||
def flush():
|
||||
if not batch:
|
||||
return
|
||||
try:
|
||||
col.bulk_write(batch, ordered=False)
|
||||
except Exception as e:
|
||||
logging.error("bulk_write: %s", e)
|
||||
print(f" CHYBA bulk_write: {e}")
|
||||
batch.clear()
|
||||
|
||||
for i, msg_path in enumerate(to_process, 1):
|
||||
doc = extract_message(msg_path)
|
||||
|
||||
if doc is None:
|
||||
err_count += 1
|
||||
else:
|
||||
batch.append(UpdateOne({"_id": doc["_id"]}, {"$set": doc}, upsert=True))
|
||||
ok_count += 1
|
||||
|
||||
if len(batch) >= BATCH_SIZE:
|
||||
flush()
|
||||
|
||||
# Výpis každého emailu
|
||||
status = "ERR " if doc is None else "OK "
|
||||
subject_str = (doc.get("subject") or "")[:60] if doc else "?"
|
||||
sender_str = (doc.get("sender", {}).get("email") or "")[:40] if doc else "?"
|
||||
print(f" {i:>6}/{total} {status} {subject_str:<60} {sender_str}")
|
||||
|
||||
if i % 500 == 0:
|
||||
elapsed = (datetime.now() - start).total_seconds()
|
||||
rate = i / elapsed if elapsed > 0 else 0
|
||||
eta_s = int((total - i) / rate) if rate > 0 else 0
|
||||
print(f" {'─'*80}")
|
||||
print(f" Průběh: ok={ok_count} err={err_count} "
|
||||
f"{rate:.1f} msg/s ETA {eta_s//3600}h{(eta_s%3600)//60}m")
|
||||
print(f" {'─'*80}")
|
||||
|
||||
flush()
|
||||
|
||||
elapsed_total = (datetime.now() - start).total_seconds()
|
||||
print(f"\n{'='*52}")
|
||||
print(f"Vysledek: ok={ok_count} | skip={skipped} | err={err_count}")
|
||||
print(f"Celkovy cas: {int(elapsed_total//3600)}h {int((elapsed_total%3600)//60)}m {int(elapsed_total%60)}s")
|
||||
print(f"Dokumentu v kolekci: {col.count_documents({})}")
|
||||
|
||||
if not args.no_indexes:
|
||||
print()
|
||||
create_indexes(col)
|
||||
|
||||
print(f"\nKonec: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
if err_count:
|
||||
print(f"Chyby logovany do: {LOG_FILE}")
|
||||
|
||||
client.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,93 @@
|
||||
import sqlite3
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
DB_DIR = r"\\tower\JNJEMAILS\db"
|
||||
MSG_DIR = r"\\tower\JNJEMAILS"
|
||||
|
||||
|
||||
def find_latest_db(db_dir):
|
||||
dbs = sorted(Path(db_dir).glob("*.db"), key=lambda p: p.stat().st_mtime)
|
||||
if not dbs:
|
||||
raise FileNotFoundError(f"Žádná .db databáze v {db_dir}")
|
||||
return dbs[-1]
|
||||
|
||||
|
||||
def count_msg_files(msg_dir):
|
||||
return sum(1 for f in Path(msg_dir).iterdir() if f.suffix.lower() == ".msg")
|
||||
|
||||
|
||||
def stats(db_path):
|
||||
con = sqlite3.connect(db_path)
|
||||
cur = con.cursor()
|
||||
|
||||
total = cur.execute("SELECT COUNT(*) FROM messages").fetchone()[0]
|
||||
|
||||
date_range = cur.execute(
|
||||
"SELECT MIN(received_at), MAX(received_at) FROM messages WHERE received_at IS NOT NULL"
|
||||
).fetchone()
|
||||
|
||||
top_senders = cur.execute(
|
||||
"SELECT sender, COUNT(*) AS n FROM messages GROUP BY sender ORDER BY n DESC LIMIT 10"
|
||||
).fetchall()
|
||||
|
||||
by_folder = cur.execute(
|
||||
"SELECT folder, COUNT(*) AS n FROM messages GROUP BY folder ORDER BY n DESC"
|
||||
).fetchall()
|
||||
|
||||
by_source = cur.execute(
|
||||
"SELECT source, COUNT(*) AS n FROM messages GROUP BY source ORDER BY n DESC"
|
||||
).fetchall()
|
||||
|
||||
by_month = cur.execute(
|
||||
"""SELECT SUBSTR(received_at, 1, 7) AS month, COUNT(*) AS n
|
||||
FROM messages WHERE received_at IS NOT NULL
|
||||
GROUP BY month ORDER BY month"""
|
||||
).fetchall()
|
||||
|
||||
con.close()
|
||||
return total, date_range, top_senders, by_folder, by_source, by_month
|
||||
|
||||
|
||||
def main():
|
||||
db_path = find_latest_db(DB_DIR)
|
||||
print(f"Databáze: {db_path.name}")
|
||||
print(f"Velikost: {db_path.stat().st_size / 1024 / 1024:.1f} MB")
|
||||
|
||||
msg_count = count_msg_files(MSG_DIR)
|
||||
print(f".msg souborů ve složce: {msg_count:,}")
|
||||
|
||||
total, date_range, top_senders, by_folder, by_source, by_month = stats(db_path)
|
||||
|
||||
print(f"\n{'-'*50}")
|
||||
print(f" Emailu v databazi: {total:,}")
|
||||
if date_range[0]:
|
||||
print(f" Nejstarsi: {date_range[0]}")
|
||||
print(f" Nejnovejsi: {date_range[1]}")
|
||||
|
||||
if by_folder:
|
||||
print(f"\n Slozky:")
|
||||
for folder, n in by_folder:
|
||||
print(f" {folder or '(bez slozky)':<35} {n:>6,}")
|
||||
|
||||
if by_source:
|
||||
print(f"\n Zdroje (source):")
|
||||
for src, n in by_source:
|
||||
print(f" {src or '(prazdny)':<35} {n:>6,}")
|
||||
|
||||
if by_month:
|
||||
print(f"\n Emaily po mesicich:")
|
||||
for month, n in by_month:
|
||||
bar = "#" * min(n // 20, 40)
|
||||
print(f" {month} {bar:<40} {n:>5,}")
|
||||
|
||||
if top_senders:
|
||||
print(f"\n Top 10 odesilatelU:")
|
||||
for sender, n in top_senders:
|
||||
print(f" {(sender or '(neznamy)')[:50]:<52} {n:>5,}")
|
||||
|
||||
print(f"{'-'*50}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user