Initial commit — clean history (removed large test files, browser profiles, Medidata/Clario downloads)

This commit is contained in:
2026-06-01 15:36:31 +02:00
commit bb604e593e
1304 changed files with 116480 additions and 0 deletions
+3
View File
@@ -0,0 +1,3 @@
DROPBOX_APP_KEY=4scysbfek6ddwwm
DROPBOX_APP_SECRET=gn9ph1q3oro2nq0
DROPBOX_APP_REFRESH_TOKEN=VShbST3VjUgAAAAAAAAAAXeZZzFLns6eE80-VJKIc5oq61PyXW6sCx9Dw5kM1w8c
+45
View File
@@ -0,0 +1,45 @@
# msgreceiver — build & deploy na Unraid
## Umístění na Unraidu
- Appdata: `/mnt/user/appdata/msgreceiver/` (síťově `\\tower\appdata\msgreceiver\`)
- Emaily: `/mnt/user/JNJEMAILS` (mount jako `/msgs` v kontejneru)
## Kopírování souborů z Windows
Všechny soubory z `U:\janssen\EmailsImport\DockerCustomApp\` nakopírovat do `\\tower\appdata\msgreceiver\`.
**DŮLEŽITÉ:** Po každé změně `app.py` je nutný rebuild a restart kontejneru (viz níže). Bez toho běží stará verze.
## Build & restart (SSH)
```bash
# Připojení: ssh root@192.168.1.76, heslo: 7309208104
# Nebo přes paramiko v Pythonu (viz EmailsImport skripty)
cd /mnt/user/appdata/msgreceiver
docker build -t msgreceiver .
docker stop msgreceiver
docker rm msgreceiver
docker run -d --name msgreceiver \
-p 8765:8765 \
-v /mnt/user/JNJEMAILS:/msgs \
--restart unless-stopped \
msgreceiver
```
## Kontejner
- Port: 8765
- Restart policy: unless-stopped
- Endpointy:
- `/upload` (msg + volitelný `folder` → uloží na disk + import do Graph API)
- `/upload-db` (db → /msgs/db, maže staré)
- `/upload-dropbox` (soubory do Dropboxu)
- Auth: Bearer token v app.py
- Dropbox credentials: v `.env` uvnitř image
- Graph API credentials: přímo v app.py (Mail.ReadWrite + Mail.Send, tenant TrialHelp s.r.o.)
## Graph import
Při uploadu .msg s parametrem `folder` (plná cesta z JNJ Outlooku) server:
1. Uloží .msg na disk
2. Parsuje .msg a importuje do schránky `vladimir.buzalka@buzalka.cz` do `Inbox/JNJ/...`
3. Složky se vytvářejí automaticky, mapování: `/vbuzalka@its.jnj.com/X``JNJ/X`, `/Online Archive.../X``JNJ/Online Archive/X`
Klient v1.4 (`janssenpc_email_send_new_v1.4.py`) posílá `folder` automaticky.
+7
View File
@@ -0,0 +1,7 @@
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
COPY .env .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8765"]
+63
View File
@@ -0,0 +1,63 @@
# msgreceiver — deployment instrukce
## Soubory
- Zdrojový skript: `U:\PythonProject\Janssen\EmailsImport\DockerCustomApp\app.py`
- Network share: `\\tower\appdata\msgreceiver\app.py`
- Unraid cesta: `/mnt/user/appdata/msgreceiver/`
## Přihlašovací údaje
- **Unraid SSH:** `root@192.168.1.76`, heslo: `7309208104`
- **Docker kontejner:** `msgreceiver`
## Postup při nové verzi app.py
### 1. Zkopírovat app.py na server
```powershell
Copy-Item "U:\PythonProject\Janssen\EmailsImport\DockerCustomApp\app.py" "\\tower\appdata\msgreceiver\app.py" -Force
```
### 2. Připojit se přes SSH a přebuildovat Docker (přes Python paramiko)
```python
import paramiko
c = paramiko.SSHClient()
c.set_missing_host_key_policy(paramiko.AutoAddPolicy())
c.connect('192.168.1.76', username='root', password='7309208104')
# Build
_, stdout, stderr = c.exec_command('docker build -t msgreceiver /mnt/user/appdata/msgreceiver/ 2>&1')
print(stdout.read().decode())
# Restart
_, stdout, stderr = c.exec_command('docker restart msgreceiver')
print(stdout.read().decode())
c.close()
```
> Poznámka: `sshpass` není na tomto Windows stroji k dispozici, Windows OpenSSH neumí neinteraktivní heslo — proto vždy použij **paramiko**.
## Struktura adresáře na serveru
```
/mnt/user/appdata/msgreceiver/
├── Dockerfile
├── app.py
├── requirements.txt
└── .env ← Dropbox credentials
```
## Dropbox konfigurace (.env)
Proměnné načítané z `.env`:
- `DROPBOX_APP_KEY`
- `DROPBOX_APP_SECRET`
- `DROPBOX_APP_REFRESH_TOKEN`
Upload cesta v Dropboxu: `/!!!Days/Downloads Z230/{filename}`
## API endpointy
Bearer token: `13e1bb01-9fd5-44a8-8ce9-4ee27133d340`
| Endpoint | Přijímá | Chování |
|---|---|---|
| `POST /upload` | `.msg` | Uloží do `/msgs`, přeskočí pokud existuje |
| `POST /upload-db` | `.db` | Smaže všechny staré `.db` v `/msgs/db`, uloží novou |
| `POST /upload-dropbox` | cokoliv | Nahraje do Dropboxu (overwrite) |
+412
View File
@@ -0,0 +1,412 @@
# app.py | v1.6 | 2026-06-01
# FastAPI server pro příjem .msg a .db souborů, upload do Dropboxu a import do Graph API.
# Endpointy: /upload (.msg → /msgs + Graph import), /upload-db (.db → /msgs/db),
# /upload-dropbox (→ Dropbox /!!!Days/Downloads Z230),
# /message-delete, /message-update (sync: smazání, přečtení, přesun složky).
from fastapi import FastAPI, UploadFile, File, Form, Header, HTTPException
from pydantic import BaseModel
import shutil
import base64
import hashlib
import logging
from pathlib import Path
from typing import Optional
import os
import dropbox
import msal
import requests as http_requests
import extract_msg
from dateutil import parser as dtparser
from datetime import timezone
from dotenv import load_dotenv
from cryptography.fernet import Fernet
load_dotenv(Path(__file__).parent / ".env")
app = FastAPI()
log = logging.getLogger("msgreceiver")
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
# Šifrovací klíč odvozený z TOKENu (Fernet = AES-128 CBC + HMAC)
_FERNET = Fernet(base64.urlsafe_b64encode(hashlib.sha256(TOKEN.encode()).digest()))
SAVE_DIR = Path("/msgs")
DB_DIR = Path("/msgs/db")
SAVE_DIR.mkdir(parents=True, exist_ok=True)
DB_DIR.mkdir(parents=True, exist_ok=True)
DROPBOX_APP_KEY = os.getenv("DROPBOX_APP_KEY", "")
DROPBOX_APP_SECRET = os.getenv("DROPBOX_APP_SECRET", "")
DROPBOX_REFRESH_TOKEN = os.getenv("DROPBOX_APP_REFRESH_TOKEN", "")
# --- Graph API config ---
GRAPH_TENANT_ID = "7d269944-37a4-43a1-8140-c7517dc426e9"
GRAPH_CLIENT_ID = "4b222bfd-78c9-4239-a53f-43006b3ed07f"
GRAPH_CLIENT_SECRET = "Txg8Q~MjhocuopxsJyJBhPmDfMxZ2r5WpTFj1dfk"
GRAPH_MAILBOX = "vladimir.buzalka@buzalka.cz"
GRAPH_ROOT_FOLDER = "JNJ" # subfolder under Inbox — root for imported emails
GRAPH_URL = "https://graph.microsoft.com/v1.0"
# Cache: folder path → Graph folder ID
_folder_id_cache: dict[str, str] = {}
_graph_token: Optional[str] = None
def _get_graph_token() -> str:
global _graph_token
msalapp = msal.ConfidentialClientApplication(
GRAPH_CLIENT_ID,
authority=f"https://login.microsoftonline.com/{GRAPH_TENANT_ID}",
client_credential=GRAPH_CLIENT_SECRET,
)
result = msalapp.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])
if "access_token" not in result:
raise RuntimeError(f"Graph auth failed: {result}")
_graph_token = result["access_token"]
return _graph_token
def _graph_headers() -> dict:
token = _graph_token or _get_graph_token()
return {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
def _ensure_folder(path_parts: list[str]) -> str:
"""Ensure folder hierarchy exists under Inbox, return leaf folder ID."""
cache_key = "/".join(path_parts)
if cache_key in _folder_id_cache:
return _folder_id_cache[cache_key]
headers = _graph_headers()
parent_id = "Inbox"
for i, part in enumerate(path_parts):
partial_key = "/".join(path_parts[: i + 1])
if partial_key in _folder_id_cache:
parent_id = _folder_id_cache[partial_key]
continue
# List children of parent
if parent_id == "Inbox":
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/mailFolders/Inbox/childFolders"
else:
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/mailFolders/{parent_id}/childFolders"
r = http_requests.get(url, headers=headers, timeout=15)
if r.status_code == 401:
_get_graph_token()
headers = _graph_headers()
r = http_requests.get(url, headers=headers, timeout=15)
found = None
for f in r.json().get("value", []):
if f["displayName"].lower() == part.lower():
found = f["id"]
break
if not found:
# Create folder
cr = http_requests.post(url, headers=headers, json={"displayName": part}, timeout=15)
if cr.status_code in (200, 201):
found = cr.json()["id"]
elif cr.status_code == 409:
# Already exists (race condition) — re-fetch
r2 = http_requests.get(url, headers=headers, timeout=15)
for f in r2.json().get("value", []):
if f["displayName"].lower() == part.lower():
found = f["id"]
break
if not found:
raise RuntimeError(f"Cannot create folder '{part}': {cr.text}")
_folder_id_cache[partial_key] = found
parent_id = found
return parent_id
def _map_jnj_folder(folder: str) -> list[str]:
"""Map JNJ folder path to Graph folder parts under JNJ root.
'/vbuzalka@its.jnj.com/Inbox/TMP' → ['JNJ', 'Inbox', 'TMP']
'/Online Archive - vbuzalka@its.jnj.com/Inbox' → ['JNJ', 'Online Archive', 'Inbox']
"""
parts = [p for p in folder.split("/") if p]
if not parts:
return [GRAPH_ROOT_FOLDER]
# First part is mailbox name — strip it but detect Online Archive
mailbox = parts[0]
rest = parts[1:]
prefix = [GRAPH_ROOT_FOLDER]
if "online archive" in mailbox.lower():
prefix.append("Online Archive")
return prefix + rest if rest else prefix
def _make_recipient(addr: str) -> dict:
if "<" in addr and ">" in addr:
name = addr[: addr.index("<")].strip().strip('"')
email = addr[addr.index("<") + 1 : addr.index(">")].strip()
else:
name = addr
email = addr
return {"emailAddress": {"name": name, "address": email}}
def _import_msg_to_graph(msg_path: Path, folder: str) -> Optional[str]:
"""Parse .msg and import into Graph API mailbox. Returns message ID or None."""
try:
msg = extract_msg.Message(str(msg_path))
subject = msg.subject or "(no subject)"
# Čtení těla — extract_msg může selhat na nestandartním kódování (cp1252 apod.)
try:
body_html = msg.htmlBody
if isinstance(body_html, bytes):
body_html = body_html.decode("utf-8", errors="replace")
except Exception:
body_html = None
try:
body_text = msg.body or ""
except Exception:
body_text = ""
try:
sender_email = msg.sender or ""
except Exception:
sender_email = ""
try:
sender_name = getattr(msg, "senderName", None) or sender_email
except Exception:
sender_name = sender_email
try:
to_raw = msg.to or ""
except Exception:
to_raw = ""
try:
cc_raw = msg.cc or ""
except Exception:
cc_raw = ""
try:
date_raw = msg.date
except Exception:
date_raw = None
att_list = []
for att in msg.attachments:
if att.data and att.longFilename:
att_list.append({
"@odata.type": "#microsoft.graph.fileAttachment",
"name": att.longFilename,
"contentType": getattr(att, "mimetype", None) or "application/octet-stream",
"contentBytes": base64.b64encode(att.data).decode(),
})
msg.close()
to_list = [a.strip() for a in to_raw.split(";") if a.strip()]
cc_list = [a.strip() for a in cc_raw.split(";") if a.strip()]
# Map folder and ensure it exists
folder_parts = _map_jnj_folder(folder)
folder_id = _ensure_folder(folder_parts)
payload = {
"subject": subject,
"body": {
"contentType": "HTML" if body_html else "Text",
"content": body_html or body_text,
},
"from": _make_recipient(f"{sender_name} <{sender_email}>"),
"toRecipients": [_make_recipient(a) for a in to_list],
"ccRecipients": [_make_recipient(a) for a in cc_list],
"isRead": True,
"singleValueExtendedProperties": [
{"id": "Integer 0x0E07", "value": "1"}
],
}
if date_raw:
try:
dt = dtparser.parse(str(date_raw))
payload["receivedDateTime"] = dt.astimezone(timezone.utc).strftime(
"%Y-%m-%dT%H:%M:%SZ"
)
except Exception:
pass
if att_list:
payload["attachments"] = att_list
headers = _graph_headers()
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/mailFolders/{folder_id}/messages"
r = http_requests.post(url, headers=headers, json=payload, timeout=30)
if r.status_code == 401:
_get_graph_token()
headers = _graph_headers()
r = http_requests.post(url, headers=headers, json=payload, timeout=30)
if r.status_code in (200, 201):
msg_id = r.json().get("id", "")
log.info("Graph OK: %s%s", subject[:60], "/".join(folder_parts))
return msg_id
else:
log.error("Graph FAIL [%d]: %s | %s", r.status_code, subject[:60], r.text[:200])
return None
except Exception as e:
log.error("Graph import error for %s: %s", msg_path.name, e)
return None
@app.post("/upload")
async def upload_msg(
file: UploadFile = File(...),
authorization: str = Header(None),
folder: str = Form(""),
):
if authorization != f"Bearer {TOKEN}":
raise HTTPException(status_code=401, detail="Unauthorized")
is_encrypted = file.filename.endswith(".emsg")
if not file.filename.endswith(".msg") and not is_encrypted:
raise HTTPException(status_code=400, detail="Only .msg or .emsg files accepted")
# Ukládáme vždy jako .msg
msg_filename = file.filename[:-5] + ".msg" if is_encrypted else file.filename
dest = SAVE_DIR / msg_filename
if dest.exists():
return {"status": "exists", "file": msg_filename}
content = await file.read()
if is_encrypted:
content = _FERNET.decrypt(content)
with dest.open("wb") as f:
f.write(content)
# Import to Graph API if folder was provided by client
graph_id = None
if folder:
graph_id = _import_msg_to_graph(dest, folder)
return {
"status": "saved",
"file": msg_filename,
"graph_id": graph_id,
}
@app.post("/upload-db")
async def upload_db(
file: UploadFile = File(...),
authorization: str = Header(None)
):
if authorization != f"Bearer {TOKEN}":
raise HTTPException(status_code=401, detail="Unauthorized")
if not file.filename.endswith(".db"):
raise HTTPException(status_code=400, detail="Only .db files accepted")
for old in DB_DIR.glob("*.db"):
old.unlink()
dest = DB_DIR / file.filename
with dest.open("wb") as f:
shutil.copyfileobj(file.file, f)
return {"status": "saved", "file": file.filename}
class MessageDeleteRequest(BaseModel):
graph_id: str
class MessageUpdateRequest(BaseModel):
graph_id: str
is_read: Optional[bool] = None
folder: Optional[str] = None
def _retry_graph(method, url, headers_fn, **kwargs):
"""Call Graph API, refresh token once on 401."""
headers = headers_fn()
r = method(url, headers=headers, **kwargs)
if r.status_code == 401:
_get_graph_token()
headers = headers_fn()
r = method(url, headers=headers, **kwargs)
return r
@app.post("/message-delete")
async def message_delete(req: MessageDeleteRequest, authorization: str = Header(None)):
if authorization != f"Bearer {TOKEN}":
raise HTTPException(status_code=401, detail="Unauthorized")
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/messages/{req.graph_id}"
r = _retry_graph(http_requests.delete, url, _graph_headers, timeout=15)
if r.status_code in (200, 204):
log.info("Graph DELETE OK: %s", req.graph_id)
return {"status": "deleted"}
raise HTTPException(status_code=500, detail=f"Graph DELETE failed: {r.status_code} {r.text[:200]}")
@app.post("/message-update")
async def message_update(req: MessageUpdateRequest, authorization: str = Header(None)):
if authorization != f"Bearer {TOKEN}":
raise HTTPException(status_code=401, detail="Unauthorized")
current_graph_id = req.graph_id
result: dict = {"status": "ok"}
# Move first — returns new graph_id which we use for subsequent read-status update
if req.folder:
folder_parts = _map_jnj_folder(req.folder)
folder_id = _ensure_folder(folder_parts)
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/messages/{current_graph_id}/move"
r = _retry_graph(http_requests.post, url, _graph_headers,
json={"destinationId": folder_id}, timeout=15)
if r.status_code in (200, 201):
current_graph_id = r.json().get("id", current_graph_id)
result["moved"] = True
log.info("Graph MOVE OK: %s%s", req.graph_id, "/".join(folder_parts))
else:
log.error("Graph MOVE FAIL [%d]: %s", r.status_code, r.text[:200])
result["moved"] = False
if req.is_read is not None:
url = f"{GRAPH_URL}/users/{GRAPH_MAILBOX}/messages/{current_graph_id}"
r = _retry_graph(http_requests.patch, url, _graph_headers,
json={"isRead": req.is_read}, timeout=15)
result["read_updated"] = r.status_code in (200, 201)
if not result["read_updated"]:
log.error("Graph PATCH isRead FAIL [%d]: %s", r.status_code, r.text[:200])
result["graph_id"] = current_graph_id
return result
@app.post("/upload-dropbox")
async def upload_dropbox(
file: UploadFile = File(...),
authorization: str = Header(None),
):
if authorization != f"Bearer {TOKEN}":
raise HTTPException(status_code=401, detail="Unauthorized")
if not DROPBOX_REFRESH_TOKEN:
raise HTTPException(status_code=500, detail="Dropbox not configured")
content = await file.read()
dbx = dropbox.Dropbox(
app_key=DROPBOX_APP_KEY,
app_secret=DROPBOX_APP_SECRET,
oauth2_refresh_token=DROPBOX_REFRESH_TOKEN,
)
dropbox_path = f"/!!!Days/Downloads Z230/{file.filename}"
dbx.files_upload(content, dropbox_path, mode=dropbox.files.WriteMode.overwrite)
return {"status": "uploaded", "file": file.filename, "dropbox_path": dropbox_path}
@@ -0,0 +1,10 @@
fastapi
uvicorn
python-multipart
dropbox
python-dotenv
msal
requests
extract-msg
python-dateutil
cryptography
@@ -0,0 +1,180 @@
# test_import_msg.py — pokusný import .msg do schránky přes Graph API
# Parsuje .msg soubor a vytvoří zprávu v Inbox cílové schránky.
import base64
import msal
import requests
import extract_msg
import sys
from pathlib import Path
# === CONFIG ===
TENANT_ID = "7d269944-37a4-43a1-8140-c7517dc426e9"
CLIENT_ID = "4b222bfd-78c9-4239-a53f-43006b3ed07f"
CLIENT_SECRET = "Txg8Q~MjhocuopxsJyJBhPmDfMxZ2r5WpTFj1dfk"
MAILBOX = "vladimir.buzalka@buzalka.cz"
AUTHORITY = f"https://login.microsoftonline.com/{TENANT_ID}"
SCOPE = ["https://graph.microsoft.com/.default"]
GRAPH_URL = "https://graph.microsoft.com/v1.0"
TARGET_FOLDER = "JNJ" # subfolder under Inbox
# === MSG FILE ===
MSG_PATH = Path(__file__).parent / "FC130007ACFE5DCB0000.msg"
def get_token():
app = msal.ConfidentialClientApplication(
CLIENT_ID, authority=AUTHORITY, client_credential=CLIENT_SECRET
)
token = app.acquire_token_for_client(scopes=SCOPE)
if "access_token" not in token:
raise RuntimeError(f"Auth failed: {token}")
return token["access_token"]
def parse_msg(path):
"""Parse .msg file and return dict with message properties."""
msg = extract_msg.Message(str(path))
# Read all properties before closing
subject = msg.subject or "(no subject)"
body_html = msg.htmlBody
if isinstance(body_html, bytes):
body_html = body_html.decode("utf-8", errors="replace")
body_text = msg.body or ""
sender_email = msg.sender or ""
sender_name = getattr(msg, "senderName", None) or sender_email
to_raw = msg.to or ""
cc_raw = msg.cc or ""
date_raw = msg.date
att_list = []
for att in msg.attachments:
if att.data and att.longFilename:
att_list.append({
"@odata.type": "#microsoft.graph.fileAttachment",
"name": att.longFilename,
"contentType": getattr(att, "mimetype", None) or "application/octet-stream",
"contentBytes": base64.b64encode(att.data).decode(),
})
msg.close()
# Process after close
to_list = [a.strip() for a in to_raw.split(";") if a.strip()]
cc_list = [a.strip() for a in cc_raw.split(";") if a.strip()]
received = str(date_raw) if date_raw else None
return {
"subject": subject,
"body_html": body_html,
"body_text": body_text,
"sender_email": sender_email,
"sender_name": sender_name,
"to": to_list,
"cc": cc_list,
"received": received,
"attachments": att_list,
}
def make_recipient(addr):
"""Create Graph API recipient object from email address."""
# Handle 'Name <email>' format
if "<" in addr and ">" in addr:
name = addr[:addr.index("<")].strip().strip('"')
email = addr[addr.index("<") + 1 : addr.index(">")].strip()
else:
name = addr
email = addr
return {"emailAddress": {"name": name, "address": email}}
def import_msg(msg_path):
token = get_token()
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
}
print(f"Parsing: {msg_path}")
data = parse_msg(msg_path)
print(f" Subject: {data['subject']}")
print(f" From: {data['sender_name']} <{data['sender_email']}>")
print(f" To: {data['to']}")
print(f" Date: {data['received']}")
print(f" Attachments: {len(data['attachments'])}")
# 1. Create message in mailFolder (Inbox)
payload = {
"subject": data["subject"],
"body": {
"contentType": "HTML" if data["body_html"] else "Text",
"content": data["body_html"] or data["body_text"],
},
"from": make_recipient(
f"{data['sender_name']} <{data['sender_email']}>"
),
"toRecipients": [make_recipient(a) for a in data["to"]],
"ccRecipients": [make_recipient(a) for a in data["cc"]],
"isRead": True,
# PR_MESSAGE_FLAGS (0x0E07) = 1 → read, NOT draft (without MSGFLAG_UNSENT=0x08)
"singleValueExtendedProperties": [
{
"id": "Integer 0x0E07",
"value": "1",
}
],
}
if data["received"]:
# Graph API expects ISO 8601 UTC format
from datetime import datetime, timezone
try:
from dateutil import parser as dtparser
dt = dtparser.parse(data["received"])
payload["receivedDateTime"] = dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
except Exception as e:
print(f" Warning: cannot parse date '{data['received']}': {e}")
if data["attachments"]:
payload["attachments"] = data["attachments"]
# Find target folder (Inbox/JNJ)
folder_url = f"{GRAPH_URL}/users/{MAILBOX}/mailFolders/Inbox/childFolders"
r_folders = requests.get(folder_url, headers=headers, timeout=15)
folder_id = None
for f in r_folders.json().get("value", []):
if f["displayName"].lower() == TARGET_FOLDER.lower():
folder_id = f["id"]
break
if not folder_id:
# Create the folder if it doesn't exist
r_create = requests.post(
folder_url, headers=headers,
json={"displayName": TARGET_FOLDER}, timeout=15
)
folder_id = r_create.json()["id"]
print(f" Created folder '{TARGET_FOLDER}'")
url = f"{GRAPH_URL}/users/{MAILBOX}/mailFolders/{folder_id}/messages"
print(f"\nPOST -> Inbox/{TARGET_FOLDER}")
r = requests.post(url, headers=headers, json=payload, timeout=30)
if r.status_code in (200, 201):
msg_id = r.json().get("id", "?")
print(f" OK! Message created, id={msg_id[:40]}...")
return r.json()
else:
print(f" FAILED [{r.status_code}]: {r.text[:500]}")
return None
if __name__ == "__main__":
path = sys.argv[1] if len(sys.argv) > 1 else MSG_PATH
import_msg(Path(path))
@@ -0,0 +1,25 @@
import dropbox
from dotenv import load_dotenv
from pathlib import Path
import os
from dropbox import DropboxOAuth2FlowNoRedirect
load_dotenv(Path(__file__).parent / ".env")
APP_KEY = os.getenv("DROPBOX_APP_KEY", "")
APP_SECRET = os.getenv("DROPBOX_APP_SECRET", "")
auth_flow = DropboxOAuth2FlowNoRedirect(
APP_KEY,
APP_SECRET,
token_access_type='offline' # důležité — dá refresh token
)
authorize_url = auth_flow.start()
print(f"Otevři v prohlížeči:\n{authorize_url}")
auth_code = input("Vlož autorizační kód: ").strip()
oauth_result = auth_flow.finish(auth_code)
print(f"Refresh token: {oauth_result.refresh_token}")
# Tento token ulož — platí "navždy" (dokud app neodvoláš)
@@ -0,0 +1,22 @@
import dropbox
from dotenv import load_dotenv
from pathlib import Path
import os
load_dotenv(Path(__file__).parent / ".env")
APP_KEY = os.getenv("DROPBOX_APP_KEY", "")
APP_SECRET = os.getenv("DROPBOX_APP_SECRET", "")
REFRESH_TOKEN = os.getenv("DROPBOX_APP_REFRESH_TOKEN", "")
dbx = dropbox.Dropbox(
app_key=APP_KEY,
app_secret=APP_SECRET,
oauth2_refresh_token=REFRESH_TOKEN,
)
dropbox_path = "/!!!Days/Downloads Z230/AHOJVLADO.TXT"
content = b"AHOJ VLADO"
dbx.files_upload(content, dropbox_path, mode=dropbox.files.WriteMode.overwrite)
print(f"Nahráno: {dropbox_path}")
@@ -0,0 +1,18 @@
"""
db_cleanup_inbox v1.0
Verze: 1.0
Datum: 2026-05-28
Popis: Jednorázový cleanup - smaže záznamy v SQLite DB kde folder = '/Inbox'
(záznamy bez emailové adresy v cestě, vytvořené chybnou verzí skriptu).
"""
import sqlite3
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
conn = sqlite3.connect(DB_PATH)
deleted = conn.execute("DELETE FROM messages WHERE folder = '/Inbox'").rowcount
conn.commit()
conn.close()
print(f"Smazáno záznamů: {deleted}")
print("Hotovo.")
@@ -0,0 +1,177 @@
"""
janssenpc_email_send_new v1.0
Verze: 1.0
Datum: 2026-05-28
Popis: Prochází pouze složku Inbox v Outlooku (MAPI), ukládá emailové zprávy jako .msg
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
Podporuje pokračování od posledního zpracovaného emailu (resume).
"""
import win32com.client
import requests
import sqlite3
import urllib3
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
import io
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
conn.execute("""
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
VALUES (?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source))
conn.commit()
def upload_db(db_path):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
def upload_msg(msg_path, filename):
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=30
)
resp.raise_for_status()
return resp.json()["status"]
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def process_folder(conn, folder, source, folder_path="", counter=None):
if counter is None:
counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
status = upload_msg(tmp_path, safe_name)
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
received, current_path, source)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {item.Subject[:60]}")
except Exception as e:
print(f" CHYBA | {getattr(item, 'Subject', '?')[:40]} | {e}")
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
for subfolder in folder.Folders:
process_folder(conn, subfolder, source, current_path, counter)
# --- MAIN ---
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
inbox = ns.GetDefaultFolder(6) # 6 = olFolderInbox
source = "mailbox"
print(f"\n=== Inbox ===")
process_folder(conn, inbox, source)
# Finální DB upload po dokončení
print("\nFinální upload DB...")
upload_db(DB_PATH)
conn.close()
print("\nHotovo.")
@@ -0,0 +1,195 @@
"""
janssenpc_email_send_new v1.1
Verze: 1.1
Datum: 2026-05-28
Popis: Prochází pouze složku Inbox v Outlooku (MAPI), ukládá emailové zprávy jako .msg
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
Podporuje pokračování od posledního zpracovaného emailu (resume).
Nově: chyby při uploadu se logují do souboru jnjemails_errors.log
(timestamp, složka, odesílatel, předmět, chyba).
"""
import win32com.client
import requests
import sqlite3
import urllib3
import logging
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
import io
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
conn.execute("""
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
VALUES (?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source))
conn.commit()
def upload_db(db_path):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
def upload_msg(msg_path, filename):
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=30
)
resp.raise_for_status()
return resp.json()["status"]
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def process_folder(conn, folder, source, folder_path="", counter=None):
if counter is None:
counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
status = upload_msg(tmp_path, safe_name)
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
received, current_path, source)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {item.Subject[:60]}")
except Exception as e:
subject = getattr(item, 'Subject', '?')
sender = getattr(item, 'SenderEmailAddress', '?')
received = getattr(item, 'ReceivedTime', '?')
print(f" CHYBA | {subject[:40]} | {e}")
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
current_path, sender, received, subject, e)
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
for subfolder in folder.Folders:
process_folder(conn, subfolder, source, current_path, counter)
# --- MAIN ---
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
inbox = ns.GetDefaultFolder(6) # 6 = olFolderInbox
source = "mailbox"
print(f"\n=== Inbox ===")
process_folder(conn, inbox, source)
# Finální DB upload po dokončení
print("\nFinální upload DB...")
upload_db(DB_PATH)
conn.close()
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
@@ -0,0 +1,199 @@
"""
janssenpc_email_send_new v1.2
Verze: 1.2
Datum: 2026-05-28
Popis: Prochází pouze složku Inbox v Outlooku (MAPI), ukládá emailové zprávy jako .msg
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
Podporuje pokračování od posledního zpracovaného emailu (resume).
Chyby při uploadu se logují do souboru jnjemails_errors.log.
Oprava v1.2: folder cesta obsahuje celé jméno schránky (např. /vbuzalka@its.jnj.com/Inbox)
aby resume logika správně navazovala na záznamy z původního skriptu.
"""
import win32com.client
import requests
import sqlite3
import urllib3
import logging
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
import io
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
conn.execute("""
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
VALUES (?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source))
conn.commit()
def upload_db(db_path):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
def upload_msg(msg_path, filename):
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=30
)
resp.raise_for_status()
return resp.json()["status"]
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def process_folder(conn, folder, source, folder_path="", counter=None):
if counter is None:
counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
status = upload_msg(tmp_path, safe_name)
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
received, current_path, source)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {item.Subject[:60]}")
except Exception as e:
subject = getattr(item, 'Subject', '?')
sender = getattr(item, 'SenderEmailAddress', '?')
received = getattr(item, 'ReceivedTime', '?')
print(f" CHYBA | {subject[:40]} | {e}")
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
current_path, sender, received, subject, e)
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
for subfolder in folder.Folders:
process_folder(conn, subfolder, source, current_path, counter)
# --- MAIN ---
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
inbox = ns.GetDefaultFolder(6) # 6 = olFolderInbox
mailbox_name = inbox.Parent.Name # např. "vbuzalka@its.jnj.com"
print(f"Schránka: {mailbox_name}")
source = "mailbox"
print(f"\n=== Inbox ({mailbox_name}) ===")
process_folder(conn, inbox, source, f"/{mailbox_name}")
# Finální DB upload po dokončení
print("\nFinální upload DB...")
upload_db(DB_PATH)
conn.close()
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
@@ -0,0 +1,200 @@
"""
janssenpc_email_send_new v1.3
Verze: 1.3
Datum: 2026-05-28
Popis: Prochází složky Inbox, Deleted Items a Sent Items v Outlooku (MAPI),
ukládá emailové zprávy jako .msg soubory a uploaduje je na https://msgs.buzalka.cz.
Zaznamenává zpracované zprávy do SQLite DB (jnjemails.db) a DB periodicky
uploaduje na server. Podporuje pokračování od posledního zpracovaného emailu (resume).
Folder cesta obsahuje celé jméno schránky (např. /vbuzalka@its.jnj.com/Inbox).
Chyby při uploadu se logují do souboru jnjemails_errors.log.
"""
import win32com.client
import requests
import sqlite3
import urllib3
import logging
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
import io
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
# olFolderInbox=6, olFolderDeletedItems=3, olFolderSentMail=5
FOLDERS_TO_PROCESS = [6, 3, 5]
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
conn.execute("""
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
VALUES (?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source))
conn.commit()
def upload_db(db_path):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
def upload_msg(msg_path, filename):
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=30
)
resp.raise_for_status()
return resp.json()["status"]
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def process_folder(conn, folder, source, folder_path="", counter=None):
if counter is None:
counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
status = upload_msg(tmp_path, safe_name)
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
received, current_path, source)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {item.Subject[:60]}")
except Exception as e:
subject = getattr(item, 'Subject', '?')
sender = getattr(item, 'SenderEmailAddress', '?')
received = getattr(item, 'ReceivedTime', '?')
print(f" CHYBA | {subject[:40]} | {e}")
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
current_path, sender, received, subject, e)
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
for subfolder in folder.Folders:
process_folder(conn, subfolder, source, current_path, counter)
# --- MAIN ---
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
counter = [0]
for folder_id in FOLDERS_TO_PROCESS:
folder = ns.GetDefaultFolder(folder_id)
mailbox_name = folder.Parent.Name
print(f"\n=== {folder.Name} ({mailbox_name}) ===")
process_folder(conn, folder, "mailbox", f"/{mailbox_name}", counter)
# Finální DB upload po dokončení
print("\nFinální upload DB...")
upload_db(DB_PATH)
conn.close()
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
@@ -0,0 +1,233 @@
"""
janssenpc_email_send_new v1.4
Verze: 1.4.1
Datum: 2026-05-29
Popis: Prochází složky Inbox, Deleted Items a Sent Items v Outlooku (MAPI),
ukládá emailové zprávy jako .msg soubory a uploaduje je na https://msgs.buzalka.cz.
Zaznamenává zpracované zprávy do SQLite DB (jnjemails.db) a DB uploaduje na server
jednou za 24 hodin (ne při každém běhu). Podporuje pokračování od posledního
zpracovaného emailu (resume). Folder cesta obsahuje celé jméno schránky
(např. /vbuzalka@its.jnj.com/Inbox). Chyby se logují do jnjemails_errors.log.
"""
import win32com.client
import requests
import sqlite3
import urllib3
import logging
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
import io
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
DB_UPLOAD_MARKER = r"C:\Users\vbuzalka\SQLITE\jnjemails_last_db_upload.txt"
DB_UPLOAD_INTERVAL_H = 24
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
# olFolderInbox=6, olFolderDeletedItems=3, olFolderSentMail=5
FOLDERS_TO_PROCESS = [6, 3, 5]
UPLOAD_LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_uploads.log"
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
# Separate upload logger — logs every upload attempt
_upload_log = logging.getLogger("uploads")
_upload_log.setLevel(logging.DEBUG)
_uh = logging.FileHandler(UPLOAD_LOG_PATH, encoding="utf-8")
_uh.setFormatter(logging.Formatter("%(asctime)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S"))
_upload_log.addHandler(_uh)
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
conn.execute("""
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
VALUES (?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source))
conn.commit()
def _db_upload_due() -> bool:
"""Return True if 24h elapsed since last DB upload (or never uploaded)."""
marker = Path(DB_UPLOAD_MARKER)
if not marker.exists():
return True
try:
last = datetime.fromisoformat(marker.read_text().strip())
return (datetime.now() - last).total_seconds() >= DB_UPLOAD_INTERVAL_H * 3600
except Exception:
return True
def _db_upload_mark():
"""Write current timestamp to marker file."""
Path(DB_UPLOAD_MARKER).write_text(datetime.now().isoformat())
def upload_db(db_path, force=False):
if not force and not _db_upload_due():
return
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
_db_upload_mark()
def upload_msg(msg_path, filename, folder=""):
_upload_log.info("UPLOAD %s | folder=%s", filename, folder)
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
data={"folder": folder},
timeout=60
)
resp.raise_for_status()
result = resp.json()
_upload_log.info("RESPONSE %s | %s", filename, result)
return result["status"]
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def process_folder(conn, folder, source, folder_path="", counter=None):
if counter is None:
counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
status = upload_msg(tmp_path, safe_name, current_path)
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
received, current_path, source)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {item.Subject[:60]}")
except Exception as e:
subject = getattr(item, 'Subject', '?')
sender = getattr(item, 'SenderEmailAddress', '?')
received = getattr(item, 'ReceivedTime', '?')
print(f" CHYBA | {subject[:40]} | {e}")
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
current_path, sender, received, subject, e)
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
for subfolder in folder.Folders:
process_folder(conn, subfolder, source, current_path, counter)
# --- MAIN ---
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
counter = [0]
for folder_id in FOLDERS_TO_PROCESS:
folder = ns.GetDefaultFolder(folder_id)
mailbox_name = folder.Parent.Name
print(f"\n=== {folder.Name} ({mailbox_name}) ===")
process_folder(conn, folder, "mailbox", f"/{mailbox_name}", counter)
# Finální DB upload po dokončení
print("\nFinální upload DB...")
upload_db(DB_PATH)
conn.close()
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
+170
View File
@@ -0,0 +1,170 @@
# inbox_full_sync_v1.0
**Název:** inbox_full_sync_v1.0.py
**Verze:** 1.0.4
**Datum:** 2026-06-01
**Autor:** vladimir.buzalka
---
## Účel
Jednorázový skript pro úplný přenos Inboxu z JNJ Outlooku (MAPI) do osobní schránky `vladimir.buzalka@buzalka.cz` přes Microsoft Graph API.
Spouštět ručně jako záchranná síť nebo iniciální sync. Bezpečné opakovat — duplicity se automaticky přeskočí.
---
## Co dělá
1. Připojí se k Outlooku přes MAPI (`win32com`)
2. Projde celý **Inbox** včetně všech podsložek rekurzivně
3. Pro každý email zkontroluje SQLite DB — pokud už je přenesen, přeskočí ho
4. Nový email uloží jako `.msg` do temp složky, **zašifruje** (Fernet/AES) a odešle jako `.emsg` na `msgs.buzalka.cz/upload`
5. Server (`app.py`) dešifruje, parsuje `.msg`, importuje do Graph API a vrátí `graph_id`
6. Záznam se uloží do DB (`messages`, `log`)
7. Každých 100 přenesených emailů + na konci uploaduje DB na server
**Online Archive se nepřenáší**`GetDefaultFolder(6)` vrátí pouze primární schránku.
---
## Šifrování (Zscaler bypass)
JNJ síť používá **Zscaler DLP** — blokuje upload souborů s medicínským obsahem (ECG reporty, klinická data) na externí URL.
Řešení: soubor se před odesláním zašifruje pomocí **Fernet** (AES-128 CBC + HMAC). Zscaler vidí pouze šifrovaný bináč a nerozpozná obsah.
- Šifrovací klíč se odvozuje z `TOKEN` přes SHA-256 — žádná extra konstanta, obě strany derivují klíč samostatně
- Soubor se odesílá s příponou `.emsg` místo `.msg`
- Server (app.py v1.6+) automaticky detekuje `.emsg`, dešifruje a dále zpracuje standardně
---
## Konfigurace
Konstanty jsou přímo v kódu:
| Konstanta | Hodnota |
|---|---|
| `TOKEN` | Bearer token pro msgs.buzalka.cz (slouží i jako základ šifrovacího klíče) |
| `UPLOAD_URL` | `https://msgs.buzalka.cz/upload` |
| `DB_UPLOAD_URL` | `https://msgs.buzalka.cz/upload-db` |
| `DB_PATH` | `C:\Users\vbuzalka\SQLITE\jnjemails.db` |
| `LOG_PATH` | `C:\Users\vbuzalka\SQLITE\inbox_full_sync_errors.log` |
---
## Závislosti
- Python 3.10+, Windows
- Outlook musí být spuštěn
- `pywin32`, `requests`, `cryptography`
- Server `msgs.buzalka.cz` musí běžet (app.py v1.6+)
---
## SQLite DB (`jnjemails.db`)
### Tabulka `messages`
Jeden záznam na každý přenesený email.
| Sloupec | Popis |
|---|---|
| `message_id` | Internet Message-ID (nebo `entryid:...` jako fallback) |
| `entry_id` | Outlook EntryID — pro zpětné dohledání v MAPI |
| `graph_id` | ID zprávy v Graph API — pro sync operace |
| `is_read` | Stav přečtení při přenosu (0/1) |
| `jnj_folder` | Složka v JNJ při přenosu |
| `source` | Vždy `inbox_full_sync` |
### Tabulka `runs`
Jeden záznam na každý běh skriptu.
| Sloupec | Popis |
|---|---|
| `script` | `inbox_full_sync` |
| `version` | verze skriptu |
| `started_at` / `finished_at` | časy běhu |
| `transferred` | počet nově přenesených emailů |
| `skipped` | počet přeskočených (již v DB) |
| `errors` | počet chyb |
### Tabulka `log`
Flat event log — každý console výstup i interní událost jako řádek.
| Sloupec | Popis |
|---|---|
| `run_id` | FK na `runs.id` |
| `level` | `INFO` / `ERROR` |
| `event` | typ události (viz níže) |
| `subject` | předmět emailu (pokud relevantní) |
| `folder` | složka (pokud relevantní) |
| `graph_id` | Graph ID (pokud relevantní) |
| `detail` | pro `upload_saved`: `size=XKB`; pro `upload_error`: `error=... \| size=XKB \| body=... \| sender=... \| received=... \| entry_id=... \| message_id=...` |
#### Události (`log.event`)
| Event | Popis |
|---|---|
| `run_start` | start skriptu |
| `mailbox` | název schránky |
| `folder_start` | vstup do složky (detail = počet položek) |
| `folder_done` | konec složky (detail = přeneseno/skip) |
| `upload_saved` | nový email úspěšně přenesen (detail = size=XKB) |
| `upload_exists` | email již v DB, přeskočen |
| `upload_error` | chyba při uploadu — detail obsahuje sender, received, entry_id, message_id pro dohledání v Outlooku |
| `progress` | každých 100 přenesených emailů |
| `db_upload` | úspěšný upload DB na server |
| `db_upload_error` | chyba uploadu DB |
| `run_done` | konec skriptu (detail = souhrn) |
---
## Užitečné dotazy
**Poslední běh — kompletní log:**
```sql
SELECT r.script, r.version, r.started_at,
l.level, l.event, l.subject, l.folder, l.detail, l.created_at
FROM log l JOIN runs r ON r.id = l.run_id
WHERE l.run_id = (SELECT MAX(id) FROM runs)
ORDER BY l.created_at
```
**Přehled všech běhů:**
```sql
SELECT id, script, version, started_at, finished_at,
transferred, skipped, errors
FROM runs ORDER BY started_at DESC
```
**Chyby z posledního běhu:**
```sql
SELECT l.event, l.subject, l.folder, l.detail, l.created_at
FROM log l
WHERE l.run_id = (SELECT MAX(id) FROM runs)
AND l.level = 'ERROR'
ORDER BY l.created_at
```
---
## Návaznost
- Sdílí DB s `janssenpc_email_send_new_v1.5.py` — záznamy jsou kompatibilní
- Emaily přenesené tímto skriptem mají `graph_id` a jsou od té chvíle hlídány sync průchodem v1.5
- Server endpoint: `msgs.buzalka.cz/upload` musí vracet `graph_id` (app.py v1.6+)
- nginx `client_max_body_size` nastaven na **200M** (SWAG `msgreceiver.subdomain.conf`)
---
## Historie verzí
| Verze | Datum | Změna |
|---|---|---|
| 1.0.0 | 2026-06-01 | Základní funkce: Inbox full scan, dedup přes DB, entry_id/graph_id/is_read |
| 1.0.1 | 2026-06-01 | DB upload každých 100 emailů + finální upload |
| 1.0.2 | 2026-06-01 | SQLite tabulky runs + log |
| 1.0.3 | 2026-06-01 | Kompletní konzolový výstup zrcadlen do log tabulky, skipped counter |
| 1.0.4 | 2026-06-01 | Šifrování Fernet (.emsg) pro bypass Zscaler DLP; rozšířený error detail (sender/received/entry_id/size) |
+384
View File
@@ -0,0 +1,384 @@
"""
inbox_full_sync v1.0
Název: inbox_full_sync_v1.0.py
Verze: 1.0.3
Datum: 2026-06-01
Autor: vladimir.buzalka
Popis:
Jednorázový skript pro úplný přenos Inboxu z JNJ Outlooku (MAPI) do osobní
schránky vladimir.buzalka@buzalka.cz přes Graph API.
Prochází celý Inbox včetně všech podsložek. Online Archive se nepřenáší
(GetDefaultFolder(6) vrátí pouze primární schránku).
Každý email se uloží jako .msg do temp složky, odešle na https://msgs.buzalka.cz/upload
a přes Graph API se importuje do odpovídající složky v osobní schránce.
Dedup zajišťuje SQLite DB — email který je v DB (message_id) se přeskočí.
Spouštění:
Spouštět ručně jako záchranná síť nebo iniciální sync.
Bezpečné opakovat — duplicity se přeskočí.
Závislosti:
win32com, requests, sqlite3 (stdlib)
Python 3.10+, Windows, Outlook musí být spuštěn
Konfigurace (konstanty v kódu):
TOKEN Bearer token pro msgs.buzalka.cz
UPLOAD_URL https://msgs.buzalka.cz/upload
DB_UPLOAD_URL https://msgs.buzalka.cz/upload-db
DB_PATH C:\\Users\\vbuzalka\\SQLITE\\jnjemails.db
LOG_PATH C:\\Users\\vbuzalka\\SQLITE\\inbox_full_sync_errors.log
SQLite DB (jnjemails.db):
messages — přenesené emaily (message_id, entry_id, graph_id, is_read, jnj_folder, ...)
runs — jeden záznam na běh (script, version, started_at, finished_at, counts)
log — flat event log per run (level, event, subject, folder, graph_id, detail)
Dotaz pro posledn běh:
SELECT r.script, r.version, r.started_at, l.level, l.event,
l.subject, l.folder, l.detail, l.created_at
FROM log l JOIN runs r ON r.id = l.run_id
WHERE l.run_id = (SELECT MAX(id) FROM runs)
ORDER BY l.created_at
Log události (log.event):
run_start — start skriptu
mailbox — název schránky
folder_start — vstup do složky (detail = počet položek)
folder_done — konec složky (detail = přeneseno/skip)
upload_saved — nový email přenesen
upload_exists — email již v DB, přeskočen
upload_error — chyba při uploadu (detail = chybová zpráva)
progress — každých 100 přenesených
db_upload — úspěšný upload DB na server
db_upload_error — chyba uploadu DB
run_done — konec skriptu (detail = souhrn)
Historie verzí:
1.0.0 2026-06-01 Základní funkce: Inbox full scan, dedup přes DB, entry_id/graph_id/is_read
1.0.1 2026-06-01 DB upload každých 100 emailů + finální upload
1.0.2 2026-06-01 SQLite tabulky runs + log
1.0.3 2026-06-01 Kompletní konzolový výstup zrcadlen do log tabulky, skipped counter
"""
import win32com.client
import requests
import sqlite3
import urllib3
import logging
import hashlib
import base64
from pathlib import Path
from datetime import datetime
from cryptography.fernet import Fernet
import tempfile
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\inbox_full_sync_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
DB_UPLOAD_URL = "https://msgs.buzalka.cz/upload-db"
SCRIPT_NAME = "inbox_full_sync"
SCRIPT_VERSION = "1.0.4"
# Šifrovací klíč odvozený z TOKENu — stejný algoritmus jako na serveru
_FERNET = Fernet(base64.urlsafe_b64encode(hashlib.sha256(TOKEN.encode()).digest()))
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now')),
entry_id TEXT,
graph_id TEXT,
is_read INTEGER DEFAULT 0,
jnj_folder TEXT
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.execute("""
CREATE TABLE IF NOT EXISTS runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
script TEXT NOT NULL,
version TEXT,
started_at TEXT NOT NULL,
finished_at TEXT,
transferred INTEGER DEFAULT 0,
skipped INTEGER DEFAULT 0,
sync_updated INTEGER DEFAULT 0,
sync_deleted INTEGER DEFAULT 0,
errors INTEGER DEFAULT 0
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(id),
level TEXT NOT NULL,
event TEXT NOT NULL,
subject TEXT,
folder TEXT,
graph_id TEXT,
detail TEXT,
created_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_log_run_id ON log(run_id)")
for col, definition in [
("entry_id", "TEXT"),
("graph_id", "TEXT"),
("is_read", "INTEGER DEFAULT 0"),
("jnj_folder", "TEXT"),
]:
try:
conn.execute(f"ALTER TABLE messages ADD COLUMN {col} {definition}")
except Exception:
pass
conn.commit()
def start_run(conn):
cur = conn.execute(
"INSERT INTO runs (script, version, started_at) VALUES (?, ?, datetime('now'))",
(SCRIPT_NAME, SCRIPT_VERSION)
)
conn.commit()
return cur.lastrowid
def finish_run(conn, run_id, transferred, skipped, errors):
conn.execute("""
UPDATE runs SET finished_at=datetime('now'), transferred=?, skipped=?, errors=?
WHERE id=?
""", (transferred, skipped, errors, run_id))
conn.commit()
def db_log(conn, run_id, level, event, subject=None, folder=None, graph_id=None, detail=None):
conn.execute("""
INSERT INTO log (run_id, level, event, subject, folder, graph_id, detail)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (run_id, level, event, subject, folder, graph_id, detail))
conn.commit()
def info(conn, run_id, event, **kwargs):
db_log(conn, run_id, "INFO", event, **kwargs)
def error(conn, run_id, event, **kwargs):
db_log(conn, run_id, "ERROR", event, **kwargs)
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder,
entry_id=None, graph_id=None, is_read=0):
conn.execute("""
INSERT OR IGNORE INTO messages
(message_id, subject, sender, received_at, folder, source,
entry_id, graph_id, is_read, jnj_folder)
VALUES (?, ?, ?, ?, ?, 'inbox_full_sync', ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder,
entry_id, graph_id, is_read, folder))
conn.commit()
def upload_db(conn, run_id):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
try:
with open(DB_PATH, "rb") as f:
resp = requests.post(
DB_UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60,
)
result = resp.json()
msg = f"DB upload: {result}"
print(f" {msg}")
info(conn, run_id, "db_upload", detail=msg)
except Exception as e:
msg = str(e)
print(f" DB upload CHYBA: {msg}")
error(conn, run_id, "db_upload_error", detail=msg)
def upload_msg(msg_path, filename, folder=""):
size_kb = Path(msg_path).stat().st_size // 1024
with open(msg_path, "rb") as f:
encrypted = _FERNET.encrypt(f.read())
enc_filename = Path(filename).stem + ".emsg"
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (enc_filename, encrypted, "application/octet-stream")},
data={"folder": folder},
timeout=60,
)
if not resp.ok:
raise requests.HTTPError(
f"{resp.status_code} {resp.reason} | size={size_kb}KB | body={resp.text[:300]}",
response=resp,
)
return resp.json()
def process_folder(conn, run_id, folder, folder_path, counter, skipped_counter, error_counter):
current_path = f"{folder_path}/{folder.Name}"
items = folder.Items
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
total = items.Count
msg = f"Složka: {current_path} ({total} položek)"
print(f"\n {msg}")
info(conn, run_id, "folder_start", folder=current_path, detail=str(total))
for item in items:
subject = getattr(item, 'Subject', '?')
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except Exception:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
skipped_counter[0] += 1
continue
try:
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
size_kb = tmp_path.stat().st_size // 1024
result = upload_msg(tmp_path, safe_name, current_path)
status = result.get("status", "?")
graph_id = result.get("graph_id")
is_read = 0 if item.UnRead else 1
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, subject, item.SenderEmailAddress,
received, current_path,
entry_id=item.EntryID, graph_id=graph_id, is_read=is_read)
info(conn, run_id, f"upload_{status}",
subject=subject, folder=current_path, graph_id=graph_id,
detail=f"size={size_kb}KB")
counter[0] += 1
count += 1
if counter[0] % 100 == 0:
msg = f"celkem přeneseno: {counter[0]}"
print(f"{msg}, uploaduji DB...")
info(conn, run_id, "progress", detail=msg)
upload_db(conn, run_id)
print(f" {status.upper():6} | {subject[:70]}")
except Exception as e:
sender_str = getattr(item, 'SenderEmailAddress', '?')
received_str = getattr(item, 'ReceivedTime', None)
received_str = received_str.isoformat() if received_str else '?'
entry_id_str = getattr(item, 'EntryID', '?')
detail = (
f"error={e} | "
f"sender={sender_str} | "
f"received={received_str} | "
f"entry_id={entry_id_str} | "
f"message_id={mid}"
)
print(f" CHYBA | {subject[:50]} | sender={sender_str} | received={received_str} | {e}")
error(conn, run_id, "upload_error",
subject=subject, folder=current_path, detail=detail)
logging.error("folder=%s | %s", current_path, detail)
error_counter[0] += 1
except Exception as e:
# Neočekávaná chyba mimo upload blok (MessageClass, EntryID, apod.)
print(f" CHYBA (item) | {subject[:50]} | {e}")
logging.error("folder=%s | item_error | subject=%s | error=%s", current_path, subject, e)
error_counter[0] += 1
msg = f"složka hotova: přeneseno {count} | skip {skipped}"
print(f"{msg}")
info(conn, run_id, "folder_done", folder=current_path, detail=msg)
for subfolder in folder.Folders:
process_folder(conn, run_id, subfolder, current_path, counter, skipped_counter, error_counter)
# --- MAIN ---
print(f"=== inbox_full_sync v{SCRIPT_VERSION} ===")
print(f"Start: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
conn = sqlite3.connect(DB_PATH)
init_db(conn)
run_id = start_run(conn)
info(conn, run_id, "run_start", detail=f"script={SCRIPT_NAME} version={SCRIPT_VERSION}")
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
inbox = ns.GetDefaultFolder(6) # olFolderInbox — primární schránka, bez Online Archive
mailbox_name = inbox.Parent.Name
print(f"\nSchránka: {mailbox_name}")
info(conn, run_id, "mailbox", detail=mailbox_name)
counter = [0]
skipped_counter = [0]
error_counter = [0]
process_folder(conn, run_id, inbox, f"/{mailbox_name}", counter, skipped_counter, error_counter)
finish_run(conn, run_id,
transferred=counter[0],
skipped=skipped_counter[0],
errors=error_counter[0])
summary = f"přeneseno {counter[0]} | skip {skipped_counter[0]} | chyby {error_counter[0]}"
print(f"\n=== Hotovo: {summary} ===")
info(conn, run_id, "run_done", detail=summary)
print("Uploaduji DB...")
upload_db(conn, run_id)
print(f"Konec: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Chyby logovány do: {LOG_PATH}")
conn.close()
@@ -0,0 +1,464 @@
"""
janssenpc_email_send_new v1.5
Verze: 1.5.2
Datum: 2026-06-01
Popis: Prochází složky Inbox, Deleted Items a Sent Items v Outlooku (MAPI),
ukládá emailové zprávy jako .msg soubory a uploaduje je na https://msgs.buzalka.cz.
Zaznamenává zpracované zprávy do SQLite DB (jnjemails.db) a DB uploaduje na server
jednou za 24 hodin (ne při každém běhu). Podporuje pokračování od posledního
zpracovaného emailu (resume). Folder cesta obsahuje celé jméno schránky
(např. /vbuzalka@its.jnj.com/Inbox). Chyby se logují do jnjemails_errors.log.
v1.5: tracking entry_id, graph_id, is_read, jnj_folder.
Sync průchod posledních 30 dní: detekce smazání, změny přečtení, přesunu složky.
v1.5.2: SQLite logging — tabulky runs + log (flat event log per run).
"""
import win32com.client
import requests
import sqlite3
import urllib3
import logging
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
DB_UPLOAD_MARKER = r"C:\Users\vbuzalka\SQLITE\jnjemails_last_db_upload.txt"
DB_UPLOAD_INTERVAL_H = 24
LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_errors.log"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
SYNC_DAYS = 30
SCRIPT_NAME = "email_send"
SCRIPT_VERSION = "1.5.2"
# olFolderInbox=6, olFolderDeletedItems=3, olFolderSentMail=5
FOLDERS_TO_PROCESS = [6, 3, 5]
UPLOAD_LOG_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails_uploads.log"
logging.basicConfig(
filename=LOG_PATH,
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
_upload_log = logging.getLogger("uploads")
_upload_log.setLevel(logging.DEBUG)
_uh = logging.FileHandler(UPLOAD_LOG_PATH, encoding="utf-8")
_uh.setFormatter(logging.Formatter("%(asctime)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S"))
_upload_log.addHandler(_uh)
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now')),
entry_id TEXT,
graph_id TEXT,
is_read INTEGER DEFAULT 0,
jnj_folder TEXT
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.execute("""
CREATE TABLE IF NOT EXISTS runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
script TEXT NOT NULL,
version TEXT,
started_at TEXT NOT NULL,
finished_at TEXT,
transferred INTEGER DEFAULT 0,
skipped INTEGER DEFAULT 0,
sync_updated INTEGER DEFAULT 0,
sync_deleted INTEGER DEFAULT 0,
errors INTEGER DEFAULT 0
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(id),
level TEXT NOT NULL,
event TEXT NOT NULL,
subject TEXT,
folder TEXT,
graph_id TEXT,
detail TEXT,
created_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_log_run_id ON log(run_id)")
# Migrate existing DB
for col, definition in [
("entry_id", "TEXT"),
("graph_id", "TEXT"),
("is_read", "INTEGER DEFAULT 0"),
("jnj_folder", "TEXT"),
]:
try:
conn.execute(f"ALTER TABLE messages ADD COLUMN {col} {definition}")
except Exception:
pass
conn.commit()
def start_run(conn):
cur = conn.execute(
"INSERT INTO runs (script, version, started_at) VALUES (?, ?, datetime('now'))",
(SCRIPT_NAME, SCRIPT_VERSION)
)
conn.commit()
return cur.lastrowid
def finish_run(conn, run_id, transferred, skipped, sync_updated=0, sync_deleted=0, errors=0):
conn.execute("""
UPDATE runs SET finished_at=datetime('now'),
transferred=?, skipped=?, sync_updated=?, sync_deleted=?, errors=?
WHERE id=?
""", (transferred, skipped, sync_updated, sync_deleted, errors, run_id))
conn.commit()
def db_log(conn, run_id, level, event, subject=None, folder=None, graph_id=None, detail=None):
conn.execute("""
INSERT INTO log (run_id, level, event, subject, folder, graph_id, detail)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (run_id, level, event, subject, folder, graph_id, detail))
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source,
entry_id=None, graph_id=None, is_read=0):
conn.execute("""
INSERT OR IGNORE INTO messages
(message_id, subject, sender, received_at, folder, source,
entry_id, graph_id, is_read, jnj_folder)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source,
entry_id, graph_id, is_read, folder))
conn.commit()
def _db_upload_due() -> bool:
marker = Path(DB_UPLOAD_MARKER)
if not marker.exists():
return True
try:
last = datetime.fromisoformat(marker.read_text().strip())
return (datetime.now() - last).total_seconds() >= DB_UPLOAD_INTERVAL_H * 3600
except Exception:
return True
def _db_upload_mark():
Path(DB_UPLOAD_MARKER).write_text(datetime.now().isoformat())
def upload_db(db_path, force=False):
if not force and not _db_upload_due():
return
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
_db_upload_mark()
def upload_msg(msg_path, filename, folder=""):
_upload_log.info("UPLOAD %s | folder=%s", filename, folder)
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
data={"folder": folder},
timeout=60
)
resp.raise_for_status()
result = resp.json()
_upload_log.info("RESPONSE %s | %s", filename, result)
return result
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def get_item_folder_path(item):
"""Reconstruct full folder path from an Outlook item, e.g. /vbuzalka@its.jnj.com/Inbox/Sub."""
parts = []
obj = item.Parent
while True:
try:
parts.insert(0, obj.Name)
obj = obj.Parent
except Exception:
break
return "/" + "/".join(parts)
def process_folder(conn, run_id, folder, source, folder_path="", counter=None, error_counter=None):
if counter is None:
counter = [0]
if error_counter is None:
error_counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
subject = getattr(item, 'Subject', '?')
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except Exception:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
graph_id = None
try:
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
result = upload_msg(tmp_path, safe_name, current_path)
status = result.get("status", "?")
graph_id = result.get("graph_id")
is_read = 0 if item.UnRead else 1
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, subject, item.SenderEmailAddress,
received, current_path, source,
entry_id=item.EntryID, graph_id=graph_id, is_read=is_read)
db_log(conn, run_id, "INFO", f"upload_{status}",
subject=subject, folder=current_path, graph_id=graph_id)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {subject[:60]}")
except Exception as e:
db_log(conn, run_id, "ERROR", "upload_error",
subject=subject, folder=current_path, detail=str(e))
raise
except Exception as e:
sender = getattr(item, 'SenderEmailAddress', '?')
received = getattr(item, 'ReceivedTime', '?')
print(f" CHYBA | {subject[:40]} | {e}")
logging.error("folder=%s | sender=%s | received=%s | subject=%s | error=%s",
current_path, sender, received, subject, e)
error_counter[0] += 1
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
logging.error("folder=%s | CHYBA SLOŽKY | error=%s", current_path, e)
error_counter[0] += 1
for subfolder in folder.Folders:
process_folder(conn, run_id, subfolder, source, current_path, counter, error_counter)
def sync_recent(conn, run_id, ns):
"""Sync last SYNC_DAYS days: detect deleted emails, read-status changes, folder moves."""
cutoff = (datetime.now() - timedelta(days=SYNC_DAYS)).isoformat()
rows = conn.execute(
"""SELECT message_id, entry_id, graph_id, is_read, jnj_folder
FROM messages
WHERE graph_id IS NOT NULL AND entry_id IS NOT NULL
AND received_at > ?""",
(cutoff,)
).fetchall()
print(f"\n=== Sync průchod: {len(rows)} záznamů za posledních {SYNC_DAYS} dní ===")
deleted = 0
updated = 0
errors = 0
for message_id, entry_id, graph_id, is_read, jnj_folder in rows:
found = False
current_read = None
current_folder = None
try:
item = ns.GetItemFromID(entry_id)
current_read = 0 if item.UnRead else 1
current_folder = get_item_folder_path(item)
found = True
except Exception:
pass
if found:
read_changed = current_read != (is_read or 0)
folder_changed = current_folder != (jnj_folder or "")
if not read_changed and not folder_changed:
continue
payload = {"graph_id": graph_id}
if read_changed:
payload["is_read"] = bool(current_read)
if folder_changed:
payload["folder"] = current_folder
try:
resp = requests.post(
"https://msgs.buzalka.cz/message-update",
headers={"Authorization": f"Bearer {TOKEN}"},
json=payload,
timeout=30,
)
resp.raise_for_status()
result = resp.json()
new_graph_id = result.get("graph_id", graph_id)
conn.execute(
"UPDATE messages SET is_read=?, jnj_folder=?, graph_id=? WHERE message_id=?",
(current_read, current_folder, new_graph_id, message_id)
)
conn.commit()
updated += 1
if read_changed:
db_log(conn, run_id, "INFO", "sync_read_update", graph_id=new_graph_id,
detail=f"{is_read}{current_read}")
if folder_changed:
db_log(conn, run_id, "INFO", "sync_folder_move", graph_id=new_graph_id,
folder=current_folder, detail=f"{jnj_folder}{current_folder}")
changes = []
if read_changed:
changes.append(f"read={current_read}")
if folder_changed:
changes.append(f"folder={current_folder}")
print(f" UPDATE | {', '.join(changes)}")
except Exception as e:
db_log(conn, run_id, "ERROR", "sync_update_failed", graph_id=graph_id, detail=str(e))
logging.error("sync update failed | graph_id=%s | error=%s", graph_id, e)
errors += 1
else:
try:
resp = requests.post(
"https://msgs.buzalka.cz/message-delete",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"graph_id": graph_id},
timeout=30,
)
resp.raise_for_status()
db_log(conn, run_id, "INFO", "sync_delete", graph_id=graph_id, folder=jnj_folder)
conn.execute("DELETE FROM messages WHERE message_id=?", (message_id,))
conn.commit()
deleted += 1
print(f" DELETE | graph_id={graph_id[:20]}...")
except Exception as e:
db_log(conn, run_id, "ERROR", "sync_delete_failed", graph_id=graph_id, detail=str(e))
logging.error("sync delete failed | graph_id=%s | error=%s", graph_id, e)
errors += 1
print(f" → sync hotov: {updated} aktualizováno | {deleted} smazáno | {errors} chyb")
return updated, deleted, errors
# --- MAIN ---
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
run_id = start_run(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
counter = [0]
error_counter = [0]
skipped_total = [0]
for folder_id in FOLDERS_TO_PROCESS:
folder = ns.GetDefaultFolder(folder_id)
mailbox_name = folder.Parent.Name
print(f"\n=== {folder.Name} ({mailbox_name}) ===")
process_folder(conn, run_id, folder, "mailbox", f"/{mailbox_name}", counter, error_counter)
sync_updated, sync_deleted, sync_errors = sync_recent(conn, run_id, ns)
error_counter[0] += sync_errors
finish_run(conn, run_id,
transferred=counter[0],
skipped=skipped_total[0],
sync_updated=sync_updated,
sync_deleted=sync_deleted,
errors=error_counter[0])
print("\nFinální upload DB...")
upload_db(DB_PATH, force=True)
conn.close()
print(f"\nHotovo. Chyby logovány do: {LOG_PATH}")
+224
View File
@@ -0,0 +1,224 @@
"""
janssenpc_email_send v1.1
Verze: 1.1
Datum: 2026-05-28
Popis: Prochází všechny složky Outlooku (MAPI), ukládá emailové zprávy jako .msg
soubory a uploaduje je na https://msgs.buzalka.cz. Zaznamenává zpracované
zprávy do SQLite DB (jnjemails.db) a DB periodicky uploaduje na server.
Podporuje pokračování od posledního zpracovaného emailu (resume).
Nově: před startem zkontroluje jestli Outlook běží, pokud ne, spustí ho
automaticky (cesta z registru) a počká na inicializaci MAPI.
"""
import win32com.client
import requests
import sqlite3
import urllib3
import subprocess
import winreg
import time
from pathlib import Path
from datetime import datetime, timedelta
import tempfile
import io
import psutil
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload"
DB_PATH = r"C:\Users\vbuzalka\SQLITE\jnjemails.db"
PR_INTERNET_MESSAGE_ID = "http://schemas.microsoft.com/mapi/proptag/0x1035001E"
def is_outlook_running():
return any(p.name().lower() == "outlook.exe" for p in psutil.process_iter())
def find_outlook_path():
try:
key = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE,
r"SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\OUTLOOK.EXE")
path, _ = winreg.QueryValueEx(key, "")
winreg.CloseKey(key)
return path
except FileNotFoundError:
return None
def ensure_outlook_running():
outlook_path = find_outlook_path()
print(f"Cesta k Outlooku: {outlook_path}")
if is_outlook_running():
print("Outlook již běží.")
else:
if not outlook_path:
print("CHYBA: Outlook nenalezen v registru.")
exit(1)
print("Outlook neběží, spouštím...")
subprocess.Popen([outlook_path])
print("Čekám na inicializaci Outlooku", end="", flush=True)
for _ in range(30):
time.sleep(2)
print(".", end="", flush=True)
if is_outlook_running():
break
print()
time.sleep(20) # dát čas MAPI vrstvě nastartovat
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id TEXT NOT NULL,
subject TEXT,
sender TEXT,
received_at TEXT,
folder TEXT,
source TEXT,
uploaded_at TEXT DEFAULT (datetime('now'))
)
""")
conn.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_message_id ON messages(message_id)")
conn.commit()
def is_uploaded(conn, message_id):
row = conn.execute(
"SELECT 1 FROM messages WHERE message_id = ? LIMIT 1", (message_id,)
).fetchone()
return row is not None
def save_to_db(conn, message_id, subject, sender, received_at, folder, source):
conn.execute("""
INSERT OR IGNORE INTO messages (message_id, subject, sender, received_at, folder, source)
VALUES (?, ?, ?, ?, ?, ?)
""", (message_id, subject, sender, received_at, folder, source))
conn.commit()
def upload_db(db_path):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"jnjemails_{timestamp}.db"
with open(db_path, "rb") as f:
resp = requests.post(
"https://msgs.buzalka.cz/upload-db",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=60
)
print(f" DB upload: {resp.json()}")
def upload_msg(msg_path, filename):
with open(msg_path, "rb") as f:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f, "application/octet-stream")},
timeout=30
)
resp.raise_for_status()
return resp.json()["status"]
def get_folder_resume_date(conn, folder_path):
row = conn.execute(
"SELECT MAX(received_at) FROM messages WHERE folder = ?",
(folder_path,)
).fetchone()
if not row or not row[0]:
return None
last_dt = datetime.fromisoformat(row[0])
return last_dt - timedelta(hours=1)
def process_folder(conn, folder, source, folder_path="", counter=None):
if counter is None:
counter = [0]
current_path = f"{folder_path}/{folder.Name}"
try:
resume_dt = get_folder_resume_date(conn, current_path)
items = folder.Items
if resume_dt:
resume_str = resume_dt.strftime("%Y/%m/%d %H:%M:%S")
filter_str = f"@SQL=\"urn:schemas:httpmail:datereceived\" > '{resume_str}'"
items = folder.Items.Restrict(filter_str)
print(f"\n Složka: {current_path} | pokračuji od: {resume_str}")
else:
print(f"\n Složka: {current_path} | od začátku")
items.Sort("[ReceivedTime]", False)
count = 0
skipped = 0
for item in items:
try:
if not item.MessageClass.upper().startswith("IPM.NOTE"):
continue
try:
mid = item.PropertyAccessor.GetProperty(PR_INTERNET_MESSAGE_ID)
except:
mid = None
if not mid:
mid = f"entryid:{item.EntryID}"
if is_uploaded(conn, mid):
skipped += 1
continue
with tempfile.TemporaryDirectory() as tmp:
safe_name = f"{item.EntryID[-20:]}.msg"
tmp_path = Path(tmp) / safe_name
item.SaveAs(str(tmp_path), 3)
status = upload_msg(tmp_path, safe_name)
received = item.ReceivedTime.isoformat() if item.ReceivedTime else None
save_to_db(conn, mid, item.Subject, item.SenderEmailAddress,
received, current_path, source)
counter[0] += 1
count += 1
if counter[0] % 1000 == 0:
print(f" → celkem {counter[0]} emailů přeneseno, uploaduji DB...")
upload_db(DB_PATH)
print(f" {status.upper():6} | {item.Subject[:60]}")
except Exception as e:
print(f" CHYBA | {getattr(item, 'Subject', '?')[:40]} | {e}")
print(f" → složka hotova: přeneseno {count} | skip {skipped}")
except Exception as e:
print(f" CHYBA složka {current_path}: {e}")
for subfolder in folder.Folders:
process_folder(conn, subfolder, source, current_path, counter)
# --- MAIN ---
ensure_outlook_running()
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(DB_PATH)
init_db(conn)
outlook = win32com.client.Dispatch("Outlook.Application")
ns = outlook.GetNamespace("MAPI")
for i in range(1, ns.Folders.Count + 1):
root = ns.Folders.Item(i)
# if "Archive" in root.Name:
# print(f"\n=== {root.Name} — přeskočeno ===")
# continue
source = "mailbox"
print(f"\n=== {root.Name} ({source}) ===")
process_folder(conn, root, source)
# Finální DB upload po dokončení
print("\nFinální upload DB...")
upload_db(DB_PATH)
conn.close()
print("\nHotovo.")
+248
View File
@@ -0,0 +1,248 @@
# Název: janssenpc_file_send.py
# Verze: 2.1
# Datum: 2026-05-28
# Popis: Přejmenuje soubory ve složce ##JNJPrenos, odešle je na msgs.buzalka.cz
# a přesune do podsložky Trash. Loguje průběh do file_send.log vedle skriptu.
# Podporuje: PANORAMA Site Contacts (xlsx), Panorama Dashboard (xlsx),
# Site Visit Report (xlsx), Follow-Up Letter (xlsx),
# Clario MayoScore (csv), Clario MayoDiary (csv).
import os
import time
import shutil
import requests
import pandas as pd
from pathlib import Path
from datetime import datetime
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload-dropbox"
SOURCE_DIR = Path(r"C:\Users\vbuzalka\OneDrive - JNJ\##JNJPrenos")
TRASH_DIR = SOURCE_DIR / "Trash"
LOG_FILE = Path(__file__).parent / "file_send.log"
MAYO_DIARY_COLUMNS = [
'Protocol', 'Country', 'Site', 'PI Name', 'Subject ID',
'Report Date', 'Report Start Date/Time', 'Report End Date/Time',
'Stool Frequency', 'Form Number', 'Role', 'Original Source',
]
MAYO_SCORE_COLUMNS = [
'Protocol', 'Study Population', 'Country', 'Site', 'Principal Investigator',
'Participant ID', 'Baseline Stool Frequency', 'Visit', 'Visit Date',
'Endoscopy Completed?', 'Central Endoscopy Score', 'Local Endoscopy Score',
'Partial Mayo Score', 'Full Mayo Score',
]
PANORAMA_COLUMNS = [
'Part', 'Source', 'Sector', 'TA', 'Protocol ID', 'Interventional',
'Region', 'Country Name', 'Institution Name', 'Site City',
'Site Zip/Postal Code', 'Site Address', 'MSID', 'Site ID',
'Site Status', 'SM Full Name', 'PI Name', 'St F Subj Enr Act',
'ID', 'Category', 'Type', 'Priority', 'Severity', 'Description',
'Brief Description - Subject ID', 'Comments', 'Created By',
'Create Date', 'Last Modified Date', 'Start Date', 'Due Date',
'End Date', 'Status', 'Days Outstanding', 'Action Taken',
'Escalated To', 'Visit Report Status', 'Visit Report Approved',
'Visit Report Type', 'Visit Report Status End Date', 'Active',
'Association', 'Deviation', 'Deviation Closed Date', 'Reason For Exclusion'
]
def log(msg: str):
ts = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
line = f"[{ts}] {msg}"
print(line)
with LOG_FILE.open("a", encoding="utf-8") as lf:
lf.write(line + "\n")
def move_to_trash(f: Path):
TRASH_DIR.mkdir(exist_ok=True)
dest = TRASH_DIR / f.name
if dest.exists():
ts = datetime.now().strftime('%Y%m%d_%H%M%S')
dest = TRASH_DIR / f"{f.stem}_{ts}{f.suffix}"
shutil.move(str(f), dest)
def get_timestamp(file_path: str) -> str:
return datetime.fromtimestamp(os.path.getmtime(file_path)).strftime('%Y-%m-%d_%H-%M-%S')
def prejmenuj(directory: Path) -> None:
log(f"--- Přejmenování, adresář: {directory} ---")
files = [f for f in directory.iterdir() if f.is_file()]
log(f" Nalezeno souborů: {len(files)}{[f.name for f in files]}")
for f in files:
filename = f.name
file_path = str(f)
# 0a. CLARIO MAYO DIARY (CSV)
if 'MAYO-DIARY' in filename and filename.endswith('.csv'):
log(f" Detekován MayoDiary: {filename}")
try:
df = pd.read_csv(file_path)
missing = set(MAYO_DIARY_COLUMNS) - set(df.columns)
if not missing:
protocols = df['Protocol'].dropna().unique()
log(f" Protocol: {list(protocols)}")
if len(protocols) > 0:
study = str(protocols[0]).strip()
new_name = f"{get_timestamp(file_path)} {study} Clario MayoDiary.csv"
f.rename(directory / new_name)
log(f" ÚSPĚCH: -> '{new_name}'")
else:
log(f" VAROVÁNÍ: Sloupec Protocol je prázdný.")
else:
log(f" PŘESKOČENO: Chybí sloupce: {missing}")
except Exception as e:
log(f" CHYBA: {e}")
continue
# 0b. CLARIO MAYO SCORE (CSV)
if 'Custom.MayoScoreReport' in filename and filename.endswith('.csv'):
log(f" Detekován MayoScore: {filename}")
try:
df = pd.read_csv(file_path)
missing = set(MAYO_SCORE_COLUMNS) - set(df.columns)
if not missing:
protocols = df['Protocol'].dropna().unique()
log(f" Protocol: {list(protocols)}")
if len(protocols) > 0:
study = str(protocols[0]).strip()
new_name = f"{get_timestamp(file_path)} {study} Clario MayoScore.csv"
f.rename(directory / new_name)
log(f" ÚSPĚCH: -> '{new_name}'")
else:
log(f" VAROVÁNÍ: Sloupec Protocol je prázdný.")
else:
log(f" PŘESKOČENO: Chybí sloupce: {missing}")
except Exception as e:
log(f" CHYBA: {e}")
continue
# Ostatní — jen xlsx
if not filename.endswith('.xlsx'):
log(f" Přeskočeno (neznámý typ): {filename}")
continue
# 1a. PANORAMA SITE CONTACTS (XLSX) — soubor pojmenovaný "PANORAMA Dashboard"
if 'PANORAMA Dashboard' in filename:
log(f" Detekován PANORAMA Site Contacts: {filename}")
try:
with pd.ExcelFile(file_path) as xl:
sheet_names = xl.sheet_names
if 'Site Contacts' in sheet_names:
df_a1 = xl.parse('Site Contacts', nrows=1, header=None)
a1 = str(df_a1.iloc[0, 0]) if not df_a1.empty else ''
else:
a1 = None
# soubor je nyní zavřen — přejmenování proběhne bez chyby
if a1 is None:
log(f" PŘESKOČENO: List 'Site Contacts' nenalezen.")
elif 'Title: Site Contacts' in a1:
new_name = f"{get_timestamp(file_path)} PANORAMA Site Contacts.xlsx"
f.rename(directory / new_name)
log(f" ÚSPĚCH: -> '{new_name}'")
else:
log(f" PŘESKOČENO: A1 neodpovídá vzoru ({a1[:50]})")
except Exception as e:
log(f" CHYBA: {e}")
continue
# 1. PANORAMA DASHBOARD (XLSX)
if 'Panorama Dashboard' in filename:
log(f" Detekován Panorama: {filename}")
try:
df = pd.read_excel(file_path, skiprows=5)
missing = set(PANORAMA_COLUMNS) - set(df.columns)
if not missing:
ids = df['Protocol ID'].dropna().unique()
log(f" Protocol ID: {list(ids)}")
if len(ids) > 0:
study = str(ids[0]).strip()
new_name = f"{get_timestamp(file_path)} {study} Panorama Deviations and Issues.xlsx"
f.rename(directory / new_name)
log(f" ÚSPĚCH: -> '{new_name}'")
else:
log(f" VAROVÁNÍ: Protocol ID je prázdný.")
else:
log(f" PŘESKOČENO: Chybí sloupce: {missing}")
except Exception as e:
log(f" CHYBA: {e}")
continue
# 2. SITE VISIT REPORT A FOLLOW-UP LETTER (XLSX)
try:
df_a1 = pd.read_excel(file_path, nrows=1, header=None)
if not df_a1.empty:
a1 = str(df_a1.iloc[0, 0])
log(f" A1: {a1[:80]}")
is_site_visit = "Title: Site Visit Report Details" in a1
is_follow_up = "Title: Follow-Up Letter Details" in a1
if is_site_visit or is_follow_up:
suffix = "Site Visit Details.xlsx" if is_site_visit else "FUL details.xlsx"
log(f" Detekován {'Site Visit' if is_site_visit else 'Follow-Up Letter'}: {filename}")
df = pd.read_excel(file_path, skiprows=5)
if 'Protocol ID' in df.columns:
ids = df['Protocol ID'].dropna().unique()
log(f" Protocol ID: {list(ids)}")
if len(ids) > 0:
study = str(ids[0]).strip()
new_name = f"{get_timestamp(file_path)} {study} {suffix}"
f.rename(directory / new_name)
log(f" ÚSPĚCH: -> '{new_name}'")
else:
log(f" VAROVÁNÍ: Protocol ID je prázdný.")
else:
log(f" PŘESKOČENO: Chybí sloupec Protocol ID.")
else:
log(f" Přeskočeno (neznámý xlsx obsah): {filename}")
except Exception as e:
log(f" CHYBA: {e}")
log("--- Přejmenování dokončeno ---")
# === HLAVNÍ LOGIKA ===
log("=== Spuštění ===")
log(f"Zdrojový adresář: {SOURCE_DIR} (existuje: {SOURCE_DIR.exists()})")
# 1. Přejmenuj
prejmenuj(SOURCE_DIR)
# 2. Počkej 10 vteřin
log("Čekám 10 vteřin...")
time.sleep(10)
# 3. Odešli soubory
files = [f for f in SOURCE_DIR.iterdir() if f.is_file()]
log(f"Souborů k odeslání: {len(files)}")
for f in files:
log(f" Nalezen: {f.name}")
if not files:
log("Žádné soubory k odeslání.")
else:
for f in files:
try:
with f.open("rb") as fh:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (f.name, fh, "application/octet-stream")},
timeout=120,
)
resp.raise_for_status()
status = resp.json().get('status', '?').upper()
log(f" {status:10} | {f.name}")
move_to_trash(f)
log(f" PŘESUNUTO | {f.name} -> Trash")
except Exception as e:
log(f" CHYBA | {f.name} | {e}")
log("=== Hotovo ===")
+59
View File
@@ -0,0 +1,59 @@
import time
import requests
from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
TOKEN = "13e1bb01-9fd5-44a8-8ce9-4ee27133d340"
UPLOAD_URL = "https://msgs.buzalka.cz/upload-dropbox"
SOURCE_DIR = Path(r"C:\Users\vbuzalka\OneDrive - JNJ\##JNJPrenos")
def upload_file(f: Path):
time.sleep(2)
if not f.exists() or not f.is_file():
return
try:
with f.open("rb") as fh:
resp = requests.post(
UPLOAD_URL,
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (f.name, fh, "application/octet-stream")},
timeout=120,
)
resp.raise_for_status()
print(f" UPLOADED | {f.name}")
f.unlink()
except Exception as e:
print(f" CHYBA | {f.name} | {e}")
class NewFileHandler(FileSystemEventHandler):
def on_created(self, event):
if event.is_directory:
return
upload_file(Path(event.src_path))
def on_moved(self, event):
if event.is_directory:
return
upload_file(Path(event.dest_path))
if __name__ == "__main__":
# Při startu odešli soubory, které už tam jsou
for f in SOURCE_DIR.iterdir():
if f.is_file():
upload_file(f)
observer = Observer()
observer.schedule(NewFileHandler(), str(SOURCE_DIR), recursive=False)
observer.start()
print(f"Hlídám: {SOURCE_DIR}")
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
+164
View File
@@ -0,0 +1,164 @@
#!/usr/bin/env python3
"""
mcp_jnjemails.py | v1.0 | 2026-06-01
MCP server pro dotazování SQLite DB jnjemails (EmailsImport pipeline).
Automaticky načte nejnovější jnjemails_*.db z \\tower\JNJEMAILS\db\.
"""
import sqlite3
import sys
from pathlib import Path
from mcp.server.fastmcp import FastMCP
DB_DIR = Path(r"\\tower\JNJEMAILS\db")
def log(msg: str):
print(msg, file=sys.stderr, flush=True)
def get_latest_db() -> Path:
files = sorted(DB_DIR.glob("jnjemails_*.db"), key=lambda f: f.name)
if not files:
raise FileNotFoundError(f"Žádný jnjemails_*.db soubor v {DB_DIR}")
return files[-1]
def query(sql: str, params=()) -> list[dict]:
db_path = get_latest_db()
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
try:
cur = conn.execute(sql, params)
return [dict(row) for row in cur.fetchall()]
finally:
conn.close()
mcp = FastMCP("jnjemails")
@mcp.tool()
def db_info() -> dict:
"""Informace o aktuální DB: cesta, velikost, název souboru."""
db_path = get_latest_db()
size_kb = db_path.stat().st_size // 1024
return {
"path": str(db_path),
"file": db_path.name,
"size_kb": size_kb,
}
@mcp.tool()
def last_run() -> list[dict]:
"""Kompletní log posledního běhu (runs + log tabulky)."""
return query("""
SELECT r.script, r.version, r.started_at, r.finished_at,
r.transferred, r.skipped, r.errors,
l.level, l.event, l.subject, l.folder, l.detail, l.created_at
FROM log l
JOIN runs r ON r.id = l.run_id
WHERE l.run_id = (SELECT MAX(id) FROM runs)
ORDER BY l.created_at
""")
@mcp.tool()
def list_runs(limit: int = 20) -> list[dict]:
"""Přehled všech běhů skriptů (nejnovější nahoře)."""
return query("""
SELECT id, script, version, started_at, finished_at,
transferred, skipped, errors
FROM runs
ORDER BY started_at DESC
LIMIT ?
""", (limit,))
@mcp.tool()
def errors_last_run() -> list[dict]:
"""Chyby z posledního běhu."""
return query("""
SELECT l.event, l.subject, l.folder, l.detail, l.created_at
FROM log l
WHERE l.run_id = (SELECT MAX(id) FROM runs)
AND l.level = 'ERROR'
ORDER BY l.created_at
""")
@mcp.tool()
def run_log(run_id: int) -> list[dict]:
"""Log konkrétního běhu podle ID (z list_runs)."""
return query("""
SELECT level, event, subject, folder, graph_id, detail, created_at
FROM log
WHERE run_id = ?
ORDER BY created_at
""", (run_id,))
@mcp.tool()
def search_messages(subject: str = "", sender: str = "", folder: str = "", limit: int = 50) -> list[dict]:
"""Hledání v přenesených emailech. Všechny parametry jsou volitelné (LIKE, case-insensitive)."""
conditions = []
params = []
if subject:
conditions.append("subject LIKE ?")
params.append(f"%{subject}%")
if sender:
conditions.append("sender LIKE ?")
params.append(f"%{sender}%")
if folder:
conditions.append("jnj_folder LIKE ?")
params.append(f"%{folder}%")
where = f"WHERE {' AND '.join(conditions)}" if conditions else ""
params.append(limit)
return query(f"""
SELECT id, message_id, subject, sender, received_at,
jnj_folder, graph_id, is_read, source, uploaded_at
FROM messages
{where}
ORDER BY received_at DESC
LIMIT ?
""", params)
@mcp.tool()
def messages_stats() -> dict:
"""Statistiky: celkový počet emailů, podle source, podle složky (top 10)."""
total = query("SELECT COUNT(*) as cnt FROM messages")[0]["cnt"]
by_source = query("SELECT source, COUNT(*) as cnt FROM messages GROUP BY source ORDER BY cnt DESC")
by_folder = query("""
SELECT jnj_folder, COUNT(*) as cnt FROM messages
WHERE jnj_folder IS NOT NULL
GROUP BY jnj_folder
ORDER BY cnt DESC
LIMIT 10
""")
no_graph = query("SELECT COUNT(*) as cnt FROM messages WHERE graph_id IS NULL")[0]["cnt"]
return {
"total": total,
"no_graph_id": no_graph,
"by_source": by_source,
"top_folders": by_folder,
}
@mcp.tool()
def sql(query_str: str) -> list[dict]:
"""Libovolný SELECT dotaz proti DB. Pouze SELECT (ostatní jsou blokovány)."""
stripped = query_str.strip().upper()
if not stripped.startswith("SELECT"):
return [{"error": "Povoleny jsou pouze SELECT dotazy."}]
try:
return query(query_str)
except Exception as e:
return [{"error": str(e)}]
if __name__ == "__main__":
db_path = get_latest_db()
log(f"jnjemails MCP server — DB: {db_path.name}")
mcp.run()
+15
View File
@@ -0,0 +1,15 @@
2026-06-01 14:58:02 | open failed [899C0000969E661B0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000002, and it could not be determined automatically.
2026-06-01 14:59:10 | open failed [899C0000987BE65C0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 14:59:39 | open failed [899C00009A2FBD760000.msg]: Attachment method missing on attachment __attach_version1.0_#00000002, and it could not be determined automatically.
2026-06-01 14:59:56 | open failed [899C00009B8A9A000000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:00:07 | open failed [899C00009C3844690000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:00:17 | open failed [899C00009C3844720000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:08:39 | open failed [899C0000A4683A4B0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:09:12 | open failed [899C0000A5CC64F40000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:09:32 | open failed [899C0000A5CC64FE0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:18:06 | open failed [899C0000B30157A10000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:18:08 | open failed [899C0000B30157A80000.msg]: Attachments of type data MUST have a data stream.
2026-06-01 15:18:50 | open failed [899C0000B588253A0000.msg]: Attachment method missing on attachment __attach_version1.0_#00000003, and it could not be determined automatically.
2026-06-01 15:22:48 | open failed [899C0000BE168C140000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:24:07 | open failed [899C0000C0CF55D50000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
2026-06-01 15:24:18 | open failed [899C0000C1CA96890000.msg]: Attachment method missing on attachment __attach_version1.0_#00000001, and it could not be determined automatically.
+322
View File
@@ -0,0 +1,322 @@
# parse_emails_v1.0
**Název:** parse_emails_v1.0.py
**Verze:** 1.0
**Datum:** 2026-06-01
**Autor:** vladimir.buzalka
---
## Účel
Jednorázový import všech `.msg` souborů do MongoDB. Z každého souboru extrahuje **všechny dostupné vlastnosti** — podobně jako EXIF u fotek.
- **DB:** `emaily`
- **Kolekce:** `vbuzalka@its.jnj.com`
- `_id` = Internet Message-ID (nebo `filename:<stem>` jako fallback)
- Bezpečné přerušit a opakovat — upsert podle `_id`
---
## Zdroje dat
Výhradně `.msg` soubory — žádná závislost na SQLite ani jiné DB.
| Z každého .msg se extrahuje |
|---|
| Předmět, normalized subject |
| Odesílatel (email, jméno, SMTP adresa) |
| Příjemci To/CC/BCC — strukturovaně `[{type, email, name}]` |
| Čas doručení a odeslání (UTC) |
| Tělo plaintext + HTML (max 2 MB) |
| Přílohy — metadata: jméno, velikost, MIME typ, inline flag |
| Internet headers — X-Originating-IP, Received, DKIM, X-Mailer, ... |
| MAPI: důležitost, citlivost, příznak, konverzační vlákno, kategorie |
| In-Reply-To, References — pro rekonstrukci vlákna |
| Všechny raw MAPI properties jako `{0xXXXX: value}` |
---
## Konfigurace
Konstanty přímo v kódu:
| Konstanta | Hodnota |
|---|---|
| `MSGS_DIR` | `\\tower\JNJEMAILS` |
| `MONGO_URI` | `mongodb://192.168.1.76:27017` |
| `MONGO_DB` | `emaily` |
| `MONGO_COL` | `vbuzalka@its.jnj.com` |
| `BATCH_SIZE` | 200 dokumentů na jeden bulk_write |
| `LOG_FILE` | `parse_emails_errors.log` (vedle skriptu) |
---
## Spouštění
**Venv:** `U:\PythonProject\Janssen\.venv\Scripts\python.exe`
**1. spuštění — kompletní import:**
```cmd
"U:\PythonProject\Janssen\.venv\Scripts\python.exe" "U:\PythonProject\Janssen\EmailsImport\parse_emails_v1.0.py"
```
**Pokračování po přerušení (druhý den):**
```cmd
"U:\PythonProject\Janssen\.venv\Scripts\python.exe" "U:\PythonProject\Janssen\EmailsImport\parse_emails_v1.0.py" --skip-existing
```
**Test na malém vzorku:**
```cmd
"U:\PythonProject\Janssen\.venv\Scripts\python.exe" "U:\PythonProject\Janssen\EmailsImport\parse_emails_v1.0.py" --limit 50 --no-indexes
```
### Všechny parametry
| Parametr | Popis |
|---|---|
| `--skip-existing` | Načte seznam hotových souborů z MongoDB a přeskočí je. Použij pro pokračování po přerušení. |
| `--limit N` | Zpracuje jen prvních N souborů. Vhodné pro test. |
| `--no-indexes` | Nevytváří indexy na konci. Použij pokud je přerušíš uprostřed — indexy vytvoř ručně až je vše hotové. |
---
## Průběh na konzoli
Každý email na jednom řádku:
```
1/69371 OK RE: Protocol deviation CZ10022 jan.novak@its.jnj.com
2/69371 OK UCO3001: Draft FUL pro DD5-CZ10022 monitor@4gclinical.com
3/69371 ERR ? ?
```
Každých 500 emailů oddělovač s průběhem:
```
────────────────────────────────────────────────────────────────────────────────
Průběh: ok=498 err=2 0.4 msg/s ETA 47h12m
────────────────────────────────────────────────────────────────────────────────
```
Na konci souhrn:
```
====================================================
Vysledek: ok=69300 | skip=0 | err=71
Celkovy cas: 47h 23m 10s
Dokumentu v kolekci: 69300
```
---
## Struktura dokumentu v MongoDB
```json
{
"_id": "<message-id@domain>",
"filename": "7A3F...0000.msg",
"subject": "RE: Protocol deviation CZ10022",
"normalized_subject": "Protocol deviation CZ10022",
"importance": 1,
"sensitivity": 0,
"flag_status": 0,
"read_receipt_requested": false,
"delivery_receipt_requested": false,
"has_attachments": true,
"attachment_count": 1,
"message_size_bytes": 284512,
"conversation_topic": "Protocol deviation CZ10022",
"conversation_index": "AcqX...",
"in_reply_to": "<prev-id@domain>",
"internet_references": ["<ref1@domain>", "<ref2@domain>"],
"categories": ["UCO3001"],
"received_at": "2026-05-15T09:23:11",
"sent_at": "2026-05-15T09:21:44",
"sender": {
"email": "jan.novak@its.jnj.com",
"name": "Novák Jan",
"smtp": "jan.novak@its.jnj.com"
},
"to": "vladimir.buzalka@its.jnj.com",
"cc": "petra.free@its.jnj.com",
"bcc": "",
"display_to": "Buzalka Vladimir",
"display_cc": "Free Petra",
"recipients": [
{ "type": "to", "email": "vbuzalka@its.jnj.com", "name": "Buzalka Vladimir" },
{ "type": "cc", "email": "petra.free@its.jnj.com", "name": "Free Petra" }
],
"body_text": "Dobrý den,\n\nposílám...",
"body_html": "<html>...",
"attachments": [
{
"filename": "PD_report_CZ10022.pdf",
"size_bytes": 284512,
"mime_type": "application/pdf",
"content_id": null,
"is_inline": false
}
],
"headers": {
"message_id": "<xxx@domain>",
"x_originating_ip": "10.24.1.55",
"x_mailer": "Microsoft Outlook 16.0",
"received": ["from SMTP01...", "from EX2019..."],
"x_spam_status": "No"
},
"mapi": {
"0x0017": 1,
"0x0036": 0,
"0x0070": "Protocol deviation CZ10022",
"0x1035": "<message-id@domain>",
"0x1042": "<prev-id@domain>"
},
"parsed_at": "2026-06-01T20:00:00"
}
```
---
## Hodnotové kódy
| Pole | Hodnota | Význam |
|---|---|---|
| `importance` | 0 | Nízká |
| | 1 | Normální |
| | 2 | Vysoká |
| `sensitivity` | 0 | Normální |
| | 1 | Osobní |
| | 2 | Soukromé |
| | 3 | Důvěrné |
| `flag_status` | 0 | Bez příznaku |
| | 1 | Označeno (follow up) |
| | 2 | Dokončeno |
---
## MongoDB indexy
Automaticky vytvořeny na konci importu (`--no-indexes` přeskočí):
| Index | Pole |
|---|---|
| Chronologický | `received_at`, `sent_at` |
| Odesílatel | `sender.email` |
| Soubor | `filename` (unique) |
| Konverzace | `conversation_topic` |
| Filtry | `has_attachments`, `categories`, `importance`, `flag_status` |
| Full-text | `subject` + `body_text` + `to` + `cc` (text index `text_search`) |
---
## Ukázkové dotazy (MongoDB shell / MCP)
**Emaily o UCO3001 s přílohou:**
```javascript
db["vbuzalka@its.jnj.com"].find({
$text: { $search: "UCO3001" },
has_attachments: true
}).sort({ received_at: -1 })
```
**Emaily od konkrétního odesílatele:**
```javascript
db["vbuzalka@its.jnj.com"].find({
"sender.email": /covance/i
}).sort({ received_at: -1 })
```
**Celé konverzační vlákno:**
```javascript
db["vbuzalka@its.jnj.com"].find({
conversation_topic: "Protocol deviation CZ10022"
}).sort({ received_at: 1 })
```
**Označené emaily (follow up):**
```javascript
db["vbuzalka@its.jnj.com"].find({ flag_status: 1 })
```
**Vysoká priorita s přílohou:**
```javascript
db["vbuzalka@its.jnj.com"].find({
importance: 2,
has_attachments: true
}).sort({ received_at: -1 })
```
**Statistiky podle odesílatele (top 20):**
```javascript
db["vbuzalka@its.jnj.com"].aggregate([
{ $group: { _id: "$sender.email", count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 20 }
])
```
**Emaily s PDF přílohou:**
```javascript
db["vbuzalka@its.jnj.com"].find({
"attachments.mime_type": "application/pdf"
})
```
**Hledání v těle emailu:**
```javascript
db["vbuzalka@its.jnj.com"].find({
$text: { $search: "inactivation notification" }
})
```
---
## Chybový log
Soubory které selhaly jsou zalogrovány do `parse_emails_errors.log` vedle skriptu:
```
2026-06-01 20:14:33 | open failed [7A3F...0000.msg]: <důvod>
2026-06-01 20:15:01 | extract_message failed [8B2C...0000.msg]: <důvod>
```
---
## Výkon
| Parametr | Hodnota |
|---|---|
| Počet souborů | ~69 000 |
| Rychlost | ~0.4 msg/s (síť SMB + htmlBody dekódování) |
| Odhadovaný čas | 48 hodin (přes noc, s pokračováním) |
| Batch size | 200 dokumentů / bulk_write |
| Odhadovaná velikost DB | 25 GB |
---
## Závislosti
```
extract-msg==0.55.0
pymongo
python-dateutil
```
Instalace do venv:
```cmd
"U:\PythonProject\Janssen\.venv\Scripts\pip.exe" install extract-msg pymongo python-dateutil
```
---
## Historie verzí
| Verze | Datum | Změna |
|---|---|---|
| 1.0 | 2026-06-01 | Iniciální verze |
+644
View File
@@ -0,0 +1,644 @@
"""
parse_emails_v1.0.py
Nazev: parse_emails_v1.0.py
Verze: 1.0
Datum: 2026-06-01
Autor: vladimir.buzalka
Popis:
Parsuje vsechny .msg soubory z MSGS_DIR a importuje je jako dokumenty
do MongoDB. Z kazdeho souboru extrahuje VSECHNY dostupne vlastnosti —
podobne jako EXIF u fotek:
- predmet, odesilatel, prijemci (To/CC/BCC s typy)
- cas doruceni a odeslani (UTC)
- telo plaintext + HTML (max 2 MB)
- prilohy (metadata: jmeno, velikost, MIME typ, inline flag)
- internet headers (X-Originating-IP, Received, DKIM, ...)
- MAPI vlastnosti: dulezitost, citlivost, priznak, konverzacni vlakno,
kategorie, In-Reply-To, References, ...
- vsechny raw MAPI properties jako {0xXXXX: value}
DB: emaily
Kolekce: vbuzalka@its.jnj.com
_id: Internet Message-ID (nebo "filename:<stem>" jako fallback)
Bezpecne prerusit a opakovat:
- upsert podle _id — duplicity se automaticky prepisi
- --skip-existing nacte seznam hotovych souboru z MongoDB a
preskoci je => pokracovani po preruseni bez ztraty prace
Spousteni:
python parse_emails_v1.0.py # kompletni import
python parse_emails_v1.0.py --limit 50 # test na prvnich 50
python parse_emails_v1.0.py --skip-existing # pokracovani po preruseni
python parse_emails_v1.0.py --no-indexes # bez vytvoreni indexu na konci
Vystup na konzoli:
Kazdy email na jednom radku:
<poradi>/<celkem> OK/ERR <predmet 60 znaku> <odesilatel>
Kazych 500 emailu: oddelovac s prubehem, rychlosti a ETA.
Na konci: souhrn ok/skip/err, celkovy cas, pocet dokumentu v kolekci.
Zavislosti:
pymongo, extract-msg, python-dateutil
Python 3.10+, Windows nebo Linux
Pristup k \\\\tower\\JNJEMAILS (SMB share)
MongoDB na 192.168.1.76:27017
Struktura dokumentu v MongoDB:
_id Internet Message-ID (nebo filename: fallback)
filename jmeno .msg souboru (20znakovy hex + .msg)
subject predmet zpravy
normalized_subject predmet bez RE:/FW: prefixu
importance 0=nizka 1=normalni 2=vysoka
sensitivity 0=normalni 1=osobni 2=soukrome 3=duverne
flag_status 0=bez priznaku 1=oznaceno 2=dokonceno
read_receipt_requested bool
delivery_receipt_requested bool
has_attachments bool
attachment_count int
message_size_bytes velikost .msg souboru na disku
conversation_topic tema vlakna (PR_CONVERSATION_TOPIC)
conversation_index base64 PR_CONVERSATION_INDEX
in_reply_to Message-ID predchozi zpravy
internet_references [Message-ID] — cela historia vlakna
categories [str] — MAPI kategorie / stitky
read_receipt_requested bool
delivery_receipt_requested bool
received_at datetime UTC — cas doruceni
sent_at datetime UTC — cas odeslani
sender.email emailova adresa odesilatele
sender.name zobrazovane jmeno odesilatele
sender.smtp SMTP adresa (pro interni EX adresy)
to retezec To (tak jak v Outlooku)
cc retezec CC
bcc retezec BCC
display_to PR_DISPLAY_TO (zkraceny seznam)
display_cc PR_DISPLAY_CC
recipients [{type, email, name}] — to/cc/bcc s typy
body_text plain text telo
body_html HTML telo (max 2 MB, None pokud neni)
attachments [{filename, size_bytes, mime_type,
content_id, is_inline}]
headers dict internet headers (lowercase_s_podtrzitky)
mapi dict vsech raw MAPI properties {0xXXXX: value}
parsed_at datetime UTC — cas parsovani
Indexy (vytvoreny automaticky na konci):
received_at, sent_at, sender.email, filename (unique),
conversation_topic, has_attachments, categories, importance,
flag_status, text_search (subject + body_text + to + cc)
Chyby:
Soubory ktere selhaly jsou zalogiovany do parse_emails_errors.log
v adresari skriptu. Radek: timestamp | open/extract failed | duvod.
Historie verzi:
1.0 2026-06-01 Inicialni verze
"""
import sys
import re
import logging
import argparse
import base64
from pathlib import Path
from datetime import datetime, timezone
from typing import Optional
import extract_msg
from dateutil import parser as dtparser
from pymongo import MongoClient, UpdateOne, ASCENDING, TEXT
if hasattr(sys.stdout, "reconfigure"):
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
# ─── KONFIGURACE ──────────────────────────────────────────────────────────────
MSGS_DIR = Path(r"\\tower\JNJEMAILS")
MONGO_URI = "mongodb://192.168.1.76:27017"
MONGO_DB = "emaily"
MONGO_COL = "vbuzalka@its.jnj.com"
BATCH_SIZE = 200
LOG_FILE = Path(__file__).parent / "parse_emails_errors.log"
SCRIPT_VERSION = "1.0"
# ──────────────────────────────────────────────────────────────────────────────
logging.basicConfig(
filename=str(LOG_FILE),
level=logging.ERROR,
format="%(asctime)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
encoding="utf-8",
)
# ─── Pomocné funkce ───────────────────────────────────────────────────────────
def safe(obj, *attrs, default=None):
"""Bezpecne cteni atributu — vrati prvni non-None hodnotu."""
for attr in attrs:
try:
val = getattr(obj, attr, None)
if val is None:
continue
if isinstance(val, str) and not val.strip():
continue
return val
except Exception:
continue
return default
def parse_date(raw) -> Optional[datetime]:
"""Libovolny datum -> UTC datetime bez tzinfo (pro MongoDB)."""
if raw is None:
return None
if isinstance(raw, datetime):
if raw.tzinfo:
return raw.astimezone(timezone.utc).replace(tzinfo=None)
return raw
try:
dt = dtparser.parse(str(raw))
if dt.tzinfo:
return dt.astimezone(timezone.utc).replace(tzinfo=None)
return dt
except Exception:
return None
def to_bson(val):
"""Konvertuje hodnotu na BSON-serializovatelny typ."""
if isinstance(val, bytes):
return val.hex() if len(val) <= 128 else f"<bytes:{len(val)}>"
if isinstance(val, datetime):
return parse_date(val)
if isinstance(val, (str, int, float, bool, type(None))):
return val
if isinstance(val, list):
return [to_bson(v) for v in val]
try:
return int(val)
except Exception:
pass
return str(val)
# ─── Extrakce částí zprávy ────────────────────────────────────────────────────
def extract_headers(msg) -> dict:
headers = {}
try:
hdr = msg.header
if not hdr:
return {}
from email.header import decode_header as _dh
def _decode(v: str) -> str:
try:
parts = _dh(v)
out = ""
for part, enc in parts:
out += part.decode(enc or "utf-8", errors="replace") if isinstance(part, bytes) else part
return out
except Exception:
return v
for key in set(hdr.keys()):
k = key.lower().replace("-", "_")
vals = [_decode(v) for v in hdr.get_all(key, [])]
headers[k] = vals if len(vals) > 1 else (vals[0] if vals else "")
except Exception as e:
logging.error("extract_headers: %s", e)
return headers
def extract_recipients(msg) -> list:
result = []
type_map = {1: "to", 2: "cc", 3: "bcc"}
try:
for r in msg.recipients:
rtype = getattr(r, "type", 1)
try:
rtype = int(rtype)
except Exception:
try:
rtype = int(rtype.value)
except Exception:
rtype = 1
rec = {
"type": type_map.get(rtype, "to"),
"email": safe(r, "email", default=""),
"name": safe(r, "name", default=""),
}
result.append(rec)
except Exception as e:
logging.error("extract_recipients: %s", e)
return result
def extract_attachments(msg) -> list:
result = []
try:
for att in msg.attachments:
fname = safe(att, "longFilename", "shortFilename", default="")
if not fname:
continue
size = 0
try:
d = att.data
size = len(d) if d else 0
except Exception:
pass
result.append({
"filename": fname,
"size_bytes": size,
"mime_type": safe(att, "mimetype", "mimeType", default="application/octet-stream"),
"content_id": safe(att, "cid", default=None),
"is_inline": bool(safe(att, "isInline", default=False)),
})
except Exception as e:
logging.error("extract_attachments: %s", e)
return result
def extract_mapi_props(msg) -> dict:
"""Vsechny raw MAPI properties jako {0xXXXX: value}."""
result = {}
try:
props = msg.props
if not hasattr(props, "items"):
return {}
for key, prop in props.items():
try:
val = to_bson(prop.value)
prop_id = f"0x{key[:4].upper()}" if len(key) >= 4 else f"0x{key.upper()}"
result[prop_id] = val
except Exception:
pass
except Exception as e:
logging.error("extract_mapi_props: %s", e)
return result
# ─── Hlavní extrakce ─────────────────────────────────────────────────────────
def extract_message(msg_path: Path) -> Optional[dict]:
"""Parsuje jeden .msg soubor -> MongoDB dokument."""
try:
msg = extract_msg.Message(str(msg_path))
except Exception as e:
logging.error("open failed [%s]: %s", msg_path.name, e)
return None
try:
# ── Message-ID ────────────────────────────────────────────────
mid = None
for attr in ("messageId", "message_id", "internetMessageId"):
mid = safe(msg, attr)
if mid:
break
if not mid:
mid = f"filename:{msg_path.stem}"
mid = str(mid).strip()
# ── Předmět ───────────────────────────────────────────────────
try:
subject = msg.subject or ""
except Exception:
subject = ""
normalized_subject = safe(msg, "normalizedSubject", "normalized_subject", default="")
# ── Tělo ──────────────────────────────────────────────────────
try:
body_text = msg.body or ""
except Exception:
body_text = ""
body_html = None
try:
bh = msg.htmlBody
if isinstance(bh, bytes):
bh = bh.decode("utf-8", errors="replace")
if bh:
body_html = bh if len(bh) <= 2 * 1024 * 1024 else bh[:2 * 1024 * 1024]
except Exception:
pass
# ── Odesílatel ────────────────────────────────────────────────
try:
sender_email = msg.sender or ""
except Exception:
sender_email = ""
sender_name = safe(msg, "senderName", "sender_name", default="")
sender_smtp = safe(msg, "senderSmtpAddress", "sent_representing_smtp_address", default="")
# ── Příjemci ──────────────────────────────────────────────────
recipients = extract_recipients(msg)
try:
to_raw = msg.to or ""
except Exception:
to_raw = ""
try:
cc_raw = msg.cc or ""
except Exception:
cc_raw = ""
try:
bcc_raw = getattr(msg, "bcc", None) or ""
except Exception:
bcc_raw = ""
display_to = safe(msg, "displayTo", "display_to", default="")
display_cc = safe(msg, "displayCc", "display_cc", default="")
# ── Časy ──────────────────────────────────────────────────────
try:
received_at = parse_date(msg.date)
except Exception:
received_at = None
sent_at = None
for attr in ("clientSubmitTime", "client_submit_time", "sentOn"):
v = safe(msg, attr)
if v:
sent_at = parse_date(v)
break
# ── MAPI vlastnosti ───────────────────────────────────────────
importance = 1
try:
v = msg.importance
if v is not None:
importance = int(v)
except Exception:
pass
sensitivity = 0
try:
v = getattr(msg, "sensitivity", None)
if v is not None:
sensitivity = int(v)
except Exception:
pass
flag_status = 0
try:
v = safe(msg, "flagStatus", "flag_status")
if v is not None:
flag_status = int(v)
except Exception:
pass
conversation_topic = safe(msg, "conversationTopic", "conversation_topic", default="")
conversation_index = ""
try:
ci = safe(msg, "conversationIndex", "conversation_index")
if isinstance(ci, bytes):
conversation_index = base64.b64encode(ci).decode()
elif ci:
conversation_index = str(ci)
except Exception:
pass
in_reply_to = safe(msg, "inReplyTo", "in_reply_to", default="")
internet_refs = []
try:
refs = safe(msg, "internetReferences", "internet_references")
if isinstance(refs, list):
internet_refs = refs
elif isinstance(refs, str) and refs:
internet_refs = [r.strip() for r in refs.split() if r.strip()]
except Exception:
pass
categories = []
try:
cats = safe(msg, "categories")
if isinstance(cats, list):
categories = [str(c) for c in cats if c]
elif isinstance(cats, str) and cats:
categories = [c.strip() for c in re.split(r"[;,]", cats) if c.strip()]
except Exception:
pass
read_receipt = bool(safe(msg, "readReceiptRequested", "read_receipt_requested", default=False))
delivery_receipt = bool(safe(msg, "deliveryReceiptRequested", "delivery_receipt_requested", default=False))
# ── Internet headers ──────────────────────────────────────────
headers = extract_headers(msg)
if not in_reply_to:
in_reply_to = headers.get("in_reply_to", "")
if not internet_refs:
refs_str = headers.get("references", "")
if isinstance(refs_str, str) and refs_str:
internet_refs = [r.strip() for r in refs_str.split() if r.strip()]
# ── Přílohy ───────────────────────────────────────────────────
attachments = extract_attachments(msg)
# ── Raw MAPI ──────────────────────────────────────────────────
mapi_raw = extract_mapi_props(msg)
msg.close()
# ── Dokument ──────────────────────────────────────────────────
return {
"_id": mid,
"filename": msg_path.name,
"subject": subject,
"normalized_subject": normalized_subject,
"importance": importance,
"sensitivity": sensitivity,
"flag_status": flag_status,
"read_receipt_requested": read_receipt,
"delivery_receipt_requested": delivery_receipt,
"has_attachments": len(attachments) > 0,
"attachment_count": len(attachments),
"message_size_bytes": msg_path.stat().st_size,
"conversation_topic": conversation_topic,
"conversation_index": conversation_index,
"in_reply_to": in_reply_to,
"internet_references": internet_refs,
"categories": categories,
"received_at": received_at,
"sent_at": sent_at,
"sender": {
"email": sender_email,
"name": sender_name,
"smtp": sender_smtp,
},
"to": to_raw,
"cc": cc_raw,
"bcc": bcc_raw,
"display_to": display_to,
"display_cc": display_cc,
"recipients": recipients,
"body_text": body_text,
"body_html": body_html,
"attachments": attachments,
"headers": headers,
"mapi": mapi_raw,
"parsed_at": datetime.now(timezone.utc).replace(tzinfo=None),
}
except Exception as e:
logging.error("extract_message failed [%s]: %s", msg_path.name, e)
return None
# ─── MongoDB indexy ───────────────────────────────────────────────────────────
def create_indexes(col):
print(" Vytvarim indexy...")
col.create_index([("received_at", ASCENDING)])
col.create_index([("sent_at", ASCENDING)])
col.create_index([("sender.email", ASCENDING)])
col.create_index([("filename", ASCENDING)], unique=True, sparse=True)
col.create_index([("conversation_topic", ASCENDING)])
col.create_index([("has_attachments", ASCENDING)])
col.create_index([("categories", ASCENDING)])
col.create_index([("importance", ASCENDING)])
col.create_index([("flag_status", ASCENDING)])
col.create_index([
("subject", TEXT),
("body_text", TEXT),
("to", TEXT),
("cc", TEXT),
], name="text_search", default_language="none")
print(" Indexy hotovy.")
# ─── MAIN ─────────────────────────────────────────────────────────────────────
def main():
ap = argparse.ArgumentParser(description=f"parse_emails v{SCRIPT_VERSION}")
ap.add_argument("--msgs-dir", default=str(MSGS_DIR),
help="Cesta k .msg souborum")
ap.add_argument("--limit", type=int, default=0,
help="Zpracovat max N souboru (0 = vse)")
ap.add_argument("--skip-existing", action="store_true",
help="Preskocit soubory ktere jiz jsou v MongoDB (pokracovani)")
ap.add_argument("--no-indexes", action="store_true",
help="Nevytvorit indexy na konci")
args = ap.parse_args()
msgs_dir = Path(args.msgs_dir)
start = datetime.now()
print(f"=== parse_emails v{SCRIPT_VERSION} ===")
print(f"Start: {start.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Zdroj: {msgs_dir}")
print(f"MongoDB: {MONGO_URI} -> {MONGO_DB}.{MONGO_COL}")
# MongoDB
client = MongoClient(MONGO_URI, serverSelectionTimeoutMS=5000)
try:
client.admin.command("ping")
print(" MongoDB OK")
except Exception as e:
print(f" CHYBA: MongoDB neni dostupna -- {e}")
sys.exit(1)
col = client[MONGO_DB][MONGO_COL]
# Skip existing — nacti seznam uz importovanych souboru
existing: set = set()
if args.skip_existing:
print(" Nacitam existujici zaznamy z MongoDB...")
existing = set(col.distinct("filename"))
print(f" {len(existing)} jiz importovano")
# Scan
print(f"\nSkenuji {msgs_dir} ...")
all_files = sorted(msgs_dir.glob("*.msg"))
if args.limit:
all_files = all_files[:args.limit]
to_process = [f for f in all_files if f.name not in existing]
skipped = len(all_files) - len(to_process)
total = len(to_process)
print(f" Celkem .msg: {len(all_files)}")
print(f" Preskoceno: {skipped}")
print(f" Ke zpracovani: {total}\n")
if total == 0:
print("Neni co importovat.")
client.close()
return
batch = []
ok_count = 0
err_count = 0
def flush():
if not batch:
return
try:
col.bulk_write(batch, ordered=False)
except Exception as e:
logging.error("bulk_write: %s", e)
print(f" CHYBA bulk_write: {e}")
batch.clear()
for i, msg_path in enumerate(to_process, 1):
doc = extract_message(msg_path)
if doc is None:
err_count += 1
else:
batch.append(UpdateOne({"_id": doc["_id"]}, {"$set": doc}, upsert=True))
ok_count += 1
if len(batch) >= BATCH_SIZE:
flush()
# Výpis každého emailu
status = "ERR " if doc is None else "OK "
subject_str = (doc.get("subject") or "")[:60] if doc else "?"
sender_str = (doc.get("sender", {}).get("email") or "")[:40] if doc else "?"
print(f" {i:>6}/{total} {status} {subject_str:<60} {sender_str}")
if i % 500 == 0:
elapsed = (datetime.now() - start).total_seconds()
rate = i / elapsed if elapsed > 0 else 0
eta_s = int((total - i) / rate) if rate > 0 else 0
print(f" {''*80}")
print(f" Průběh: ok={ok_count} err={err_count} "
f"{rate:.1f} msg/s ETA {eta_s//3600}h{(eta_s%3600)//60}m")
print(f" {''*80}")
flush()
elapsed_total = (datetime.now() - start).total_seconds()
print(f"\n{'='*52}")
print(f"Vysledek: ok={ok_count} | skip={skipped} | err={err_count}")
print(f"Celkovy cas: {int(elapsed_total//3600)}h {int((elapsed_total%3600)//60)}m {int(elapsed_total%60)}s")
print(f"Dokumentu v kolekci: {col.count_documents({})}")
if not args.no_indexes:
print()
create_indexes(col)
print(f"\nKonec: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
if err_count:
print(f"Chyby logovany do: {LOG_FILE}")
client.close()
if __name__ == "__main__":
main()
+93
View File
@@ -0,0 +1,93 @@
import sqlite3
import os
from pathlib import Path
DB_DIR = r"\\tower\JNJEMAILS\db"
MSG_DIR = r"\\tower\JNJEMAILS"
def find_latest_db(db_dir):
dbs = sorted(Path(db_dir).glob("*.db"), key=lambda p: p.stat().st_mtime)
if not dbs:
raise FileNotFoundError(f"Žádná .db databáze v {db_dir}")
return dbs[-1]
def count_msg_files(msg_dir):
return sum(1 for f in Path(msg_dir).iterdir() if f.suffix.lower() == ".msg")
def stats(db_path):
con = sqlite3.connect(db_path)
cur = con.cursor()
total = cur.execute("SELECT COUNT(*) FROM messages").fetchone()[0]
date_range = cur.execute(
"SELECT MIN(received_at), MAX(received_at) FROM messages WHERE received_at IS NOT NULL"
).fetchone()
top_senders = cur.execute(
"SELECT sender, COUNT(*) AS n FROM messages GROUP BY sender ORDER BY n DESC LIMIT 10"
).fetchall()
by_folder = cur.execute(
"SELECT folder, COUNT(*) AS n FROM messages GROUP BY folder ORDER BY n DESC"
).fetchall()
by_source = cur.execute(
"SELECT source, COUNT(*) AS n FROM messages GROUP BY source ORDER BY n DESC"
).fetchall()
by_month = cur.execute(
"""SELECT SUBSTR(received_at, 1, 7) AS month, COUNT(*) AS n
FROM messages WHERE received_at IS NOT NULL
GROUP BY month ORDER BY month"""
).fetchall()
con.close()
return total, date_range, top_senders, by_folder, by_source, by_month
def main():
db_path = find_latest_db(DB_DIR)
print(f"Databáze: {db_path.name}")
print(f"Velikost: {db_path.stat().st_size / 1024 / 1024:.1f} MB")
msg_count = count_msg_files(MSG_DIR)
print(f".msg souborů ve složce: {msg_count:,}")
total, date_range, top_senders, by_folder, by_source, by_month = stats(db_path)
print(f"\n{'-'*50}")
print(f" Emailu v databazi: {total:,}")
if date_range[0]:
print(f" Nejstarsi: {date_range[0]}")
print(f" Nejnovejsi: {date_range[1]}")
if by_folder:
print(f"\n Slozky:")
for folder, n in by_folder:
print(f" {folder or '(bez slozky)':<35} {n:>6,}")
if by_source:
print(f"\n Zdroje (source):")
for src, n in by_source:
print(f" {src or '(prazdny)':<35} {n:>6,}")
if by_month:
print(f"\n Emaily po mesicich:")
for month, n in by_month:
bar = "#" * min(n // 20, 40)
print(f" {month} {bar:<40} {n:>5,}")
if top_senders:
print(f"\n Top 10 odesilatelU:")
for sender, n in top_senders:
print(f" {(sender or '(neznamy)')[:50]:<52} {n:>5,}")
print(f"{'-'*50}")
if __name__ == "__main__":
main()