JEPA Calendar Training Report
Date: 2026-06-12
Summary
The calendar data was prepared and trained in two stages.
First, the new calendar data was trained by itself for 6 repeated passes. Each pass used a fresh validation split. This tested whether the model could learn from the calendar data alone.
Second, the calendar data was mixed back into the original public training set and trained for 4 more repeated passes. Each mixed pass trained every public dataset and then trained one combined model across all public data.
The best final result came from mixed pass 2. Its combined validation loss was 0.0064. The previous initial combined pass had a validation loss of 0.0183, so the best new mixed run was about 64.9 percent lower on the JEPA validation objective.
This means the model learned the self supervised prediction task better in the new mixed run. It does not, by itself, prove that real world scheduling decisions improved. That still needs a downstream scheduler evaluation.
What Data Was Used
New Calendar Hugging Face Data
The final calendar set included 6 sources:
| Source type | What it adds |
|---|---|
| Korean academic calendar questions and answers | Month level academic schedule examples. |
| German calendar intent text | Calendar query, set, and remove examples in German. |
| German MASSIVE calendar annotations | Calendar intent examples with richer slot annotations. |
| Microsoft scheduling availability examples | Participant availability blocks and meeting constraints. |
| SGD calendar dialogue turns | Conversational examples about creating, checking, and confirming calendar events. |
| Konkuk academic calendar events | Real dated academic events with start and end dates. |
The calendar set contained:
| Count | Value |
|---|---|
| Prepared calendar windows | 15,532 |
| Typical train windows per split | About 13,900 to 14,000 |
| Typical validation windows per split | About 1,500 to 1,600 |
| Preparation errors | 0 |
Original Public Training Data
The mixed phase used the original public training data plus the new calendar set. The mixed set included:
| Dataset group | What it represents |
|---|---|
| Public holiday calendar data | All day calendar and holiday structure. |
| New Hugging Face calendar data | Calendar language, scheduling constraints, and academic events. |
| OpenProject and Taiga data | Project tasks, work packages, and issue tracker activity. |
| GitHub event data | Code and project activity patterns. |
| MS-LaTTE | Task timing preference examples. |
| Enron-derived sources | Email and communication activity. |
| Public Jira | Large-scale project and issue activity. |
| SmartToDo coded data | Task and to do intent examples. |
Blocked or gated placeholder datasets were recorded by the pipeline, but they had zero train windows and were not used for model training.
How The Data Was Prepared
The model does not train directly on raw text, raw calendar files, or raw dialogue rows.
Each source record is converted into a schedule-like numeric window:
| Item | Meaning |
|---|---|
| 96 slots | One 24 hour day split into 15 minute slots. |
| 16 features per slot | Numeric signals such as calendar event, task, deadline, priority, duration, and participant count. |
| Values from 0.0 to 1.0 | All features are normalized into a consistent range. |
For example:
- A full day academic calendar event becomes a window where the calendar event feature is active across the day.
- A scheduling availability example becomes a window where each time slot shows how many participants are available.
- A calendar dialogue turn becomes a calendar intent window, with extra signals for task like behavior, completion, and text length.
- A dated academic event becomes a calendar event window with duration and deadline like signals when the event spans multiple days.
This gives the JEPA model a common format even though the raw sources are very different.
Model And Training Setup
The same basic setup as the initial public training was used.
| Setting | Value |
|---|---|
| Input features per slot | 16 |
| Slots per window | 96 |
| Approximate time per slot | 15 minutes |
| Training steps per run | 1,000 |
| Batch size | 32 |
| Load mode | Streaming |
| Device | CUDA GPU |
| Mask ratio | 0.40 |
The model was trained with a JEPA style self supervised task:
- Take a prepared schedule window.
- Hide part of the window.
- Let the context encoder read the visible part.
- Let the target encoder read the full original window.
- Train the predictor to match the hidden target representation.
- Repeat this many times across the dataset.
In simple terms, the model learns by seeing partial schedule patterns and trying to predict the missing schedule representation.
Step 1: Calendar Only Training
The first stage trained only the new calendar dataset.
Each pass used:
- The same prepared calendar source set.
- A different random validation split.
- 1,000 training steps.
- Batch size 32.
- Streaming load mode.
Results:
| Calendar pass | Train windows | Validation windows | Final train loss | Final validation loss |
|---|---|---|---|---|
| 1 | 14,042 | 1,490 | 0.4266 | 0.3095 |
| 2 | 13,902 | 1,630 | 0.0602 | 0.0620 |
| 3 | 13,928 | 1,604 | 0.0625 | 0.3221 |
| 4 | 13,944 | 1,588 | 0.0808 | 0.3138 |
| 5 | 13,973 | 1,559 | 0.1314 | 0.1406 |
| 6 | 14,011 | 1,521 | 0.5031 | 0.2807 |
Calendar only interpretation:
- The model can learn from the calendar data alone.
- The best calendar only pass was pass 2, with validation loss 0.0620.
- The validation loss varies more than the mixed training loss because the calendar set is much smaller and more mixed in structure.
- Calendar only training is useful for checking the calendar adapter, but it is not the best final model for general scheduling.
Step 2: Mixed Public Training
The second stage mixed the new calendar data back into the original public training set.
Each mixed pass did three things:
- Created a fresh validation split for all prepared public datasets.
- Trained each public dataset individually for 1,000 steps.
- Trained a final combined model across all public datasets.
The combined model is the most important result from each mixed pass.
Results:
| Mixed pass | Train windows loaded | Validation windows loaded | Final train loss | Final validation loss |
|---|---|---|---|---|
| 1 | 6,709,495 | 747,556 | 0.0408 | 0.0374 |
| 2 | 6,709,892 | 747,159 | 0.0414 | 0.0064 |
| 3 | 6,711,263 | 745,788 | 0.0506 | 0.0170 |
| 4 | 6,711,002 | 746,049 | 0.0468 | 0.0069 |
Mixed-training interpretation:
- All 4 mixed passes completed successfully.
- All 4 mixed passes included the new calendar Hugging Face dataset.
- The best mixed pass was pass 2, with validation loss 0.0064.
- Pass 4 was very close, with validation loss 0.0069.
- The average mixed validation loss was 0.0169.
Comparison To The Initial Training
The previous initial combined public run had a validation loss of 0.0183 in its final combined pass report.
The best new mixed run had a validation loss of 0.0064.
| Run | Combined validation loss |
|---|---|
| Previous initial public training | 0.0183 |
| New mixed training, best pass | 0.0064 |
| New mixed training, final pass | 0.0069 |
The best new mixed result was about 64.9 percent lower than the previous initial combined validation loss.
The final mixed pass was about 62.3 percent lower than the previous initial combined validation loss.
This is a strong result for the JEPA self supervised objective. It suggests that adding the calendar data did not hurt the combined training. Instead, the best mixed runs achieved better held-out JEPA prediction loss than the earlier initial run.
Which Checkpoint To Use
The recommended current model is:
mixed-public-calendar-20260612-pass02, combined checkpoint
Reason:
- It had the best combined validation loss.
- It included the original public training data.
- It included the new calendar data.
The second-best option is:
mixed-public-calendar-20260612-pass04, combined checkpoint
Reason:
- It was the latest mixed pass.
- Its combined validation loss was very close to pass 2.
For general scheduling work, use a mixed combined checkpoint rather than a calendar only checkpoint. The calendar only checkpoints are useful for diagnostics, but the mixed model has broader schedule, task, project, email, and calendar representation coverage.
Quality Notes
The results are promising, but they should be understood carefully.
What the training result does show:
- The data pipeline worked.
- The model consumed the new calendar data.
- The removed page flip dataset stayed out of the final runs.
- The mixed model learned the JEPA prediction task well.
- The best new mixed validation loss was better than the previous initial combined run.
What the training result does not prove yet:
- It does not prove that the scheduler makes better real world decisions.
- It does not prove that every calendar source is equally useful.
- It does not replace a downstream scheduling evaluation.
Recommended next evaluation:
- Run the scheduler on a fixed set of realistic scheduling tasks.
- Compare the previous initial combined checkpoint against the new mixed checkpoint.
- Measure whether the new checkpoint improves ranking, conflict avoidance, time preference handling, and calendar event awareness.
Final Conclusion
The new calendar data is useful when mixed with the original public training data.
The calendar only runs showed that the model can learn from the calendar data, but the best final model came from mixed training.
The best mixed run reduced validation loss from 0.0183 to 0.0064 compared with the previous initial combined report. The recommended model for the next stage is the combined checkpoint from mixed pass 2.