JEPA Calendar Training Report

Date: 2026-06-12

Summary

The calendar data was prepared and trained in two stages.

First, the new calendar data was trained by itself for 6 repeated passes. Each pass used a fresh validation split. This tested whether the model could learn from the calendar data alone.

Second, the calendar data was mixed back into the original public training set and trained for 4 more repeated passes. Each mixed pass trained every public dataset and then trained one combined model across all public data.

The best final result came from mixed pass 2. Its combined validation loss was 0.0064. The previous initial combined pass had a validation loss of 0.0183, so the best new mixed run was about 64.9 percent lower on the JEPA validation objective.

This means the model learned the self supervised prediction task better in the new mixed run. It does not, by itself, prove that real world scheduling decisions improved. That still needs a downstream scheduler evaluation.

What Data Was Used

New Calendar Hugging Face Data

The final calendar set included 6 sources:

Source type What it adds
Korean academic calendar questions and answers Month level academic schedule examples.
German calendar intent text Calendar query, set, and remove examples in German.
German MASSIVE calendar annotations Calendar intent examples with richer slot annotations.
Microsoft scheduling availability examples Participant availability blocks and meeting constraints.
SGD calendar dialogue turns Conversational examples about creating, checking, and confirming calendar events.
Konkuk academic calendar events Real dated academic events with start and end dates.

The calendar set contained:

Count Value
Prepared calendar windows 15,532
Typical train windows per split About 13,900 to 14,000
Typical validation windows per split About 1,500 to 1,600
Preparation errors 0

Original Public Training Data

The mixed phase used the original public training data plus the new calendar set. The mixed set included:

Dataset group What it represents
Public holiday calendar data All day calendar and holiday structure.
New Hugging Face calendar data Calendar language, scheduling constraints, and academic events.
OpenProject and Taiga data Project tasks, work packages, and issue tracker activity.
GitHub event data Code and project activity patterns.
MS-LaTTE Task timing preference examples.
Enron-derived sources Email and communication activity.
Public Jira Large-scale project and issue activity.
SmartToDo coded data Task and to do intent examples.

Blocked or gated placeholder datasets were recorded by the pipeline, but they had zero train windows and were not used for model training.

How The Data Was Prepared

The model does not train directly on raw text, raw calendar files, or raw dialogue rows.

Each source record is converted into a schedule-like numeric window:

Item Meaning
96 slots One 24 hour day split into 15 minute slots.
16 features per slot Numeric signals such as calendar event, task, deadline, priority, duration, and participant count.
Values from 0.0 to 1.0 All features are normalized into a consistent range.

For example:

  1. A full day academic calendar event becomes a window where the calendar event feature is active across the day.
  2. A scheduling availability example becomes a window where each time slot shows how many participants are available.
  3. A calendar dialogue turn becomes a calendar intent window, with extra signals for task like behavior, completion, and text length.
  4. A dated academic event becomes a calendar event window with duration and deadline like signals when the event spans multiple days.

This gives the JEPA model a common format even though the raw sources are very different.

Model And Training Setup

The same basic setup as the initial public training was used.

Setting Value
Input features per slot 16
Slots per window 96
Approximate time per slot 15 minutes
Training steps per run 1,000
Batch size 32
Load mode Streaming
Device CUDA GPU
Mask ratio 0.40

The model was trained with a JEPA style self supervised task:

  1. Take a prepared schedule window.
  2. Hide part of the window.
  3. Let the context encoder read the visible part.
  4. Let the target encoder read the full original window.
  5. Train the predictor to match the hidden target representation.
  6. Repeat this many times across the dataset.

In simple terms, the model learns by seeing partial schedule patterns and trying to predict the missing schedule representation.

Step 1: Calendar Only Training

The first stage trained only the new calendar dataset.

Each pass used:

Results:

Calendar pass Train windows Validation windows Final train loss Final validation loss
1 14,042 1,490 0.4266 0.3095
2 13,902 1,630 0.0602 0.0620
3 13,928 1,604 0.0625 0.3221
4 13,944 1,588 0.0808 0.3138
5 13,973 1,559 0.1314 0.1406
6 14,011 1,521 0.5031 0.2807

Calendar only interpretation:

Step 2: Mixed Public Training

The second stage mixed the new calendar data back into the original public training set.

Each mixed pass did three things:

  1. Created a fresh validation split for all prepared public datasets.
  2. Trained each public dataset individually for 1,000 steps.
  3. Trained a final combined model across all public datasets.

The combined model is the most important result from each mixed pass.

Results:

Mixed pass Train windows loaded Validation windows loaded Final train loss Final validation loss
1 6,709,495 747,556 0.0408 0.0374
2 6,709,892 747,159 0.0414 0.0064
3 6,711,263 745,788 0.0506 0.0170
4 6,711,002 746,049 0.0468 0.0069

Mixed-training interpretation:

Comparison To The Initial Training

The previous initial combined public run had a validation loss of 0.0183 in its final combined pass report.

The best new mixed run had a validation loss of 0.0064.

Run Combined validation loss
Previous initial public training 0.0183
New mixed training, best pass 0.0064
New mixed training, final pass 0.0069

The best new mixed result was about 64.9 percent lower than the previous initial combined validation loss.

The final mixed pass was about 62.3 percent lower than the previous initial combined validation loss.

This is a strong result for the JEPA self supervised objective. It suggests that adding the calendar data did not hurt the combined training. Instead, the best mixed runs achieved better held-out JEPA prediction loss than the earlier initial run.

Which Checkpoint To Use

The recommended current model is:

mixed-public-calendar-20260612-pass02, combined checkpoint

Reason:

The second-best option is:

mixed-public-calendar-20260612-pass04, combined checkpoint

Reason:

For general scheduling work, use a mixed combined checkpoint rather than a calendar only checkpoint. The calendar only checkpoints are useful for diagnostics, but the mixed model has broader schedule, task, project, email, and calendar representation coverage.

Quality Notes

The results are promising, but they should be understood carefully.

What the training result does show:

What the training result does not prove yet:

Recommended next evaluation:

  1. Run the scheduler on a fixed set of realistic scheduling tasks.
  2. Compare the previous initial combined checkpoint against the new mixed checkpoint.
  3. Measure whether the new checkpoint improves ranking, conflict avoidance, time preference handling, and calendar event awareness.

Final Conclusion

The new calendar data is useful when mixed with the original public training data.

The calendar only runs showed that the model can learn from the calendar data, but the best final model came from mixed training.

The best mixed run reduced validation loss from 0.0183 to 0.0064 compared with the previous initial combined report. The recommended model for the next stage is the combined checkpoint from mixed pass 2.