JEPA Calendar Training Report

Date: 2026-06-12

Summary

The calendar data was prepared and trained in two stages.

First, the new calendar data was trained by itself for 6 repeated passes. Each pass used a fresh validation split. This tested whether the model could learn from the calendar data alone.

Second, the calendar data was mixed back into the original public training set and trained for 4 more repeated passes. Each mixed pass trained every public dataset and then trained one combined model across all public data.

The best final result came from mixed pass 2. Its combined validation loss was 0.0064. The previous initial combined pass had a validation loss of 0.0183, so the best new mixed run was about 64.9 percent lower on the JEPA validation objective.

This means the model learned the self supervised prediction task better in the new mixed run. It does not, by itself, prove that real world scheduling decisions improved. That still needs a downstream scheduler evaluation.

What Data Was Used

New Calendar Hugging Face Data

The final calendar set included 6 sources:

Source type	What it adds
Korean academic calendar questions and answers	Month level academic schedule examples.
German calendar intent text	Calendar query, set, and remove examples in German.
German MASSIVE calendar annotations	Calendar intent examples with richer slot annotations.
Microsoft scheduling availability examples	Participant availability blocks and meeting constraints.
SGD calendar dialogue turns	Conversational examples about creating, checking, and confirming calendar events.
Konkuk academic calendar events	Real dated academic events with start and end dates.

The calendar set contained:

Count	Value
Prepared calendar windows	15,532
Typical train windows per split	About 13,900 to 14,000
Typical validation windows per split	About 1,500 to 1,600
Preparation errors	0

Original Public Training Data

The mixed phase used the original public training data plus the new calendar set. The mixed set included:

Dataset group	What it represents
Public holiday calendar data	All day calendar and holiday structure.
New Hugging Face calendar data	Calendar language, scheduling constraints, and academic events.
OpenProject and Taiga data	Project tasks, work packages, and issue tracker activity.
GitHub event data	Code and project activity patterns.
MS-LaTTE	Task timing preference examples.
Enron-derived sources	Email and communication activity.
Public Jira	Large-scale project and issue activity.
SmartToDo coded data	Task and to do intent examples.

Blocked or gated placeholder datasets were recorded by the pipeline, but they had zero train windows and were not used for model training.

How The Data Was Prepared

The model does not train directly on raw text, raw calendar files, or raw dialogue rows.

Each source record is converted into a schedule-like numeric window:

Item	Meaning
96 slots	One 24 hour day split into 15 minute slots.
16 features per slot	Numeric signals such as calendar event, task, deadline, priority, duration, and participant count.
Values from 0.0 to 1.0	All features are normalized into a consistent range.

For example:

A full day academic calendar event becomes a window where the calendar event feature is active across the day.
A scheduling availability example becomes a window where each time slot shows how many participants are available.
A calendar dialogue turn becomes a calendar intent window, with extra signals for task like behavior, completion, and text length.
A dated academic event becomes a calendar event window with duration and deadline like signals when the event spans multiple days.

This gives the JEPA model a common format even though the raw sources are very different.

Model And Training Setup

The same basic setup as the initial public training was used.

Setting	Value
Input features per slot	16
Slots per window	96
Approximate time per slot	15 minutes
Training steps per run	1,000
Batch size	32
Load mode	Streaming
Device	CUDA GPU
Mask ratio	0.40

The model was trained with a JEPA style self supervised task:

Take a prepared schedule window.
Hide part of the window.
Let the context encoder read the visible part.
Let the target encoder read the full original window.
Train the predictor to match the hidden target representation.
Repeat this many times across the dataset.

In simple terms, the model learns by seeing partial schedule patterns and trying to predict the missing schedule representation.

Step 1: Calendar Only Training

The first stage trained only the new calendar dataset.

Each pass used:

The same prepared calendar source set.
A different random validation split.
1,000 training steps.
Batch size 32.
Streaming load mode.

Results:

Calendar pass	Train windows	Validation windows	Final train loss	Final validation loss
1	14,042	1,490	0.4266	0.3095
2	13,902	1,630	0.0602	0.0620
3	13,928	1,604	0.0625	0.3221
4	13,944	1,588	0.0808	0.3138
5	13,973	1,559	0.1314	0.1406
6	14,011	1,521	0.5031	0.2807

Calendar only interpretation:

The model can learn from the calendar data alone.
The best calendar only pass was pass 2, with validation loss 0.0620.
The validation loss varies more than the mixed training loss because the calendar set is much smaller and more mixed in structure.
Calendar only training is useful for checking the calendar adapter, but it is not the best final model for general scheduling.

Step 2: Mixed Public Training

The second stage mixed the new calendar data back into the original public training set.

Each mixed pass did three things:

Created a fresh validation split for all prepared public datasets.
Trained each public dataset individually for 1,000 steps.
Trained a final combined model across all public datasets.

The combined model is the most important result from each mixed pass.

Results:

Mixed pass	Train windows loaded	Validation windows loaded	Final train loss	Final validation loss
1	6,709,495	747,556	0.0408	0.0374
2	6,709,892	747,159	0.0414	0.0064
3	6,711,263	745,788	0.0506	0.0170
4	6,711,002	746,049	0.0468	0.0069

Mixed-training interpretation:

All 4 mixed passes completed successfully.
All 4 mixed passes included the new calendar Hugging Face dataset.
The best mixed pass was pass 2, with validation loss 0.0064.
Pass 4 was very close, with validation loss 0.0069.
The average mixed validation loss was 0.0169.

Comparison To The Initial Training

The previous initial combined public run had a validation loss of 0.0183 in its final combined pass report.

The best new mixed run had a validation loss of 0.0064.

Run	Combined validation loss
Previous initial public training	0.0183
New mixed training, best pass	0.0064
New mixed training, final pass	0.0069

The best new mixed result was about 64.9 percent lower than the previous initial combined validation loss.

The final mixed pass was about 62.3 percent lower than the previous initial combined validation loss.

This is a strong result for the JEPA self supervised objective. It suggests that adding the calendar data did not hurt the combined training. Instead, the best mixed runs achieved better held-out JEPA prediction loss than the earlier initial run.

Which Checkpoint To Use

The recommended current model is:

mixed-public-calendar-20260612-pass02, combined checkpoint

Reason:

It had the best combined validation loss.
It included the original public training data.
It included the new calendar data.

The second-best option is:

mixed-public-calendar-20260612-pass04, combined checkpoint

Reason:

It was the latest mixed pass.
Its combined validation loss was very close to pass 2.

For general scheduling work, use a mixed combined checkpoint rather than a calendar only checkpoint. The calendar only checkpoints are useful for diagnostics, but the mixed model has broader schedule, task, project, email, and calendar representation coverage.

Quality Notes

The results are promising, but they should be understood carefully.

What the training result does show:

The data pipeline worked.
The model consumed the new calendar data.
The removed page flip dataset stayed out of the final runs.
The mixed model learned the JEPA prediction task well.
The best new mixed validation loss was better than the previous initial combined run.

What the training result does not prove yet:

It does not prove that the scheduler makes better real world decisions.
It does not prove that every calendar source is equally useful.
It does not replace a downstream scheduling evaluation.

Recommended next evaluation:

Run the scheduler on a fixed set of realistic scheduling tasks.
Compare the previous initial combined checkpoint against the new mixed checkpoint.
Measure whether the new checkpoint improves ranking, conflict avoidance, time preference handling, and calendar event awareness.

Final Conclusion

The new calendar data is useful when mixed with the original public training data.

The calendar only runs showed that the model can learn from the calendar data, but the best final model came from mixed training.

The best mixed run reduced validation loss from 0.0183 to 0.0064 compared with the previous initial combined report. The recommended model for the next stage is the combined checkpoint from mixed pass 2.