Hi Robby Read,
Welcome to Microsoft Q&A,
Thank you for the detailed explanation of your setup. What you are experiencing is a known and common limitation of custom classification models in Azure AI Document Intelligence, especially when multiple document types are involved.
Important clarification upfront
This is not a bug, and it does not mean your training is incorrect. Simply adding more training documents does not always improve accuracy once the model reaches a certain learning limit.
Why this happens (even when documents look very different)
Custom classification models do not understand documents semantically the way humans do. Instead, they rely on statistical and visual signals such as:
- Page layout and structure
Text density and distribution
Common keywords and phrases
Header/footer placement
Overall visual composition
Because of this:
Two documents that are clearly different to a human can still appear statistically similar to the model
If both document types share:
Similar page counts
Similar text density
Overlapping keywords (for example: *Total*, *Date*, *Reference*)
Similar header positioning
the classifier can repeatedly confuse them.
Once this happens, adding more documents (beyond ~10–15 per class) often does not improve accuracy, which aligns with what you’re seeing.
Key limitations to be aware of
Custom classification models currently:
- Do not support feature weighting
- Do not allow tuning confidence thresholds
- Do not support “hard negative” or contrastive training
- Do not allow you to explicitly define distinguishing features
Because of these constraints, persistent confusion between two specific document types can occur and may not be solvable through retraining alone.
What does work
1. Add a rule-based pre-classification layer
Before calling the classification model, apply deterministic rules such as:
Presence or absence of a unique keyword or phrase
A specific header string
A known document identifier
Page count rules
Regex-based patterns
You can then:
Route only ambiguous cases to the classifier, or
Override the classifier’s output when a strong rule matches
This hybrid approach is the most reliable production pattern.
2. Split the classification workload
Instead of one classification model with 7 document types:
Create multiple classification models
Separate the two problematic document types into different models
Use a lightweight first-pass rule to decide which classifier to call
This reduces class competition and improves accuracy.
3. Use confidence-based post-processing
Although thresholds can’t be tuned inside the service:
- Capture the
confidence score returned by the classifier
- If confidence is below a safe value (for example, < 0.85):
- Apply fallback rules
- Route for secondary validation
- Or classify using an alternate model
4. Strengthen visual differentiation
If you have influence over document templates:
Add a consistent, unique header or label
Introduce a distinctive first-page identifier
Ensure a repeated keyword unique to each type
Even a small, consistent visual cue can significantly improve classification.
What will not help
Adding large numbers of similar training documents
Re-training repeatedly without changing signals
Expecting semantic understanding from the classifier
Relying on classification alone for routing decisions
Recommended best-practice architecture
In real-world implementations, the most stable approach is:
Rule-based pre-filter
Custom classification model
Confidence evaluation
Fallback or override logic
Classification should be treated as assistive, not authoritative.
- This behavior is expected with custom classification models
- Adding more samples often does not improve accuracy
- Visually different documents can still conflict statistically
- There is no way to “force” separation inside the classifier today
- Hybrid rule + ML approaches deliver the best results
Please refer this
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!