Small dataset (~600 images) for facial emotion detection using YOLOv11 + Roboflow — how to collect better training images?

1 day ago 3
ARTICLE AD BOX

I am building a small computer vision project (MoodAI Player) where a webcam detects facial emotions and triggers music + LED responses on a Raspberry Pi .

I am planning to train a YOLOv11 model, with data annotated in Roboflow, using 6 classes:

happy, sad, angry, neutral, fearful, sleepy

Dataset size: ~600 images (100 per class)

❓ Main Issue (Data Collection)

I think my image collection strategy might be the weak point, so I want advice specifically on how training images should be captured.

Current approach:

Mostly frontal faces

Clean background

Good lighting

Face fills ~60–80% of frame

Some variation (angles, lighting, different people)

50 images with glasses / 50 without

🤔 Questions

Should I keep images clean or include more real-world noise (background clutter, different distances, etc.)?

How much variation is useful?

Side angles?

Low light / shadows?

Occlusions (hair, hands, etc.)?

Is 100 images per class too small, even with augmentation in Roboflow?

Any common mistakes in emotion datasets I should avoid?
(especially confusing classes like sleepy vs neutral)

🎯 Goal

This is a university project, so I’m not aiming for perfect accuracy—just a model that works reasonably well in real-time.

Read Entire Article