No Reference Metric Resources

This webpage lists potential resources for NR metrics, collected by the No Reference Metric (NORM) VQEG project:

Algorithms
Development tools
Datasets
Industry design specifications

This Google Sheet identifies metric components: artifacts, features, measurement points, and key performance indicators.

Algorithms

The following algorithms assess mean opinion score (MOS), root cause analysis (RCA), or both.

Sawatch version 2 (2000)—image and video, MOS and RCA, designed to NORM specifications
BLIINDS (2012)—image MOS
BRISQUE (2012)—image MOS
DIIVINE (2011)—image MOS
NIQE (2013)—image MOS, machine learning re-training tools provided
NORM (2018)—NR image quality metric for realistic image synthesis
No-Reference Video Quality Indicators (various dates)—RCA

These No-Reference Video Quality Indicators (VQIs) were developed by Video Quality AGH team (VQ AGH). Executable form is freely and openly available for download through the http://vq.kt.agh.edu.pl/metrics.html website. Source code available upon request (only for a research purposes).

PSTR-PXNR (2019)—video MOS
VQA—open source video complexity analyzer

The following full reference (FR) metrics may be helpful for creating simulated training datasets.

HDR-VDP—a visual metric that compares a pair of images (a reference and a test image)
LPLD—learning to predict localized distortions in rendered images
VMAF

Development Tools

NRMetricFramework—a software framework to train NR metrics, including subjective datasets suitable for training NR metrics and reports on the performance of various NR metrics
Confidence Intervals for Metrics—describe the NR metric's performance as a certain number of people in an ad-hoc assessment; code provided in the NRMetricFramework

Datasets

See the "documentation" folder of NRMetricFramework for more information on most of these datasets, including references and download location.

AGH-NTIA-Dolby
Artifacts in synthetic (computer generated) images (local distortion detectability)
Blurred image database with ground truths (BID)
CCRIQ (videos and ratings on the Consumer Digital Video Library: search for key word "ccriq")
CID2013 Camera Image Database (click here for the author's video presentation)
CVD2014 Camera Video Database
"ITS4S: A Video Quality Dataset with Four-Second Unrepeated Scenes" (videos and ratings on the Consumer Digital Video Library: search for key word "its4s")
"ITS4S2: An Image Quality Dataset With Unrepeated Images From Consumer Cameras"
"ITS4S3: A Video Quality Dataset With Unrepeated Videos, Camera Impairments, and Public Safety Scenarios"
"ITS4S4: A Video Quality Study of Camera Pans"
IQA-Experts-3000 Visual Quality Database
KADID-10k Image Database
KonPatch-30k Image Database
KonViD-1K Video Database
KonIQ-10K Image Database
LIVE Public-Domain Subjective In the Wild Image Quality Challenge Database
NORM metric's user study dataset (local distortion detectability)
USC JVC Just Noticeable Difference (JND)
VARIUM

Industry design specifications

Professionally Produced Entertainment Use Cases

Video On Demand (VoD) Services
- Description: A studio produces a high quality movie and provides it to a VoD service provider. The perceptual quality of the video is assumed to be very high, but there could be production, transcoding, file format or transmission errors. A no-reference method to “watch” the movie and detect any impairments through “visual” inspection can save operational costs at the VoD service provider.
- Goal: Provide frame specific quality & artifact scores.
- Assumptions: The expectation of quality is very high. The metric should ignore impairments stemming from the production, aesthetics, and artistic intent. The metric should only detect artifacts that result from the transmission or file errors.
- Datasets: Professionally produced content. And the same content with visual artifacts injected from transmission errors.
- Description: A VoD service provider provider needs to measure the perceived video quality of a service after processing/transcoding to compliment the full-reference degradation measurements achieved with methods like VMAF.
- Goal: Provide overall perceptual quality to meet their quality minimums prior to publishing the content on a CDN.
- Assumptions: The expectation of quality is determined by the user’s expectation of a professional service offered in SD/DVD, HD, or UHD quality on a TV, tablet, or smartphone. Content will be encoded at various bitrates & resolutions and then decoded & scaled by the device.
- Datasets: Professionally produced content encoded for a streaming VoD service.
- Description: A VoD service provider or network service provider needs to measure the perceived video quality of a service on a device to determine if it meets their quality goal and if not, what the visual impairments are and the possible root cause of the impairments. Could also be used for benchmarking service, device benchmarking, etc.
- Goal: Provide time bound quality scores, artifact scores, and root cause.
- Assumptions: The expectation of quality is determined by the user’s expectation of a professional service offered in SD/DVD, HD, or UHD quality on a TV, tablet, or smartphone. Content will be encoded at various bitrates & resolutions, transmitted over the Internet, received/buffered/rate adjusted, decoded & scaled by the device.
- Datasets: Professionally produced content encoded for a streaming VoD service.
- Root Cause: Encoding artifacts, buffering in the client, representation changes.

Content Contribution (e.g. validation of a very high quality file)
Service Quality (e.g. validation after processing/transcoding)
End User Quality (e.g. validation of quality as displayed to the user)

Broadcast/Live Services
- Description: A professional broadcast studio produces a high quality news or sporting event and provides it as a live service (broadcast). Studio production is typically high quality, however, outdoor events like sports will have higher variability in quality due to weather, lighting, and available transmission bandwidth to send the video to the production studio. In some cases the video will be pristine (Olympics) and in some cases the same as user generated content (remote news crews). Hence, there are capture issues as well as encoding and transmission.
- Goal: Provide frame specific quality & artifact scores.
- Service Quality (e.g. validation after processing/transcoding).
- Description: A VoD service provider or network service provider needs to measure the perceived video quality of a service on a device to determine if it meets their quality goal and if not, what the visual impairments are and the possible root cause of the impairments. Could also be used for benchmarking service, device benchmarking, etc.
- Goal: Provide time bound quality scores, artifact scores, and root cause.
- Assumptions: The expectation of quality is determined by the user’s expectation of a professional service offered in SD/DVD, HD, or UHD quality on a TV, tablet, or smartphone. Content will be encoded at various bitrates & resolutions and then decoded & scaled by the device.
- Datasets: Professionally produced content encoded for a streaming VoD service.
- Root Cause: Encoding artifacts, buffering in the client, representation changes.

Content Contribution (e.g. validation of a live stream (maybe different types))
End User Quality (e.g. validation of quality as displayed to the user)

User Generated Content Use Cases

Social Media Live Content

Content Capture (e.g. device quality (camera, encode, transmission))
Service (e.g. real-time service encode and distribution)
End User Quality (e.g. validation of quality as displayed to the user)

Social Media On Demand Content

Content Capture (e.g. device quality (camera, encode, transmission))
Service (e.g. service encode and distribution)
End User Quality (e.g. validation of quality as displayed to the user)

Video Chat

Content Capture (e.g. device quality (camera, encode, transmission))
End User Quality (e.g. validation of quality as displayed to the user)

Industrial and Application Specific Use Cases

Camera capture

Description: Understand the quality impact of the entire pipeline (sensor, image processing, encode, decode and display). Optimize the recording bandwidth.

Digital surveillance (e.g., automotive , LIDAR camera to steer)
First Responder Video (e.g. fire, police, security, SAR, etc.)

End User Quality (e.g. validation of quality as displayed to the user)
AI Quality (e.g. ability of an AI system to recognize objects in the video)

Medical

End User Quality (e.g. validation of quality as displayed to the user)

Artificial Intelligence (AI) Systems (e.g. video not intended for human viewing, like autonomous vehicle systems or video analytics)

AI Quality (e.g. ability of an AI system to recognize objects in the video)

Network optimization

Description: Allow different types of networks (5G, wireless, etc.) to understand the quality of video streams. For example, priority access in a limited wireless bandwidth situation is needed to optimize live video streams that provide situational awareness (e.g., a managed network of first responder cameras providing real-time video feeds to assist the decision making process and coordinate practitioner response).

Sensors (e.g., video used by device for automated response)
Immersive environment, including virtual reality, augmented reality, 360 degree video, and free point video.

Description: Includes feedback, latency, functionality, and telepresence (perceive you are in another place).

KPI between network exchange contract (what quality I deliver, what quality I receive from vendor); ensure contract based on quality (service level agreement)

Acceptable Constraints—Where Do We Start?

Specify the task (e.g., entertainment, understand events, recognize people)
Specify amateur vs professional production
Specify user expectations (e.g., bit-rate range for this application, what does “good” mean in this context)
Quality as viewed right now (e.g., ignore potential value of zooming into a 40 MP image, ignore quality changes over time, aka the recency effect)

Additional Design Specifications for NR Metrics

Open source usage rights
One metric for both images and video.
Able to predict the quality of content directly from the camera.

We need to characterize (but perhaps ignore) the real world subject, camera operator actions, lens, sensor, image processing, aesthetics, etc.

Extrapolate “what if”

Predicted impact of changes to bit-rate, resolution, frame rate, etc.

Root cause analysis

Why is the quality bad?

Measure immediate quality response

Short observation window
Add long observation window model later

Real-time implementation (compute problem)
Degrades gracefully

Doesn’t just fail on unexpected conditions; no division by zero
Can become less accurate but does not produce random results

Warns user when moving outside of intended usage
Computes confidence in ratings

Accuracy level, confidence interval
Help users trust and understand values

Distortion flexibility

Works on different types of impairments
Focus on type of artifacts that are appropriate for the current task / application

Non-static, multi-faceted model that can be trained on new applications
No hard coded constants
User expectations change; the NR metric can change with them
Produces both engineering values and experience values
Scalable

No requirement for a specific frame size, resolution, frame rate, etc.
Scalability of complexity; can scale down the implementation; may trade off performance and decoupled factors
Computationally scalable

Easily explained to naïve users
Robust response to new content
Self-learning

Implementation may collect historical data and pass it to a self-learning feedback loop. Users could optionally contribute to the pool of knowledge.