Manual vs Data driven - eSTUDENT

Robust ML Challenge

Receipt classification • Identify the receipt retailer

based on their visible logo

• First step of a system for automatic receipt data extraction

• Allows cheap and fast collection of information directly from consumers

• Basis for highly advanced market research methods

Robust optimization • Problem simple in theory but complex in practice

• Points out current ML method limitations

• Requires custom model design and advanced data handling

• Provides good experience for general machine learning research

Feature engineering

Deep neural networks

Black box approach • It can approximate any function • Structure of the function is unknown • No simple link between the weights and the function being approximated

Generalization

Model selection • Finding the best hyperparameters of a model • No understanding of the underlying architecture needed • Naive example: random search • Advanced example: population based training • Problem: performance highly depends on the architecture itself

It works

Interpretability • If we wish to make AI systems deployed on self-driving

cars safe, straightforward black-box models will not suffice, as we will need methods of understanding their rare but costly mistakes. (source: Interpretable ML Symposium at NIPS 2017 http://interpretable.ml/) 

• Treating bias as a technical problem means ignoring the

underlying social problem, and has the potential to make things worse. (source: The trouble with bias - NIPS 2017 keynote by Kate Crawford)

Robustness • A learning algorithm that can reduce the chance of fitting noise is called robust

• SIFT (Scale-invariant feature transform) - invariant to

image translation, scaling, and rotation, partially invariant to illumination changes and robust to local geometric distortion

• Key requirement for industry level solutions • 98% is not good enough if you are replacing humans

Manual vs Data driven • Engineers make

• Behaviour is learned

• Interpretable • Known limitations • Little data needed

• Black box • Unknown behaviour • Needs large amounts

behaviour decisions

from data

of data

Manual vs Data driven • Engineers make

• Engineers make

• Interpretable • Known limitations • Little data needed

• Black box • Unknown behaviour • Needs large amounts

design decisions

behaviour decisions

of data

Real systems

Modular vs End-to-end Modular

A

B

End-to-end

C

Modular vs End-to-end • Split a complex

problem into solvable subproblems

• Requires annotated data for every subproblem

• Manual design

between submodules

• More stable

• Tackle the entire

problem at once

• Requires only one set of annotated data

• With enough data

design decisions are inherent

• Extremely prone to overfitting

Modular vs End-to-end • Split a complex

• Tackle the entire

problem into solvable subproblems

problem at once

• Requires only one set

• Requires annotated

of annotated data

data for every subproblem

• With enough data

design decisions are inherent

• Needs more explicit design decisions

Model design

Model design • Layer engineering • Differential programming • Designing a specialized model for a given problem • Challenge: find the optimum between a full modular system and a full end-to-end system

• Limitations: amount and type of available annotated data, robustness requirements, allowed complexity

Receipt classification • Identify the receipt retailer

based on their visible logo

• More complex than logo classification

• With the provided annotations

becomes a end-to-end problem

• Potential candidate for a two stage modular system

End-to-end approach Retailer

• Treat the problem as standard classification

• No additional annotation types End-to-end  classifier

needed

• No model design needed,

standard classifiers are fine

Receipt   image

• Extremly prone to overfitting • Needs huge amounts of annotated data to work

Modular approach Retailer

Classifier

• Divide the task into two subproblems

Logo   localizer

• Additional annotations needed for every subproblem

• Less prone to overfitting • Needs less data but significantly Receipt   image

harder to acquire annotations

Model design

Classifier End-to-end  classifier

?

Logo   localizer

Model design Retailer

• The structure of the model ?

should force the localization by design

• Should not require additional

annotations - very hard to scale

• Should minimize overfitting on reasonable amounts of data

Receipt   image

Q&A

Manual vs Data driven - eSTUDENT

Manual vs Data driven - eSTUDENT

Suggest Documents

Manual vs Data driven - eSTUDENT

Data-Driven Evolutionary Optimization of Complex Systems: Big vs ...

NATIVE VS. NON-NATIVE ENGLISH: DATA-DRIVEN LEXICAL ...

Data-driven modelling vs. machine learning in flood ... - Google Sites

Source vs Data-driven Approach for Live P2P Streaming - CiteSeerX

Supply and demand vs. price driven graphs financial data visualization

VS manual - Choralia

Max-Flo VS™ Manual

Runtime vs. Manual Data Distribution for Architecture ... - CiteSeerX

Protocol-driven vs. physician-driven electrolyte replacement in adult

Knowledge-driven versus data-driven logics - IRIT

Data driven manufacturing

DATA DRIVEN VIDEO

Data-Driven Traps!

A data-driven approach

Data Driven Software

Data-Driven Modelling - Springer

Data-Driven Development

data-driven lexical analysis

data-driven - CiteSeerX

data driven company - Wizdee

Data-Driven Traps!

data driven company - Wizdee

Data-Driven GRC - ACL