CV

Hosted on Gihub Pages (last update: 05/01/2022)

Personal Info

Name

Ismail Harrando - إسماعيل هرندو

Residence

Toulouse, France

Links

	Google Scholar
	Linkedin
	Github
	Twitter

Experience

R&D Engineer

Natural Language Processing

Linagora

May 2022 - Present

Large Language Models, fine-tuning, evaluation
Retrieval-Augmented Generation
NLP / Information Extraction
Technologies : Python, Pytorch, Transformers, SpaCy, Pandas,..

Invited Researcher

Digital Humanities

DHLab @ Royal Netherlands Academy of Arts and Sciences

February 2022

Study of Olfactory Information Extraction from Historical Text

R&D Internship in Computer Vision

Studying Transfer Learning for Action Recognition

Atos Bull

February - September 2018

State of the Art & comparison of action recognition models
Studying Transfer Learning under labeled data scarcity
Visualization of learned features / Saliency maps for videos
Semi-supervised action detection and localization
Technologies : Python, Tensorflow, Tensorboard, OpenCV

R&D Internship in Timeseries Prediction and Log Analysis

Neural Predictive Maintenance for HPC systems

Atos Bull

April - September 2017

Timeseries analysis, prediction and classification using Deep Learning models (LSTM)
Log Analysis for failure prediction (using statistical and neural models)
Anomaly detection in metrics - Setting up a Big Data environment for metrics storage and processing
Technologies : Python, TensorFlow, Keras, Pandas, OpenTSDV, Grafana, Jupyter Notebook

Education

PhD in Data Science

Representation, information extraction, and summarization for automatic multimedia understanding []

Sorbonne University / EURECOM Data Science Department

2018-2022

Semantic representation of Multimedia Content
Linguistic Information Extraction for Media Enrichment
Content-based Multimedia Recommendation
Multimodal Content Summarization
Awards and honors:
- MediaEval 2019 & 2021 Memorability Task Distinctive Mention
- MediaEval 2021 Fake News Detection Task Distinctive Mention
- TrecVID 2020 & 2021 Video Summarization Task - Best Scoring Participation

Master of Science in Informatics (MoSiG)

Artificial Intelligence and the Web Specialization

Grenoble INP Ensimag / UGA IM²AG

2017-2018

Doubly accredited Computer Science research program (Master's Website)
Relevant Courses : Category Learning and Object Recognition, Computer Vision, Natural Language Processing, Machine Learning Fundamentals, Advanced Machine Learning Algorithms, Knowledge Representation.

Master of Engineering (Ingénieur d'état) in Computer Science

Information Systems option

National School of Applied Science of Tangier

2012-2017

First in Class 2017
Google Student Ambassador (2014-2015 + 2015-2016)
Computer Science Club Advisor
Organizing Member for the Free Software Day (hosting Richard Stallman)
Relevant Courses : Image Processing / Computer Vision, Optimization, Operational Research, Graph Theory, Advanced AI, Data Mining and B.I., Algorithms and Complexity, Signal Processing, Calculus, Algebra

Skills

NLP . CV . DS	PyTorch, HuggingFace, Scikit-learn, TensorFlow, Keras, Numpy, Pandas, NLTK, Matplotlib, OpenCV
KG & SW	RDF/OWL, SPARQL, Graph embeddings, PyTorch-Geometric
Programming	Python, C++, C, Java, C#, Haskell, Prolog, Common Lisp
Web	HTML5, CSS3, PHP (Laravel), JS (jQuery, Angular), Node.js, Bootstrap, Socket.io
Database	Oracle (Optimization, Administration, Distribution), MySQL, MongoDB, Neo4j
Miscellaneous	UML, XML, Git, Linux (Bash), Networking, AWS, Latex

Languages	English (TOEFL - C2), French (TCF - C2), Arabic (Native), Italian (elementary), Spanish (elementary)

Research

Brief

I am interested in many fields of Data Science and Artificial Intelligence, especially ones pertaining to language and semantics.
The following is a list of publications and presentations/demos I've done throughout my PhD in these domains of interest.

Information Extraction / Text Analysis

My main interest is to extract high-level descriptors (document class, topics, named entities..) from raw text.

ProZe: Explainable and Prompt-Guided Zero-Shot Text Classification

July 2022, IEEE Internet Computing, Special issue on knowledge-infused learning for computational social systems.

To improve the state of the art on zero-shot text classification, we combine the explanatory power of a common-sense knowledge graph with the world knowledge contained in pretrained-language models by conditioning them on domain-specific prompts.

[ paper ]

Keywords :

Zero-Shot Common Sense ConceptNet Prompting Pretrained Language Models Text Classification

Detecting COVID-19-related conspiracy theories in tweets

December 2021, 12th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2021), Online.

To tackle the problem of detecting COVID-19-related conspiracy theories in tweets, we used different approaches such as a combination of TFIDF and machine learning, transformer-based neural networks or Natural Language Inference.

[ code ]

Keywords :

Text Representation Deep Language Models Ensembling Text Classification

Explainable zero-shot topic extraction using a common-sense knowledge graph

September 2021, 3rd Conference on Language, Data and Knowledge (LDK'2021), Zaragoza, Spain.

Wanna build a text classifier without any training data that can also explain its predictions? ZeSTE may be what you're looking for!

[ paper | code ]

Keywords :

Zero-Shot Common Sense ConceptNet Text Classification

And cut! Exploring textual representations for media content segmentation and alignment

June 2021, 2nd International Workshop on Data-driven Personalisation of Television (DataTV @ IMX'2021), Online.

In this work, we present an approach to content segmentation that leverages topical coherence, language modeling and word embeddings to detect change of topics.

[ paper ]

Keywords :

Text Segmentation Topic Modeling Language Modeling Text Representation

Named Entity Recognition as Graph Classification

June 2021, 18th Extended Semantic Web Conference (ESWC'2021 - Poster Track), Online.

Injecting real-world information (typically contained in Knowledge Graphs) and hand-crafted features into a pipeline for training end-to-end Natural Language Processing models is an open challenge. In this paper, we propose to approach the task of Named Entity Recognition, which is traditionally viewed as a Sequence Tagging problem, as a Graph Classification problem.

[ paper | code ]

Keywords :

Graph Neural Network Knowledge Graph Named Entity Recognition Knowledge Injection

Topic Modeling

Discovering interpretable topics by leveraging common sense knowledge

September 2021, 3rd Conference on Language, Data and Knowledge (LDK'2021)

How to make the results of topic modeling algorithms more understandable to humans? Try to add some common sense into the process :)

[ paper | code ]

Keywords :

Topic Modeling Interpretability Human evaluation Common Sense

Apples to Apples: A Systematic Evaluation of Topic Models

September 2021, 13th Conference on Recent Advances in NLP (RANLP'2021), Online

Topic Modeling Evaluation is an open problem in the Topic Modeling community. While the reliance on automatic evaluation remains more or less necessary to quickly assess the performance of a given topic model algorithm, there is no study that attempts to evaluate several algorithms in the literature given the same preprocessing, datasets, and metrics. That's what we did!

[ paper ]

Keywords :

NLP Evaluation Topic Modeling Coherence Survey

ToModAPI: A Topic Modeling API to Train, Use and Compare Topic Models

November 2020, 2nd Workshop for NLP Open Source Software (NLP-OSS @ EMNLP'2020)}, Online.

This API is built to dynamically perform training, inference, and evaluation for different topic modeling techniques. The API grant common interfaces and command for accessing the different models, make easier to compare them.

[ paper | code ]

Keywords :

Open Source Python Library API Topic Modeling Evaluation

Multimodal Content Analysis

Stories of Love and Violence: Zero-shot interesting events classification for unsupervised TV series summarization

January 2023, Multimedia Systems Journal (2023).

Keywords :

Zero-shot classification Plot summarization

Exploring Multimodality, Perplexity and Explainability forMemorability Prediction

December 2021, 12th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2021), Online

Keywords :

Multimodal Deep Learning Convolutional Neural Network Content Representation

Character-based TV Series Summaries using keyword classification

December 2021, International Workshop on Video Retrieval Evaluation (TRECVID'2021), Online

[ code ]

Keywords :

Zero-shot Learning Event-based classification Content Summarization

Predicting Media Memorability with Audio, Video, and Text representation

December 2020, 11th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2020), Online

[ paper | code ]

Keywords :

Multimodal Deep LearningVisio-Linguistic Transformer Content Representation

Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization

November 2020, the International Workshop on Video Retrieval Evaluation (TRECVID'2020), Online

[ paper | code ]

Keywords :

Text Matching Content Summarization

Combining Textual and Visual Modeling for Predicting Media Memorability

October 2019, 10th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2019), Sophia Antipolis, France.

[ paper | code ]

Keywords :

Multimodal Deep Learning Convolutional Neural Network Content Representation

Semantic Multimedia Representation and Recommendation

Modeling And Using The H2020 MeMAD Knowledge Graph (talk)

June 2019, The EBU Metadata Developer Network Workshop (EBU-MDN), Geneva, Switzerland

In the context of the European research project MeMAD (Methods for Managing Audiovisual Data), we face the challenge of modeling semantically audiovisual legacy metadata and results of automatic analysis from multiple partners and in an interoperable manner. In this talk, we will present an implementation of the EBU-CCDM/EBU Core data model for representing production and broadcasting information of TV and Radio programs provided by two industrial partners covering several channels. The resulting MeMAD knowledge graph provides metadata for more than 60K hours of audiovisual content, spanning multiple channels, audiovisual genres, themes and languages.

Keywords :

Multimedia Semantic Modeling Knowledge Graph Ontology

Improving Media Content Recommendation with Automatic Annotations

September 2021, the 3rd Edition of Knowledge-aware and Conversational Recommender Systems (KaRS @ RecSys'2021), Amsterdam, Netherlands

In this work, we study the potential of using off-the-shelf automatic annotation tools from the Information Extraction literature to improve recommendation performance without any extra cost of training, data collection or annotation.

[ paper | code ]

Keywords :

Content-based Recommender Systems Multimedia Semantic Modeling Knowledge Graph Graph Embeddings Information Extraction

Projects

Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon!

Favorites

I love consuming media, and I love talking about it. I also love making lists. These are a few of my favorite things
The lists here are neither exhaustive (I had to limit my picks for one per artist/franchise) nor representative of the "best" in their respective media. An item in any of these lists reflects either and aesthetic, emotional or conceptual appreciation for the media (and sometimes just good ol' nostalgia). Peruse at your leasure! (and if you have any recommendation based on what you see, pleas let me know :))

Movies

This was the hardest list to narrow down, as I watch quite a lot of movies.
I still would like to watch more non-English films.

Synecdoche, New York

Amarcord

Doubt

The End of the Tour

Fargo

5 Centimeters per Second

Her

Monty Python and the Holy Grail

Incendies

Indie Game

Inside

The Lego Movie

Magnolia

The Matrix

Memories of Murder

Moonlight

The Wizard of Oz

The Shawshank Redemption

The Sixth Sense

The Social Network

Three Billboards outside Ebbings, Missouri

Three Idiots

Tokyo Story

My Neighbor Totoro

Trainspotting

V for Vendetta

50/50

After Life (Wonderful Life)

About Elly

Tàr

Dune: Part II

Before Sunrise

Big Fish

Cat Soup

Chicago

The French Dispatch

Intouchables

The Secret in Their Eyes

The Princess Bride

Aftersun

Everything Everywhere All At Once

Shame

Anime

I used to watch anime and read manga religiously, less so nowaday (it doesn't help that 80% of new productions look and feel the same), but the occasional gems do pop up sometimes, and here's my collection :)

The Tatami Galaxy

Made in Abyss

Death Note

Full Metal Alchemist: Brotherhood

Hunter x Hunter

Mahoujin Guru Guru (1994)

Mushishi

Odd Taxi

Psychopass

Dai Mahou Touge

Rainbow

Serial Experiments Lain

Steins;Gate

Ranking of Kings

Yojouhan Time Machine Blues

Video Games

As much as I believe that video games have the potential to be the best entertainment medium, I don't play as much as I used to :/
PS: Half the games here are added for nostalgia value. I don't know if new games are just not as charming as their predecessors, or we just get jaded and dull as we grow older :[

Final Fantasy: Tactics

Peak everything.

Inscryption

Age of Empires II

Baba is You

Adventure Quest

I'm glad I didn't have a credit card when I was into this game..

Zelda: Breath of the Wild

Croc 2

Dungeon Dice Monsters

Duel Masters: Shadow of the Code

Dofus

Dragon Quest Monsters 2

Duck Tales 2

Ehrgeiz

(only the hidden Quest Mode)

Fez

Ehrgeiz

(only the hidden Quest Mode)

Digimon Digital Card Battle

Guild Wars 2

Pokémon Platinum

Hercules

Hex: Shards of Destiny

Very well-designed TCG, such a shame it's discontinued..

Yu-Gi-Oh! Joey The Passion

Monster Sanctuary

Monument Valley (1&2)

Painfully pretty.

Ni No Kuni

The best of JRPGs and Monster collection genres.

Super Mario Odyssey

Probably the best entry in the Super Mario ludography.

Portal

"This was a triumph"

Slay the Spire

Easiest way to lose 5 hours without noticing..

The Stanley Parable

Wild concept meets perfect execution

Thomas was alone

All the feelz packaged in little monochromatic quadrilaterals

World of Warcraft

The OG AR experience

Zelda: Oracle of Seasons

First game I finished, still one of the best.

Pokémon Legends: Arceus

Monster Train

Ori and the Blind Forest

Series

I don't have the attention span to watch a series in general, so I usually stick to short series or the ones that are so good you have to binge them.

Community

Black Mirror

We don't talk about season 5 finale..

Fleabag

Hannibal

The Bear

The IT Crowd

Ramy

Schitt's Creek

The Office

WestWorld

Books

"Reading more books" seems to be the only new year resolution that I stick to. Here are a few of the ones I enjoyed in the last few years mostly. If for any reason you'd like to see what I'm reading, add me on Goodreads :)

Gödel, Escher, Bach

Piranesi

1984

The Master and Margarita

The Ones Who Walk Away from Omelas

Pale Fire

Refuse to Choose

Narcissus and Goldmund

رجال في الشمس

Destiny Disrupted

Flowers for Algernon

Alice in Wonderland

Collected Fictions

Capitalist Realism

A Clockwork Orange

Crime And Punishment

The Denial of Death

Les Désorientés

A Confederacy of Dunces

Extension du domaine de la lutte

Galileo

Gilead

The Hitchhiker's Guide to the Galaxy

Life 3.0

Lord of the Flies

Of Mice and Men

Le Petit Prince

الريح الشتوية

The Name of the Rose

Catch-22

Slaughterhouse Five

Thinking Fast and Slow

The Basis of Morality

Authentic Happiness

The Three-Body Problem

The Stranger

مذكرات الأرقش

Four Thousand Weeks: Time Management for Mortals

Stories of your Life and Others

The Fire Next Time

موت صغير

Albums

I listen to music a lot, but I've only started listening to full albums recently, that's why most picks here are from this century :)
I've also limited myself to one album per artist, but I enjoy the discography of every artist included here (and similar artists).
I am also very open to recommendations!