CV

Hosted on Gihub Pages (last update: 05/01/2022)




Personal Info

Name Ismail Harrando - إسماعيل هرندو
Residence Toulouse, France
Links
Google Scholar
Linkedin
Github
Twitter

Experience

R&D Engineer
Natural Language Processing
Linagora
May 2022 - Present
  • Large Language Models, fine-tuning, evaluation
  • Retrieval-Augmented Generation
  • NLP / Information Extraction
  • Technologies : Python, Pytorch, Transformers, SpaCy, Pandas,..
Invited Researcher
Digital Humanities
DHLab @ Royal Netherlands Academy of Arts and Sciences
February 2022
  • Study of Olfactory Information Extraction from Historical Text
R&D Internship in Computer Vision
Studying Transfer Learning for Action Recognition
Atos Bull
February - September 2018
  • State of the Art & comparison of action recognition models
  • Studying Transfer Learning under labeled data scarcity
  • Visualization of learned features / Saliency maps for videos
  • Semi-supervised action detection and localization
  • Technologies : Python, Tensorflow, Tensorboard, OpenCV
R&D Internship in Timeseries Prediction and Log Analysis
Neural Predictive Maintenance for HPC systems
Atos Bull
April - September 2017
  • Timeseries analysis, prediction and classification using Deep Learning models (LSTM)
  • Log Analysis for failure prediction (using statistical and neural models)
  • Anomaly detection in metrics - Setting up a Big Data environment for metrics storage and processing
  • Technologies : Python, TensorFlow, Keras, Pandas, OpenTSDV, Grafana, Jupyter Notebook

Education

PhD in Data Science
Representation, information extraction, and summarization for automatic multimedia understanding []
Sorbonne University / EURECOM Data Science Department
2018-2022
  • Semantic representation of Multimedia Content
  • Linguistic Information Extraction for Media Enrichment
  • Content-based Multimedia Recommendation
  • Multimodal Content Summarization
  • Awards and honors:
    • MediaEval 2019 & 2021 Memorability Task Distinctive Mention
    • MediaEval 2021 Fake News Detection Task Distinctive Mention
    • TrecVID 2020 & 2021 Video Summarization Task - Best Scoring Participation
Master of Science in Informatics (MoSiG)
Artificial Intelligence and the Web Specialization
Grenoble INP Ensimag / UGA IM²AG
2017-2018
  • Doubly accredited Computer Science research program (Master's Website)
  • Relevant Courses : Category Learning and Object Recognition, Computer Vision, Natural Language Processing, Machine Learning Fundamentals, Advanced Machine Learning Algorithms, Knowledge Representation.
Master of Engineering (Ingénieur d'état) in Computer Science
Information Systems option
National School of Applied Science of Tangier
2012-2017
  • First in Class 2017
  • Google Student Ambassador (2014-2015 + 2015-2016)
  • Computer Science Club Advisor
  • Organizing Member for the Free Software Day (hosting Richard Stallman)
  • Relevant Courses : Image Processing / Computer Vision, Optimization, Operational Research, Graph Theory, Advanced AI, Data Mining and B.I., Algorithms and Complexity, Signal Processing, Calculus, Algebra

Skills

NLP . CV . DS PyTorch, HuggingFace, Scikit-learn, TensorFlow, Keras, Numpy, Pandas, NLTK, Matplotlib, OpenCV
KG & SW RDF/OWL, SPARQL, Graph embeddings, PyTorch-Geometric
Programming Python, C++, C, Java, C#, Haskell, Prolog, Common Lisp
Web HTML5, CSS3, PHP (Laravel), JS (jQuery, Angular), Node.js, Bootstrap, Socket.io
Database Oracle (Optimization, Administration, Distribution), MySQL, MongoDB, Neo4j
Miscellaneous UML, XML, Git, Linux (Bash), Networking, AWS, Latex

Languages English (TOEFL - C2), French (TCF - C2), Arabic (Native), Italian (elementary), Spanish (elementary)

Research

Brief

I am interested in many fields of Data Science and Artificial Intelligence, especially ones pertaining to language and semantics.
The following is a list of publications and presentations/demos I've done throughout my PhD in these domains of interest.

Information Extraction / Text Analysis

My main interest is to extract high-level descriptors (document class, topics, named entities..) from raw text.

ProZe: Explainable and Prompt-Guided Zero-Shot Text Classification

July 2022, IEEE Internet Computing, Special issue on knowledge-infused learning for computational social systems.

To improve the state of the art on zero-shot text classification, we combine the explanatory power of a common-sense knowledge graph with the world knowledge contained in pretrained-language models by conditioning them on domain-specific prompts.

[ paper ]

Detecting COVID-19-related conspiracy theories in tweets

December 2021, 12th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2021), Online.

To tackle the problem of detecting COVID-19-related conspiracy theories in tweets, we used different approaches such as a combination of TFIDF and machine learning, transformer-based neural networks or Natural Language Inference.

[ code ]

Explainable zero-shot topic extraction using a common-sense knowledge graph

September 2021, 3rd Conference on Language, Data and Knowledge (LDK'2021), Zaragoza, Spain.

Wanna build a text classifier without any training data that can also explain its predictions? ZeSTE may be what you're looking for!

[ paper | code ]

And cut! Exploring textual representations for media content segmentation and alignment

June 2021, 2nd International Workshop on Data-driven Personalisation of Television (DataTV @ IMX'2021), Online.

In this work, we present an approach to content segmentation that leverages topical coherence, language modeling and word embeddings to detect change of topics.

[ paper ]

Named Entity Recognition as Graph Classification

June 2021, 18th Extended Semantic Web Conference (ESWC'2021 - Poster Track), Online.

Injecting real-world information (typically contained in Knowledge Graphs) and hand-crafted features into a pipeline for training end-to-end Natural Language Processing models is an open challenge. In this paper, we propose to approach the task of Named Entity Recognition, which is traditionally viewed as a Sequence Tagging problem, as a Graph Classification problem.

[ paper | code ]

Topic Modeling

Discovering interpretable topics by leveraging common sense knowledge

September 2021, 3rd Conference on Language, Data and Knowledge (LDK'2021)

How to make the results of topic modeling algorithms more understandable to humans? Try to add some common sense into the process :)

[ paper | code ]

Apples to Apples: A Systematic Evaluation of Topic Models

September 2021, 13th Conference on Recent Advances in NLP (RANLP'2021), Online

Topic Modeling Evaluation is an open problem in the Topic Modeling community. While the reliance on automatic evaluation remains more or less necessary to quickly assess the performance of a given topic model algorithm, there is no study that attempts to evaluate several algorithms in the literature given the same preprocessing, datasets, and metrics. That's what we did!

[ paper ]

ToModAPI: A Topic Modeling API to Train, Use and Compare Topic Models

November 2020, 2nd Workshop for NLP Open Source Software (NLP-OSS @ EMNLP'2020)}, Online.

This API is built to dynamically perform training, inference, and evaluation for different topic modeling techniques. The API grant common interfaces and command for accessing the different models, make easier to compare them.

[ paper | code ]

Multimodal Content Analysis

Stories of Love and Violence: Zero-shot interesting events classification for unsupervised TV series summarization

January 2023, Multimedia Systems Journal (2023).

Exploring Multimodality, Perplexity and Explainability forMemorability Prediction

December 2021, 12th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2021), Online

Character-based TV Series Summaries using keyword classification

December 2021, International Workshop on Video Retrieval Evaluation (TRECVID'2021), Online

[ code ]

Predicting Media Memorability with Audio, Video, and Text representation

December 2020, 11th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2020), Online

[ paper | code ]

Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization

November 2020, the International Workshop on Video Retrieval Evaluation (TRECVID'2020), Online

[ paper | code ]

Combining Textual and Visual Modeling for Predicting Media Memorability

October 2019, 10th MediaEval Benchmarking Initiative for Multimedia Evaluation Workshop (MediaEval'2019), Sophia Antipolis, France.

[ paper | code ]

Semantic Multimedia Representation and Recommendation

Modeling And Using The H2020 MeMAD Knowledge Graph (talk)

June 2019, The EBU Metadata Developer Network Workshop (EBU-MDN), Geneva, Switzerland

In the context of the European research project MeMAD (Methods for Managing Audiovisual Data), we face the challenge of modeling semantically audiovisual legacy metadata and results of automatic analysis from multiple partners and in an interoperable manner. In this talk, we will present an implementation of the EBU-CCDM/EBU Core data model for representing production and broadcasting information of TV and Radio programs provided by two industrial partners covering several channels. The resulting MeMAD knowledge graph provides metadata for more than 60K hours of audiovisual content, spanning multiple channels, audiovisual genres, themes and languages.

Improving Media Content Recommendation with Automatic Annotations

September 2021, the 3rd Edition of Knowledge-aware and Conversational Recommender Systems (KaRS @ RecSys'2021), Amsterdam, Netherlands

In this work, we study the potential of using off-the-shelf automatic annotation tools from the Information Extraction literature to improve recommendation performance without any extra cost of training, data collection or annotation.

[ paper | code ]

Projects

Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon!

Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon!

Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon!

Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon!

Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon! Coming soon!

Favorites

I love consuming media, and I love talking about it. I also love making lists. These are a few of my favorite things
The lists here are neither exhaustive (I had to limit my picks for one per artist/franchise) nor representative of the "best" in their respective media. An item in any of these lists reflects either and aesthetic, emotional or conceptual appreciation for the media (and sometimes just good ol' nostalgia). Peruse at your leasure! (and if you have any recommendation based on what you see, pleas let me know :))

Video Games

As much as I believe that video games have the potential to be the best entertainment medium, I don't play as much as I used to :/
PS: Half the games here are added for nostalgia value. I don't know if new games are just not as charming as their predecessors, or we just get jaded and dull as we grow older :[

Series

I don't have the attention span to watch a series in general, so I usually stick to short series or the ones that are so good you have to binge them.

Contact