Erika Fille T. Legara is a scientist, educator, and an advisor on data science and artificial intelligence (AI), data and AI strategy and governance, infrastructure, and education. Erika also sits in the Board of Directors of RCBC (RCB: Philippines). As a scientist, she is interested in the study of complex systems using advanced data-driven analytics. Prior to joining the Asian Institute of Management (AIM) in 2017, Erika was a scientist at A*STAR, Singapore, where she worked closely with government institutions and the industry sector on different R&D initiatives. In 2020, the TOYM and TOWNS awardee received the National Academy of Science and Technology Outstanding Young Scientist award. She is the founding director of AIM’s MSc. in Data Science program, holding an associate professor position. Legara is also a senior scientist at the Analytics, Computing, and Complex Systems lab at AIM. She is an Asia 21 Young Leader (Class of 2022).
Smart Policy Design, 2021
Harvard Kennedy School of Government
PhD in Physics, 2011
University of the Philippines
MSc in Physics, 2008
University of the Philippines
BSc in Physics, 2006
University of the Philippines
One of the main problems in the study of human migration is predicting how many people will migrate from one place to another. An important model used for this problem is the radiation model for human migration, which models locations as attractors whose attractiveness is moderated by distance as well as attractiveness of neighboring locations. In the model, the measure used for attractiveness is population which is a proxy for economic opportunities and jobs. However, this may not be valid, for example, in developing countries, and fails to take into account people migrating for non-economic reasons such as quality of life. Here, we extend the radiation model to include the number of amenities (offices, schools, leisure places, etc.) as features aside from population. We find that the generalized radiation model outperforms the radiation model by as much as 10.3 percent relative improvement in mean absolute percentage error based on actual census data five years apart. The best performing model does not even include population information which suggests that amenities already include the information that we get from population. The generalized radiation model provides a measure of feature importance thus presenting another avenue for investigating the effect of amenities on human migration.
We develop a numerical model using both artificial and empirical inputs to analyze taxi dynamics in an urban setting. More specifically, we quantify how the supply and demand for taxi services, the underlying road network, and the public acceptance of taxi ridesharing (TRS) affect the optimal number of taxis for a particular city, as well as commuters' average waiting time and trip time. Results reveal certain universal features of the taxi dynamics with real-time taxi-booking—that there is a well-defined transition between the oversaturated phase when demand exceeds supply, and the undersaturated phase when supply exceeds demand. The boundary between the two phases gives the optimal number of taxis a city should accommodate, given the specific demand, road network and commuter habits. Adding or removing taxis may affect commuter experience very differently in the two phases revealed. In the oversaturated phase the average waiting time is affected exponentially, while in the undersaturated phase it is affected sub-linearly. We analyze various factors that can shift the phase boundary, and show that an increased level of acceptance for TRS universally shifts the phase boundary by reducing the number of taxis needed. We discuss some of the useful insights on the benefits and costs of TRS, especially how under certain situations TRS will not only have economic benefits for commuters, but can also save the overall travel time for the shared parties, by significantly reducing the time commuters spend on waiting for taxis. Our simulations also suggest that simple artificial taxi systems can capture most of the universal features of the taxi dynamics. The relevance of the assumptions and the overall methodology are also illustrated using the empirical road network and taxi demand in Singapore.
Here, using an ensemble of machine learning models, a procedure is demonstrated that classifies passengers (Adult, Child/Student, and Senior Citizen) based on their three-month travel patterns. The method proceeds by constructing distinct commuter matrices, we refer to as eigentravel matrices, that capture a commuter’s characteristic travel routine. Comparing various classification models, we show that the gradient boosting method gives the best prediction with 76% accuracy, 81% better than the minimum model accuracy (42%) computed using proportional chance criterion.
In this work, we particularly focus on the complex relationship between land-use and transport offering an innovative approach to the problem by using land-use features at two differing levels of granularity (the more general land-use sector types and the more granular amenity structures) to evaluate their impact on public transit ridership in both time and space. To quantify the interdependencies, we explored three machine learning models and demonstrate that the decision tree model performs best in terms of overall performance—good predictive accuracy, generality, computational efficiency, and “interpretability”.
I am currently the co-project lead of a smart city project funded under the DOST-PCIEERD Industry, Energy and Emerging Technology …
My colleague Chris Monterola and I have been tapped by the Philippines Government through the Department of Trade and Industry (DTI) …
We have an on-going project with the Manila Water, Corp. funded through the DOST-PCIEERD CRADLE grant. The title of the project is …
Aside from teaching at AIM, I also supervise students in R&D especially when they engage industry/government stakeholders as part …
In the past few years, I have been involved in various industry and government projects. Some of them are listed here.
To support and promote women and gender minoritie in ML and DS
Event that brings together experts to share short data stories with the public over free beers.
For International Women’s Day 2020, we’re getting to know the pioneering women across East Asia Pacific who are breaking barriers and …
Data Scientists Needed in Every Industry, Experts Say
Introducing data science as ‘future’ of policy making in PH
This Pinay Is Leading Data Science Education In The Philippines
AIM’s big data bid to regain business school leadership
Filipina Physicist Back from SG to Head AIM’s Data Science Program
Machine-learning program predicts public transport use in Singapore
This course introduces participants to the latest trends in analytics in the era of big data, artificial intelligence, and the Internet of Things. The course explores various data-driven approaches, frameworks, and models used by different industries across functions to improve processes and/or create new and innovative products. In particular, participants will familiarize themselves with the different levels of analytics—descriptive, predictive, and prescriptive, and will be tasked to identify use cases where the approaches can be applied.
Complex Systems are systems composed of heterogeneous agents that are highly interacting and whose interactions result to emergent behavior, e.g. societies, economies, markets, cities, and biological systems like the immune system and the brain, to list a few. In this class, the students will be exposed to various tools used in characterizing and modeling complex systems. The topics include dynamical systems, chaos, fractals, self-organization, cellular-automata modeling, agent-based modeling, and complex networks.
The module covers the basics of Complexity Science with particular focus on Complex Networks (network science), which are the backbones of complex systems (e.g. cities, organizations, economies, and financial markets). Complex networks quantify the interactions of various entities/players in complex systems. Examples of complex networks include social networks like those generated from Twitter, Facebook, and Instagram, financial networks, biological networks, and organizational networks. Students learn how to visualize, analyze, and model complex networks using Python, NetworkX, and Gephi. At the end of the course, students should be able to view and analyze problems in business and marketing, among others, through the lens of complexity science. They should also be able to argue, in descriptive and quantitative manner, why a system-of-systems thinking is necessary to address most real-world issues.
In this course, students learn data science fundamentals that are more in tune with their applications to business; essentially, how the field is applied in the real-world. Students are provided with a comprehensive overview of data science and artificial intelligence—what they are and what they’re not. Students are also exposed to the current state of data science and its future direction(s). The class has data science practitioners share their experiences—from how companies come up with a data strategy toward becoming a truly data-driven organization, to building data science teams, to learning about the challenges companies faced and are currently facing. Participants learn about data workflows and pipelines; they will learn and appreciate how to assemble and lead data science enterprises. Finally, the course also covers the fundamentals of data privacy and data/AI ethics.
In this course, students will learn to appreciate the importance of successful data visualizations and intelligible stories in communicating insights. Using real-world datasets, learners will gain the necessary skills to fashion effective vizzes that exhibit not only good design elements but also layers of information that when weaved together as a narrative can drive stakeholders to take action. Storytelling will be emphasized across the sessions. On a more technical aspect, students, in this course, will also get to widen their visualization vocabulary. In addition, they will be introduced to the different viz tools available including Tableau, QGIS, and Gephi (a network visualization tool). They will also, of course, learn how to create visualizations in Python with pandas, networkx, geopandas, matplotlib, and plotly, among others.