Textual Analytics in Social Media

Beliefs drive markets

Uncovering Individual beliefs that lead to new market insights and sustainable investment outcomes.

With the growing accessibility of textual data in recent years, the potential to develop structural models that incorporate beliefs and micro-decisions has emerged. Through our team’s extensive analysis of thousands of news and social media outlets, we produce actionable textual insights and analytics. We monitor the exposure of content (reads, comments, likes) and its impact on audiences (speed of spread, sentiment tones, emotions). Additionally, we transform unstructured text from thousands of news and social media sources into structured indicators, enabling deeper insights based on a variety of principles. Our passion lies in generating sustainable research outputs, focusing on extracting and leveraging micro-belief statements, which distill key sentiments and trends from the vast sea of data.

Vison

Our Vision

Beliefs are central to asset pricing. Nearly all asset pricing models operate on the assumption that investors determine asset prices based on their beliefs regarding future payoffs. Someone unfamiliar with the field might assume that a significant portion of asset pricing research focuses on understanding how investors form these beliefs. However, this hasn’t been the case. The majority of theoretical and empirical studies in asset pricing aim to reverse-engineer these beliefs, adapting them within intricate models to align with observed asset prices.

One might naturally question if we could achieve further progress by uncovering belief dynamics both theoretically and empirically. Models of investor belief dynamics should address the sources of information that investors utilize. It’s equally important to understand how investors process and digest this information before making investment decisions. Belief Analytics, at least in part, pursues these developments.

Belief measurement

Recent years have seen an improvement in the availability of textual data, but for beliefs data to become a standard component of asset-pricing studies, further advancements in the collection and categorization of belief statements are required.

Acquisition of large amounts of data

Preparing a single plate of fried rice is easy; producing 100 million plates is another challenge. To tackle this scale, our project focuses on creating a robust infrastructure capable of processing vast volumes of specific texts from the Internet. We emphasize crafting automated programs that are fault-tolerant and excel at storing and retrieving data efficiently.

The theory of communication

We are constantly exposed to news, opinions, and emotions through social interaction, even if they are outdated. Additionally, this exposure subjects us to the spread of disinformation and noise. Due to this, the event itself may not be relevant, but the “story” that was coded, created, embellished, and distorted is. We represent a “big picture” of how financial news and opinions are spread based on collected belief statements and how they differ based on the characteristics of the posts.

Modeling belief formation

In classical models, investors are assumed to be rational, taking into account all available historical data when learning about relevant stochastic processes for pricing. But when mapping these models into the real world, it is not clear what “all available” means. We aim to close the gap between the simplistic environment in asset-pricing models and the complex inputs investors face in the real world. Moreover, there are reasons to expect that memory and perception of social posts are biased. It is crucial to conduct empirical and theoretical research to better understand how investors select and process information conveyed in vast amounts of media posts.

By far, by the numbers

Texts from 1.1 million videos

- Social media video platform (TikTok China )

Texts from 274.2 million posts

- Chinese Stock Forum

Texts from 9.9 million posts

- Chinese Mutual fund Forum

Texts from 8.6 million posts

- Chinese Furtures Forum

AEnB2Up6Y5oFzsJzk9V3gLsXJp7m71-rWBFrDk3S

Releasing the power of text by fine-tuning vertical LLMs

Social media content serves as a lens into the public’s perceptions of specific subjects or incidents. Nevertheless, the extraction of valuable insights from such expansive quantities of text data is far beyond the capabilities of even the most experienced human analysts. This is where Large Language Models (LLMs) come to the rescue. Conceptually, an LLM functions as an advanced analytical instrument for processing natural language. Analogous to a parrot echoing phrases within its environment, an LLM emulates human language patterns. A critical distinction, however, is that LLMs are trained on extensive datasets, enabling them to produce coherent and contextually appropriate results in analyzing financial reports, market trends, and generating economic forecasts.

Belief Analytics harnesses its immense collection of textual data to forge state-of-the-art, domain-specialized LLMs tailored for both academic research and industrial applications. Our fine-tuned vertical models are strategically focused on achieving unparalleled performance in specific usages, albeit with a trade-off in the model’s overall generative capabilities.

We believe we can make sustainable contributions via the following three key dimensions:

Data-centric AI

Data-centric AI emphasizes the critical role of data quality, organization, and integrity, contrasting with the traditional emphasis on the complexity of models or algorithms. This paradigm suggests that employing high-quality data can diminish the need for complex model architectures and large-scale datasets, thereby significantly lessening the computational load during fine-tuning.

At Belief Analytics, we concentrate on achieving excellence in our training corpus. This is accomplished through diligent curation, accurate labeling, and comprehensive preprocessing of our text datasets. Our aim is to ensure that the applied corpus is bias-free, balanced, and specifically relevant to each distinct task. This approach not only makes the fine-tuning process more efficient in terms of resources but also enhances the overall effectiveness of our AI solutions.

Benchmarks play a pivotal role in fine-tuning LLMs, as they provide an essential framework for performance assessment and enhancement. At Belief Analytics, we are dedicated to developing benchmarks tailored specifically for vertical LLMs. This work is crucial for several reasons. Firstly, it provides standardized metrics for evaluating model performance in niche sectors, facilitating objective comparisons and providing deeper insights into specific model strengths. Secondly, these benchmarks are designed to address the unique challenges and needs of various vertical applications, leading to the creation of more focused and efficient models. Finally, these benchmarks are key in driving continuous corpus refinement that fully leverages the data-centric AI approach. By setting clear objectives and success criteria, appropriate benchmarks ensure our LLMs are not only powerful but also precisely tuned to the specific demands of their application domains.

Developing
LLMs Benchmarks

Purpose Driven
Fine-Tuned LLMs

Our fine-tuned LLMs are developed to understand and process the complex terminologies inherent in finance. They can also be deployed locally to comply with the privacy and security regulations of the financial sector. A key strength of our LLMs is their adaptability, allowing us to automate routine tasks and continually adjust to new demands and areas of interest in finance. This keeps our solutions at the forefront of technology, offering relevant and impactful insights for financial research and industry applications.

Finally, multiple fine-tuned models may operate collaboratively under an advanced coordinating framework. This integration facilitates a diverse array of functionalities coupled with a superior level of automation. Such a synergistic approach empowers our models to deliver intricate and nuanced abilities, catering to a wide spectrum of technology-oriented financial innovations.

Our Key Members

Teams

Founder

Dr Yuxin Xie

Associate Professor

School of Finance

Southwestern University of Finance and Economics

Dr. Yuxin Xie is the founder of Belief Analytics. He has more than twelve years of research experience in behavioral finance, quantitative computing and neuroeconomics. He leads the research and tech development team in developing and researching micro-belief datasets and implications.

Chief Scientist

Dr Athanasios Pantelous

Professor

Business School

Monash University

Athanasios Pantelous holds dual PhDs in statistics and modeling, as well as systems theory. He is an expert in applying decision theory and financial network theory and operation to the aggregation of textual data. As a part of his role at Belief Analytics, he assists in the construction and validation of sentiment-based indications.

Data Lead

Dr Xiaomeng Lu

Associate Professor

Survey and Research Center for China Household Finance

Southwestern Universit of Finance and Economics

Dr. Xiaomeng Lu is head of the household finance research division at the Survey and Research Center for China's Household Finance. Her specialization is preparing and cleaning large-scale data sets with a focus on cloud computing. She leads the data management team in planning, collecting and processing huge amounts of data on a daily basis.

IT Lead

Mr Bingxu Wang

PhD Candidate

School of Finance

Southwestern University of Finance and Economics

Having worked in the fields of physics and computer science for six years, Mr. Wang is in charge of building the infrastructure to handle the large amount of network data. Additionally, he helped us create a deep analysis framework utilizing natural language processing.

Operation Lead

Dr Feng Li

Associate Professor

School of Finance

Southwestern University of Finance and Economics

Dr. Li is a distinguished faculty member of our institute, with her primary research focus on the financial decisions within micro-level households. Her depth of knowledge in this niche has not only elevated the academic standing of our institution but also has practical implications that extend beyond academia. Maintaining close ties with both governmental agencies and businesses, Dr. Li serves as a critical bridge between our institute and the broader industry and governmental sectors. Her role is pivotal in fostering collaboration, ensuring that our research is not only theoretically robust but also holds real-world significance.

LLM lead

Mr Ruohua Tang

PhD Candidate

Business School

Monash University

As the lead for developing Large Language Models (LLMs), Mr. Ruohua stands at the vanguard of fine-tuning large-scale language models to specific vertical domains. Through strategic refinements, he, along with his team ensure a balance between general capabilities and exceptional performance in specialized areas. Ruohua's visionary leadership paves the way for innovative solutions that align with the distinct requirements of both research and industry standards.

Reserach

Dr Shao Jia

Assistant Professor

School of Mathematics

University of Birmingham

Dr. Shao holds a PhD in Quantitative Finance and was a quant strategist in a hedge fund in London. She is a fellow in the Royal Statistical Society (RSS) Finance and Economics Section and RSS National Statistics Advisory Group. She is an expert in statistics, financial and actuarial mathematics. Her role in at Belief Analytics is data analysis and implications.

Reserach

Dr Xiaoqian Wen

Associate Professor

Institute of Chinese Financial Studies

Southwestern University of Finance and Economics

Dr. Xiaoqian Wen is a leading expert in the field of commodity research. With a deep academic background in economics and extensive experience in the commodity markets, she specializes in analyzing global trends, market dynamics, and pricing mechanisms for various commodities. Her primary focus is on harnessing advanced textual and numeric data analytics to interpret vast datasets, aiming to provide valuable insights for strategic decision-making in the commodity sector. Dr. Wen leads a dedicated team in developing sophisticated models for commodity forecasting, significantly influencing trading strategies and policy making.

Reserach

Mr Liyuan Liu

PhD Candidate

The PBC School of Finance (PBCSF)

Tsinghua University

The primary responsibility of Mr. Liyuan Liu at Belief Analytics is to generate research ideas, conduct preliminary empirical tests, and lead the research team. Based on his extensive education in finance and mathematics, he is interested in the dynamics of agents' beliefs and behaviors by combining empirical evidence with theoretical foundations.

Reserach

Mr Jinlin Fu

PhD Candidate

Research Institute of Economics and Management

Southwestern University of Finance and Economics

Mr. Jinglin Fu is responsible for various tasks related to tracking and analyzing data from the social media platform TikTok, specifically with a focus on video content. He downloads and organizes data from the platform, which includes tracking metrics such as engagement, views, and likes. Additionally, he works on developing tools and processes for converting video and audio content into text, which can help to better understand trends and patterns in user behavior on the platform.

Our Partner

Our partners are providing important aids in our area of focus. In our area of focus, our partners provide valuable aid. As a result of the collaborative projects that we undertake, we are able to work together toward the fulfillment of a common objective.