Introduction
Recent studies have explored COVID-19's impact on financial markets, including cryptocurrencies. Findings suggest Bitcoin does not act as a safe haven asset during crises, showing correlation with stock markets instead. While prior research used machine learning to predict Bitcoin prices based on historical data, none examined how social media sentiment—particularly Twitter—affected Bitcoin during the pandemic.
This paper conducts a comprehensive Valence Aware Dictionary and sEntiment Reasoner (VADER) analysis of Bitcoin-related tweets during COVID-19. We evaluate 13 preprocessing strategies to enhance sentiment-to-price correlation, focusing on cleaning techniques that refine tweet text for more accurate VADER scoring.
Sentiment Analysis Methods
1. VADER (Valence Aware Dictionary and sEntiment Reasoner)
A lexicon- and rule-based tool optimized for social media, VADER analyzes text for emotional polarity (negative, neutral, positive) and outputs a compound sentiment score between -1 (negative) and +1 (positive).
Key Features:
- Handles slang, emojis, and abbreviations.
- Computationally efficient (no training required).
2. Word2Vec
Word2Vec transforms words into numerical vectors, preserving semantic relationships (e.g., "king – man + woman ≈ queen").
Architectures:
- CBOW: Predicts a word from its context.
- Skip-gram: Predicts context words from a target word.
3. TF-IDF (Term Frequency-Inverse Document Frequency)
Identifies keywords by weighing term frequency against document rarity.
4. N-grams
Groups adjacent words (e.g., bigrams, trigrams) to capture contextual meaning, useful for detecting negations like "not good."
Related Work
- Price Prediction: Tweets' sentiment showed 83% accuracy in forecasting Bitcoin fluctuations (Li et al., 2021).
- Misinformation Impact: Bot accounts amplified cryptocurrency price volatility (Kaplan et al., 2022).
- COVID-19 Correlation: Agrello and Dogecoin exhibited strong ties to Twitter sentiment (R² > 0.22).
👉 Explore how sentiment drives crypto markets
Methodology: Analyzing BTC Tweets During COVID-19
Data Collection
- Source: Custom Python scraper using Twitter API (Tweepy).
- Keywords: "Bitcoin," "BTC," "#XBT," etc.
- Volume: 4.1 million tweets (May–July 2021).
Preprocessing Strategies
- Cleaning: Removed URLs, hashtags, and Twitter syntax.
- Splitting: Segmented text into sentences for granular analysis.
- Stopword Removal: Filtered non-essential words (e.g., "the," "and").
Optimal Strategy:
Combining cleaning + splitting improved sentiment-price correlation by 15%.
Results
- Strong Correlation: Processed sentiment scores aligned with Bitcoin’s 24-hour price movements (p < 0.05).
- Volume Polarity: High tweet volume with positive sentiment often preceded price rallies.
FAQs
Q: How does VADER handle emojis?
A: It assigns sentiment values (e.g., 😊 → +0.8, 😠 → -0.5).
Q: Can tweet sentiment predict Bitcoin crashes?
A: Yes—sharp spikes in negative sentiment often preceded drops by 6–12 hours.
Q: Why avoid machine learning for this analysis?
A: VADER’s rule-based approach is faster and requires no training data.
👉 Learn advanced crypto sentiment tactics
Conclusion
Preprocessing Twitter data with VADER enhances Bitcoin sentiment analysis, offering actionable insights for traders. Future work could integrate real-time sentiment tracking with price prediction models.