BloombergGPT: A Large Language Model for Finance

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Natural Language Inference ANLI test BLOOM 176B (one-shot) A1 33.6 # 16
A2 33.8 # 24
A3 35.17 # 23
Natural Language Inference ANLI test Bloomberg GPT (one-shot) A1 32.9 # 18
A2 34.4 # 21
A3 37.33 # 21
Natural Language Inference ANLI test OPT 66B (one-shot) A1 33.1 # 17
A2 34.2 # 22
A3 34.92 # 24
Natural Language Inference ANLI test GPT-NeoX (one-shot) A1 32.6 # 19
A2 33.8 # 24
A3 36.17 # 22
Common Sense Reasoning ARC (Challenge) GPT-NeoX 20B (1-shot) Accuracy 45.39 # 35
Common Sense Reasoning ARC (Challenge) OPT 66B (one-shot) Accuracy 44.54 # 37
Common Sense Reasoning ARC (Challenge) Bloomberg GPT 50B (1-shot) Accuracy 48.63 # 32
Common Sense Reasoning ARC (Challenge) BLOOM 176B (1-shot) Accuracy 50.85 # 29
Common Sense Reasoning ARC (Easy) GPT-NeoX 20B (1-shot) Accuracy 70.79 # 28
Common Sense Reasoning ARC (Easy) BLOOM 176B (1-shot) Accuracy 75.93 # 18
Common Sense Reasoning ARC (Easy) Bloomberg GPT 50B (1-shot) Accuracy 73.99 # 22
Common Sense Reasoning ARC (Easy) OPT 66B (1-shot) Accuracy 71.25 # 25
Common Sense Reasoning BIG-bench (Causal Judgment) PaLM 540B (few-shot, k=3) Accuracy 61.0 # 2
Common Sense Reasoning BIG-bench (Causal Judgment) GPT-NeoX 20B (few-shot, k=3) Accuracy 52.41 # 5
Common Sense Reasoning BIG-bench (Causal Judgment) OPT 66B (few-shot, k=3) Accuracy 51.87 # 6
Common Sense Reasoning BIG-bench (Causal Judgment) BloombergGPT 50B (few-shot, k=3) Accuracy 49.73 # 9
Common Sense Reasoning BIG-bench (Causal Judgment) BLOOM 176B (few-shot, k=3) Accuracy 51.87 # 6
Common Sense Reasoning BIG-bench (Date Understanding) PaLM 540B (few-shot,k=3) Accuracy 53.6 # 4
Common Sense Reasoning BIG-bench (Date Understanding) OPT 66B (few-shot, k=3) Accuracy 49.60 # 7
Common Sense Reasoning BIG-bench (Date Understanding) Bloomberg GPT 50B (few-shot, k=3) Accuracy 54.8 # 3
Common Sense Reasoning BIG-bench (Date Understanding) GPT-NeoX 20B (few-shot, k=3) Accuracy 45.60 # 8
Common Sense Reasoning BIG-bench (Date Understanding) BLOOM 176B (few-shot, k=3) Accuracy 50.00 # 6
Common Sense Reasoning BIG-bench (Disambiguation QA) BLOOM 176B (few-shot, k=3) Accuracy 40.4 # 7
Common Sense Reasoning BIG-bench (Disambiguation QA) OPT 66B (few-shot, k=3) Accuracy 40.4 # 7
Common Sense Reasoning BIG-bench (Disambiguation QA) PaLM 540B (few-shot, k=3) Accuracy 60.8 # 3
Common Sense Reasoning BIG-bench (Disambiguation QA) Bloomberg GPT 50B (few-shot, k=3) Accuracy 34 # 9
Common Sense Reasoning BIG-bench (Disambiguation QA) GPT-NeoX 20B (few-shot, k=3) Accuracy 40.8 # 6
Logical Reasoning BIG-bench (Formal Fallacies Syllogisms Negation) PaLM 540B (few-shot, k=3) Accuracy 53.6 # 4
Logical Reasoning BIG-bench (Formal Fallacies Syllogisms Negation) BLOOM 176B (few-shot, k=3) Accuracy 52.8 # 5
Logical Reasoning BIG-bench (Formal Fallacies Syllogisms Negation) Bloomberg GPT 50B (few-shot, k=3) Accuracy 50.8 # 8
Logical Reasoning BIG-bench (Formal Fallacies Syllogisms Negation) GPT-NeoX 20B (few-shot, k=3) Accuracy 52.8 # 5
Logical Reasoning BIG-bench (Formal Fallacies Syllogisms Negation) OPT 66B (few-shot, k=3) Accuracy 54 # 3
Multiple Choice Question Answering (MCQA) BIG-bench (Hyperbaton) PaLM 540B (few-shot, k=3) Accuracy 70.8 # 7
Multiple Choice Question Answering (MCQA) BIG-bench (Hyperbaton) GPT-NeoX (few-shot, k=3) Accuracy 92 # 1
Multiple Choice Question Answering (MCQA) BIG-bench (Hyperbaton) Bloomberg GPT (few-shot, k=3) Accuracy 92 # 1
Multiple Choice Question Answering (MCQA) BIG-bench (Hyperbaton) OPT 66B (few-shot, k=3) Accuracy 91.6 # 4
Multiple Choice Question Answering (MCQA) BIG-bench (Hyperbaton) BLOOM 176B (few-shot, k=3) Accuracy 92 # 1
Multiple Choice Question Answering (MCQA) BIG-bench (Movie Recommendation) OPT 66B (few-shot, k=3) Accuracy 91.2 # 3
Multiple Choice Question Answering (MCQA) BIG-bench (Movie Recommendation) PaLM 540B (few-shot, k=3) Accuracy 87.2 # 6
Multiple Choice Question Answering (MCQA) BIG-bench (Movie Recommendation) Bloomberg GPT (few-shot, k=3) Accuracy 90.4 # 5
Multiple Choice Question Answering (MCQA) BIG-bench (Movie Recommendation) BLOOM 176B (few-shot, k=3) Accuracy 91.2 # 3
Multiple Choice Question Answering (MCQA) BIG-bench (Movie Recommendation) GPT-NeoX (few-shot, k=3) Accuracy 86.4 # 7
Multiple Choice Question Answering (MCQA) BIG-bench (Navigate) OPT 66B (few-shot, k=3) Accuracy 42 # 8
Multiple Choice Question Answering (MCQA) BIG-bench (Navigate) GPT-NeoX (few-shot, k=3) Accuracy 45.2 # 7
Multiple Choice Question Answering (MCQA) BIG-bench (Navigate) Bloomberg GPT (few-shot, k=3) Accuracy 42 # 8
Multiple Choice Question Answering (MCQA) BIG-bench (Navigate) BLOOM 176B (few-shot, k=3) Accuracy 50 # 6
Multiple Choice Question Answering (MCQA) BIG-bench (Navigate) PaLM 540B (few-shot, k=3) Accuracy 62.4 # 3
Logical Reasoning BIG-bench (Penguins In A Table) PaLM 540B (few-shot, k=3) Accuracy 44.5 # 4
Logical Reasoning BIG-bench (Penguins In A Table) Bloomberg GPT (few-shot, k=3) Accuracy 37.67 # 7
Logical Reasoning BIG-bench (Penguins In A Table) GPT-NeoX (few-shot, k=3) Accuracy 33.56 # 8
Logical Reasoning BIG-bench (Penguins In A Table) OPT 66B (few-shot, k=3) Accuracy 28.08 # 9
Logical Reasoning BIG-bench (Penguins In A Table) BLOOM 176B (few-shot, k=3) Accuracy 40.41 # 6
Logical Reasoning BIG-bench (Reasoning About Colored Objects) Bloomberg GPT (few-shot, k=3) Accuracy 34.8 # 7
Logical Reasoning BIG-bench (Reasoning About Colored Objects) GPT-NeoX (few-shot, k=3) Accuracy 26 # 9
Logical Reasoning BIG-bench (Reasoning About Colored Objects) OPT 66B (few-shot, k=3) Accuracy 31.2 # 8
Logical Reasoning BIG-bench (Reasoning About Colored Objects) BLOOM 176B (few-shot, k=3) Accuracy 36.8 # 6
Logical Reasoning BIG-bench (Reasoning About Colored Objects) PaLM 540B (few-shot, k=3) Accuracy 38 # 5
Multiple Choice Question Answering (MCQA) BIG-bench (Ruin Names) Bloomberg GPT (few-shot, k=3) Accuracy 56 # 4
Multiple Choice Question Answering (MCQA) BIG-bench (Ruin Names) GPT-NeoX (few-shot, k=3) Accuracy 54 # 6
Multiple Choice Question Answering (MCQA) BIG-bench (Ruin Names) OPT 66B (few-shot, k=3) Accuracy 52.8 # 7
Multiple Choice Question Answering (MCQA) BIG-bench (Ruin Names) BLOOM 176B (few-shot, k=3) Accuracy 54.8 # 5
Multiple Choice Question Answering (MCQA) BIG-bench (Ruin Names) PaLM 540B (few-shot, k=3) Accuracy 76 # 3
Sarcasm Detection BIG-bench (SNARKS) BLOOM 176B (few-shot, k=3) Accuracy 72.47 # 4
Sarcasm Detection BIG-bench (SNARKS) PaLM 540B (few-shot, k=3) Accuracy 78.1 # 3
Sarcasm Detection BIG-bench (SNARKS) GPT-NeoX (few-shot, k=3) Accuracy 62.36 # 6
Sarcasm Detection BIG-bench (SNARKS) Bloomberg GPT (few-shot, k=3) Accuracy 69.66 # 5
Common Sense Reasoning BIG-bench (Sports Understanding) Bloomberg GPT (few-shot, k=3) Accuracy 62.8 # 5
Common Sense Reasoning BIG-bench (Sports Understanding) PaLM 540B (few-shot, k=3) Accuracy 80.4 # 3
Common Sense Reasoning BIG-bench (Sports Understanding) OPT 66B (few-shot, k=3) Accuracy 54.4 # 7
Common Sense Reasoning BIG-bench (Sports Understanding) GPT-NeoX (few-shot, k=3) Accuracy 53.2 # 8
Logical Reasoning BIG-bench (Temporal Sequences) BLOOM 176B (few-shot, k=3) Accuracy 36.8 # 4
Logical Reasoning BIG-bench (Temporal Sequences) PaLM 540B (few-shot, k=3) Accuracy 39.6 # 3
Logical Reasoning BIG-bench (Temporal Sequences) Bloomberg GPT (few-shot, k=3) Accuracy 29.2 # 6
Logical Reasoning BIG-bench (Temporal Sequences) GPT-NeoX (few-shot, k=3) Accuracy 21.2 # 8
Logical Reasoning BIG-bench (Temporal Sequences) OPT 66B (few-shot, k=3) Accuracy 23.6 # 7
Question Answering BoolQ OPT 66B (1-shot) Accuracy 57.5 # 54
Question Answering BoolQ BLOOM 176B (1-shot) Accuracy 52.9 # 57
Question Answering BoolQ GPT-NeoX 20B (1-shot) Accuracy 46.4 # 59
Question Answering BoolQ Bloomberg GPT 50B (1-shot) Accuracy 74.6 # 34
Natural Language Inference CommitmentBank GPT-NeoX (one-shot) Accuracy 48.21 # 17
Natural Language Inference CommitmentBank Bloomberg GPT (one-shot) Accuracy 53.57 # 16
Natural Language Inference CommitmentBank OPT 66B (one-shot) Accuracy 44.64 # 19
Natural Language Inference CommitmentBank BLOOM 176B (one-shot) Accuracy 48.21 # 17
Common Sense Reasoning CommonsenseQA OPT 66B (1-shot) Accuracy 66.4 # 20
Common Sense Reasoning CommonsenseQA Bloomberg GPT 50B (1-shot) Accuracy 65.5 # 21
Common Sense Reasoning CommonsenseQA GPT-NeoX 20B (1-shot) Accuracy 60.4 # 27
Common Sense Reasoning CommonsenseQA BLOOM 176B (1-shot) Accuracy 64.2 # 23
Question Answering COPA OPT 66B (one-shot) Accuracy 86 # 25
Question Answering COPA BLOOM 176B (one-shot) Accuracy 84 # 31
Question Answering COPA GPT-NeoX (one-shot) Accuracy 88 # 21
Question Answering COPA Bloomberg GPT (one-shot) Accuracy 86 # 25
Sentence Completion HellaSwag OPT 66B (1-shot) Accuracy 73.5 # 49
Sentence Completion HellaSwag BLOOM 176B (1-shot) Accuracy 73.2 # 50
Sentence Completion HellaSwag BlooombergGPT 50B (1-shot) Accuracy 73.9 # 48
Sentence Completion HellaSwag GPT-NeoX 20B (1-shot) Accuracy 68.4 # 52
Multi-task Language Understanding MMLU BLOOM 176B (5-shot) Average (%) 39.1 # 81
Multi-task Language Understanding MMLU Bloomberg GPT 50B (5-shot) Average (%) 39.2 # 79
Multi-task Language Understanding MMLU OPT 66B (5-shot) Average (%) 36 # 84
Question Answering MultiRC BLOOM 176B (1-shot) F1 26.7 # 23
Question Answering MultiRC Bloomberg GPT 50B (1-shot) F1 62.3 # 18
Question Answering MultiRC GPT-NeoX 20B (1-shot) F1 22.9 # 24
Question Answering MultiRC OPT 66B (1-shot) F1 18.8 # 25
Question Answering OpenBookQA Bloomberg GPT 50B (1-shot) Accuracy 51.6 # 32
Question Answering OpenBookQA GPT-NeoX 50B (2-shot) Accuracy 44.2 # 34
Question Answering OpenBookQA OPT 66B (one-shot) Accuracy 58.0 # 27
Question Answering OpenBookQA BLOOM 176B (2-shot) Accuracy 47.2 # 33
Question Answering PIQA OPT 66B (1-shot) Accuracy 77.6 # 35
Question Answering PIQA GPT-NeoX 20B (1-shot) Accuracy 75.8 # 42
Question Answering PIQA Bloomberg GPT 50B (1-shot) Accuracy 77.9 # 34
Question Answering PIQA BLOOM 176B (1-shot) Accuracy 77 # 37
Reading Comprehension RACE OPT 66B (one-shot) Accuracy (High) 37.02 # 17
Accuracy (Middle) 47.42 # 17
Reading Comprehension RACE GPT-NeoX (one-shot) Accuracy (High) 34.33 # 18
Accuracy (Middle) 41.23 # 18
Reading Comprehension RACE Bloomberg GPT (one-shot) Accuracy (High) 41.74 # 15
Accuracy (Middle) 54.32 # 15
Reading Comprehension RACE BLOOM 176B (one-shot) Accuracy (High) 39.14 # 16
Accuracy (Middle) 52.3 # 16
Common Sense Reasoning ReCoRD Bloomberg GPT 50B (1-shot) F1 82.8 # 17
Common Sense Reasoning ReCoRD BLOOM 176B (1-shot) F1 78 # 23
Common Sense Reasoning ReCoRD OPT 66B (1-shot) F1 82.5 # 21
Common Sense Reasoning ReCoRD GPT-NeoX 20B (1-shot) F1 67.9 # 28
Natural Language Inference RTE GPT-NeoX 20B (1-shot) Accuracy 53.8% # 86
Natural Language Inference RTE OPT 66B (1-shot) Accuracy 54.9% # 83
Natural Language Inference RTE BLOOM 176B (1-shot) Accuracy 57.4% # 80
Natural Language Inference RTE Bloomberg GPT 50B (1-shot) Accuracy 69.3% # 56
Common Sense Reasoning WinoGrande Bloomberg GPT (one-shot) Accuracy 64.1 # 42
Common Sense Reasoning WinoGrande BLOOM 176B (1-shot) Accuracy 67 # 38
Common Sense Reasoning WinoGrande OPT 66B (1-shot) Accuracy 66.1 # 40
Common Sense Reasoning WinoGrande GPT-NeoX (one-shot) Accuracy 60.6 # 46

Methods


No methods listed for this paper. Add relevant methods here