Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling

Successful companies were defined as those achieving IPO, acquisition, or unicorn status, with specific valuation and funding thresholds applied. An extensive dataset was compiled, including timelines of key investment rounds. Unsuccessful companies were identified by filtering out those with successful outcomes and applying additional criteria, resulting in a dataset of 32,760 unsuccessful and 1,989 successful companies for model training.


This content originally appeared on HackerNoon and was authored by ExitStrategy

:::info Authors:

(1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com);

(2) Andrey Chertok, (a.v.chertok@gmail.com);

(3) Konstantin Zorin, (berzqwer@gmail.com);

(4) Cyril Shtabtsovsky, (cyril@aloniq.com).

:::

Abstract and 1. Introduction

2 Related works

3 Dataset Overview, Preprocessing, and Features

3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset

3.3 Features

4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest

4.2 Backtest settings

4.3 Results

4.4 Capital Growth

5 Other approaches

5.1 Investors ranking model

5.2 Founders ranking model and 5.3 Unicorn recommendation model

6 Conclusion

7 Further Research, References and Appendix

3.1 Successful Companies Dataset

In this research, a company is deemed successful if it achieves one of three outcomes: Initial Public Offering (IPO), Acquisition (ACQ), or Unicorn status (UNIC), the latter being defined as a valuation exceeding $1 billion. To assemble a list of successful companies, we initially filtered for IPOs with valuations above $500M or funds raised over $100M, yielding 363 companies. For acquisitions, we applied filters to eliminate companies with a purchase price below the maximum amount of funds raised or under $100M, resulting in 833 companies. To select unicorns, we searched for companies with a valuation above $1 billion, utilizing both Crunchbase data and an additional table of verified unicorns, which led to a total of 1074 unicorns.

\ The final dataset contains a timeline of all crucial investment rounds leading to the success event (i.e., achieving unicorn status, IPO, or ACQ), with the index of this event specified in the success_round column. This approach ensures that the dataset accurately represents the history and progress of each successful company, facilitating effective analysis.

3.2 Unsuccessful Companies Dataset

To supply the model with examples of ’unsuccessful’ companies, we collected a separate dataset. We excluded companies already present in the successful companies dataset by removing those that had IPO, ACQ, or UNIC flags. We also eliminated a considerable number of actual unicorns from the CrunchBase website [16] to avoid overlap. We excluded companies that have not attracted any rounds since 2016. Additionally, we excluded companies that are subsidiaries or parent companies of other entities. Furthermore, we used the jobs dataset to exclude companies that have hired employees since 2017.

\ Additionally, we applied extra filters to exclude companies with valuation above $100 million, as they reside in the "gray area" of companies that may not be clearly categorized as successful or unsuccessful. By applying these filters, we constructed a dataset comprising 32,760 companies, denoted by the label ’0’ for unsuccessful, and 1,989 companies, denoted by the label ’1’ for successful.

\

:::info This paper is available on arxiv under CC 4.0 license.

:::

\


This content originally appeared on HackerNoon and was authored by ExitStrategy


Print Share Comment Cite Upload Translate Updates
APA

ExitStrategy | Sciencx (2024-08-07T18:20:09+00:00) Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling. Retrieved from https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/

MLA
" » Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling." ExitStrategy | Sciencx - Wednesday August 7, 2024, https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/
HARVARD
ExitStrategy | Sciencx Wednesday August 7, 2024 » Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling., viewed ,<https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/>
VANCOUVER
ExitStrategy | Sciencx - » Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/
CHICAGO
" » Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling." ExitStrategy | Sciencx - Accessed . https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/
IEEE
" » Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling." ExitStrategy | Sciencx [Online]. Available: https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/. [Accessed: ]
rf:citation
» Unicorns vs Failures: Constructing Comprehensive Datasets for Predictive Modeling | ExitStrategy | Sciencx | https://www.scien.cx/2024/08/07/unicorns-vs-failures-constructing-comprehensive-datasets-for-predictive-modeling/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.