Parameter Efficient Pre-Training: Comparing ReLoRA and GaLore
Can we use parameter-efficient training methods and achieve PEFT-like efficiency gains during the pre-training stage too?
Can we use parameter-efficient training methods and achieve PEFT-like efficiency gains during the pre-training stage too?
Models of BERT family are overall robust to pruning, but they have an Achilles heel: the outlier dimensions, without which the quality of the model drops sig...
The world is filled with data. Can we learn from this data to generate something new?
QuAIL is a new challenging NLP benchmark that combines reading comprehension and commonsense reasoning.
BERT and its Transformer-based cousins are still ahead on all NLP leaderboards. But how much do they actually understand about language?