 CodeEval-Pro Leaderboard
      CodeEval-Pro Leaderboard
    
    HumanEval Pro and MBPP Pro evaluate LLMs on
        self-invoking code generation task to show their reasoning ability in code generation.
      
      
        
          
          
          
        
        
          
          
          
        
      
      
        
          
            
              📝 Notes
              
                
                  1. All self-invoking samples are generated from scratch using our codebase.
                  
                  2. The pass@1 scores are reported with greedy generation strategy,  Models are ranked by pass@1.
                  
                
              
            
          
      
      
        
          
            
              🤗 Acknowlegement and More Leaderboards
              
                
                  The leaderboard code is inspired from  EvalPlus 
                  and 
                  CRUXEval, Thanks a lot!
                  We also recommend the following leaderboards for measuring code LM ability on various coding tasks,
                  such as
                  EvalPlus Leaderboard,
                  LiveCodeBench Leaderboard,
                  BigCodeBench
                    Leaderboard,
                  and
                  McEval
                    Leaderboard.
                
              
            
          
      
  
  
  
📝 Notes
                  1. All self-invoking samples are generated from scratch using our codebase.
                  
                  2. The pass@1 scores are reported with greedy generation strategy,  Models are ranked by pass@1.
                  
                
🤗 Acknowlegement and More Leaderboards
The leaderboard code is inspired from EvalPlus and CRUXEval, Thanks a lot! We also recommend the following leaderboards for measuring code LM ability on various coding tasks, such as EvalPlus Leaderboard, LiveCodeBench Leaderboard, BigCodeBench Leaderboard, and McEval Leaderboard.